LLMs for Understanding Narrative Discourse

In our new paper for the Workshop on Narrative Understanding, we test the affordances of large language models for the analysis of narrative discourse, understood here as the three key linking functions between the primary nodes in Genette’s classic narratological triangle.

Considerable work in NLP has focused on understanding the two original (lower) nodes of Genette’s triangle. For the task of story understanding (i.e. the lower left node), work has focused on key areas such as the detection of character types (Stammbach et al., 2022; Bamman et al., 2014), event types (Parekh et al., 2023; Chambers and Jurafsky, 2009), and story lines (Caselli et al., 2015)). Similarly, narrative structure (i.e. the lower right node, traditionally referred to as discourse), has been amply addressed in concepts such as plot arcs (Reagan et al., 2016; Fudolig et al., 2023), turning points (Ouyang and McKeown, 2015), and non-linearity (Piper and Toubia, 2023).

Our paper aims to develop a preliminary prompting framework to capture the greater concept of narrative discourse, which for Genette was identified as the interactions between the three key story poles. We translate Genette’s linguistic terminology (tense, mood, voice) into the more colloquial terms: Time, Setting, and Perspective and develop multiple natural language prompts to capture each dimension. The Table below illustrates our overall approach.

We then manually annotated 188 passages drawn from the NarraDetect dataset using a 3-point Likert scale (2=strongly present, 1=weakly present, 0=not present). We compare our student-annotated answers to language models of different sizes. The table below provides the accuracy of our models under two conditions: majority vote by annotators and when the model agreed with at least one annotator. We found given the subjectivity of the task that a match with at least one annotator almost always resulted in a reasonable answer. As we can see in the table, both GPT-4 and our fine-tuned Llama3 model perform very well overall on this task.

Finally, we examined the features that most strongly helped predict whether a passage was labeled as narrative or not. We found strong evidence supporting the “deictic theory” of narrative – the idea that stories serve to focus our attention on specific human experiences happening at a distance in time and space. Our models consistently identified features that support this theory, such as:

  • Strong emphasis on character perspectives and experiences
  • Frequent use of concrete, sensory details to build distant worlds
  • Predominant use of past tense to create temporal distance

We also found that our classification experiment did not result in a strong clustering of any one of our higher-level classes (POV, setting, time). Rather, it appears to be the case that one of the distinguishing features of narrative communication is a reliance on multiple dimensions of discourse (i.e. an intermixing of all three of Genette’s linking functions). We observe for example that just under 90% of all narrative passages utilize at least one feature from each of our three classes (POV, setting, time), while non-narrative passages do this just 25% of the time.

In sum, we found that LLMs like GPT can reasonably approximate human judgments on a range of narrative comprehension tasks and provide an exciting new resource for deeper narrative understanding.