Understanding narrative character roles from the ground up

When we read the news, we meet characters just like we do in novels. There are heroes and villains, victims and visionaries, public servants and profiteers. These portrayals shape how we understand events and decide whom to trust or blame. Yet most computational methods for analyzing media narratives still rely on fixed sets of roles like “hero,” “villain,” or “victim.”

In our new paper at EMNLP 2025, we introduce a different approach: Taxonomy-Free Character Role Labeling (TF-CRL). Instead of slotting people into preset categories, TF-CRL uses large language models (LLMs) to generate open-ended, compositional role labels—phrases such as Resilient LeaderScapegoated Visionary, or Righteous Protester. These labels capture not only what a person does in a story, but how they do it.

How it works

Once the set of characters for a given article has been identified, we employ large language models to generate descriptive role labels for each entity. Rather than selecting from a predefined taxonomy of roles, the models produce short, open-ended noun phrases that encapsulate a character’s narrative function as depicted by the author. To guide the model’s outputs we emphasize a primary label (leader, protester, ally) along with a modification (violent, innocent, strategic, etc.) The resulting expressions provide a fluid array of higher level roles and modifications.

How we evaluate our models

At the core of our evaluation lies what we call the Goldilocks problem of narrative labeling: determining the level of abstraction that is “just right.” A label that is too broad risks flattening the distinctiveness of a character’s portrayal, while one that is too narrow loses its capacity for comparison across narratives. To navigate this tension, we assess role labels through four interrelated criteria. Generalizability asks whether a label can extend beyond a single instance—whether a term like Righteous Protester describes a recurring type rather than a one-off characterization. Informativeness considers the degree of nuance and specificity a label provides—whether it adds meaningful detail to our understanding of the character. Relevance evaluates whether the label isolates the most thematically central aspect of the portrayal, and Faithfulness determines whether it accurately reflects the text itself. Taken together, these criteria aim to identify labels that are neither overfitted to a particular context nor so generic as to lose analytical value.

We found that using crowd-based human judges that participants preferred LLM character role labels over human labels quite decisively. This suggests that LLMs are reliable annotators when it comes to such narrative synthesis and generalization.

What can it do?

We’re excited about this positive evaluation because it means our narrative understanding can expand significantly beyond tradition hero-villain-victim frameworks. First, our method helps achieve greater character role resolution. For example, both Putin and Zelensky are “leaders” but with very different modifiers depending on the news source. Character Role entropy also lets us observe how unified a persona’s identity is. We can see how uniform Zelensky’s roles are versus Putin who has a range of negative framing around his role in the conflict.

Our method also helps with novel role discovery. Instead of being limited to existing taxonomies we can discover important narrative functions of recurring character roles. Finally, we can also observe cross-topic overlap, i.e. the extent to which certain roles recur in different story areas. For example, we show how stories around climate and immigration share many of the same roles, such as victim and legal ally, offering another way that LLM-assisted role labeling can give us a more precise understanding of collective narratives.