How Do We Know When People Are Telling Stories? Meet NarraDetect

The task of narrative detection is an important means for understanding storytelling at large scale. Among the piles of documents out in the world, which ones are stories? Understanding this can help us better study why and when people tell stories.

Over the past few years, we’ve been developing a dataset to facilitate this task, which we just released as part of the Workshop on Narrative Understanding.

The Challenge: Stories Aren’t All the Same

Narrative detection—figuring out whether a piece of text is a story—is surprisingly hard. Why? Three big reasons:

  1. Stories are social creatures. They show up in wildly different places: Reddit threads, fairy tales, autobiographies, even legal decisions. Each context shapes how a story gets told.
  2. Stories shift inside themselves. A novel might start with a gripping scene but shift into philosophical reflection. Not every part of a story is equally “story-like.”
  3. No one agrees on what a story really is. Scholars debate the essential ingredients of narrative—is it characters? Events? Time? A beginning, middle, and end?

Our Solution: NarraDetect

To tackle this, we built NarraDetect, a new dataset that takes a two-pronged approach:

  • large corpus of over 13,000 text passages from 18 narrative and non-narrative genres. This includes everything from flash fiction to U.S. Supreme Court decisions.
  • manually annotated corpus of nearly 400 passages rated for how narrative they are—not just whether they are stories, but how much they feel like one.

We broke narrativity down into three dimensions:

  • Agency – Are we hearing about characters doing things?
  • Event Sequencing – Do events unfold over time?
  • World Building – Can we imagine the world the story is set in?

Each passage was rated on a 5-point scale by trained annotators. Some texts scored high (vivid scenes with clear action), others low (abstract arguments or definitions), and many in between.

The Models: Who’s Better at Spotting Stories?

We ran both traditional supervised machine learning models and state-of-the-art language models (LLMs) like GPT-4 and LLaMA on the task.

Here’s what we found:

  • Supervised models (e.g., SVM + BERT embeddings) performed best within the dataset they were trained on—but struggled to generalize to other domains.
  • LLMs, while not always the top performers, showed a surprising ability to generalize and align with human judgments of narrativity, especially for scalar judgments.

In short: models trained on narrow definitions of narrative can do well in those contexts, but large models understand the feel of a story across contexts.

Why This Matters

Understanding when and how stories appear in our textual worlds is crucial for everything from misinformation detection to literary analysis. But first, we need a way to identify stories at scale—and in a way that respects their complexity.

NarraDetect is a step toward that goal. It provides both breadth (many genres) and depth (fine-grained annotations), and we’ve made it openly available to support further research.

📦 Explore the dataset herehttps://doi.org/10.5683/SP3/HEEEKN