Using Citizen Science to study literary social networks
Understanding how characters interact in narratives is at the heart of studying stories. These interactions are not only crucial to plot structure but also provide insights into the social fabric of narratives, enhancing our understanding of how stories model human relationships. To delve deeper into these dynamics, we turned to an innovative combination of citizen science and computational methods. Check out the full paper as part of the NLP4DH proceedings.
Citizen Science Meets Computational Literary Analysis
Citizen science involves engaging the general public in scientific research, and we’ve found it has tremendous potential to support computational literary studies. By mobilizing volunteers to annotate character interactions, we gathered a high-quality dataset of 13,395 labeled interactions from contemporary fiction and non-fiction books. This dataset forms the foundation for understanding how genres and audience factors influence the social structures in narratives.
Our project, hosted on the Zooniverse platform, asked participants to identify types of interactions between characters in brief text passages. These interactions were categorized into five types: communicating, thinking about, observing, touching, and associating. Participants also determined whether interactions were unilateral (only one character is aware) or bilateral. Through this effort, over 1,900 citizen scientists contributed nearly 74,000 annotations in just three months.
Building Better Social Networks
With this annotated dataset, we fine-tuned a small language model (Phi-3) to detect interaction types in narrative texts. The fine-tuned model achieved an F1 score of 0.70, just ahead of models like GPT-4. Using this model, we analyzed 390 books to construct character social networks. These networks were examined for various structural properties such as density, centrality, and modularity, offering a quantitative lens through which to compare narratives.
What We Discovered
Fiction and non-fiction narratives differ markedly in their social structures. Fictional narratives favor dense and interconnected networks, emphasizing embodied interactions such as physical contact and observation. These characteristics align with theories suggesting that fiction immerses readers in social cognition by modeling close-knit, relatable social environments. In contrast, non-fiction narratives, like biographies, exhibit more modular and loosely connected networks, reflecting their often broader and less immersive storytelling style.
Youth-oriented fiction also stood out, displaying even simpler and more centralized networks, likely tailored to younger audiences for easier comprehension.
Why It Matters
This study highlights the power of citizen science in advancing the humanities. By involving volunteers, we not only reduced annotation costs but also demonstrated the potential for high-quality data collection through community engagement. This collaborative approach bridges the gap between computational methods and humanistic inquiry, enabling scalable, nuanced analysis of narrative structures.
Furthermore, understanding literary social networks has broader implications for digital humanities, narrative theory, and cognitive science. Fiction, with its dense social networks, might play a unique role in developing readers’ social cognition—a concept with significant educational and psychological implications.
Looking Ahead
We’ve got a number of new initiatives going on at our citizen science project called “Citizen Readers.” From character emotions to a world map of story morals to understanding illustrations in children’s books, we citizen science as an integral part of computational literary studies.