Hathi1M: Introducing a Million Page Historical Prose Dataset in English from the Hathi Trust

Hathi1M: Introducing a Million Page Historical Prose Dataset in English from the Hathi Trust

Really pleased to announce the release of a new data set that I’ve been working on with my collaborator Sunyam Bagga. In it we build on the prior work of Ted Underwood and his team to develop parallel corpora of fiction and non-fiction writing over 

Introducing the World Literature Data Collective

Introducing the World Literature Data Collective

Welcome to the next moonshot. Together with a growing and dynamic group of researchers I am extremely proud to announce a new initiative aimed at understanding human storytelling across numerous world cultures. The goal of globalizing our understanding of storytelling has long been a dream 

The difference Queer FanFic makes

The difference Queer FanFic makes

Fanfic isn’t all about sex. It’s about connection. Two students in our lab, Nikoo Sarraf and Jennifer Chen, have a new lab collaboration paper out that explores the way queer fanfiction differs from mainstream publishing. As they write in their introduction: Fanfiction is a powerful 

Detecting narrativity across long time scales

Detecting narrativity across long time scales

Our lab has a new article out that will be appearing in the Computational Humanities Research Conference Proceedings (CHR 2021). In this project, we develop computational methods for measuring the degree of narrativity in over 335,000 text passages distributed across two- to three-hundred years of 

Let the hypothesis testing begin

Let the hypothesis testing begin

We believe that a turn toward hypothesis testing will help us become more aware of exactly what we are doing and why we are doing it. I have a new piece out with Matt Erlin in Public Books. In it we describe why we think 

Celebrating 5 Years of Cultural Analytics

Celebrating 5 Years of Cultural Analytics

I am very proud to announce that the journal that I edit and co-founded has turned five! In the past five years, the Journal of Cultural Analytics has published 107 articles, 5 special issues, moved through two open-access publishing platforms, accrued over 1,000 subscribers, is 

Narrative Theory for Computational Narrative Understanding

Narrative Theory for Computational Narrative Understanding

I have a new piece out with co-authors David Bamman and Richard Jean So in the forthcoming proceedings of the Empirical Methods in Natural Language Processing (EMNLP) conference. Our goal in the paper is to provide NLP researchers with a clear theoretical framework to computationally 

Fiction’s Functions, or Year-End Round-Up

Fiction’s Functions, or Year-End Round-Up

It’s been a fun year at .txtlab. We’ve done projects on queer fan fiction, the effects of corporate ownership on local news, narrativity across long time scales, minor literature and literary nationalism, measuring bias in machine learning, narratives to support sustainable business practices in small