Tag: Digital Humanities

Hathi1M: Introducing a Million Page Historical Prose Dataset in English from the Hathi Trust

Hathi1M: Introducing a Million Page Historical Prose Dataset in English from the Hathi Trust

Really pleased to announce the release of a new data set that I’ve been working on with my collaborator Sunyam Bagga. In it we build on the prior work of Ted Underwood and his team to develop parallel corpora of fiction and non-fiction writing over 

Introducing the World Literature Data Collective

Introducing the World Literature Data Collective

Welcome to the next moonshot. Together with a growing and dynamic group of researchers I am extremely proud to announce a new initiative aimed at understanding human storytelling across numerous world cultures. The goal of globalizing our understanding of storytelling has long been a dream 

Gettin’ into GitHub

Gettin’ into GitHub

One of my New Year’s Resolutions was to get stuff off my computer and onto GitHub. I know I’m late to the party. But better late than never. GitHub is an amazing resource where researchers all over the world share code. When we talk about 

Let the hypothesis testing begin

Let the hypothesis testing begin

We believe that a turn toward hypothesis testing will help us become more aware of exactly what we are doing and why we are doing it. I have a new piece out with Matt Erlin in Public Books. In it we describe why we think 

Can We Be Wrong?

Can We Be Wrong?

I have a new book out. It’s called “Can We Be Wrong? The Problem of Textual Evidence in a Time of Data.” The goal of the book is to change the terms of debate surrounding the place of computational literary analysis within the field literary 

The Coding Turn in the Humanities

The Coding Turn in the Humanities

As part of my new book, I have made the code and all derived text data freely available online. The underlying text data has been shared as far as copyright restrictions would allow. As I mentioned in my initial post, this entailed a massive amount 

Enumerations is out!

Enumerations is out!

My new book, Enumerations: Data and Literary Study, is now out with the University of Chicago Press. It’s a long-form exploration of the meaning of quantity in literature, from a study of punctuation in poetry, to plot structure in novels, to the semantics of character 

Why are non-data driven representations of data-driven research in the humanities so bad?

Why are non-data driven representations of data-driven research in the humanities so bad?

One of the more frustrating aspects of working in data-driven research today is the representation of such research by people who do not use data. Why? Because it is not subject to the same rules of evidence. If you don’t like data, it turns out