Can We Be Wrong?

Can We Be Wrong?

I have a new book out. It’s called “Can We Be Wrong? The Problem of Textual Evidence in a Time of Data.”

The goal of the book is to change the terms of debate surrounding the place of computational literary analysis within the field literary studies. Most of these debates have and continue to centre around the idea of, “What does this new thing add?” The answers then range from nothing to exciting.

In other words, no one starts out by asking what’s wrong with literary studies. That’s a big problem because it overlooks the very real methodological deficiencies of literary studies that severely impact its credibility as a discipline and that computational methods are expressly trying to address.

So what are these deficiencies? For me, the single biggest problem facing the field of literary studies is “generalization”: how do we move in a credible (or incredible) way from individual observations of small numbers of texts to large-scale claims about the world?

As I show using data-driven methods of NLP and machine learning in the book, this practice of generalization, whether about genres, periods, or things as large as “modernity” or the “anthropocene,” is a regular and consistent feature of the discipline. Indeed, the practice appears to happen almost as frequently as in quantitative disciplines like sociology.

For those who argue that literary studies, and the humanities more generally, ought to be a discipline guided by a focus on particulars and particularity, this is not a good description of actual researcher behaviour according to the research in my book.

So what should we do about it? What should we do about this problem of evidentiary insufficiency?

Can We Be Wrong? tries to show how the values associated with the open science movement can be an excellent resource in realigning our methods around more restrained and transparent forms of generalization. One of the most important affordances of computational literary analysis is the way it allows researchers to be more open about the methodological steps taken to arrive at generalizations about literary behaviour.

Notice how what I’m saying is NOT that the exclusive answer is big data. Rather, the principles of “open generalization” — the means through a generalized empirical claim about the world is made — can and should be applied to all forms of research in the field. Being more transparent about how many documents were consulted, out of what sized population, and what methods were used to arrive at one’s claims — these principles should be applied in assessing the credibility of any article or book, whether it uses computation or not. This will go a long way towards solving the tremendous gaps that currently exist between what we say our research shows and what we have actually observed.

But that’s only Part 1 of the argument. This is not a book about how literary studies should only be guided by empirical best practices. When we make empirical claims about behaviour out in the world, then yes for sure. We need better methods for these purposes.

But this overlooks an entirely different value system in the field that prioritizes creativity and visionary thinking with respect to the textual past and that I show goes back in one form or another to the work of Lorenzo Valla (author of the Donation of Constantine). As I also show in the book, literary studies as a field is far more open to discursive or conceptual openness than the sciences. As Valla himself championed in his empirical take-down of the papacy, imagining what the past might hold was as important as showing what it did not.

In other words, we need to continue to valorize the critical and visionary thinking that is going on in the humanities. But not at the expense of credible empirical methods that make normative claims about how literature works. We can do both, just not at the same time.

What the book ultimately concludes with is a call for a greater mutuality of method. The theoretical leaps and bounds of creative literary scholarship can inform better and more nuanced empirical models of actual literary behaviour just as empirical findings of new data driven research can guide how we dream up possible worlds. We need to see these as separate, yet interconnected practices, distinct in their methodologies but complementary in their aims.

It’s hopeful, for sure, but I mean don’t we need a bit of that right now?