Why are non-data driven representations of data-driven research in the humanities so bad?

September 17, 2017

Data Theory

One of the more frustrating aspects of working in data-driven research today is the representation of such research by people who do not use data. Why? Because it is not subject to the same rules of evidence. If you don’t like data, it turns out you can say whatever you want about people who do use data.

Take for example this sentence, from a recent special issue in Genre:

At the heart of much data-driven literary criticism lies the hope that the voice of data, speaking through its visualizations, can break the hermeneutic circle.

Where is the evidence for this claim? If you’re wondering who has been cited so far in the piece you can guess it’s Moretti. That’s it. Does it matter that others have made the exact opposite claim? For example, in this piece:

In particular, I want us to see the necessary integration of qualitative and quantitative reasoning, which, as I will try to show, has a fundamentally circular and therefore hermeneutic nature.

But does a single piece of counter-evidence really matter? Wouldn’t the responsible thing be to try to account for some summary judgment of all “data-driven literary criticism” and its relationship to interpretive practices?

To be concerned about the hegemony of data and data science today is absolutely reasonable and warranted. Data-driven research has a powerful multiplier effect in its ability to be covered by the press and circulate as social certainty. Projects like “Calling Bullshit” by Carl Bergstrom and Javin West are all the more urgent for this reason.

But there is another dimension of calling bullshit that we shouldn’t overlook. It’s when people invent statements to confirm their prior belief systems. To suggest that data is omnipotent in its ability to shape public opinion misses one of the great tragedies of facticity of our time: climate collapse (a phrase I prefer to climate “change” which is too wishy washy a word for where we’re headed — “change is good!”).

In other words, calling bullshit is a multidimensional problem. It’s not just about data certainty. Its also about certainty in the absence of data. Its about rhetorical tactics that are used to represent phenomena without adequate evidence, something that happens all too often in the humanities these days when it comes to understanding things as disparate as the novel or our own discipline.

As authors, journal editors, peer-reviewers, researchers and teachers we need to wake up to this problem and stop allowing it to pass with a mild nod of the head. We need to start asking that hard question: Where’s your evidence for that?