I am pleased to announce the publication of a new piece I have written that appears today in CA: Journal of Cultural Analytics. The aim of the piece is to take a first look at the ways in which fictional language distinguishes itself from non-fiction using computational approaches. When authors set out to write an imaginary narrative as opposed to an ostensibly “true” one, what kinds of language do they use to signal such fictionality? One of the interesting findings that the piece offers is the way such signalling has remained remarkably constant for the past two centuries. Using a classification algorithm trained on nineteenth-century fiction, we can still predict contemporary fiction with above 91% accuracy (down from about 95% when tested against data from its own time period). These results hold across at least one other European language (German). In the future I hope to be able to test more languages to better understand just how constant such fictional discourse can be said to be.

In addition to seeing the constancy of these features across time and languages, the piece also highlights the specific nature of those features. As I argue in the piece, fictional language distinguishes itself most strongly by an attention to a phenomenological investment: an attention to a language of sensing and perceiving embodied individuals. It is this heightened focus on sense perception — the world’s feltness — that makes fiction stand out as a genre. When we look at the ways novels in particular distinguish themselves from other kinds of fictional texts, we see a very interesting case of a language of “doubt” and “prevarication” emerge, suggesting that the novel does not put us into the world in a fundamentally realist way, but inserts people into the world in a skeptical, testing, hypothetical relationship to the world around them.

This piece is part of a nascent project to use computation to better understand creative human practices. The aim is not to replace human judgments about literary meaning or quality, but to make more transparent the semantic profiles of different types of cultural practices. Computation can be a useful tool in showing us how different cultures use different kinds of writing to convey meaning to readers over time. It helps us transcend the impressionistic ideas we develop when we read a smaller sample of novels or stories and test the extent to which these beliefs hold across much broad collections of writing.

While the original text data could not be shared in this project, all derived data has been shared as part of the article. One of the advantages of using non-word-based feature sets as I do in the piece is that that derived data can then be freely shared.