Below is a list of data sets that are made available by .txtLAB.


A collection of 450 novels in German, French, and English that span 1770 to 1930. Each language is represented by 150 novels with a roughly even distribution across time, length, and gender. The data can be downloaded here. And the metadata is here. Please cite:

Piper, Andrew (2016): txtlab Multilingual Novels. figshare.

Contemporary Novels

A collection of 1,211 novels published between 2000-2015. They are categorized by the following 6 groups: Bestsellers (BS), Prizewinners (PW), Novels reviewed in the New York Times (NYT), Mysteries (MYST), Romances (ROM), and Science Fiction (SCIFI). Metadata is available here.

Please cite: Andrew Piper and Eva Portelance, “How Cultural Capital Works: Prizewinning Novels, Bestsellers, and the Time of Reading,” Post45 (2016).

Academic Publishing I: Prestige Data

Metadata on institutional affiliation for 5,000+ academic articles published in four prestige journals within the humanities (PMLA, Critical Inquiry, New Literary History, Representations). Included in the metadata are the author’s institutional affiliation at time of publication, the author’s PhD institution, and the author’s gender. 3,500 authors are represented from close to 350 PhD-granting institutions and 725 authorial institutions. We also include supplementary data on gender and publication on another ~3,800 articles published since 2010 in 16 further journals. The data is available here.

Please cite: Chad Wellmon and Andrew Piper, “Publication, Power and Patronage: On Inequality and Academic Publishing,” Critical Inquiry (July 2017):

Academic Publishing II:  MLA Author Data

This data represents 1,937 and 6,252 and bibliographic records in the field of literary studies of articles published in 1970 and 2015 respectively. The data was downloaded from the MLA database using the ProQuest interface in January 2017. The full data set can be accessed here.

Please cite: Andrew Piper, “Think Small: On Literary Modeling.” PMLA 132.3 (2017): 651-658.

LIWC for Literature

LIWC Tables for 25,000+ documents, consisting of both fiction and non-fiction texts drawn from different periods (the nineteenth century canon, Hathi Trust nineteenth-century documents, the twentieth century repositories of Gutenberg and Amazon, and multiple contemporary literary genres from mysteries to prizewinners) as well as two separate languages (German and English). The data is available here.

Please cite: Andrew Piper, “Fictionality,” Cultural Analytics (December 2016): DOI: 10.22148/16.011