Below is a list of data sets that are made available by .txtLAB.
A collection of 450 novels in German, French, and English that span 1770 to 1930. Each language is represented by 150 novels with a roughly even distribution across time, length, and gender. The data can be downloaded here. And the metadata is here. Please cite:
Piper, Andrew (2016): txtlab Multilingual Novels. figshare.
A collection of 1,211 novels published between 2000-2015. They are categorized by the following 6 groups: Bestsellers (BS), Prizewinners (PW), Novels reviewed in the New York Times (NYT), Mysteries (MYST), Romances (ROM), and Science Fiction (SCIFI). Metadata is available here.
Metadata on institutional affiliation for 5,664 academic articles published in four prestige journals within the humanities (PMLA, Critical Inquiry, New Literary History, Representations). Included in the metadata are the author’s institutional affiliation at time of publication, the author’s PhD institution, and the author’s gender. 3,547 authors are represented from 344 PhD-granting institutions and 721 authorial institutions. We also include supplementary data on gender and publication on another 3,845 articles published since 2010 in 16 further journals. The data is available here.
LIWC for Literature
LIWC Tables for 25,000+ documents, consisting of both fiction and non-fiction texts drawn from different periods (the nineteenth century canon, Hathi Trust nineteenth-century documents, the twentieth century repositories of Gutenberg and Amazon, and multiple contemporary literary genres from mysteries to prizewinners) as well as two separate languages (German and English). The data is available here.
Please cite: Andrew Piper, “Fictionality,” CA: Journal of Cultural Analytics (December 2016): http://culturalanalytics.org/2016/12/fictionality/.