Hathi1M: Introducing a Million Page Historical Prose Dataset in English from the Hathi Trust

Hathi1M: Introducing a Million Page Historical Prose Dataset in English from the Hathi Trust

Really pleased to announce the release of a new data set that I’ve been working on with my collaborator Sunyam Bagga. In it we build on the prior work of Ted Underwood and his team to develop parallel corpora of fiction and non-fiction writing over