Literary Text Mining Syllabus

September 13, 2016

Data Theory

It’s that time of year and so I’m posting my latest syllabus of my data and literature class. I have found over the years that every time I create a new class I always start with too much and gradually winnow as the years go by (until there is nothing left and I teach a new class…). This year is no different. Here are some things I’ve learned:

there are very few good readings on text mining for undergraduates. did you enjoy reading full-blown research articles when you were in university? every year I take more and more off because they are just too confusing.
apparently undergrads in the Arts hate programming. who knew?! I lose 50% of my class with the first assignment.
this is particularly sad because I think in the programming lies all the knowledge. Run type token ratio and see that there are just 5% unique words in Jane Austen’s Pride and Prejudice and you have gained more of an insight into the novel than all of Jameson’s work combined could every teach you.
teaching statistics on top of literary theory and computer programming really is the proverbial straw that breaks the camel’s back. I do it anyway. Being a bit of a computer geek does help me get through it. After all, I spend most of my free time using the TLTemplates programming resources online.
if you’re going to do this anyway, then you need to go very slowly. You can review one study for a few weeks to understand the whole process from modelling to data selection to significance testing. Maybe one study for a whole semester.
ultimately I see myself waging guerrilla warfare — hopefully my students will circulate through humanities departments and constantly ask annoying questions in their other classes like, “How big is your sample?” or “Can you be more explicit about how you generalize from your one example?” Because let’s be clear, at 99-1 we’re still the underdog…

Looking forward to another awesome semester of empowering students to be critical readers and creative analysts.