LLCU 255: Intro to Literary Text Mining — New Syllabus 2017

Less but better. That’s the essentialist’s motto and that’s the one I use every year when I revise my syllabus. I keep removing things and students keep learning more every year. While there is clearly a ceiling for this approach, it works remarkably well as a pedagogical tactic. Here’s the full syllabus.

This year’s class will focus on three things:

  1. understanding what text mining or literary modeling is. I am always struck by how few students have ever heard of this field.
  2. being able to undertake a variety of analytical tasks, including preparing your data, significance testing, clustering, machine learning, sentiment analysis, and social network analysis.
  3. starting to generate ideas about how to apply these tools to good questions.

It’s the last one that is always the hardest. Learning how to use R may seem intimidating at first, but being good at creating creative models and measures for complex literary concepts is always the hardest part of this research.

The most rewarding part of this class is to see the mental transformation of students when the light bulb goes off — oh you mean I can test my beliefs on more than 1 text!?! That’s awesome!

Literary Text Mining Syllabus


It’s that time of year and so I’m posting my latest syllabus of my data and literature class. I have found over the years that every time I create a new class I always start with too much and gradually winnow as the years go by (until there is nothing left and I teach a new class…). This year is no different. Here are some things I’ve learned:

  • there are very few good readings on text mining for undergraduates. did you enjoy reading full-blown research articles when you were in university? every year I take more and more off because they are just too confusing.
  • apparently undergrads in the Arts hate programming. who knew?! I lose 50% of my class with the first assignment.
  • this is particularly sad because I think in the programming lies all the knowledge. Run type token ratio and see that there are just 5% unique words in Jane Austen’s Pride and Prejudice and you have gained more of an insight into the novel than all of Jameson’s work combined could every teach you.
  • teaching statistics on top of literary theory and computer programming really is the proverbial straw that breaks the camel’s back. I do it anyway.
  • if you’re going to do this anyway, then you need to go very slowly. You can review one study for a few weeks to understand the whole process from modelling to data selection to significance testing. Maybe one study for a whole semester.
  • ultimately I see myself waging guerrilla warfare — hopefully my students will circulate through humanities departments and constantly ask annoying questions in their other classes like, “How big is your sample?” or “Can you be more explicit about how you generalize from your one¬†example?” Because let’s be clear, at 99-1 we’re still the underdog…

Looking forward to another awesome semester of empowering students to be critical readers and creative analysts.