One of my New Year’s Resolutions was to get stuff off my computer and onto GitHub. I know I’m late to the party. But better late than never.
GitHub is an amazing resource where researchers all over the world share code. When we talk about the “commons” or “public humanities,” GitHub needs to be a big part of that conversation (not the first to say this, just underscoring here).
So I’ve taken the plunge and am now regularly transitioning and updating the code I use on a regular basis for teaching and research onto GitHub. For newbies like me, I totally recommend the GitHub Desktop app. It makes life really really simple.
While my site is light years away from resources like Melanie Walsh’s Intro to Cultural Analytics or David Bamman’s bookNLP, it offers another area where you can learn to do text mining from scratch in this case using R.
So what you can find there is, first, material for my course on literary text mining. Step by step code for moving from ingesting large amounts of text files to creating document term matrices to studying them using machine learning, clustering, topic modeling, sentiment analysis and more. One day I hope to have videos to go with them. While that’s a long way off I’ll definitely try to transition to R markdown soon, which is easier to interact with.
You can also find much more streamlined code in a repository for solving individual tasks, whether its making a document term matrix, using word2vec, or generating sentiment arcs for plots of stories. Less commented but more efficient if you already know what you’re doing.
The work of getting more and more students and researchers fluent in computational text analysis is on-going and urgent. This needs to happen and it needs to happen at much larger scale than is currently the case. GitHub can be a great way of helping with that on-boarding process and increasing that fluency in the community. I’m not the first to suggest this but I’m feeling good about getting into it.
Hopefully you’ll come check out what’s there and make suggestions on what to add or improve. I can always use the help (who can’t?).