Gender Trouble: Literary Studies’ He/She Problem

Pronouns have become a hot topic of late and I thought it would be interesting to explore their use in the new JSTOR data set that I have been working on that represents 60 years of literary studies articles.

Previous work has shown how men and women use personal pronouns at differing rates (you can guess how). I wanted to see whether over the past 60 years an assumed bias towards masculine pronouns in the field might have subsided with the rise of gender studies and the entry of more women into the profession.

Unfortunately not.

Continue reading “Gender Trouble: Literary Studies’ He/She Problem”

Topic Stability, Part 2

In my previous post I tried to illustrate how different runs of the same topic modelling process can produce topics that appear to be slightly semantically different from one another. If you keep k and all other parameters constant, but change your initial seed, you’ll see the kind of variation that I showed.

The question that I want to address here is whether we can put a number to that variation, so that we can understand which topics are subject to more semantic variability than others.

I’ve gone ahead and written a script in R that calculates the average difference between a given topic and the most similar topic to it from all other runs. You can download it in GitHub.

Continue reading “Topic Stability, Part 2”

Where’s the data? Notes from an international forum on limited use text mining

I’m attending a two-day workshop on issues related to data access for text and data mining (TDM). We are 25 participants from different areas, including researchers who do TDM, librarians who oversee digital content, and content providers who package and sell data to academic libraries (principally large publishers), and finally, lawyers.

I am excited to be here because these issues strike me as both complicated and intractable. I have for several years tried to gain greater access to data in our university library with no success. I have also worked extensively with limited use data and wished I could be more open with the data. Whenever I ask how the situation can improve, a finger pointing circle begins where everyone points at someone else and nothing changes.

The overarching question that we are all implicitly asking ourselves: Will anything change after our meeting?

Here we go.

Continue reading “Where’s the data? Notes from an international forum on limited use text mining”

z, p, t, d, and counting

I made the a list the other day of all of the letters, names, and new terms I have had to learn to undertake the computational study of literature and culture. It was very long. It made me realize that when researchers speak of the “bilingualism” of interdisciplinary work, that we should take this idea very literally. I feel like I’m learning German all over again. It started as a novelty (Ich is so funny sounding!), then a frustration (I have no idea what you’re saying), and then magically you could do something with it (ich hätte gern ein Bier). And then you waited, and waited, and waited until you stopped noticing you were thinking in this other thing.

Continue reading “z, p, t, d, and counting”

The Replication Crisis I: Restoring confidence in research through replication clusters

Much has been written about the so-called “replication crisis” going on across the sciences today. There are many ways that these issues impact literary and cultural studies, but not always in the most straightforward way. “Replication” has a complicated fit with more interpretive disciplines and it warrants thinking about its implications. In the next few weeks I’ll be writing some posts about this to try to generate a conversation around the place of replication in the humanities.

Continue reading “The Replication Crisis I: Restoring confidence in research through replication clusters”