Gender Trouble: Literary Studies’ He/She Problem

Pronouns have become a hot topic of late and I thought it would be interesting to explore their use in the new JSTOR data set that I have been working on that represents 60 years of literary studies articles.

Previous work has shown how men and women use personal pronouns at differing rates (you can guess how). I wanted to see whether over the past 60 years an assumed bias towards masculine pronouns in the field might have subsided with the rise of gender studies and the entry of more women into the profession.

Unfortunately not.

Continue reading “Gender Trouble: Literary Studies’ He/She Problem”

Topic Stability, Part 2

In my previous post I tried to illustrate how different runs of the same topic modelling process can produce topics that appear to be slightly semantically different from one another. If you keep k and all other parameters constant, but change your initial seed, you’ll see the kind of variation that I showed.

The question that I want to address here is whether we can put a number to that variation, so that we can understand which topics are subject to more semantic variability than others.

I’ve gone ahead and written a script in R that calculates the average difference between a given topic and the most similar topic to it from all other runs. You can download it in GitHub.

Continue reading “Topic Stability, Part 2”

Where’s the data? Notes from an international forum on limited use text mining

I’m attending a two-day workshop on issues related to data access for text and data mining (TDM). We are 25 participants from different areas, including researchers who do TDM, librarians who oversee digital content, and content providers who package and sell data to academic libraries (principally large publishers), and finally, lawyers.

I am excited to be here because these issues strike me as both complicated and intractable. I have for several years tried to gain greater access to data in our university library with no success. I have also worked extensively with limited use data and wished I could be more open with the data. Whenever I ask how the situation can improve, a finger pointing circle begins where everyone points at someone else and nothing changes.

The overarching question that we are all implicitly asking ourselves: Will anything change after our meeting?

Here we go.

Continue reading “Where’s the data? Notes from an international forum on limited use text mining”

The Replication Crisis I: Restoring confidence in research through replication clusters

Much has been written about the so-called “replication crisis” going on across the sciences today. There are many ways that these issues impact literary and cultural studies, but not always in the most straightforward way. “Replication” has a complicated fit with more interpretive disciplines and it warrants thinking about its implications. In the next few weeks I’ll be writing some posts about this to try to generate a conversation around the place of replication in the humanities.

Continue reading “The Replication Crisis I: Restoring confidence in research through replication clusters”

An Open Letter to the MLA

Dear Prof. Taylor,

I am writing to you as a member of the MLA who has concerns about the practices and policies relating to the society’s data and its impact on research. This is an issue that effects many scholarly organizations. For this reason I have chosen to write an open letter.

The MLA has emerged as an important champion of the principles of open access scholarship. The creation of the MLA Commons represents a recent positive example of such pro-active work.

It is all the more troubling to realize that such open access does not apply to the MLA’s own data. I was recently served with a take-down notice by my university library for publicly sharing data and code used in a recent publication with the PMLA. The data was drawn from the MLA database and represented two years worth of records, one collection from 2015 and one from 1970. When I contacted the MLA to ask for the data outside of such corporate mediation I was refused. Here we have a case where data from the MLA was used to support an article published by the flagship publication of the MLA that is now being repressed from public view.

The MLA database is an essential source of knowledge about the practices within our field. As we have begun to learn, metadata alone can reveal a great deal of information about the behaviour of a community. In my own work I am interested in studying the concentration of attention surrounding literary authors, especially with respect to gender and racial diversity and how such concentration has changed (or not) over time.

Below I attach a screenshot of the licensing agreement that my university has signed with ProQuest, who distributes the data for the MLA to our university library. As you can see, principles i-k all violate essential norms of research. Not being able to mine a database (i) means that it has been walled off from standard research practices. Not being able to communicate materials received from the service (j) means that the evidentiary bases of claims using the data cannot be publicly shared or externally validated. And not being able to download parts of the service in a systematic manner (k) means that we cannot study the contents of the database in any responsible fashion. These are all principles that favour a mode of interaction with information that is both out of date and prohibitive in terms of the accepted norms of academic research today.

The MLA, and it should be added numerous other scholarly organizations, have contracted out the organization and access to their data to third parties, most of whom are private, for-profit initiatives. These parties’ business models are in direct conflict with the scholarly mission of the society, indeed any academic society. While this may have been an arrangement that was initially convenient, not to mention profitable, it is no longer an acceptable way of curating data within an academic context. Libraries need to stop signing license agreements that limit access to data in the library. And scholarly organizations need to stop signing license agreements that limit access and the public circulation of their data. Anything short represents a serious abrogation of scholarly responsibility.

I would be happy to work with you to craft data policies that are more in line with the values and norms of scholarship. The MLA has an opportunity once again to take the lead in this important matter.


Andrew Piper

The Legibility Project: Reversing the dark economy of academic labor

Here is an example of the kind of registry I am thinking of, using my own activity as a starting point.

On-going duties include: Undergraduate Advisor European Studies Minor, Editor Cultural Analytics, Board Member Centre for Social and Cultural Data Science

Table of Review Commitments since September 2017.
ActivityRequest DateDue DateAccepted/DeniedNameInstitution
Grant Proposal12/28/2017Denied############
Faculty Recommendation12/19/201701/10/2018Accepted############
Book MS12/13/2017Denied############
Grant Proposal12/13/201701/30/2018Accepted############
Faculty Recommendation12/07/201712/18/2018Accepted############
Book MS11/27/2017Denied############
Book MS10/09/201701/20/2018Accepted############
Grant Committee09/22/201711/27/2017Accepted############
Faculty Recommendation09/01/201710/15/2017Accepted############
University Committee01/01/201701/01/2018Accepted############

Over the years I have become aware that a significant portion of my time is spent on tasks for which I am not directly paid, either in the form of money or public credit, and about which no one outside of my chair or dean is aware. I am talking about work known as “peer review.” Typically we associate this term with the reviewing of scientific articles. However, the scope of “peer review” is considerably larger than that understanding implies.

Peer review can encompass:

  • Scholarly articles, the most familiar category
  • but it can also, especially in the humanities, entail reviewing entire book manuscripts. If the average scholarly article is between five and seven thousand words, then the average academic book is anywhere between ten and twenty-four times as long (not to mention time-consuming to review). Sometimes, I will be asked to review a book proposal, which can be considerably shorter, between 20 and 75 pages.
  • Promotion dossiers, either for tenure or full professor. These include publications produced over the course of a career. If someone has published several books and dozens of articles, then the time commitment is now potentially 100x the extent of reviewing a single academic article.
  • Faculty recommendation letters. These entail knowledge of their entire scholarly output, which in some cases may be more than promotion to full professor if they have already been a full professor for a while.
  • Grant or prize committees. A book prize committee can mean reading upwards of 100 scholarly monographs (i.e. the equivalent of 2,400 academic articles), while a grant committee can mean reading either a single proposal (about as long as an article and just as dense) or adjudicating up to 20-30 proposals at a time.

I should note that I have not included writing letters of recommendation for undergraduate and graduate students because I consider those to be part of my teaching and supervision, for which I am directly paid. Nor have I included my own research writing, again because my assumption is that I am directly paid for this work.

These tasks will be familiar to anyone in the profession. They are almost entirely unknown to those outside of it.

Some might say, ah, come on, that’s part of your job, too. You may not have known about it when you started out, but in addition to teaching, advising, mentoring, researching, writing, and sitting on committees, it was implied that you would also be doing a lot of reviewing of other people’s work. After all, how do you think your articles and books get published? Someone’s got to do it.

Absolutely. But the bigger issue for me is that these activities are almost all confidentially recorded, which means that no one knows you are doing them, except either your chair or dean to whom you might report your yearly activities, or the individual parties that made the request. That’s why you never anticipate doing this work because you don’t see your advisors doing it when you’re training in grad school. It just suddenly appears — and keeps on appearing. I am not opposed to the work. I am opposed to the way we hide it.

Why does this lack of transparency matter?

I think for two reasons. First, it means there is all this work going on — work which has serious consequences in the lives of real people — which is totally inscrutable. How many books or articles were or were not published last year because I, or someone else, reviewed them for a press or journal editor? What kind of biases do I bring to my judgments and do we have any way of assessing that? What individuals, and now more importantly, what social networks are making things happen in the field?

All of these questions are currently unanswerable because of this dark economy of labor. While we have a tremendous amount of freedom in the classroom, I still have to make course proposals, still have to get approvals for new classes, still have to have my performance evaluated by my students, etc. There is an important degree of accountability for what and how I teach. That is totally missing from peer review.

The second reason this matters is purely practical. I am totally exhausted by these requests. If I said yes to every request I would have no time for anything else. It literally could be a reasonable full time job to adjudicate everything people have asked me to read. Period. No teaching, no research, no writing (other than “reports”), no recommendation letters for students, no advocacy on campus for things I believe in, no advising duties. Just “peer review.”

So inevitably I say no, and then yes, and then no, and maybe a few more no’s with some guilt-ridden yes’s dropped in for good measure. I try to create some rationale, but really it’s random. That’s not a good way to make decisions, it’s not a good way for me to apportion my work time, and it’s not a good way for the field to be relying on people.

I also don’t think I’m special. My working assumption is that many, many in the field experience the same thing. I hear this anecdotally all the time when it becomes my turn to ask people to review something for the journal I edit. But it’s hard to know because everything is so invisible. And as the tenure labor market continues to shrink, the problem will only worsen, as fewer people are called upon to do more and more things.

So here is what I suggest: We need a peer-review registry.

We need a place where this work is recorded and made visible. But it’s confidential you’ll say! It’s a fair concern. But we can create a registry that contains minimum information for public consumption, and then contains confidential information for auditing purposes. For example, I can list that I am doing “promotion review” right now. You don’t need to know whom I’m doing it for. But it is important for people to be aware of who is doing this kind of work. Who are the gatekeepers? I can guarantee you will start to see biases and unintended networks appear. It will also help me in my decision making to be able to say to a requestor, look, I’m doing 6 different reviews right now, I really can’t say yes. Many of the reasons why we say yes is we are trying to maintain social bonds. No communicates a lot of negative will. No + I have a very good reason is very different. Right now, it’s hard to know if someone is just dodging work or is legitimately swamped.

But we also need a confidential section for auditing purposes. If all an academic has to do to look busy is check a box, s/he will. We need some way of validating that the public-facing representation is accurate. And we also need some way of further delving into the data. The point wouldn’t be to disclose embarrassing information — did you know that Prof. X was the reviewer for these 20 articles! — but to work with stakeholders to help them understand where problems might lie. We’re seeing a strong network effect here around this group of people, perhaps Editor Y you might consider expanding your pool a bit. Or Grant Agency Z you have traditionally been relying on reviewers with these gender/institutional/ethnic/disciplinary backgrounds. You might want to take steps to address that.