Congratulations to Victoria Svaikovsky ARIA Intern for 2017

McGill hosted its annual event to showcase the work of undergraduate summer research projects. Among the many amazing projects was .txtLAB’s Victoria Svaikovsky who led a research project with two other students, Anne Meisner and Eve Kraicer-Melamed, on studying the intersections of race and Hollywood film using computational analysis.

Their project aimed to better understand the racial inequality that has long been identified in academic and popular criticisms of Hollywood. Focusing on questions of visible and audible marginalization as well as linguistic tokenization with respect to visible minorities, Svaikovsky and her team have produced an impressive study of over 800 screenplays.

Their work will be coming out as a lab white paper in the near future. Keep an eye out!

The Problem of (Gender) Binaries

In their simplest form, computers work in binary. There are ones, and there are zeros and all the rest is context and combinations building more and more complex functions off of that binary. So it is maybe unsurprising that at .txtLAB, when we are dealing with complex entities like characters in a novel, we want to boil them down into binaries, too. It’s easy to analyze. And while I love the statistical and computational elegance of this sort of reductiveness, I worry about its implications.

The gender binary is perhaps the most common one of these simplifications we fall into in our questions and models. Do women and men write differently? Are gendered pronouns or aliases positioned and patterned differently in literary texts? How do protagonists who are women function differently in their character network than protagonists who are men? These are research questions we can ask to understand how women and men produce, and are produced within, cultural objects. But in doing so, this research falls into two, potentially harmful traps. First, we buy into the gender binary, and second, we assume all bodies that fall within the category of “woman” or “man” experience that categorization in identical ways.

To the former, gender is not binary. Our models that measure gender do not account for trans/non-binary folks. This is a problem of data and representation. In terms of data, we simply do not have enough bodies in our corpus that we could classify as either not a woman or not a man to do rigorous statistical analysis on. It is not because we do not want to ask these difficult questions on our own methods; we want to challenge the binary as much as we’ve challenged the patriarchy (see:, but we do not have enough data to do it. It’s a condition of the larger culture that we are trying to analyze. Why are there so few trans/non-binary authors and characters in our data set? Systemic oppression and invisibility of these bodies is part of this issue. We can not study a facet of a culture object that does not exist en masse. To challenge the binary computationally, we must first support the elevation of these voices, culturally.

To the latter question of individual experiences within the binary, we are faced with another set of reductions. When we measure men and compare them to women, we are not taking into account any of the intersecting identities any individual within a category may carry. Issues like class, race, sexuality, and ability are critical controls that nuance the way gender oppression operates. Upper- and middle-class cis white women, for example, hold immense privileges that other women do not. We know this, and we have, in some of our research, worked to analyze how these multiple identities interact (forthcoming research, “Racial Lines”). But more needs to be done to articulate these intersectional identities. We’re trying to find ways to evolve our tools to get at both the issues within the binaries, and of the binaries, themselves.

In the absence of both representation of non-binary folks in our data set, and of computational methods able to parse out differences within the binaries we use, should we still do this kind of gender research? It’s a question without an easy answer. Because using our current methods, we consistently find troubling patterns of the overrepresentation of men over women, which itself needs dismantling, too. So how do we reconcile the benefits of continuing to use the gender binary to measure these biases, with the cost of normalizing “men” and “women” as unique and uniform categories?

On Prestige Bias in the Chronicle of Higher Ed

The Chronicle of Higher Education ran a version of our essay on the concentration of institutional prestige as its cover story this week. In it we expand our reflections about how to change the current system. The essay is based on our original piece that appeared in Critical Inquiry. Here is an excerpt from the new essay:

The current system of double-blind peer review that underlies most academic publications is essentially an invention of the second half of the 20th century. Its failings have been well documented and numerous projects in the sciences as well as the humanities are now underway to change it. Almost all of these fixes, however, continue to rely on two basic principles: First, that communities of scholars still make intuitive judgments about quality (judgments which are rarely, if ever, made explicit); and second, that they largely rely on established publishing practices that essentially transfer content from one place (the lab or the desk) to another (the library).

What we are imagining, by contrast, is a new form of algorithmic openness, in which computation is used not as an afterthought or means of searching for things that have already been selected and sorted, but instead as a form of forethought, as a means of generating more diverse ecosystems of knowledge. What values do we care about in terms of human knowledge and how can we use the tools of data science to capture and more adequately represent those values in our system of scholarly communication? Instead of subject indexes and citation rankings, imagine filtering by institutional diversity, citational novelty, matters of public concern, or any number of other priorities. How might we encode these values to create smarter, more adaptable, and more open platforms and practices?

It is clear from our study and others like it that elite institutions continue to be the locus of the practices, techniques, virtues, and values that have come to define modern academic knowledge. They diffuse it, whether in the form of academic labor (personnel) or ideas (publication), from a concentrated center to a broader periphery. Using digital technologies to guide the circulation of knowledge does not inherently make one complicit in the “neoliberalization and corporatization” of higher education or a practitioner of “weapons of math destruction,” to use the data scientist Cathy O’Neil’s well-turned phrase. Wisely and openly used, such technologies can help us not only reveal, but potentially undo, longstanding disparities of institutional concentration. It is time we built a scholarly infrastructure that is more inclusive and more responsive to a broader range of voices, including those outside of the academy.

Over the course of the 19th century, universities adopted many of the norms of print culture and in so doing transformed themselves into modern research universities. We need a similar reinvention for our own universities as they enter a new age.

Addressing epistemic inequality, and not simply publication inequities, will require us to rethink what universities do and what they are for in a digital age. “Digitization” means more than just transferring print practices to digital formats. We need to integrate data science, knowledge of our past practices, and contemporary understandings of institutional norms to reinvigorate the intellectual openness of the university. We need to use all of our analytical and interpretive capabilities to rethink who and what counts. The university is a technology. Let’s treat it like one.

The Prestige Trap

I am pleased to announce the publication of a new piece out with Chad Wellmon in Critical Inquiry entitled, “Publication, Power, and Patronage: On Inequality and Academic Publishing.” In it we discuss the concentration of a few elite institutions within a sample of four humanities journals stretching back over forty years.

Our goal is to begin to shed light on the academic publication system with a particular emphasis on questions of institutional and intellectual inequality. As other research has shown with respect to academic hiring, there is a strong bias towards a few elite institutions who exercise outsized influence not only on who gets tenure-track jobs but also in who gets published and where.

The article combines both a quantitative analysis of contemporary publishing patterns in the humanities, as well as a conceptual account of the historical relationship of publishing practices to the modern research university. The quantitative study is based on a new, hand-curated data set of 45 years of publishing in four leading humanities journals that encompasses  over 5,000 articles. At the same time, we also try to show how the contemporary norms of academic publication have a long and complex genealogy in the scholarly and institutional practices that make up the history of the university. As academics we need to better understand both the past and present of our publication system and have open conversations about what a more egalitarian and institutionally diverse intellectual system might look like. Data + historical context, we argue, are important tools to help us better imagine alternative futures.

To give you insights into the problem, we present two graphs taken from the piece. The first shows the skew towards a small number of elite institutions, whether it is in terms of PhD training or where authors are employed at the time of publication. Between 84-89% of all publications can be accounted for by less than 25% of the institutions in our data set. Indeed, 50% of all publications can be accounted for by just ten PhD-granting institutions.

Lorenz curves showing the fraction of all articles published in four humanities journals as a function of PhD and author institutions. Here we see how 25% of the institutions are producing between 84-89% of all articles.

The second figure shows the gender bias in elite publications that still persists to this day. As we show, while PMLA and Representations have made real strides towards gender parity (something we attribute to their process of blind peer review) two of our four journals have not shown a single year when female authors outnumbered male authors since their inception. In a larger sample of twenty humanities journals taken over the past five years, we found that 3/4 of them had average rates of female authors well below parity. Patronage and patrimony remain strongly linked in humanities publishing.

Percentage of female authors published per year in the four journals in our data set. In 1991, Representations became the first journal to surpass the 50% mark.


Cultural Advocacy Internship – “Gender Bias in Book Reviews”

We are excited to announce the 2017-2018 Internship in Cultural Advocacy, focusing on gender bias in book reviews. The internship will address how women are both mis-represented and under-represented in the public discourse of book reviewing. Book reviews represent a significant cultural outlet that bestows authority, but as our lab’s new website called “Just Review” shows, there are a variety of ways that women writers are still being framed as though they belong to a Victorian set of values. A team of interns will be responsible for crafting a year-long advocacy plan to address how book reviews represent women, using a combination of computational approaches, social media campaigns, and social advocacy to engage key stakeholders. We are looking for motivated, self-directed students who want to make a positive change in the world. The internship will begin on October 1, 2017, and end on April 30, 2018.

Award: $1,000
Application deadline: Wednesday, September 20, 2017
To apply, send cover letter and resumé to