In their simplest form, computers work in binary. There are ones, and there are zeros and all the rest is context and combinations building more and more complex functions off of that binary. So it is maybe unsurprising that at .txtLAB, when we are dealing with complex entities like characters in a novel, we want to boil them down into binaries, too. It’s easy to analyze. And while I love the statistical and computational elegance of this sort of reductiveness, I worry about its implications.

The gender binary is perhaps the most common one of these simplifications we fall into in our questions and models. Do women and men write differently? Are gendered pronouns or aliases positioned and patterned differently in literary texts? How do protagonists who are women function differently in their character network than protagonists who are men? These are research questions we can ask to understand how women and men produce, and are produced within, cultural objects. But in doing so, this research falls into two, potentially harmful traps. First, we buy into the gender binary, and second, we assume all bodies that fall within the category of “woman” or “man” experience that categorization in identical ways.

To the former, gender is not binary. Our models that measure gender do not account for trans/non-binary folks. This is a problem of data and representation. In terms of data, we simply do not have enough bodies in our corpus that we could classify as either not a woman or not a man to do rigorous statistical analysis on. It is not because we do not want to ask these difficult questions on our own methods; we want to challenge the binary as much as we’ve challenged the patriarchy (see:, but we do not have enough data to do it. It’s a condition of the larger culture that we are trying to analyze. Why are there so few trans/non-binary authors and characters in our data set? Systemic oppression and invisibility of these bodies is part of this issue. We can not study a facet of a culture object that does not exist en masse. To challenge the binary computationally, we must first support the elevation of these voices, culturally.

To the latter question of individual experiences within the binary, we are faced with another set of reductions. When we measure men and compare them to women, we are not taking into account any of the intersecting identities any individual within a category may carry. Issues like class, race, sexuality, and ability are critical controls that nuance the way gender oppression operates. Upper- and middle-class cis white women, for example, hold immense privileges that other women do not. We know this, and we have, in some of our research, worked to analyze how these multiple identities interact (forthcoming research, “Racial Lines”). But more needs to be done to articulate these intersectional identities. We’re trying to find ways to evolve our tools to get at both the issues within the binaries, and of the binaries, themselves.

In the absence of both representation of non-binary folks in our data set, and of computational methods able to parse out differences within the binaries we use, should we still do this kind of gender research? It’s a question without an easy answer. Because using our current methods, we consistently find troubling patterns of the overrepresentation of men over women, which itself needs dismantling, too. So how do we reconcile the benefits of continuing to use the gender binary to measure these biases, with the cost of normalizing “men” and “women” as unique and uniform categories?