Detecting gender in 26,000 literary characters

Detecting gender in 26,000 literary characters

Eve Kraicer and I have co-authored a new piece in The Journal of Cultural Analytics. It’s called “Social Characters” and looks at the distribution of gender among characters in a collection of ca. 1,300 contemporary novels. Where previous work has emphasized the growing gender parity with respect to authorship in the publishing industry, we wanted to test the composition of fictional characters. When we make people up, do we do so with any kind of regularity when it comes to their assessed gender?

A few things that I think are worth highlighting from the piece:

  • the distribution of gender across all characters is roughly 60:40, approximating the rule of 2-men-for-every-woman in the cultural sphere.
  • while women authors write more women characters, in general they still write more men than women.
  • the prevalence of seeing three women interacting with each other in a transitive relationship (A is connected to B who is connected to C who is connected to A) is incredibly low. About 18% of all transitive relationships in novels (N=~50,000) are women only.
  • we are the first (to my knowledge) to use Rogan and Gladen’s method for accounting for errors in detection in our analysis (i.e. we adjust our estimates of women characters based on the estimated levels of error in their detection). We see this as an important dimension to add to future cultural analysis.
  • we use the network measure of assortativity as a tool to study the “heteronormativity” of novels, i.e. the likelihood that men are paired with women (and vice versa) compared with same-sex pairings. When we do so we see how mixed-gender interactions significantly outweigh what we would expect from a random pairing model