Modelling Plot: On the “conversional novel”

I am pleased to announce the acceptance of a new piece that will be appearing soon in New Literary History. In it, I explore techniques for identifying narratives of conversion in the modern novel in German, French and English. A great deal of new work has been circulating recently that addresses the question of plot structures within different genres and how we might or might not be able to model these computationally. My hope is that this piece offers a compelling new way of computationally studying different plot types and understanding their meaning within different genres.

Looking over recent work, in addition to Ben Schmidt’s original post examining plot “arcs” in TV shows using PCA, there have been posts by Ted Underwood and Matthew Jockers looking at novels, as well as a new piece in LLC that tries to identify plot units in fairy tales using the tools of natural language processing (frame nets and identity extraction). In this vein, my work offers an attempt to think about a single plot “type” (narrative conversion) and its role in the development of the novel over the long nineteenth century. How might we develop models that register the novel’s relationship to the narration of profound change, and how might such narratives be indicative of readerly investment? Is there something intrinsic, I have been asking myself, to the way novels ask us to commit to them? If so, does this have something to do with larger linguistic currents within them – not just a single line, passage, or character, or even something like “style” – but the way a greater shift of language over the course of the novel can be generative of affective states such as allegiance, belief or conviction? Can linguistic change, in other words, serve as an efficacious vehicle of readerly devotion?

While the full paper is available here, I wanted to post a distilled version of what I see as its primary findings. It’s a long essay that not only tries to experiment with the project of modelling plot, but also reflects on the process of model building itself and its place within critical reading practices. In many ways, its a polemic against the unfortunate binariness that surrounds debates in our field right now (distant/close, surface/depth etc.). Instead, I want us to see how computational modelling is in many ways conversional in nature, if by that we understand it as a circular process of gradually approaching some imaginary, yet never attainable centre, one that oscillates between both quantitative and qualitative stances (distant and close practices of reading).

This diagram captures the different stages of computational reading and the different types of practices each stage entails. Traditional close reading encompasses the first stage of “belief.” Current understandings of distant reading bring us as far as “measurement.” This model advocates for the continuation of the process in an oscillatory fashion, moving back and forth between close and distant forms of reading in order to approach an imaginary conceptual center. The initial sample (here Augustine’s Confessions) is chosen and understood with reference to a larger category (here “The Novel”), as is the new sample of quantitatively significant texts derived from the model (“Sample2”). “Sample2” is also mediated by the larger sample from which it is drawn (“Whole’”, here my subset of 450 novels that are representative of “The Novel”). The process of interpreting “Sample2” is both one of validation – did the model work – and also one of refinement – in what other ways can we understand and thus measure this group of texts? The overall process is represented as a spiral that does not return to the initial sample, but gradually, though never completely, converges on an imagined generic center.
This diagram captures the different stages of computational reading and the different types of practices each stage entails. Traditional close reading encompasses the first stage of “belief.” Current understandings of distant reading bring us as far as “measurement.” This model advocates for the continuation of the process in an oscillatory fashion, moving back and forth between close and distant forms of reading in order to approach an imaginary conceptual center. The initial sample (here Augustine’s Confessions) is chosen and understood with reference to a larger category (here “The Novel”), as is the new sample of quantitatively significant texts derived from the model (“Sample2”). “Sample2” is also mediated by the larger sample from which it is drawn (“Whole’”, here my subset of 450 novels that are representative of “The Novel”). The process of interpreting “Sample2” is both one of validation – did the model work – and also one of refinement – in what other ways can we understand and thus measure this group of texts? The overall process is represented as a spiral that does not return to the initial sample, but gradually, though never completely, converges on an imagined generic center.

My approach consisted of creating measures that tried to identify the degree of lexical binariness within a given text across two different, yet related dimensions. Drawing on the archetype of narrative conversion in Augustine’s Confessions, my belief was that “conversion” was something performed through lexical change — profound personal transformation required new ways of speaking. So my first measure looks at the degree of difference between the language of the first and second halves of a text and the second looks at the relative difference within those halves to each other (how much more heterogenous one half is than another). As I show, this does very well at accounting for Augustine’s source text and it interestingly also highlights the ways in which novels appear to be far more binarily structured than autobiographies over the same time period. Counter to my initial assumptions, the ostensibly true narrative of a life exhibits a greater degree of narrative continuity than its fictional counterpart (even when we take into account factors such as point of view, length, time period, and gender).

Bivariate plot of autobiographies and novels in German according to the two conversion scores. While there is considerable overlap, in general novels pull upwards and towards the right which indicates a higher score on both measures. Here are the results of comparing the two sets: CrossHalfNovel MeanAutobiography Meanp-value (t.test) p-value (wilcoxon) 0.013847732	0.009992367		2.2e-16		2.2e-16 InHalf		Novel Mean	Autobiography Mean	p-value (t.test) p-value (wilcoxon) 		0.001141110	0.000797757		4.03e-05	0.0003017
Bivariate plot of autobiographies and novels in German according to the two conversion scores. While there is considerable overlap, in general novels pull upwards and towards the right which indicates a higher score on both measures. Here are the results of comparing the two sets:
CrossHalf
Novel Mean 0.013847732
Autobiography Mean 0.009992367
p-value (t.test) 2.2e-16
p-value (wilcoxon) 2.2e-16InHalf
Novel Mean 0.001141110
Autobiography Mean 0.000797757
p-value (t.test) 4.03e-05
p-value (wilcoxon) 0.0003017 

The second interesting finding is part of what I call the validation-discovery section of the paper. Assuming that there is a greater degree of “change” within novels, what are those novels about that exhibit abnormally high degrees of such change? Do they have anything to do with “conversion”? Below is a list of those novels that have significantly high conversion scores, which leaves us with something of an interesting literary historical riddle: what do Heidi, Lesabéndio, White Fang, Impey Barbicane, Barnabas Thayer, Poil de Carrotte, and Kafka’s K. all have in common?

Conversional_Novels_Table

As I show in the paper, what is interesting about these novels is the way each is marked by a high degree of binariness, but the way such schismatic plotting is used for different ends. In some cases, as in the Heidi novels, the point is literal conversion — the Grandfather at the novel’s end is converted to Christianity and of course the Book. In Jack London’s White Fang, conversion isn’t religious, but social in nature — coming in from the Wild. In other cases, as in science fiction novels, the point is planetary escape — to experience a highly transformative spatial rupture, one that poses fundamental communicative quandaries, as in Jules Verne’s De la Terre à la Lune. How can you communicate this new state to the planetary remainder? Finally, in the double marriage plot, we see how a strong degree of binariness is used to generate social order — the realignment of a misaligned quadrant into its respective halves. This can either take the form of resolution, as in Mary Freeman’s Pembroke, or one of constriction and loss, as in Fontane’s Irrungen, Wirrungen.

Beyond the individual cases, the main point is the way this method is linguistically neutral in its assumptions about conversion — it does not start with a given vocabulary of conversion (though I looked at that too), but rather looks only at the relationships between words within a given text. Such linguistic neutrality allows for the capturing of different kinds of conversional experience, but ones that nevertheless share certain structural properties. It allows us to see the continuities of difference that I think is computational reading’s hallmark.

As you’ll see in the piece, I offer 5 hypotheses on how to further measure this notion of conversionality in the novel. The validation of the model (close reading) is not an end in itself. Rather, it is the beginning of further testing (distant reading). By way of conclusion, I post these here:

Hypothesis 1: Conversional novels are defined by nature/culture dichotomies, where nature is a proxy for the divine. Create dictionaries based on these novels (consisting of words for “culture,” such as civilization, justice, reading etc. on the one hand, and words for “nature,” i.e. alps, trees, wilderness, etc., on the other) and measure their intensities. The stronger both vocabularies are, the more conversional a novel can be said to be.

Hypothesis 2: Conversional novels are defined by a topos of incommunicability. Create measures of phrases that articulate a communicative impasse such as: a) subjunctive phrases like “als wäre” or “as though + verb”; or b) said + negation (“said nothing,” “did not say,” “could not say” etc.). Higher conditionality and higher negativity should correlate with greater conversionality and its incommunicability.

Hypothesis 3a: Conversional novels are structured by strong binary geographies, which are marked by different ways of speaking. Using named entity recognition do we find the grouping of names into different lexical communities? The stronger the dichotomy between them (the clearer their difference), the more conversional a novel could be said to be.

Hypothesis 3b: Conversional novels are marked by an increase in polysemy over the course of the novel. Lexical reduction corresponds to semantic complexity. Could we create a measure that accounts for the semantic ambiguity of a text, the ways in which it becomes increasingly difficult to locate a word’s particular meaning? Implement a series of tools – part of speech tagging, machine translation – and see the degree to which they fail. Ambiguity should correlate with greater difficulty of automation and these values should subsequently increase as the novel progresses.

Hypothesis 4: Conversional novels are recursive. They recapitulate themselves as they progress, slowing themselves down as they expand internally. This is devotion as a form of imbrication (we cannot get out). Diegetic levels – narration within narration – should therefore increase over the course of the novel. There is also an aspect of social network analysis to this – the introduction of new characters retards rather than furthers narrative progression. Is there a correlation between the growth of characters, the growth of intra-diegetic narration, and the slowing down of plot?

Novel Conversions

 

What would it mean for a novel to turn us, even as we turn it? How are we not simply moved, but transformed – turned around, converted – through the novel’s combination of gestural and affective structures? How might we think about the correspondences between the novel’s technics and its tropes in its ability to assume meaning for us as a genre at a profound personal level?

The history of the novel, as Hans Blumenberg once remarked, has most often been read as an extended referendum on the Platonic notion that the poets lie. Major studies of the novel, from Auerbach’s mimesis to Barthes’ reality effect to contemporary interest in thing theory, reliably begin with the novel’s representation of an estranged reality as its primary generic identity. Novels are where we go to experience our alienation and therefore orientation to the social world. But rather than consider the history of the novel as an extended engagement with the problem of givenness, of das Gegebene – what Lukacs called “the immediate and unshatterable givenness of the world” – how might we construct a history of the novel’s performance of what we could call its devotionality, its Ergebenheit – the experience of giving ourselves completely and entirely over to it? What if novels are where we go to experience a sense of profound internal difference, not an estrangement from the world – a primordial Heimweh in Lukacs’ terms – but an experience of a completed identification with a world? What would such a conversional history of the novel look like?

In order to answer such questions, I have over the past year been constructing different computational models to think about the novel’s relationship to radical change, at both the narrative and semantic levels. Such thinking about the novel’s turning is of course deeply Augustinian in its lineages. The Confessions has served as a kind of foundational document in the Western tradition in establishing this conjunction of subjective transformation with the technological instrument of the codex. Indeed, this project initially took shape as an inquiry into the Augustinian legacy on modern autobiography after Rousseau. What remained, I was interested in asking, of this confessional archetype within an emerging commercial environment of life writing? And yet what I found over and over again was that according to the models I was building, Augustinian conversion lived on most strongly not in the genre of autobiography to which it belonged at a more general genealogical level, but instead in the genre of the novel. The novel, it seemed, was the genre where readers increasingly turned to experience conversion in a double sense – not just as a profound sense of change, but also as a form of devotion, as the experience of giving oneself over.

Overview

This project began with a graph.

AugustinePlot1

Here we see a plot of the thirteen books of Augustine’s Confessions in their discursive relation to each other using the method of multi-dimensional scaling, where spatial relations are based on the degree of lexical similarity between the units. There are two features that I want to draw attention to in this graph and that emerged as the basis of my subsequent models of what I am calling narrative conversionality. The first is the relative distance between the pre- and post-conversional books of the Confessions (Augustine’s conversion occurs towards the close of Book Eight). We can see how the distance between the halves, and in particular Books 10-13, is greater than most of the distances between the individual parts of each half. There is a strong dissimilarity, according to this graph, of language between pre- and post-conversional narration.

AugustinePlot3

The second feature is the relative distance between the books within the pre- and post-conversional parts. We can see how the clustering of the pre-conversional books is significantly tighter than that of the post-conversional books. There is, in other words, a strong intra-discursive difference between pre- and post-conversional narration, or said another way, the language after conversion becomes far more heterogeneous than that before. Augustine not only speaks in very different terms before and after his conversion, but he also speaks increasingly differently after his conversion. According to this graph, conversion is an entry into discursive plenitude.

In the paper version of this post, I continue to discuss the series of tests that I developed to model these two features of “conversionality”, moving from k-means, to hierarchical clustering, to the use of distance matrices to identify the greatest degree of binariness within sets of works as well as looking for the lexical presence of Augustine’s vocabulary of conversion (drawn from Book 8.12). I test these models against two primary corpuses: 150 novels drawn over a 150 year period from 1774 to 1932 and a similar collection of autobiographies with the same range.

I conclude with a discussion of examples of those novels which seem to score particularly high on my tests and ask what a realist novel of marriage, one of the most famous children’s novels of all time (Heidi), an “asteroid novel,” and Kafka’s The Castle all have in common as distinctively conversional forms.

The exciting aspect of this work for me is the way it allows for new kinds of literary groupings to take place that register levels of formal complexity that we would otherwise not be able to capture through our normal reading practices. But it is also an occasion to reflect on the question of conversional reading more generally and its technological a priori. What happens to the history of this bibliographic model of attachment — forms of devotional reading, from Augustine to Kafka — under new technological conditions? How does the conversionality of computational modeling — its reliance on principles of translation and resolution rather than readerly revolution — impact the future of readerly devotion?