1000 Words

Lab member Fedor Karmanov has created a beautiful new project that combines machine vision, machine learning, and poetry. It is called “1,000 Words,” and takes the self-portraits of Van Gogh and generates poems based on the colours and items in the portrait. The poems consist of 10 lines randomly drawn from an archive of about 70,000 twentieth-century poems.

As we ask in the introduction to the piece: “Would this process of machines learning to see also help us as human beings see differently, and think about seeing differently?” It’s an example of the more creative use of algorithms, something that is equally important to our lab. While we often focus on the analytical functions of algorithms (identifying things like gender bias in book reviews, prestige bias in academic publishing, or nostalgia bias in prizewinning literature), it is important to think about the ways in which machines change our understanding of language, and, in this case, vision. These kinds of projects can tap into a the chance encounter with words, but also the curiosity of how machines focus on an image.

It is all part of a much bigger effort to better understand how we think with machines, rather than have them think for us.

When innovation isn’t

Having moved through two models of poetic careers — the compaction of Whitman and the expansion of Goethe — I thought I had found a third model in the case of William Wordsworth. Using the same measures as before, I found that it was Wordsworth’s middle period that registered as the most “innovative” or experimental. We can imagine how in this sense a poet grows into greater degrees of diverse writing styles — a sense of “maturity” — which gradually then compress as the poet ages. There is a strong biological model of rise and fall at work behind this theory.

For the purposes of this project I am understanding innovation primarily as a sense of linguistic diversity — the more a poet experiments with different vocabularies the more a particular period in his or her life can be thought of as a period of experimentation or innovation. That’s clearly just one way of thinking about periods and innovation of course, and I’m open to many more suggestions of how to think about this.

But what interested me about Wordsworth was the way the middle period appears at first glance to fit this model and then doesn’t. Here are the same measures used for each of the three periods. (Period 1 = 1787-1815; Period 2 = 1816-1832; Period 3 = 1833-1851). As you can see, the average lexical distance between works increases in the middle period, as does the overall diameter of the network, just as the transitivity (the number of closed loops) decreases.

WordsworthMeasures

 

There’s obviously lots to quibble with in terms of how to mark out the periods of Wordsworth’s writing, but I’ll leave that aside for now (suggestions welcome). But for Wordsworth scholars to say that the years 1816-1832 were a period marked by a great deal of innovation will seem deeply counter-intuitive. This is after all the era in his writing most defined by the ecclesiastical sonnets. Surely a turn to religion, and its formulaic writing, can’t be seen as innovative (I’m parodying here). If we plot those distances, however, we begin to see a different story.

Wordsworth_Plots_DistanceBetweenWorks_ByPeriodWhat you can see happening in the middle period is the way the average lexical distance between works increases, but the variance shrinks (the standard deviation score that we saw above decreases in the middle period). That big clump in the middle where the curve rises are the ecclesiastical sonnets. So there is in fact a great deal of formal consolidation at work, but one that looks on average to be greater than the rest of the corpus.

It’s not that we should ignore the fact of an overall lexical heterogeneity, but that we need to see it in a particular light. There is a greater average difference between poems but also a greater degree of homogeneity to that difference.

I’m not sure if innovative or experimental would be useful terms anymore here, but it does tell us something about the poet’s career and the different shape periods can take.

In my next post I’ll try a different method using community detection to model this idea of linguistic coherence as a marker of poetic period.

Blue Periods: On Aging and Writing

As a follow-up to my last post on Whitman, I wanted to explore more examples of how writing develops over the course of a poet’s career. As I wrote there, I’m interested in using network theory to better understand how a poet’s career might have a particular shape or orientation, indeed, how one might visualize the “career” itself. What are the connections between the highly local creative process of writing poems and a larger sense of the whole, of how one’s writing and a sense of one’s life line up?

If Whitman gives us one idea to work with — that the arc of the poetic corpus is towards consolidation, consensus, and compaction — I want to explore two other writers here, the Romantic poets J.W. Goethe and William Wordsworth. The point is not simply to find different types of careers, but also to understand how the patterns and groupings within a corpus work over time. To borrow Ted Underwood’s expression, not just how literary periods matter, but how periods within a person’s life matter, too.

The first example is the career of J.W. Goethe. Below you can see network graphs of his published poetry (I’ll return to the distinction about publication later). There is the full graph followed by his poetry broken down into the traditional three-part period scheme usually used to understand his career (early, middle, late). In addition, I’m including the measures used to calculate differences between the different period networks.

Goethe's published poems color-coded by genre.
Goethe’s published poems color-coded by genre.
Goethe’s Early Period. Notice the strong affiliation with the bright green nodes, which are “Lieder.” The large purple node (his poem on the opening of the Ilmenau mine) indicates the beginnings of what will be a new period and cluster.
Goethe’s middle or “classical” period. This period is dominated by the Roman Elegies and the Venetian epigrams. Notice the very strong grouping of the poetry from this period in the lower left.
Goethe’s late period. It is longer and larger than the others, but also marked by a much higher degree of generic heterogeneity (see the measures below). The oriential translations from the West-East Divan are the blue nodes in the lower left corner that mark a group unto themselves.

 

The two salient points that these graphs illustrate are the formal expansions of the late period – the generic entropy, the greater irregularity of distances between works, the increasing diameter – but also just how compact the so-called classical or middle period is. There is a tremendous formal consolidation that occurs during this phase of Goethe’s career that makes the late opening out all the more emphatic. The number of closed loops decrease by close to 30%, the variance of distances between works increases by 50%, the avg distance doubles, and the diameter is 2.5 times as wide. (It should be added that the avg. distance that matches the early period is very significant given the greater number of poems and thus the greater linguistic horizon that the late poems can connect with. The similarity of distance given the overall greater availability of language suggests a profound commitment to diversity.)

These points get even more interesting if we explore the periods as a longer horizon of time. What you see below is a graph of the distances in vocabulary between the poems for every connected pair of poems in the network. On one level, it reiterates the insight about that classical consolidation in the middle of Goethe’s career, the point at which we might say in more colloquial terms he has found his “voice.” (It’s the sag around poem 200.)

GoethePoetry_Plot_DistanceBetweenWorks

But actually, what seems most interesting to me about this graph are those spikes before the dips. In fact they often coincide with the *beginnings* of the different periods in Goethe’s life. If we plot it by differentiating between the three periods, we can see how each period is marked by an opening spike followed by a dip. In other words, one way to think about “periods” in a poet’s work are as moments of increased experimentation followed-by lexical consolidation. This might give us one way of thinking about what we mean by “period” — not as something internally coherent, but quite the opposite, as that which is marked by a distinct rise and fall of dissimilarity. The truly interesting insight here is the way it appears that towards the end of Goethe’s life he was working his way towards a new period as we see that distinct uptick of vocabulary change when he is in his late 70s. Faustian striving indeed.

The lexical difference between connected poems across Goethe's corpus, color-coded by period.
The lexical difference between connected poems across Goethe’s corpus, color-coded by period.

 

So what we have so far are two different examples of how poet’s imagined their careers – Whitman moving ever more in the direction of closure and compaction, Goethe moving in the direction of expansion, one spiraling out, the other in. Wordsworth will give us a third model to work with, but for that I’ll create a separate post.

On Biopoetics: The Evolution of Whitman’s Leaves of Grass

What if the Leaves of Grass was instead called Webs of Grass?

I’ve created a series of network graphs that represent the multiple editions of Whitman’s Leaves of Grass, in which pages are represented as nodes and the edges represent the lexical similarities between them. The networks are drawn in an “evolutionary” way, so for each page that enters, the program measures the distance of the new page to all prior ones based on its vocabulary. It then selects the top 6 closest pages and continues on through the whole work. So what you see is an evolving web of relations between the pages of Whitman’s Leaves.

This project is part of a larger investigation into the relationship between human aging and poetic expression. Instead of the strong determinism of a lot of recent writing in literary evolution or biopoetics, I’m interested in seeing how writers’ work develops over time and whether there are consistencies between that development. Is there something common about the “career” or the “corpus” when taken as a whole or do we find that career’s are more culturally conditioned, or more expressive of the writer’s personal poetic aims? When I was in graduate school I was trained to studiously avoid biographical criticism, for reasons that are in retrospect not entirely clear to me. It’s telling, in a sign of the times, that one of the greatest proponents of this doxa, Stephen Greenblatt, is finishing his career by writing biographies of poets. We know for example from the work of Ian Lancashire that large-scale linguistic patterns can indicate the onset of mental illness in writers. As he has shown, we can identify with a great deal of precision when alzheimer’s emerges in writers, most often well before they are officially diagnosed. But aside from these more dramatic physiological states reflected in language, I’m interested in the extent to which there might be associations between how we write and how we age.

Whitman_1855_Evol_Page
Leaves of Grass (1855)
Leaves of Grass (1860)
Leaves of Grass (1860)
Leaves of Grass (1891)
Leaves of Grass (1891). Light to dark green are the earlier pages, pinks the next section, blues and purples the concluding section with the darkest nodes the final pages.

Network graphs can be hard to interpret visually, so in order to better understand these networks, I have initially created five different measures: the average lexical distance between pages, the overall diameter of the graph (the distance between the two furthest nodes), the average page range between any two connected nodes in the network, the robustness of each graph, which measures how many random nodes need to be removed before it breaks in half, i.e. how fragile it is, and finally transitivity, which measures the number of closed loops or triangles in the graph as a percentage of all connections. The higher the number the more closed loops you have, the lower the number the more openness you have.

Slide1

 

Slide2What is interesting about these measures is the way they begin to tell a particular story about The Leaves of Grass as an evolving poetic entity.  That story is, I would suggest, one of compaction and consolidation. Over time, The Leaves of Grass becomes less linguistically heterogeneous, curling in on itself, as the pages begin to connect to pages closer themselves, as the lexical differences between connected pages decrease, as the overall range of the work shrinks, as the network gets more robust and thus tighter, and finally as the number of closed loops grows. There is a sense of closure to the poetic process of rewriting, which you can see in this next graph that captures the lexical distance between connected pages in the network of the final edition. There is a downward trend of difference within the final work, as the work itself recapitulates this larger process of consensus and closure.

Whitman_Plots_DistanceBetweenPagesGGPlot

If we compare those 1860 and 1891 networks above to each other, we some confirmation of this in so far as there is something like a paginal arc evolving. Where in the 1860 edition we have what appear to me to be discrete blocks or zones, in the 1891 edition, we move from the light green in the lower left, to the pinks in the upper left to the blues in the right and then the very darkest nodes, which are the final pages of the work entering back into the heart of the graph, so that we could speak of the final edition of The Leaves of Grass as poetic spiral. Incidentally this is the Ur-Symbol of all plant life for Whitman’s predecessor Johann Wolfgang von Goethe.

Goethe, Spiral

 

 

The Poetic Body

What can knowledge of the shape of a poet’s corpus tell us about that writing or indeed about the nature of the corpus as such? In this project we model different poets’ corpuses in three languages from the seventeenth to the twentieth centuries as evolutionary networks. Our aim is to better understand how a body of writing evolves and the features poetry assumes at this larger scale, from work to corpus.