How Cultural Capital Works: Prizewinning Novels, Bestsellers, and the Time of Reading

Screen Shot 2016-05-11 at 2.40.17 PM

This new essay published in Post45 is about the relationship between prizewinning novels and their economic counterparts, bestsellers. It is about the ways in which social distinction is symbolically manifested within the contemporary novel and how we read social difference through language. Not only can we observe very strong stylistic differences between bestselling and prizewinning writing, but this process of cultural distinction appears to revolve most strongly around the question of time. The high cultural work of prizewinning novels appears most strongly defined by an attention to childhood, nature, and retrospection, while the economic work of bestsellers is defined by a diligent attention to the moment. As the forthcoming work of James English has shown, it is these temporal frameworks, or what Bakhtin might have called “chronotopes,” that emerge as some of the more meaningful ways to distinguish the work of cultural capital from that of economic capital.

The approach we use draws on the emerging field of textual analytics within the framework of Bourdieu’s theory of the literary field. Our interest lies in exploring a larger population of works, but also the ways in which groups of works help to mutually define one another through their differences. As Bourdieu writes, “Only at the level of the field of positions is it possible to grasp both the generic interests associated with the fact of taking part in the game and the specific interests attached to different positions.” We wanted to test the extent to which “bestsellers” and “prizewinners” cohere as categories and how this coherence may be based on meaningful, and meaningfully distinguishing, textual features.

Our aim is to help us see the values and ideological investments that accrue around different cultural categories  the act of “position taking,” in Bourdieu’s words  and the ways in which these horizons of expectation help maintain positions of power and social hierarchy. Our project is ultimately about asking how forms of social and symbolic distinction correspond and how that knowledge may be used to critique normative assumptions about what counts as significant within the literary field. The high cultural investment in retrospection that we see on display should not be taken as a default but can also be seen in a critical light — as privileging a more regressive, backward looking narrative mode.

For the full article, you can go here.

CBC interview on using algorithms to predict prizewinners and bestsellers

This past weekend I participated in an interview with Jeanette Kelly on the CBC to discuss our new work on using computers to predict bestsellers and prizewinning novels. In it I discuss the Devoir challenge in which local Quebec writers try to impersonate a bestseller using our data and our successful attempt at predicting this year’s Giller Prize winner that was foiled by my misjudgment of committee behaviour.

Here is a link to the interview.

Interview with BookNet Canada on algorithms, publishing and creative writing

I recently did a podcast with the BookNet group in Canada that focuses on the intersection of technology and books. They were interested in our research focusing on prizewinning and bestselling novels. My main emphasis in the discussion was to focus on the way computers can be useful for different kinds of audiences: for publishers to better understand the books they are selecting and marketing; for readers to better understand the books they want to enjoy but also engage with more critically and/or analytically; and for writers who want to use data to create new works that are aligned with existing markets in fresh and novel ways.

The .txtLAB Guide on How to Write Like a Bestseller

Here is a humble 1-page guideline that we produced after studying a sample of 10 years worth of the bestselling novels according to the NY Times Bestseller list. It was used as part of the Devoir Challenge in which some local Montreal writers were asked to try to write stories “like an American bestseller.”

One of the most interesting things we found when we sampled this past year’s bestsellers was that nothing much seems to have changed. In fact, the only really strong difference we detected was more emphasis on technology (more texting, phones, email, laptops, photographs, screens, and video). At the same time, there was less bitterness, genuineness, learning, and faith, and sadly more murders, police, lawyers and detection.

One question we were left with is just how stable this vocabulary is over time. Do bestsellers really reflect their times, and if so, what is the relevant time-frame (a year, a decade, a generation)? Or maybe they just consist of a relatively consistent set of tropes (action, police procedures, etc) recycled into a variety of insignificant sub-plots. More work to be done there.

How to write like a Bestseller

***Things to focus on:

Try to use many more characters than normal (about 30% more per novel).

Try to use more dialogue, about 50% more than you would normally.

Try to focus more on people, pronouns and actions:

  • More than 50% of the unique grammatical patterns in Bestsellers involve proper names
  • This is another popular formulation: gerund – to – verb, as in “going to run”

Try to focus on the following themes:

  • police and law (investigate, gun, kill, shot, file, lawyer, evidence)
  • technology (phone, photo, cell, text, program, scan, camera, screen, tape, button, but not “telephone”, that is more indicative of serious books)
  • conflict oriented words (problem, challenge)
  • facial expressions (nod, frown, sigh, grin, blink)
  • simple actions (grab, rip, gasp, ring, shook, crash, pull, get)
  • greater certainty (absolutely, totally, especially)
  • oddities: pretty, coffee, showers, porches

***Things to avoid:

Try to avoid using sentences longer than 11 words on average.

Try to avoid over-emphasizing nouns instead of proper names, in other words, think people not things (or even worse, abstractions).

Also try to avoid using nouns around conjunctions.

Some of the more popular grammatical patterns of serious literature involve nouns, adjectives, prepositions and determiners (such as everyfewthis).

Try to avoid the following themes:

  • complex emotions (shame, weeping, pity, abandon)
  • nostalgia (children, childhood, mothers, fathers)
  • nature (sea, winter, trees, desert, branches, mountains, spring, clouds)
  • imagination (pretend, imagine, dream)
  • the act of writing (write , wrote, language, books)
  • tentativeness (sometimes, perhaps)

oddities: tea, coughing, meat, soap, socks

The Devoir Challenge. How to write like an American Bestseller

When the books editor of Le Devoir, Catherine Lalonde, called to ask if my lab would supply a data-driven guide on how to write like a bestseller, I enthusiastically said yes. But I expected everyone else would say no. Surely writers will be allergic to data. And surely Quebecois and Canadian writers won’t want to write like an American bestseller! But this turns out not to be the case. The volunteers lined up, including this year’s Giller Prize winner. Here is a link to the final interview.

The reason I love this experiment is because it challenges our assumptions about data, creativity and culture. Understanding the tropes and tricks of bestselling writing offered a way for these writers to play with words and conventions. Writing a story, in this view, doesn’t start with the imaginary blank page (the way creative writing is often depicted in movies). Instead, it starts with explicit knowledge about how words always precede us before we begin to create something new. Data can be an instigation.

The same could be said with the cultural mash-up of asking francophone writers to write like American bestsellers. Its an exercise in mental travel, something we do physically here all the time, since so many of us live so close to the U.S. border. These kinds of cultural border crossings are important. They are about trying to think our way into the conventions of other people. The world would be better off if all of us did this more often. For our part in the lab, we’re going to look at more than just U.S. bestsellers next time. What are the different popular cultures of reading that exist around the globe? This is something we want to know more about.

The results of the experiment have been delightful to read — funny, clever, urgent. They take some of the bestseller’s love of emergency and give it a thought-provoking spin. One is about a writer trying to break through the constraints of writing by talking to herself. One is about a girl storming her home after a terrible day like it’s Star Wars. Another reads like a classic mystery in miniature, wealthy manor and all. One is about a man shifting his gender towards being a woman, and finally, the most recent is a complex allegory about sheep and an obsession with coffee and lost property (“sheepish” is wryly translated by André Alexis literally into French as “moutonnière,” once again showing us his brilliant thinking through animals). Each story, in its own way, boils down to a sense of identity in peril, something out of kilter or uncertain. You can still hear the pulse of Quebec beneath the thrum of l’américaine.

But did they succeed, you might be asking yourself. For the curious, we went ahead and asked the computer to predict which of the five stories sounded most like an “American bestseller”. As you will see, three of the five stories succeeded, with “Annie courait” by Daniel Grenier the most likely to be a bestseller. This doesn’t mean the others aren’t excellent in their own way. It just means that M. Grenier was able to mimic the conventions in incredibly droll ways. Then again, this could be one test where failing is a good thing!

If we take a quick look at Grenier’s story, we see how he does all the right things. He focuses on body parts like heads and faces; he conveys a sense of urgency through phrases like “La porte allait se refermer d’une seconde à l’autre” or “Soudain, elle fut stoppée net dans sa fuite.” He uses a lot of dialogue and has short, choppy sentences (“Rien. Silence Radio.”). And of course, there is a gun.

But he also plays with these mundane rules, too. The dialogue is actually Annie talking to herself. And her obsession is with breaking through a door — the door of “8,000 signs,” which we gradually learn is the story she can’t finish. This is a story about constraint, the constraint of a newspaper imposing strict word limits, about being handed a list of do’s and don’ts that were generated by a computer, about all those little voices in our head telling us what we should do in life. “You are going to do more with less, Annie,” she says to herself pointing the gun at the table of multiple columns of the 8,000 signs.

This is the breakthrough we are all hoping for: the discovery of something new and exciting, more from less.

 

The Devoir Challenge

Story Score
Annie courait par Daniel Grenier 83%
On ne rit pas par Monique Proulx 65%
Millionnaire fauché par Stéphane Dompierre 33%
Les sécrétions magnifiques par Marie Hélène Poitras 71%
Au Mouton Grincheux par André Alexis 46%

* Scores are based on the probability that the computer expected the story to be a bestseller. Results are based on a sample of 44,270 passages of bestselling and random novels.

 

Quantifying the Weepy Bestseller

I have a new piece out that is appearing in The New Republic. In a number of recent book reviews, literary critics and novelists arrive at the consensus that to be a great writer, one must avoid being “sentimental.” One famous novelist describes it as a “cardinal sin” of writing. But is it actually true? Using a computer science method called “sentiment analysis,” we tested this claim on a large corpus of novels from the early twentieth century to the present, and found the opposite. Writers who win book prizes and get reviewed in the New York Times are not any less sentimental than novelists who write popular fiction, such as romances or bestsellers. The only group for whom this was not true were the 50 most canonical novels ever written since about 1950. Our analysis tells us that if you want to write one of the most important books of the next half century, then you should tone down the sentiment. But if you want to be reviewed in a major newspaper, sell books, or win prizes, go ahead and emote away.

But the larger point for us is the way our cultural taste-makers are often wrong or extremely biased in their assumptions about what matters. We found that a computer, ironically, can paint a more nuanced picture of what makes great literature.

Here is a an excerpt:

If you want to be a great writer, should you withhold your sentimental tendencies? The answer for most critics and writers seems to be yes. Sentimentality is often seen as a useful way of distinguishing between serious literature and the not-so-serious, probably best-selling kind. “Sentimentality,” James Baldwin wrote, is “the ostentatious parading of excessive and spurious emotion…the mark of dishonesty, the inability to feel.” While sentimentality is false, grandiose, manipulative, and over-boiled, high literature is subtle, nuanced, cool, and true. As Roland Barthes, the dean of high cultural criticism, once remarked: “It is no longer the sexual which is indecent, it is the sentimental.” This sentiment (yes sentiment) has been around since at least the early twentieth century and is still a subject of debate in the review pagesof numerous media outlets today. But is it true? Whether you are for subtlety or against sentimentality, is this a good way to think about writing your next novel?

Read more here.

Prizewinners versus Bestsellers. Timeless Reads or the Spotlight of Fame

This post is the first in a series by this year’s .txtLAB interns. It is authored by Eva Portelance.

Building Corpuses

The first step in our search for answers required that we build solid corpuses for comparison. The PW corpus was selected from five main literary awards given in the United-States, Canada and Britain. These were the National Book Awards, the PEN/Faulkner Award for Fiction, the Governor General Literary Award for Fiction, the Scotiabank Giller Prize and The Man Booker Prize, this last one also awards international authors who have been published in the United Kingdom. From these awards, all shortlisted books, including the winners, from years 2005 to 2014 that were available as e-publications in Canada were selected. This amounted to 216 books. Publications that had won several prizes were only added once to the set. As for the BS, the 200 most popular books from the New York Times Bestsellers list from 2008 to 2014 were selected. This criteria was defined by the number of weeks spent on the list. The additional criteria that the novels had to have been published post- 2000 was also considered to try to better match the publication dates of the PW.

Defining Dictionaries

The corpuses created, we began testing different avenues in search of clues that could help us create a clearer picture of what it was that made these groups distinct within their shared fictionality. The two sets were rather similar, but the most interesting differences seemed to lie in their distinct lexicons, suggesting different themes and approach to written work in general. To illustrate these differences, dictionaries highlighting these themes and behaviours were selected. The process which led to their creation was thorough and avoided subjective criteria as best possible to ensure their validity. First, we ran a likelihood test which creates a matrix of common words to a first set, that is, words that seem to be present throughout the corpus and thus, possibly representative of the set. This matrix is then cross-referenced with a second set to only look at words which are present in both corpuses and uses a Wilcoxon Rank Sum test to rank and select the 400 most distinctive words, which in turn are likely to be indicative of characteristics of the first set. We ran the test in both directions thereby creating a dictionary representing each of the corpuses. It is important to note that the sets were both ridded of stop words and stemmed, so not to be surprised by the unconventional orthography or lack of inflection on the resulting dictionaries presented in the graphs bellow. The words used for the subsequent dictionaries investigating theme and language use were selected from these two resulting lists.

Timelessness and Momentary

Recurring themes for the PW corpus seemed to be family, nature as well as sadness and spirituality: the key components of a good soul searching endeavor. This concentration on nature also suggested the importance of descriptive passages. As for BS, interesting sets of words that were explored were technology related words and vernacular words. What peeked my curiosity was the most however was not necessarily the distinct themes themselves, but rather the distinctive words used within similar categories. The example I will share here is that of time. PW seemed to use words that spoke of time in terms of visual cues or spatial relations, referencing the age of characters, or seasons, whereas the BS had words that were based on the factual nature of time, like “minute”, “hour” or “yesterday”. With this in mind, I found that the other key categories mentioned for PW could also be looked at in this light, seeking universal and timeless values such as family and spirituality, no matter if they be discussed in a positive or negative light. Those mentioned for BS speak of things in passing, technology and popular speech are always evolving and certainly do not represent language or ideas that are expected to withstand time, often expiring even within a few years. These are things that readers will understand and breathe in the moment. To this extent, they propose a very different relation to time than do the writings in the PW corpus. They speak of momentary ideas and if this also applies to their storylines, it would suggests events of ephemeral pleasure or pain, rather than contemplation.

Language and Thought

To generalise this idea even further, I question whether the use of language in PW and BS is indicative of different intuitions on language, but also on the world it chooses describe. Whether something is well written if often highly based on prescriptive ruling and thus, there is less interest in knowing what makes a good book. However, what is chosen to be written about and the perspective used to do so is anchored in descriptive thought processing. Therefore, I center my attention for further reflection on a new question: Is the language used by the authors of these books from two distinct sets indicative of a shared thought process or perspective of writing, or even the world they choose to describe? 

List of Graphs

Here are bar graphs representing each dictionary mentioned. The dictionaries were originally a little longer, but the most indicative words were selected here. This selection was based on the sparsity level of a word through its most representative corpus. The data presented compares the scaled word count of a specific variable throughout the sets.

 

The words chosen were present between 100 and 95 percent of the books in the PW corpus, indicating very high relevance. It is interesting to note the lower presence of male characters in BS compared to female counterparts.
The words chosen were present between 100 and 95 percent of the books in the PW corpus, indicating very high relevance. It is interesting to note the lower presence of male characters in BS compared to female counterparts.
The words chosen were present between 100 and 75 percent of the books in the PW corpus, indicating very high relevance.
The words chosen were present between 100 and 75 percent of the books in the PW corpus, indicating very high relevance.
The words chosen were present between 95 and 83 percent of the books in the PW corpus, indicating very high relevance.
The words chosen were present between 95 and 83 percent of the books in the PW corpus, indicating very high relevance.
The words chosen were present between 90 and 70 percent of the books in the PW corpus, indicating very high relevance.
The words chosen were present between 90 and 70 percent of the books in the PW corpus, indicating very high relevance.
The words chosen were present between 98 and 71 percent of the books in the BS corpus, indicating very high relevance, whereas in the PW, they only appear in 30 to 90 percent of them, much less unified.
The words chosen were present between 98 and 71 percent of the books in the BS corpus, indicating very high relevance, whereas in the PW, they only appear in 30 to 90 percent of them, much less unified.
These words are less common than the other sets, however, most are still very common in the BS and very uncommon in the PW, indicating a clear point of comparison. They were present between 95 and 40 percent of the books in the BS corpus, whereas in the PW, they only appear in 23 and 80 percent of them.
These words are less common than the other sets, however, most are still very common in the BS and very uncommon in the PW, indicating a clear point of comparison. They were present between 95 and 40 percent of the books in the BS corpus, whereas in the PW, they only appear in 23 and 80 percent of them.