University of Valencia logo Logo Master's Degree in Language and Literature Research Logo del portal

Literary Digitalization

If we seek for the opposite of macro-literature, as a consequence of the works written by Franco Moretti, we are facing philology or close reading. The author defended a comparative reading of international literature in which the microscopic (figures, poems, books or concrete authors) would have littler importance compared to the macroscopic (genres or number of translations). However, over 15 years have passed since Moretti’s works were published and the process of digitisation is unstoppable. Big Data “has changed our way of understanding the world”, just as Jorge Carrión explained in his article “Literature and big data”, published by La Vanguardia. 

13 september 2016

Carrión starts from Moretti’s books and classes in Stanford as well as in his publications for the New Left Review Magazine where he talked about his “Conjectures on World’s Literature” (2000), where the writer described the speechless system created by modern novels. He ensured that there was a co-relation between economy and literary forms; however, for some people Hispano-American poetry, for example, does not adapt to this statement; hence, they are against such statements.

He ensured that there was a co-relation between economy and literary forms

To all them, Moretti dedicated “More conjectures on world’s literature” (2003), a collection of all critics received with their corresponding response as a tool to reaffirm his opinion: “moving attention to isolated and presumably extraordinary texts to a great textual mass”. 

In 2004, Google Print (soon to be Google Books) was presented at the Frankfort Fair. This meant the creation of “a data base which seemed to be designed to maximize the analysis and results of quantitative readings of literature after centuries of studies based on aloof and random concepts such as taste and quality”, ensured Carrión.

However, we have to be more precise because, as stated by the journalist, the big data revolution in digital human studies does not only reside in quantity, but also in quality. Yet, it is not the quality of the texts but rather the patterns extracted from the data: “the interpretations created by our senses”, explained Carrión. Hence, we are dealing with a new form to build models and stories which can be as good -or even better- than those traditionally set in reflection, intuition or the linking of a limited number of personal or group readings.

The big data revolution in digital human studies does not only reside in quantity, but also in quality. Yet, it is not the quality of the texts but rather the patterns extracted from the data

For its part, Matthew L. Jockers (Moretti’s disciple); has crossed around 3592 texts published between 1780 and 1900 to discover the most popular authors. Contrary to what Jean Austen and Sir Walter Scott, “both in terms of stylistic resources and contagiousness of themes there was no other author who could hold a similar influence than them”, stated Carrión.

In this sense, we can also higlight Ryan Heuser and Long Le-Khac, who thanks to their researches established another corpus of 19th Century novels (2958 titles) and determined that with the century’s progression, there was a rise in the number of terms which indicated “action” and those which described the human body (finger or face). Hence, we could say that this is how “the urbanisation process and the birth of the modern mass was born”, wrote La Vanguardia journalist. 

Likewise, this literature conditioned by the corpus, needs of a new literary researcher. Someone with knowledge on computer science and mathematics. We should highlight he manifesto signed by 14 authors with the Google Books team entitled “Quantitative analysis of culture, using millions od digitized books”, which talks about working in cuturomics: the economy of culture or quantified culture, explained Carrión. 

This literature conditioned by the corpus, needs of a new literary researcher

Google Ngram Viewer was born in the middle of such development and it allows reader to carry out their personal statistical searches. “The words introduced are tracked and found in over five million books in English, Spanish, French, Russian, Chinese, German and Hebrew published between 1500 and 2008 and turned into a graphic”, commented Carrión.  However, the facts are not always equal to interpretations, which could take years to formulate. 

Google Ngram Viewer was born in the middle of such development and it allows reader to carry out their personal statistical searches

In the end, this development is significant as both the lexicon and the rhetoric are treated statistically and can tit the scale when there are doubts on who really wrote a work, for example, by searching for anachronisms; according to Carrión, it is possible to solve such doubts. In the digital era, there is such a high information flow that it is hard to distinguish a new idea from the textuality which surrounds us. Hence, there have been many new forms to work online which have arouse during the change of the century