The Text(s) of Shakespeare: Back to Basics, Forward in the Digital Era

Jesús Tronch Pérez (Universitat de València)

29 November 2016

Universitat Jaume I

"If Early Modern plays continue to be read at all, the young reader’s first encounter with them will increasingly be through some mobile device."

Martin Mueller, 22 June 2013 post

Looking for the text of Hamlet or Macbeth in the Internet

First hits for Hamlet

> text of Moratín's 1798 translation in Wikisource (modernized text)

problems: "Éste sólo gusta de ver hablar" Acto II Escena X). Also in cervantesvirtual.com

Compare facsimile of 1798 edition

> text in English from Wikisource: "the source document of this text is not known" (accessed 27 Nov 2016)

First hits for Macbeth

> Macbeth en Wikisource (First Folio transcription)

problems: verse as prose with oddly capitalized words

compare facsimile (indicated in sources: )

> Macbeth in The Shakespeare Project

Source of the text: intro > Moby(TM) at MIT (index page)

"Moby Shakespeare" (Ward's Moby project, apparently based on the 1866 Globe edition [Johnson])

Back to basics

Textual criticism - a short introduction

Forward in the Digital Era

electronic medium: what is an electronic text?

What's behind an electronic text?

The text of play or poem by Shakespeare is a sequence of symbols, an arrangement of words (made up of letters) and punctuation marks, recorded in a given document (paper, electronic). These symbols are inscribed instructions for a reader or receiver to process that information, to decode that text in order to understand the message.

While we can read the letter "i", a sign in the roman alphabet code, computers don't read the letter "i". Almost all modern computers use the binary numeral system to encode information, and so they need to have the letter "i" encoded as a given sequence of 0 and 1, such as "110 1001", in ASCII code.

Further reading: Hugh Cayless' "Character Encoding and TEI XML"

Computers' processing of information is called "computation" or "calculation"

Calculation = "deliberate process for transforming one or more inputs into one or more results, with variable change" ("calculation" Wikipedia)

In computing, this process involves a step-by-step procedure, that is, an algorithm = " systematic procedure that produces—in a finite number of steps—the answer to a question or the solution of a problem. " ("algorithm." Encyclopædia Britannica Online Academic Edition. Web. 01 Mar. 2013)

Before we discuss what sort of problems we can deal with in the study of Shakespeare through electronic texts , we must point out the ovious fact that computers need the text of the poem or play, etc., that is, an "input" in electronic form.

When we produce or read a text, we not only encode or decode the signs such as words and punctuation marks, but also apply cultural codes (mostly typographical conventions) that tell how the text is structured.

When we read the following text:

" PEANUT-BUTTER ON A SPOON

peanut-butter

Stick a spoon on a jar of peanut-butter, scoop and pull out a big glob of peanut-butter "

we interpret its typographical features and content as belonging to the text genre of "recipe", as we observe that this text is organized into a title, followed by the list of ingredients, and then instructions.

Now, computers can only read a text as a sequence of signs, a string of symbols. If the "input" were just the text, the computer would only read "peanut-butteronaspoonpeanut-butterstickaspoononajarofpeanut-butter,scoopandpulloutabigglobofpeanut-butter".

The input needs to be encoded in such a way that the computer distinguishes sections of the text and their structutre.

For this, computers need a markup language.

Markup language =

"A markup language is a modern system for annotating a document in a way that is syntactically distinguishable from the text. [ . . . ] Examples are typesetting instructions such as those found in troff, TeX and LaTeX, or structural markers such as XML tags. Markup instructs the software displaying the text to carry out appropriate actions, but is omitted from the version of the text that is displayed to users. Some markup languages, such as HTML, have pre-defined presentation semantics, meaning that their specification prescribes how the structured data are to be presented; others, such as XML, do not. ("Markup language" Wikipedia )

SGML encoding of Literature Online

XML (entry in Wikipedia )

HTML (entry in Wikipedia)

LaTex (explanation in Wikipedia)

What if we try to use the same encoding conventions to make texts machine-readable?

TEI Text Encoding Initiative

TEI by Example

"Character Encoding and TEI XML"

An example of a Shakespeare text in TEI-conformant XML: Hamlet from Folger Digital Texts

What sort of PROBLEMS can we have in the study of Shakespeare through the texts?

For instance, in an analysis of Frankenstein we may be interested in the contrast between positive terms related to "happiness" and "beauty" and negative terms related to "darkness". Robert Harris suggests the following question: "How do the passages relating to happiness and beauty fall within the novel, and how to they occur in relation to those of darkness?"

To answer that we need to "Look for such terms as lovely, beautiful, beauty, pleasure, delight, joy, delighted, enchanged, gay, happy and contrast them with terms such as ugly, disconsolate, melancholy." ("Ideas")

How can specific tools, such as computers, assist me in solving these problems?

Computers can perform certain tasks more efficiently than humans can: for instance, search and find text or patterns

In order to have computers perform a given task, humans need to give them instructions

Readers understand a text because they share the code (the language, e.g. English) with the sender or writer, as well as its writing conventions (instructions). Thus humans can read the letter "i", a sign in the roman alphabet code, and understand "I" as the first personal pronoun because they learned that. Computers "learn" as long as they are given instructions.

Instructions can be communicated to computers by means of a programming language: A sequence or collection of such instructions is a computer program

Basic computer programs to work with texts are text editors and word processors.

Text editor : Bluefish (explanation in Wikipedia) | Word processor : Writer in LibreOffice

Robert Harris's 1994 article explains ways of using word processors in literary analysis:

- finding occurrences and themes

- testing theories or claims

- discovering thematic shapes

- performing specialized searches

Text Analysis Tools

With electronic texts, it is easier to find and study patterns in texts, see words and phrases in context, examine their frequencies, analyse word clusters, obtain concordances, etc.

Text analysis software listed in DiRT (Digital Research Tools) and in "Text mining" Project Bamboo

For other digital research tools, see Project Bamboo

Concordances

Use of concordances in the study of language and literature: "Concordance (publishing)" Wikipedia

For instance, in an analysis of Frankenstein, Robert Harris suggests to run the novel through a concordance program in order to "determine the frequencies of word occurrence. After the common "glue words" (such as the, of, is, and, etc.), what words are used most often, and of what kind are they? Approach the meaning of "kind" from several aspects. That is, are they nouns, verbs, or adjectives? Are they tonal words such as dark or gloom? Are they words conveying excess of a quality rather than moderateness (such as terror rather than fear, elated rather than happy)? And of course, so what? " ("Ideas")

Software:

AntConc (freeware concordance programme) | Concordance |

see "Concordancer" entry in Wikipedia

Open Source Shakespeare

WordCruncher : an electronic text viewer with several tools with which one can look for specific references, search for words or phrases, perform simple word frequency distribution studies, study multiple languages in synchronized windows, and see various reports (e.g., collocation, vocabulary dispersion, vocabulary usage)

WordSmith Tools (concordance, key words, word list alphabetical and frequency, )

Word Count Tools (identifies unique words, difficult words; average sentence length, readability, keyword density, estimated reading time)

Wordcounter (shows top 10 keywords) |

Visualization tools

TokenX | Tagline Generator | ManyEyes |

TagCrowd | Wordle | ManiWordle | WordItOut (word cloud generator) |

TextArc TextArc Hamlet

Facsimiles:

Texts in quarto :
Texts from the First Folio (1623):

Electronic transcriptions and editions of "readable" text:

Hylton, Jeremy's site at MIT (based on Moby)

Open Source Shakespeare by Eric Johnson (based on Moby, with further corrections from the 1866 Globe edition)

WordHoard Shakespeare edited by Craig Berry, Martin Mueller, and Clifford Wulfman (Tufts University and Northwestern University) (text derived from the Globe edition)

Internet Shakespeare Editions - Hamlet

Folger Digital Texts ("download" section: see formats)

ebook version of Folger Shakespeare Library edition

MorhAdorner

"... digital texts can be enriched with levels of metadata that support new forms of corpus-wide exploration. You need to do a lot “to” texts before you can do these new things “with” them, but once you have done them, there are oceans of opportunity."

Martin Mueller, 22 June 2013 post

Works cited and consulted

Best, Michael. “Shakespeare and the Electronic Text.” Blackwell Concise Companion to Shakespeare and the Text. Blackwell, 2007

Harris, Robert. "Ideas for Analyzing Frankenstein." VirtualSalt. 18 Jun 2000. Web. 1 Mar 2013. http://www.virtualsalt.com/lit/franidea.htm

Harris, Robert. "The Personal Computer as a Tool for Student Literary Analysis" VirtualSalt. 30 Dec 1994. Web. 1 Mar 2013. http://www.virtualsalt.com/comptool.htm

Johnson, Eric. “Introduction: The History of Open Source Shakespeare.” Open Source Shakespeare. http://www.opensourceshakespeare.org/info/aboutsite.php (accessed 27 Nov 2016)

Jones, Sarah. "When Computers Read: Literary Analysis and Digital Technology." ASIS&T Bulletin (April/May 2012)

Lancashire, Ian. “The Public-Domain Shakespeare.” A paper delivered at the Modern Language Association, New York, December 29, 1992, http://www.library.utoronto.ca/utel/ret/mla1292.html