Electronic Jane Austen and S. T. Coleridge

by

Eric Johnson

Version 2.0 of an article first published in
TEXT Technology, 4.2(Summer, 1994), 93-100.
It is almost impossible to imagine a kind of research about Jane Austen or Samuel Taylor Coleridge that could not be assisted by employing electronic (machine-readable) versions of their works; teaching, too, might profit from using electronic editions. The Oxford Electronic Text Library edition of The Complete Works of Jane Austen (OETL Austen) and The Oxford Electronic Text Library edition of The Poetical Works of Samuel Taylor Coleridge (OETL STC) are admirable electronic texts that modern scholars and teachers should be using.

Both the OETL Austen and the OETL STC are based on standard texts and provide useful information in an accessible format. The OETL Austen is a rendering of R. W. Chapman's Oxford Illustrated Jane Austen; the OETL STC contains the text of Ernest Hartley Coleridge's Coleridge: Poetical Works. They include valuable information encoded in Standard Generalized Markup Language (SGML). They are distributed in both MS-DOS and Macintosh formats. Site licenses are available, and thus the files can be processed by multiple users on a local area network. These texts will be employed in a multitude of ways by many students of Austen and Coleridge for years to come.

The OETL files contain standard ASCII text plus they have SGML tags and references (more about which in a moment). The tags and references can be removed, and necessary substitutions made in order to format the text into a readable form. The result of such formatting for the last sixteen lines (plus Coleridge's last marginal gloss -- see note 1) of The Rime of the Ancient Mariner is shown in Figure 1.

The formatted version of the text is useful to scholars and students since they can load large or small parts of it directly into most word processors. A writer who wants to quote the well- known stated moral of The Rime can paste the lines into a document with a few key strokes or mouse clicks:

     He prayeth well, who loveth well
     Both man and bird and beast.

     He prayeth best, who loveth best
     All things both great and small;
     For the dear God who loveth us,
     He made and loveth all.

It is thus easier to write a paper or to construct a test. The advantage of electronically inserting part of one document into another is not only the saving of time and effort, but the assurance of accuracy; a writer on The Rime can confidently assert that all of the included quotations conform to Ernest Hartley Coleridge's edition.

The formatted version of the text can also be searched using a word processor or extremely commonly-available software. In this way, a student can find the context of each of five occurrence of "loveth" in The Rime and may note that all five are at the very end of the poem. The problem, however, with searching a formatted text is that it is difficult to identify just where in a text an occurrence is found, and it is nearly impossible to catalog multiple occurrences spread over many pages.

The OETL Austen and the OETL STC contain SGML tags that indicate the divisions and parts of documents: volumes, chapters, pages, stanzas, lines, and so on. Therefore, using the full OETL texts, a researcher can search for specific words, and when they are located, record the references, and thus produce a selected (or complete) index.

Robert Penn Warren has said that in The Rime, "the good events take place under the aegis of the moon, the bad events under that of the sun" (see note 2). As a start in examining whether such symbolism runs throughout Coleridge's poetry, a student might want a listing stating the location of occurrences of the words "moon" and "sun" in the poems. Figure 2 furnishes exactly that: it gives a short title, the page number, and the line number for every occurrence of the selected words.

The OETL Austen provides additional encoded information. Tags in the OETL Austen indicate direct (and indirect) quotation, and the speaker for each. Therefore, a researcher can isolate and analyze the dialogue of each character (see note 3). It is interesting to compare how much the heroine of Emma and the heroine of Pride and Prejudice talk: Emma Woodhouse speaks about 27 percent of the number of words of dialogue and Elizabeth Bennet speaks about 15 percent. Elizabeth speaks only about one-third as much as the narrator in her novel, but Emma talks almost as much as the narrator in hers.

It can be helpful to a scholar to know how many nouns, verbs, adjectives, and so on are in a text, and to observe how they are used. Of course, a form such as "thought" commonly functions as a noun ("Emma would speak every thought") and as a verb ("Emma thought of her friend"). Therefore, the OETL Austen texts contain reference codes to identify the function of homographic forms such as "thought." With such references to distinguish forms, it is possible to make counts. Although the first chapters of all Austen novels are similar, the first chapter of Sense and Sensibility has a higher percentage of nouns and a lower percentage of verbs than the first chapter of Mansfield Park; they have nearly identical percentages of prepositions and pronouns.

Thus, researchers can use the OETL Austen to collect the dialogue of each speaker, and they then can analyze occurrences of specific words and phrases as well as the word forms used by each in order to differentiate and characterize a wide range of Austen's speakers. An index of occurrences of words (like that for Coleridge shown in Figure 2) could be made for Austen's novels, and it could be restricted to those occurrences in Emma's dialogue -- or to those of any specific characters. It might be found that some characters use far more adjectives and adverbs than others. Such analyses can be more sophisticated and far more detailed than would be feasible without the OETL texts.

The OETL STC was prepared by David Miall, Tim McRory, and Susan Fisher; it was encoded in SGML by Alan Morrison and Lou Burnard. The electronic edition upon which the OETL Austen is based was prepared by John Burrows and Alexis Antonia; conversion of their encoding scheme was done by Lou Burnard. They should be proud of their work, and users will be grateful.

Although, there are a few small slips, these OETL texts appear to be accurate and complete electronic renderings of standard printed texts of two major English writers. As it is currently shipping, the OETL Austen does not contain the Minor Works, but it includes the six major novels published in the Chapman Oxford Illustrated Jane Austen. The OETL STC does not contain about one hundred pages of fragments, metrical experiments, or early drafts of the Poetical Works, but it has all 492 pages of major (and minor) poems. In the OETL STC, once in a while lines of marginal glosses or epigraphs are mistakenly coded as lines of poems. Occasionally a space is omitted from the text following a reference code in the OETL Austen. These are slight flaws that affect only line numbering (in Coleridge) or the appearance of a formatted text (in Austen).

Many folks say that the electronic book is the pattern of publishing in the future, but marketing electronic text files is still a new idea, and a traditional book publisher may be a little uncertain about how to do it. The OETL Austen and OETL STC contain only files of texts; there is no software included to use with the texts. (The analyses mentioned in this review were produced by SPITBOL-386 programs that I created especially for these texts.) Probably Oxford University Press would find the OETL Austen and OETL STC would appeal to a larger market if they included some kind of software: perhaps a search program similar to the one that produced the output shown in Figure 2, or a formatting program like the one that produced the output shown in Figure 1. Of course, since the texts are encoded with SGML, they should be able to be used with any software designed to process SGML. It would be an immense mistake for any publisher to encode electronic texts with a proprietary markup system since they then could not be processed by standard SGML software.

The forty-two page printed documentation for the OETL STC is a clear, helpful introduction to SGML encoding; it is exactly what a programmer needs in order to write software for the texts. The printed documentation for the OETL Austen gives an interesting account of how the electronic texts were prepared and used by Burrows. However, the texts as Burrows used them were in a format different from the OETL Austen SGML format. Since the documentation mostly describes the Burrows' format, the documentation contributes little to an understanding of the complex form of the OETL Austen texts. Perhaps it will be revised along the lines of the documentation for the OETL STC.

Oxford University Press deserves praise and support for publishing the OETL Austen and the OETL STC -- as well as similar editions for other major English writers. At a time when many texts are available free of charge (especially on Internet servers), we might ask why we should pay for the OETL texts. Well, the answer is that you get what you pay for. The OETL texts are accurate renditions of standard editions that contain SGML markup; they allow students and scholars to do many kinds of research that would not be possible without them. Indeed, they both enable and stimulate precise textual and linguistic research.


Title: The Oxford Electronic Text Library edition of The Complete
Works of Jane Austen.

Category: SGML-conformant electronic (machine-readable) text.

System Requirements: Currently (without the Minor Works) the OETL
Austen consists of 77 files which require approximately 6.7 MB of
disk space; additional space is required to concatenate the files
into useful entities.  Available in Macintosh disk format or
either 5.25-inch or 3.5-inch MS-DOS disk format.

Documentation: Twenty-five page pamphlet.

Company: Oxford University Press
         198 Madison Avenue
         New York, NY 10016
         Tel: 212-726-6000
         FAX: 212-726-6440

Price:    $95.00    Site License: $295.00

Title: The Oxford Electronic Text Library edition of The Poetical Works of Samuel Taylor Coleridge. Category: SGML-conformant electronic (machine-readable) text. System Requirements: The OETL STC consists of 12 files which require approximately 724 KB of disk space; additional space is required to concatenate the files. Available in Macintosh disk format or either 5.25-inch or 3.5-inch MS-DOS disk format. Documentation: Forty-two page pamphlet. Company: Oxford University Press 198 Madison Avenue New York, NY 10016 Tel: 212-726-6000 FAX: 212-726-6440 Price: $95.00 Site License: $295.00


Eric Johnson is the author or editor of over one hundred volumes and articles -- mostly about computers, writing, and literature. He can be contacted via email as JohnsonE@jupiter.dsu.edu


Click here to go to Eric Johnson's publications.

Click here to go to Eric Johnson's home page.


Notes

1 I have made some minor changes in a OETL STC file to insure that the line breaks in the Rime's marginal glosses correspond with the printed version.

2 Warren, Robert Penn. "A Poem of Pure Imagination: An Experiment in Reading" in Selected Essays. New York, 1958, 233- 234. Warren specifically suggests tracking the sun-moon symbolism throughout Coleridge's works.

3 Exactly such analysis is the content of Burrows, J. F. Computation into Criticism: A Study of Jane Austen's Novels and an Experiment in Method. Oxford, 1987. The electronic editions of the Austen novels that were used for his research are the basis of the OETL Austen.
 



And to teach,
by his own
example, love
and reverence
to all things
that God made
and loveth.

Farewell, farewell! but this I tell
To thee, thou Wedding-Guest!
He prayeth well, who loveth well
Both man and bird and beast.

He prayeth best, who loveth best
All things both great and small;
For the dear God who loveth us,
He made and loveth all.

The Mariner, whose eye is bright,
Whose beard with age is hoar,
Is gone: and now the Wedding-Guest
Turned from the bridegroom's door.

He went like one that hath been stunned,
And is of sense forlorn:
A sadder and a wiser man,
He rose the morrow morn.
Figure 1. Formatted text of lines from OETL STC.



Words        Title,page-number,line-number

sun          Anthem for children,6,31; Nose,8,14; To the Muse,10,14;
             Honour,24,2; 26,67; Absence,30,17; Happiness,32,83;
             Effusion at Evening,49,13; 49,24; Autumnal Evening,51,13;
             Beautiful Spring,59,25; To Lesbia,61,6; Pantisocracy,69,13;
             Erskine,80,13; La Fayette,82,10;
             Answer to a Melancholy Lettter,90,5;
             Religious Musings,109,15; 110,19; 113,98; 113,111; 114,139;
             118,255; 123,385; 125,417; Destiny of Nations,132,26;
             133,65; 146,448; J. Horne Tooke,150,8;
             Lime-Tree Bower,179,11; 179,33; Ancient Mariner,187,25;
             189,83; 190,98; 190,112; 193,174; 193,176; 193,177;
             193,180; 193,183; 193,185; 195,198; 200,355; 201,383;
             Christabel,231,516; War Eclogue,238,35; France,244,17;
             245,48; Old man of the Alps,249,62;
             Fears in Solitude,257,20; 258,40; 259,85;
             The Nigtingale,265,27; Three Graves,271,62; 276,224;
             276,240; 280,393; 284,507; 284,509; 284,511;
             Wanderings of Cain,288,0; Dark Ladie,293,13;
             Story of the Mad Ox,299,6; 299,7; Hymn to the Earth,328,17;
             Mad Monk,349,42; Inscription for a Seat,350,20;
             The Picture,374,167; Hymn Before Sunrise,379,55; 380,84;
             Sonnet from Marini,392,3; Blossoming date-tree,396,1;
             Two Sisters,411,13; Israel's Lament,433,6;
             Alie du Clos,469,1; 471,68; Love, Hope, and Patience,481,2

moon         Absence,30,19; Happiness,32,85; Songs of Pixies,41,9;
             44,85; Beautiful Spring,59,30; Nightingale,94,16;
             Foster-Mothers' Tale,182,9; Ancient Mariner,190,114;
             195,202; 196,210; 196,212; 197,262; 197,263; 198,271;
             199,321; 199,323; 199,329; 202,417; 203,432; 203,437;
             204,475; I,209,2; Christabel,216,18; 222,175;
             Frost at Midnight,242,74; Circassian Love-chaunt,253,5;
             254,16; 254,19; 254,32; 255,62; The Nightingale,266,76;
             266,77; 267,102; Three Graves,278,306; 280,393;
             Kubla Khan,297,24; Hymn to the Earth,328,18;
             View of Saddleback,347,6; Stranger Minstrel,352,57;
             Ode to Tranquility,361,24; Dejection,362,1; 362,2; 363,17;
             364,39; Hymn Before Sunrise,379,55; Day-Dream,385,14;
             A Sunset,394,4; To William Wordsworth,408,101;
             Grateful People,436,18; Ideal Object,456,20
Figure 2. A select index for Coleridge's Poetical Works.


El diseño de la página y las imágenes son
© 1996-2000, Universitat de València Press
© del grupo "mmm"
Comentarios a: fores@uv.es
València  15th September 2000