Version 2.0 of an article first published in TEXT Technology, 4.4 (Winter, 1994), 263-267.
The Oxford Electronic Text Library Edition of The
Complete Works of Jane Austen
contains SGML tags that precisely identify, among other things, the speaker of every word
of dialogue in Austen's six novels. (See note 1.) A computer program that I created,
JADIALOG, notes the speaker tags in the text and separates the dialogue of each of the
characters (including narrators) accordingly. Some dialogue had to be assigned to
composites (such as "inseparable groups") or to imaginary or unidentifiable speakers, but
most of the talking is done by the named characters. In all, there are 141 speakers in the six
Austen novels, and the number of speakers in each novel is rather uniform: four novels have
24 speakers; one has 23; one has 22. Using such an electronic edition and computer
programs designed to read the tags in the texts (see note 2), scholars can do research about
characters' dialogue that would not be possible without them -- or would be possible only
with very arduous labor.
Using my program JATALK, it is easy to determine
how much each character speaks. As
might be supposed, the narrators talk more than most of the characters: from a high of
69,240 words (Mansfield Park) to 40,686 (Northanger Abbey). Emma Woodhouse talks
more than any other character (42,800 words); Fanny Price is the next most vocal, but she
speaks less than half as much as Emma (21,156 words). Ford, the owner of the shop Emma
visits, speaks less than any other named character: 14 words. Mr. Collins who will be
remembered as giving frequent, long, formal speeches in Pride and Prejudice, speaks less
than might be thought: 5,057 words.
It can be interesting to determine what kinds of
words the characters use with a program
called FINDLIST. Figure 1 shows the percentage of words (from seven lists) that are found
in the dialogue of characters and narrators. Thirteen male characters and thirteen female
characters were selected; most of these are major characters, but a few memorable minor
ones (like Mr. Collins) are included. Also, for comparison, the percentage for the full text of
the six novels is given. See Figure 1.
The first four lists contain pronouns. List 1 holds
masculine pronouns ("he," "him," "his,"
and "himself"). List 2 has feminine pronouns ("she," "her," "hers," and "herself"). List 3
includes all the pronouns of lists 1 and 2, plus it contains the plural forms ("they," "them,"
"their," and "themselves"). List 4 consists of first-person pronouns ("I," "me," "my," and
One rationale for computing the percentage of pronouns
(and other words) used by
speakers is that such data may indicate the subjects about which characters speak a lot (or a
little) and thus reveal something of the speakers' personalities. In some cases, the
percentages are approximately what readers might expect; other cases may be a little
surprising, and thus they may lead to new scrutiny of some of the characters.
Frank Churchill and Edward Ferrars might be thought
rather self-centered and even selfish.
The high percentage of Me Words and the low percentage of Other Words for these two
characters seems to show that they talk much more about themselves than about others
(especially Ferrars), thus establishing their characters as expected. Something similar is
shown for Jane Fairfax -- who talks the most of herself and the least of others of any
character in this study.
The lowest percentage of Me Words and the high percentage
of Other Words for Fanny
Price and Anne Elliot matches readers' expectations for Austen's most selfless heroines. The
percentages for Emma Woodhouse may not be exactly as supposed. Her use of Me Words
is the third lowest, and her use of Other Words is the second highest. Probably few readers
would place Emma in a category with Fanny and Anne -- she is frequently interfering,
scheming, and match-making; she acts rather self- centered. However, Emma's pronoun use
might suggest a re-evaluation of her nature.
Jane Bennet, Harriet Smith, and Colonel Brandon seem
to talk a good deal about themselves
as well as about others. In contrast, Henry Tilney says little about himself and little about
others. Austen scholars might consider what is thus shown about these characters.
Based on their use of pronouns in Lists 1 and 2,
it appears that characters tend to talk more
about the opposite sex than about their own -- especially the female speakers. Ten of the
thirteen women speak more about men than about women. Seven of the thirteen men speak
more about women than about men. By contrast, all six narrators speak more about women,
and they all do so by a substantial margin.
The words in List 5 ("love," "loved," loves," "affection,"
and "passion") are found in the six
novels 940 times: they are only thirteen one-hundredths of one percent (0.13%) of all words
used. Since the plots of Austen novels deal with courtship and marriage, it is somewhat
surprising that such low percentages of Love Words are found in the dialogue and
throughout the novels. More women seem to speak of love: seven women use the words
0.20% or more, but only two men do so. It may seem natural that the elderly Sir Walter and
Mrs Clay never use the words. There may or may not be a reason that Jane Fairfax does not
use the words: readers will remember that she is concealing her engagement to Frank
Churchill. It does not seem obvious that Elizabeth Bennet should use a higher percentage of
Love Words than any other character.
There is little color in an Austen novel. The fourteen
words in List 6 ("blue," "green," "red,"
and so on) are found 152 times in the six novels: only 0.02%. By comparison, several
novels by Anthony Trollope have about double the percentage of color words as those of
Austen, and two novels by Nathaniel Hawthorne have about twelve times the percentage;
even the no-nonsense, western action novels of James Janke have an average of six times the
percentage of color words as those of Austen. The percentages are so small for each
Austen character that it is difficult to make distinctions, but it seems odd that Sir Walter's
dialogue has the highest percentage of color words.
List 7 consists of 125 words found very commonly
in all kinds of writing: prepositions,
pronouns, articles, forms of "to be" and "to have," and so on. It might be thought that the
nature of narration ("she said...," "he has been ...") would require that Austen's narrators
would use more common words than the speakers, but just the opposite is true. All but two
of the speakers use at least three percent more common words than the narrators. Only the
dialogue of Mrs. Clay and Sir Eliot fall within the range of the narrators' use of common
words. There is no apparent reason that those two characters should use fewer common
words than other characters.
The overall percentage of common words in the novels
of Austen is a little higher than that
in novels of other nineteenth-century writers such as Trollope, Dickens, James, and Twain.
If there were tags to identify the dialogue of speakers in the electronic editions of these
writers, it would be interesting to determine whether their characters use more common
words than their narrators use.
This article has just touched the surface of the
kind of study of the novels of Jane Austen
that can be done with the texts in electronic form with the dialogue of each speaker identified
with SGML tags. Not only is more sophisticated analysis of her novels possible with
editions in which characters' speech can be isolated, compared, and contrasted, the editions
stimulate it. Similar electronic versions of the novels of other writers would be truly helpful
for literary scholars.
1. For information about this edition, see my "Electronic
Jane Austen and S. T. Coleridge,"
TEXT Technology, 4.2 (Summer, 1994), 93-100.
2. The computer analysis of novels mentioned in this
article, including the output shown in
Figure 1, was produced by programs that I wrote in SPITBOL-386. A description of this
valuable compiler is found in my "SPITBOL-386: The Language of Choice for Non-numeric
Computing," TEXT Technology, 4.3 (Autumn, 1994), 177-185.
Character List 1 List 2 List 3 List 4 List 5 List 6 List 7
Text File He She Other Me Love Color Common
Words Words Words Words
Words Words Words
ELINOR.DAS 3.21% 2.22% 6.28% 3.83% 0.21% 0.00% 55.82%
MARIANNE.DAS 2.17% 0.77% 3.67% 6.71% 0.21% 0.00% 56.85%
ELIZABET.BEN 3.10% 1.97% 5.87% 4.32% 0.27% 0.01% 56.07%
JANE.BEN 3.37% 1.54% 6.24% 6.20% 0.21% 0.02% 55.35%
FANNY.PRI 2.46% 5.32% 8.55% 2.11% 0.22% 0.01% 55.18%
MARY.CRA 2.13% 1.37% 4.27% 4.73% 0.20% 0.03% 55.97%
EMMA.WH 2.50% 3.44% 6.66% 2.76% 0.23% 0.01% 54.57%
JANE.FF 0.68% 0.54% 1.86% 8.53% 0.00% 0.00% 57.17%
HARRIET.SM 3.47% 1.82% 6.06% 5.98% 0.10% 0.03% 57.59%
CATHERIN.MOR 1.82% 3.02% 5.79% 4.32% 0.13% 0.07% 56.19%
ELEANOR.TIL 1.43% 1.08% 3.05% 5.80% 0.15% 0.00% 57.75%
ANNE.ELL 2.97% 2.58% 6.51% 2.71% 0.16% 0.02% 54.07%
0.51% 4.71% 4.04% 0.00%
FERRARS.ED 1.05% 1.09% 2.71% 8.42% 0.12% 0.04% 56.33%
JOHN.DAS 2.10% 2.36% 5.32% 3.34% 0.06% 0.00% 54.80%
BRANDON.COL 1.79% 3.13% 5.48% 6.46% 0.13% 0.00% 57.24%
DARCY.FW 2.09% 0.87% 3.60% 6.54% 0.17% 0.00% 55.58%
COLLINS.MR 1.54% 1.90% 4.03% 5.91% 0.08% 0.00% 54.62%
EDMUND.BER 1.52% 2.56% 4.86% 4.45% 0.19% 0.02% 56.84%
HENRY.CRA 2.53% 1.99% 4.99% 4.62% 0.13% 0.01% 55.51%
KNIGHTLE.MR 3.28% 2.68% 6.57% 4.37% 0.23% 0.00% 55.40%
FRANK.CH 1.18% 2.18% 4.00% 7.15% 0.09% 0.00% 55.11%
HENRY.TIL 1.10% 1.22% 3.18% 3.59% 0.21% 0.08% 54.68%
JOHN.THP 2.06% 0.38% 3.38% 5.88% 0.03% 0.00% 56.42%
WENTWORT.CPT 1.80% 1.61% 3.97% 6.37% 0.13% 0.00% 57.89%
WALTER.SIR 2.68% 1.64%
5.14% 3.17% 0.00% 0.11% 50.87%
NARRATOR.SS 2.51% 5.38% 9.53% 0.00% 0.11% 0.02% 50.62%
NARRATOR.PP 3.05% 5.27% 9.95% 0.00% 0.09% 0.01% 50.93%
NARRATOR.MP 2.72% 4.54% 8.24% 0.01% 0.13% 0.03% 50.61%
NARRATOR.EM 2.75% 4.33% 8.07% 0.00% 0.10% 0.02% 49.94%
NARRATOR.NA 1.87% 5.28% 8.51% 0.07% 0.09% 0.02% 50.16%
NARRATOR.PER 2.45% 4.10% 8.02%
0.01% 0.06% 0.02% 50.55%
AUSTEN.ALL 2.43% 3.41% 6.90% 2.51% 0.13% 0.02% 53.20%
Figure 1. Percentage of Words Found in Austen Characters' Text Files.
Eric Johnson is a former Editor of
TEXT Technology: The Journal of Computer Text
Processing. He can be contacted via email as JohnsonE@jupiter.dsu.edu
y actualizada por grupo "mmm".
Para cualquier cambio, sugerencia,etc. contactar con: firstname.lastname@example.org
© a.r.e.a./Dr.Vicente Forés López
Universitat de València Press
Creada: 22/02/2000 Última Actualización: 11/03/2000