Version 2.0 of an article first published in TEXT Technology, 4.4 (Winter, 1994), 263-267.
The Oxford Electronic Text Library Edition of The
Complete Works of Jane Austen
contains SGML tags that precisely identify, among
other things, the speaker of every word
of dialogue in Austen's six novels. (See note 1.)
A computer program that I created,
JADIALOG, notes the speaker tags in the text and
separates the dialogue of each of the
characters (including narrators) accordingly. Some
dialogue had to be assigned to
composites (such as "inseparable groups") or to
imaginary or unidentifiable speakers, but
most of the talking is done by the named characters.
In all, there are 141 speakers in the six
Austen novels, and the number of speakers in each
novel is rather uniform: four novels have
24 speakers; one has 23; one has 22. Using such
an electronic edition and computer
programs designed to read the tags in the texts
(see note 2), scholars can do research about
characters' dialogue that would not be possible
without them -- or would be possible only
with very arduous labor.
Using my program JATALK, it is easy to determine
how much each character speaks. As
might be supposed, the narrators talk more than
most of the characters: from a high of
69,240 words (Mansfield Park) to 40,686 (Northanger
Abbey). Emma Woodhouse talks
more than any other character (42,800 words); Fanny
Price is the next most vocal, but she
speaks less than half as much as Emma (21,156 words).
Ford, the owner of the shop Emma
visits, speaks less than any other named character:
14 words. Mr. Collins who will be
remembered as giving frequent, long, formal speeches
in Pride and Prejudice, speaks less
than might be thought: 5,057 words.
It can be interesting to determine what kinds of
words the characters use with a program
called FINDLIST. Figure 1 shows the percentage of
words (from seven lists) that are found
in the dialogue of characters and narrators. Thirteen
male characters and thirteen female
characters were selected; most of these are major
characters, but a few memorable minor
ones (like Mr. Collins) are included. Also, for
comparison, the percentage for the full text of
the six novels is given. See Figure 1.
The first four lists contain pronouns. List 1 holds
masculine pronouns ("he," "him," "his,"
and "himself"). List 2 has feminine pronouns ("she,"
"her," "hers," and "herself"). List 3
includes all the pronouns of lists 1 and 2, plus
it contains the plural forms ("they," "them,"
"their," and "themselves"). List 4 consists of first-person
pronouns ("I," "me," "my," and
"myself").
One rationale for computing the percentage of pronouns
(and other words) used by
speakers is that such data may indicate the subjects
about which characters speak a lot (or a
little) and thus reveal something of the speakers'
personalities. In some cases, the
percentages are approximately what readers might
expect; other cases may be a little
surprising, and thus they may lead to new scrutiny
of some of the characters.
Frank Churchill and Edward Ferrars might be thought
rather self-centered and even selfish.
The high percentage of Me Words and the low percentage
of Other Words for these two
characters seems to show that they talk much more
about themselves than about others
(especially Ferrars), thus establishing their characters
as expected. Something similar is
shown for Jane Fairfax -- who talks the most of
herself and the least of others of any
character in this study.
The lowest percentage of Me Words and the high percentage
of Other Words for Fanny
Price and Anne Elliot matches readers' expectations
for Austen's most selfless heroines. The
percentages for Emma Woodhouse may not be exactly
as supposed. Her use of Me Words
is the third lowest, and her use of Other Words
is the second highest. Probably few readers
would place Emma in a category with Fanny and Anne
-- she is frequently interfering,
scheming, and match-making; she acts rather self-
centered. However, Emma's pronoun use
might suggest a re-evaluation of her nature.
Jane Bennet, Harriet Smith, and Colonel Brandon seem
to talk a good deal about themselves
as well as about others. In contrast, Henry Tilney
says little about himself and little about
others. Austen scholars might consider what is thus
shown about these characters.
Based on their use of pronouns in Lists 1 and 2,
it appears that characters tend to talk more
about the opposite sex than about their own -- especially
the female speakers. Ten of the
thirteen women speak more about men than about women.
Seven of the thirteen men speak
more about women than about men. By contrast, all
six narrators speak more about women,
and they all do so by a substantial margin.
The words in List 5 ("love," "loved," loves," "affection,"
and "passion") are found in the six
novels 940 times: they are only thirteen one-hundredths
of one percent (0.13%) of all words
used. Since the plots of Austen novels deal with
courtship and marriage, it is somewhat
surprising that such low percentages of Love Words
are found in the dialogue and
throughout the novels. More women seem to speak
of love: seven women use the words
0.20% or more, but only two men do so. It may seem
natural that the elderly Sir Walter and
Mrs Clay never use the words. There may or may not
be a reason that Jane Fairfax does not
use the words: readers will remember that she is
concealing her engagement to Frank
Churchill. It does not seem obvious that Elizabeth
Bennet should use a higher percentage of
Love Words than any other character.
There is little color in an Austen novel. The fourteen
words in List 6 ("blue," "green," "red,"
and so on) are found 152 times in the six novels:
only 0.02%. By comparison, several
novels by Anthony Trollope have about double the
percentage of color words as those of
Austen, and two novels by Nathaniel Hawthorne have
about twelve times the percentage;
even the no-nonsense, western action novels of James
Janke have an average of six times the
percentage of color words as those of Austen. The
percentages are so small for each
Austen character that it is difficult to make distinctions,
but it seems odd that Sir Walter's
dialogue has the highest percentage of color words.
List 7 consists of 125 words found very commonly
in all kinds of writing: prepositions,
pronouns, articles, forms of "to be" and "to have,"
and so on. It might be thought that the
nature of narration ("she said...," "he has been
...") would require that Austen's narrators
would use more common words than the speakers, but
just the opposite is true. All but two
of the speakers use at least three percent more
common words than the narrators. Only the
dialogue of Mrs. Clay and Sir Eliot fall within
the range of the narrators' use of common
words. There is no apparent reason that those two
characters should use fewer common
words than other characters.
The overall percentage of common words in the novels
of Austen is a little higher than that
in novels of other nineteenth-century writers such
as Trollope, Dickens, James, and Twain.
If there were tags to identify the dialogue of speakers
in the electronic editions of these
writers, it would be interesting to determine whether
their characters use more common
words than their narrators use.
This article has just touched the surface of the
kind of study of the novels of Jane Austen
that can be done with the texts in electronic form
with the dialogue of each speaker identified
with SGML tags. Not only is more sophisticated analysis
of her novels possible with
editions in which characters' speech can be isolated,
compared, and contrasted, the editions
stimulate it. Similar electronic versions of the
novels of other writers would be truly helpful
for literary scholars.
1. For information about this edition, see my "Electronic
Jane Austen and S. T. Coleridge,"
TEXT Technology, 4.2 (Summer, 1994), 93-100.
2. The computer analysis of novels mentioned in this
article, including the output shown in
Figure 1, was produced by programs that I wrote
in SPITBOL-386. A description of this
valuable compiler is found in my "SPITBOL-386: The
Language of Choice for Non-numeric
Computing," TEXT Technology, 4.3 (Autumn, 1994),
177-185.
Character List 1 List 2 List 3 List 4 List 5 List 6 List 7
Text File He She Other Me Love Color Common
Words Words Words Words
Words Words Words
ELINOR.DAS 3.21% 2.22% 6.28% 3.83% 0.21% 0.00% 55.82%
MARIANNE.DAS 2.17% 0.77% 3.67% 6.71% 0.21% 0.00% 56.85%
ELIZABET.BEN 3.10% 1.97% 5.87% 4.32% 0.27% 0.01% 56.07%
JANE.BEN 3.37% 1.54% 6.24% 6.20% 0.21% 0.02% 55.35%
FANNY.PRI 2.46% 5.32% 8.55% 2.11% 0.22% 0.01% 55.18%
MARY.CRA 2.13% 1.37% 4.27% 4.73% 0.20% 0.03% 55.97%
EMMA.WH 2.50% 3.44% 6.66% 2.76% 0.23% 0.01% 54.57%
JANE.FF 0.68% 0.54% 1.86% 8.53% 0.00% 0.00% 57.17%
HARRIET.SM 3.47% 1.82% 6.06% 5.98% 0.10% 0.03% 57.59%
CATHERIN.MOR 1.82% 3.02% 5.79% 4.32% 0.13% 0.07% 56.19%
ELEANOR.TIL 1.43% 1.08% 3.05% 5.80% 0.15% 0.00% 57.75%
ANNE.ELL 2.97% 2.58% 6.51% 2.71% 0.16% 0.02% 54.07%
CLAY.MRS 1.68%
0.51% 4.71% 4.04% 0.00%
0.00% 50.34%
FERRARS.ED 1.05% 1.09% 2.71% 8.42% 0.12% 0.04% 56.33%
JOHN.DAS 2.10% 2.36% 5.32% 3.34% 0.06% 0.00% 54.80%
BRANDON.COL 1.79% 3.13% 5.48% 6.46% 0.13% 0.00% 57.24%
DARCY.FW 2.09% 0.87% 3.60% 6.54% 0.17% 0.00% 55.58%
COLLINS.MR 1.54% 1.90% 4.03% 5.91% 0.08% 0.00% 54.62%
EDMUND.BER 1.52% 2.56% 4.86% 4.45% 0.19% 0.02% 56.84%
HENRY.CRA 2.53% 1.99% 4.99% 4.62% 0.13% 0.01% 55.51%
KNIGHTLE.MR 3.28% 2.68% 6.57% 4.37% 0.23% 0.00% 55.40%
FRANK.CH 1.18% 2.18% 4.00% 7.15% 0.09% 0.00% 55.11%
HENRY.TIL 1.10% 1.22% 3.18% 3.59% 0.21% 0.08% 54.68%
JOHN.THP 2.06% 0.38% 3.38% 5.88% 0.03% 0.00% 56.42%
WENTWORT.CPT 1.80% 1.61% 3.97% 6.37% 0.13% 0.00% 57.89%
WALTER.SIR 2.68% 1.64%
5.14% 3.17% 0.00% 0.11% 50.87%
NARRATOR.SS 2.51% 5.38% 9.53% 0.00% 0.11% 0.02% 50.62%
NARRATOR.PP 3.05% 5.27% 9.95% 0.00% 0.09% 0.01% 50.93%
NARRATOR.MP 2.72% 4.54% 8.24% 0.01% 0.13% 0.03% 50.61%
NARRATOR.EM 2.75% 4.33% 8.07% 0.00% 0.10% 0.02% 49.94%
NARRATOR.NA 1.87% 5.28% 8.51% 0.07% 0.09% 0.02% 50.16%
NARRATOR.PER 2.45% 4.10% 8.02%
0.01% 0.06% 0.02% 50.55%
AUSTEN.ALL 2.43% 3.41% 6.90% 2.51% 0.13% 0.02% 53.20%
Figure 1. Percentage of Words Found in Austen Characters' Text Files.
Eric Johnson is a former Editor of
TEXT Technology: The Journal of Computer Text
Processing. He can be contacted via
email as JohnsonE@jupiter.dsu.edu
More articles: [Next] [1] [2] [3] [4] [5] [6] [7] [8]
Página creada
y actualizada por grupo "mmm".
Para cualquier cambio,
sugerencia,etc. contactar con: fores@uv.es
© a.r.e.a./Dr.Vicente
Forés López
Universitat de València
Press
Creada: 22/02/2000
Última Actualización: 11/03/2000