How Jane Austen's Characters Talk
    By
    Eric Johnson

    Version 2.0 of an article first published in TEXT Technology, 4.4 (Winter, 1994), 263-267.

        The Oxford Electronic Text Library Edition of The Complete Works of Jane Austen
        contains SGML tags that precisely identify, among other things, the speaker of every word
        of dialogue in Austen's six novels. (See note 1.) A computer program that I created,
        JADIALOG, notes the speaker tags in the text and separates the dialogue of each of the
        characters (including narrators) accordingly. Some dialogue had to be assigned to
        composites (such as "inseparable groups") or to imaginary or unidentifiable speakers, but
        most of the talking is done by the named characters. In all, there are 141 speakers in the six
        Austen novels, and the number of speakers in each novel is rather uniform: four novels have
        24 speakers; one has 23; one has 22. Using such an electronic edition and computer
        programs designed to read the tags in the texts (see note 2), scholars can do research about
        characters' dialogue that would not be possible without them -- or would be possible only
        with very arduous labor.

        Using my program JATALK, it is easy to determine how much each character speaks. As
        might be supposed, the narrators talk more than most of the characters: from a high of
        69,240 words (Mansfield Park) to 40,686 (Northanger Abbey). Emma Woodhouse talks
        more than any other character (42,800 words); Fanny Price is the next most vocal, but she
        speaks less than half as much as Emma (21,156 words). Ford, the owner of the shop Emma
        visits, speaks less than any other named character: 14 words. Mr. Collins who will be
        remembered as giving frequent, long, formal speeches in Pride and Prejudice, speaks less
        than might be thought: 5,057 words.

        It can be interesting to determine what kinds of words the characters use with a program
        called FINDLIST. Figure 1 shows the percentage of words (from seven lists) that are found
        in the dialogue of characters and narrators. Thirteen male characters and thirteen female
        characters were selected; most of these are major characters, but a few memorable minor
        ones (like Mr. Collins) are included. Also, for comparison, the percentage for the full text of
        the six novels is given. See Figure 1.

        The first four lists contain pronouns. List 1 holds masculine pronouns ("he," "him," "his,"
        and "himself"). List 2 has feminine pronouns ("she," "her," "hers," and "herself"). List 3
        includes all the pronouns of lists 1 and 2, plus it contains the plural forms ("they," "them,"
        "their," and "themselves"). List 4 consists of first-person pronouns ("I," "me," "my," and
        "myself").

        One rationale for computing the percentage of pronouns (and other words) used by
        speakers is that such data may indicate the subjects about which characters speak a lot (or a
        little) and thus reveal something of the speakers' personalities. In some cases, the
        percentages are approximately what readers might expect; other cases may be a little
        surprising, and thus they may lead to new scrutiny of some of the characters.

        Frank Churchill and Edward Ferrars might be thought rather self-centered and even selfish.
        The high percentage of Me Words and the low percentage of Other Words for these two
        characters seems to show that they talk much more about themselves than about others
        (especially Ferrars), thus establishing their characters as expected. Something similar is
        shown for Jane Fairfax -- who talks the most of herself and the least of others of any
        character in this study.

        The lowest percentage of Me Words and the high percentage of Other Words for Fanny
        Price and Anne Elliot matches readers' expectations for Austen's most selfless heroines. The
        percentages for Emma Woodhouse may not be exactly as supposed. Her use of Me Words
        is the third lowest, and her use of Other Words is the second highest. Probably few readers
        would place Emma in a category with Fanny and Anne -- she is frequently interfering,
        scheming, and match-making; she acts rather self- centered. However, Emma's pronoun use
        might suggest a re-evaluation of her nature.

        Jane Bennet, Harriet Smith, and Colonel Brandon seem to talk a good deal about themselves
        as well as about others. In contrast, Henry Tilney says little about himself and little about
        others. Austen scholars might consider what is thus shown about these characters.

        Based on their use of pronouns in Lists 1 and 2, it appears that characters tend to talk more
        about the opposite sex than about their own -- especially the female speakers. Ten of the
        thirteen women speak more about men than about women. Seven of the thirteen men speak
        more about women than about men. By contrast, all six narrators speak more about women,
        and they all do so by a substantial margin.

        The words in List 5 ("love," "loved," loves," "affection," and "passion") are found in the six
        novels 940 times: they are only thirteen one-hundredths of one percent (0.13%) of all words
        used. Since the plots of Austen novels deal with courtship and marriage, it is somewhat
        surprising that such low percentages of Love Words are found in the dialogue and
        throughout the novels. More women seem to speak of love: seven women use the words
        0.20% or more, but only two men do so. It may seem natural that the elderly Sir Walter and
        Mrs Clay never use the words. There may or may not be a reason that Jane Fairfax does not
        use the words: readers will remember that she is concealing her engagement to Frank
        Churchill. It does not seem obvious that Elizabeth Bennet should use a higher percentage of
        Love Words than any other character.

        There is little color in an Austen novel. The fourteen words in List 6 ("blue," "green," "red,"
        and so on) are found 152 times in the six novels: only 0.02%. By comparison, several
        novels by Anthony Trollope have about double the percentage of color words as those of
        Austen, and two novels by Nathaniel Hawthorne have about twelve times the percentage;
        even the no-nonsense, western action novels of James Janke have an average of six times the
        percentage of color words as those of Austen. The percentages are so small for each
        Austen character that it is difficult to make distinctions, but it seems odd that Sir Walter's
        dialogue has the highest percentage of color words.

        List 7 consists of 125 words found very commonly in all kinds of writing: prepositions,
        pronouns, articles, forms of "to be" and "to have," and so on. It might be thought that the
        nature of narration ("she said...," "he has been ...") would require that Austen's narrators
        would use more common words than the speakers, but just the opposite is true. All but two
        of the speakers use at least three percent more common words than the narrators. Only the
        dialogue of Mrs. Clay and Sir Eliot fall within the range of the narrators' use of common
        words. There is no apparent reason that those two characters should use fewer common
        words than other characters.

        The overall percentage of common words in the novels of Austen is a little higher than that
        in novels of other nineteenth-century writers such as Trollope, Dickens, James, and Twain.
        If there were tags to identify the dialogue of speakers in the electronic editions of these
        writers, it would be interesting to determine whether their characters use more common
        words than their narrators use.

        This article has just touched the surface of the kind of study of the novels of Jane Austen
        that can be done with the texts in electronic form with the dialogue of each speaker identified
        with SGML tags. Not only is more sophisticated analysis of her novels possible with
        editions in which characters' speech can be isolated, compared, and contrasted, the editions
        stimulate it. Similar electronic versions of the novels of other writers would be truly helpful
        for literary scholars.



        Notes

        1. For information about this edition, see my "Electronic Jane Austen and S. T. Coleridge,"
        TEXT Technology, 4.2 (Summer, 1994), 93-100.

        2. The computer analysis of novels mentioned in this article, including the output shown in
        Figure 1, was produced by programs that I wrote in SPITBOL-386. A description of this
        valuable compiler is found in my "SPITBOL-386: The Language of Choice for Non-numeric
        Computing," TEXT Technology, 4.3 (Autumn, 1994), 177-185.


    Character      List 1  List 2  List 3  List 4  List 5  List 6  List 7

    Text File        He      She    Other    Me     Love    Color  Common

                    Words   Words   Words   Words   Words   Words   Words
     
     

    ELINOR.DAS      3.21%   2.22%   6.28%   3.83%   0.21%   0.00%  55.82%

    MARIANNE.DAS    2.17%   0.77%   3.67%   6.71%   0.21%   0.00%  56.85%

    ELIZABET.BEN    3.10%   1.97%   5.87%   4.32%   0.27%   0.01%  56.07%

    JANE.BEN        3.37%   1.54%   6.24%   6.20%   0.21%   0.02%  55.35%

    FANNY.PRI       2.46%   5.32%   8.55%   2.11%   0.22%   0.01%  55.18%

    MARY.CRA        2.13%   1.37%   4.27%   4.73%   0.20%   0.03%  55.97%

    EMMA.WH         2.50%   3.44%   6.66%   2.76%   0.23%   0.01%  54.57%

    JANE.FF         0.68%   0.54%   1.86%   8.53%   0.00%   0.00%  57.17%

    HARRIET.SM      3.47%   1.82%   6.06%   5.98%   0.10%   0.03%  57.59%

    CATHERIN.MOR    1.82%   3.02%   5.79%   4.32%   0.13%   0.07%  56.19%

    ELEANOR.TIL     1.43%   1.08%   3.05%   5.80%   0.15%   0.00%  57.75%

    ANNE.ELL        2.97%   2.58%   6.51%   2.71%   0.16%   0.02%  54.07%

    CLAY.MRS        1.68%   0.51%   4.71%   4.04%   0.00%   0.00%  50.34%
     
     

    FERRARS.ED      1.05%   1.09%   2.71%   8.42%   0.12%   0.04%  56.33%

    JOHN.DAS        2.10%   2.36%   5.32%   3.34%   0.06%   0.00%  54.80%

    BRANDON.COL     1.79%   3.13%   5.48%   6.46%   0.13%   0.00%  57.24%

    DARCY.FW        2.09%   0.87%   3.60%   6.54%   0.17%   0.00%  55.58%

    COLLINS.MR      1.54%   1.90%   4.03%   5.91%   0.08%   0.00%  54.62%

    EDMUND.BER      1.52%   2.56%   4.86%   4.45%   0.19%   0.02%  56.84%

    HENRY.CRA       2.53%   1.99%   4.99%   4.62%   0.13%   0.01%  55.51%

    KNIGHTLE.MR     3.28%   2.68%   6.57%   4.37%   0.23%   0.00%  55.40%

    FRANK.CH        1.18%   2.18%   4.00%   7.15%   0.09%   0.00%  55.11%

    HENRY.TIL       1.10%   1.22%   3.18%   3.59%   0.21%   0.08%  54.68%

    JOHN.THP        2.06%   0.38%   3.38%   5.88%   0.03%   0.00%  56.42%

    WENTWORT.CPT    1.80%   1.61%   3.97%   6.37%   0.13%   0.00%  57.89%

    WALTER.SIR      2.68%   1.64%   5.14%   3.17%   0.00%   0.11%  50.87%
     
     

    NARRATOR.SS     2.51%   5.38%   9.53%   0.00%   0.11%   0.02%  50.62%

    NARRATOR.PP     3.05%   5.27%   9.95%   0.00%   0.09%   0.01%  50.93%

    NARRATOR.MP     2.72%   4.54%   8.24%   0.01%   0.13%   0.03%  50.61%

    NARRATOR.EM     2.75%   4.33%   8.07%   0.00%   0.10%   0.02%  49.94%

    NARRATOR.NA     1.87%   5.28%   8.51%   0.07%   0.09%   0.02%  50.16%

    NARRATOR.PER    2.45%   4.10%   8.02%   0.01%   0.06%   0.02%  50.55%
     
     

    AUSTEN.ALL      2.43%   3.41%   6.90%   2.51%   0.13%   0.02%  53.20%

    Figure 1. Percentage of Words Found in Austen Characters' Text Files.
     


        Eric Johnson is a former Editor of TEXT Technology: The Journal of Computer Text
        Processing. He can be contacted via email as JohnsonE@jupiter.dsu.edu


    More articles: [Next] [1] [2] [3] [4] [5] [6] [7] [8]

    Página creada y actualizada por grupo "mmm".
    Para cualquier cambio, sugerencia,etc. contactar con: fores@uv.es
    © a.r.e.a./Dr.Vicente Forés López
    Universitat de València Press
    Creada: 22/02/2000 Última Actualización: 11/03/2000