Professor-Created Computer Programs for Student Research

by Eric Johnson

Dakota State University, Madison, South Dakota 57042
Email: JohnsonE@jupiter.dsu.edu

Published in Computers and the Humanities, 30.2 (April, 1996), 171-179.

Key words: computer programming, Icon, SNOBOL4, SPITBOL, text analysisAbstract
Many university students are eager to use computers to analyze and compare texts and to do various kinds of computerized literary research. If students themselves cannot create the software they want to use, and if the desired software is not available from commercial sources, a professor can be of great assistance by writing computer programs for students. Fifteen professor-created programs for text analysis are described.

Introduction
Increasingly, university students are willing, and, often, extremely eager, to use computers for their research. At the same time, it is rare that students who have not had fairly extensive training in computer science can create programs sufficiently sophisticated to be of significant assistance to them. Therefore, students are at the mercy of software developers to provide them with the computer applications that they need. However, commercial software developers often do not know what applications are wanted (especially in the humanities) and, in any case, there is little financial incentive for firms to create and market specialized programs for academic research.
Professors who can create computer software, and who, of course, know the kinds of work their students need to do, can greatly assist their students by writing programs for their use. As a professor of English who considered a career in computer science, I have written a series of programs for my students and for my colleagues' students. I have found programming in the area of text analysis for students of literature to be rewarding and enjoyable. Sometimes I wrote programs because students told me that they could not find software that would help them do specific kinds of research that they wanted to do; other times I wrote a program and then suggested to my students that they might find it useful. In any case, a number of students have made use of my professor-created software, and they were able to perform research that was often impossible without it.

Examples of Professor-Created Programs
DIALOG and DIALOG2
A student enrolled in my section of English Literature II was interested in writing a paper about how much dialogue there is in several nineteenth-century novels. The novels she wanted to study were available on campus in machine-readable form. I wrote a program called DIALOG that simply counted the words found inside quotation marks and compared that number with the total number of words in the novel. Using that program, the student wrote an excellent paper that concluded that novels that are thought of as filled with talking were often not so much so as some others that were thought to be mostly narrative. DIALOG was a little trickier to construct than I had anticipated (due to the practice of using quotation marks at the start of a new paragraph that contains dialogue continuing from the preceding paragraph), but writing the code required only a few hours one evening. (Brief descriptions of DIALOG and of the other programs mentioned in this paper are found in the Appendix at the end of this paper).
DIALOG was used by students other than the one for whom it was written, and several of them could see how extensions to the program could make it more useful for their research. DIALOG2 contains two kinds of extensions. First, it computes the number and percentage of words in quotation marks in each chapter of a text file that is marked with appropriate tags (such as <C 1> at the start of the first chapter). DIALOG2 also produces a simple graph of the percentage of quotation in each chapter.
Total words in JEREMIAH.ASC: 55,241.
Quoted words: 19,586 (35.4555493%)


Words Quoted:
Chapter Number Percent Graph of Percent
------- ------ -------
   1      632    19.2  XXXXXXXXXXXXXXXXXXX
   2      937    57.2  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   3      943    40.6  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   4     1174    23.8  XXXXXXXXXXXXXXXXXXXXXXXX
   5     1643    38.9  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   6      663    36.2  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   7     1141    44.3  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   8     3131    43.6  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
   9      411    14.9  XXXXXXXXXXXXXXX
  10      808    37.5  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  11      833    42.9  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  12      547    31.8  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  13     1071    36.1  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  14      952    40.6  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  15      854    29.8  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  16      539    37.9  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  17     1358    43.5  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  18      922    54.5  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  19      220    12.8  XXXXXXXXXXXXX
  20      608    29.0  XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  21      199    44.4  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Figure 1. DIALOG2 indicates the amount of quotation in chapters of Jeremiah Bacon.

Using the output from DIALOG2 (see Figure 1), students were able to note the relatively large amount of dialogue in the American western novel Jeremiah Bacon and to compare that with other novels. Some students were interested in tracking the rhythm of chapters containing larger and smaller amounts of talking.
One student cleverly used DIALOG2 to count and locate special classes of words. She wanted to record the number and frequency of words referring to love in a series of sonnets. The texts of the sonnets (which contained no dialogue, no quotation marks) were edited to place words such as "love," "affection," and "passion" within quotation marks. When this edited text was processed by DIALOG2, the output was similar to Figure 1, but it showed the number and percentage of love-words.
BITZER
A technical writing teacher wanted an indexing program that his students could use to produce indexes for the software manuals they had written. He wanted the program to index the words in a text by page number or by line number. I created a program called BITZER to do such indexing. (The program is named after a character in Dickens' Hard Times; he is a light porter of "the steadiest principle," and "all his proceedings were the result of the nicest and coldest calculation." Unlike most commercial software producers, professors can be whimsical in naming programs.) BITZER will produce an index showing the page (or line) location of every word in a text, and it can be set to exclude indexing for specific words (found on a "stop list").
BITZER recognizes unique page codes in the text, and the program thus captures page numbers for indexing. An ingenious student wanted to index the words of lyrics of songs by the song titles. He simply substituted song titles for page numbers in the texts of the lyrics, and BITZER produced a list of words indexed by the titles of the songs in which they are found.
WORDS and IDENT
Students often want to count the number of words in a text file. There are many programs (including most word processors) that will do that. However, sometimes students need to control the definition of what constitutes a word form: for example, hyphens, apostrophes, and numeric digits may or may not be wanted as constituent parts of word forms. WORDS is a program that counts the number of running words in a text and the number of unique word forms -- based on parameters for recognition of a word that the user can set. Perhaps in order to collect data to support an argument about the disputed authorship of texts, students sometimes want side-by-side comparisons of the counts and percentages of selected words in two texts. IDENT produces the counts for such comparisons.
ACTORS
A director of a campus play was extremely interested to know if a computer program could identify which characters in a play were never on stage simultaneously -- using such information a director could cast actors in multiple roles. ACTORS is a program that processes a play's text and notes each character's entrance or exit. With this information, ACTORS can produce tables of characters who do (and do not) appear on stage simultaneously. Using these tables, the program can suggest multiple casting -- often with the minimum number of actors -- see Figure 2.
Possible Doubling for Performance of HAMLET with Minimum Number of Actors


Actor 1   Claudius, Army, Barnardo, OneWithRecorder, Reynaldo, Sailors,
          SecondClown, Servant

Actor 2   Gertrude, Captain, Francisco

Actor 3   Guard, Ambassadors, Cornelius, FirstClown, Followers, Ghost

Actor 4   Guildenstern, Attendants, Council, Marcellus, Messenger, Priest

Actor 5   Hamlet

Actor 6   Horatio, Others

Actor 7   Lords, Valtemand

Actor 8   Ophelia, Colours

Actor 9   PlayerFive, Drummer, Lucianus, Prologue

Actor 10  PlayerFour, Fortinbras

Actor 11  PlayerKing, Laertes

Actor 12  PlayerQueen, Osric

Actor 13  PlayerThree

Actor 14  Polonius

Actor 15  Rosencrantz
Figure 2. Output from ACTORS.

For no other program mentioned in this paper was the code as time-consuming to write and as troublesome to debug as ACTORS. I started creating the application during a Christmas break as a labor of love; by the time it ran properly -- more than a semester later -- love was far from my chief feeling for the program. To write the code to compute which characters are on stage simultaneously and to compute which characters are never on stage simultaneously is not a simple task, but it was child's play compared with writing the code to determine the possible doubling shown in Figure 2.
SHAKWORD and CONCORD
A colleague teaching a course in Shakespeare wanted his students to be able to search an electronic text (containing SGML-type tags-- see note 1) in order to index the location (play title, act, scene, and line) of any word in the complete works; I created a program named SHAKWORD to do that -- see Figure 3. I also created a menu system for my colleague's students to allow them to access SHAKWORD and seven other programs -- including a program called CONCORD that produces a key-word-in-context concordance for a text. CONCORD was designed to process novels such as Moby Dick (see Figure 4), but it works well with plays too.
Words        References

play         Ham, 1:2,84; Ham, 1:2,255; Ham, 2:2,438; Ham, 2:2,442;
             Ham, 2:2,538; Ham, 2:2,539; Ham, 2:2,591; Ham, 2:2,597;
             Ham, 3:1,22B; Ham, 3:1,135; Ham, 3:1,184; Ham, 3:2,29;
             Ham, 3:2,38; Ham, 3:2,43; Ham, 3:2,73; Ham, 3:2,86;
             Ham, 3:2,88; Ham, 3:2,129; Ham, 3:2,133; Ham, 3:2,141;
             Ham, 3:2,218; Ham, 3:2,225; Ham, 3:2,226; Ham, 3:2,256;
             Ham, 3:2,260; Ham, 3:2,338; Ham, 3:2,352; Ham, 3:2,359;
             Ham, 4:7,88; Ham, 5:1,90; Ham, 5:2,32; Ham, 5:2,199;
             Ham, 5:2,213A; Ham, 5:2,228; Ham, 5:2,236; Ham, 5:2,237B;
             Ham, 5:2,253B; Ham, AP:P,4; Ham, AP:P,12; Oth, 1:1,134;
             Oth, 2:1,118; Oth, 2:1,177; Oth, 2:3,327; Oth, 2:3,338;
             Oth, 3:1,1; Oth, 5:2,254
act          Ham, 1:2,205; Ham, 1:3,60; Ham, 1:5,84; Ham, 3:1,129;
             Ham, 3:2,76; Ham, 3:3,91; Ham, 3:4,39B; Ham, 3:4,50B;
             Ham, 3:4,50B; Ham, 5:1,11; Ham, 5:1,11; Ham, 5:1,12;
             Ham, 5:2,286; Oth, 1:1,62; Oth, 1:1,153; Oth, 1:1,173;
             Oth, 2:1,228; Oth, 3:3,139; Oth, 3:3,332; Oth, 4:2,167;
             Oth, 5:2,196B; Oth, 5:2,210; Oth, 5:2,218; Oth, 5:2,381
scene        Ham, 2:2,401; Ham, 2:2,592; Ham, 3:2,74
actor        Ham, 2:2,393; Ham, 2:2,397; Ham, 3:2,97
actress      
actors       Ham, 2:2,394; Ham, 2:2,398
Figure 3. Output from SHAKWORD showing the locations of six words in Hamlet and in Othello.
       239.10 nging sign over the door with a WHITE painting upon it, faintly repre
       418.6  ply brown and burnt, making his WHITE teeth dazzling by the contrast;
       609.4   me.  I remembered a story of a WHITE man --a whaleman too--who, fall
       616.12  heard of a hot sun's tanning a WHITE man into a purplish yellow one.
      1297.11 e yet afloat.  And ever, as the WHITE moon shows her affrighted face 
      1330.12 observe his prayer, and so many WHITE bolts, upon his prison.  Then J
      1733.10 e so companionable; as though a WHITE man were anything more dignifie
      1864.14 starboard hand till we opened a WHITE church to the larboard, and the
      3003.2   in the moonlight; and like the WHITE ivory tusks of some huge elepha
      3046.9  eedlessly, ye harpooneers; good WHITE cedar plank is raised full thre
      3462.8  ity in looking up at him; and a WHITE man standing before him seemed 
      3462.15 an standing before him seemed a WHITE flag come to beg truce of a for
      3556.2  ers of discernment.  So that no WHITE sailor seriously contradicted h
      3562.5  mness was owing to the barbaric WHITE leg upon which he partly stood.
                                                .
                                                .
                                                .
      8274.6  -head. In the distance, a great WHITE mass lazily rose, and rising hi
      8281.10 he breaches!  right ahead!  The WHITE Whale, the White Whale!  Upon t
      8282.2  ht ahead!  The White Whale, the WHITE Whale!  Upon this, the seamen r
      8292.3   did he distinctly perceive the WHITE mass, than with a quick intensi
      8307.13 m, than to have seen thee, thou WHITE ghost!                         
                                                .
                                                .
                                                .
     16362.12 c fountain in his head, did the WHITE Whale now reveal his vicinity; 
     16370.3  his immeasureable bravadoes the WHITE Whale tossed himself salmon-lik
     16390.14 p's three masts to his eye; the WHITE Whale churning himself into fur
     16399.3  his untraceable evolutions, the WHITE Whale so crossed and recrossed,
     16414.12  fast again.  That instant, the WHITE Whale made a sudden rush among 
     16429.6  rpendicularly from the sea, the WHITE Whale dashed his broad forehead
Figure 4. A small part of the output of CONCORD for Moby Dick.

Often professors and students are familiar with computer software that performs a desired kind of processing, but sometimes it will not work with the electronic texts at hand. My colleagues had purchased a site license to use Shakespeare: The Complete Works: Electronic Edition produced by Oxford University Press. There are several kinds of programs that produce indexes of texts, but they can not read and store the titles, acts, scenes, and lines as they are encoded in the Oxford electronic text of Shakespeare that we owned. Thus I created SHAKWORD to work with the specific electronic text that we had.
WORDCELL
Like SHAKWORD, a program called WORDCELL records the location of selected words in a text. However, WORDCELL records only the relative location of words within 66 cells (or blocks) in the text. Thus, words that are used in the first half of a text will appear in the first 33 cells; those that are used in the last tenth of a text will appear in the last six or seven cells.
SENT
In a programming class, students had learned to create a program that computed the number and percent of words of various lengths in a text (the number and percent of one-letter words, two-letter words, and so on). I was asked to create a program that would provide similar information about the lengths of sentences in a text. SENT calculates the number and percent of one-word sentences, two-word sentences, and so on.
JAFORMAT
The Oxford Electronic Text Library Edition of the Complete Works of Jane Austen is an excellent, accurate electronic edition of the six Austen novels encoded with SGML tags in conformance with the principles of the Text Encoding Initiative (see note 2). I wrote a series of programs to allow students to use the electronic editions of the Austen novels for research.
JAFORMAT processes the encoded Austen texts and removes the tags and reformats the text as necessary to produce clean ASCII text that can be loaded directly into a word-processed paper about Austen. It seems rather perverse to convert a carefully tagged text into a plain ASCII text, but sometimes a text without tags is what is needed.
JAWORDS
Because this Austen electronic edition codes page and line numbers corresponding to the standard printed edition (edited by R. W. Chapman), it was rather straight forward for me create a computer program to index words. JAWORDS produces an index of the page and line location of selected words in the SGML texts.
JATALK and JADIALOG
This Austen edition identifies the words of dialogue of each of the speakers in each novel. Using the tags of dialogue identification, JATALK counts the number of words of dialogue for each speaker (including the narrator), and thus shows exactly how much each character speaks. JADIALOG separates the dialogue into individual files for each speaker, thus making it available for processing with other programs.
Each of the programs designed to be used with the Austen electronic texts can be used with other electronic texts if they are encoded with exactly the same kinds of SGML markup. Students have, in fact, used the Austen programs to process other SGML texts.
FINDLIST
FINDLIST is a program that computes the percent of words on multiple lists that are found in multiple text files. As Figure 5 shows, it can be used for analysis of the multiple files of dialogue of characters in Austen novels (produced by JADIALOG). Students find it interesting to note how frequently various male and female characters use masculine and feminine pronouns (as well as other pronouns) and what percent of their dialogue contains words for love and color. Various characters use a greater or lower percentage of 125 common English words.
Character      List 1  List 2  List 3  List 4  List 5  List 6  List 7
Text File        He      She    Other    Me     Love    Color  Common
                Words   Words   Words   Words   Words   Words   Words

ELINOR.DAS      3.21%   2.22%   6.28%   3.83%   0.21%   0.00%  55.82%
MARIANNE.DAS    2.17%   0.77%   3.67%   6.71%   0.21%   0.00%  56.85%
ELIZABET.BEN    3.10%   1.97%   5.87%   4.32%   0.27%   0.01%  56.07%
JANE.BEN        3.37%   1.54%   6.24%   6.20%   0.21%   0.02%  55.35%
FANNY.PRI       2.46%   5.32%   8.55%   2.11%   0.22%   0.01%  55.18%
MARY.CRA        2.13%   1.37%   4.27%   4.73%   0.20%   0.03%  55.97%
EMMA.WH         2.50%   3.44%   6.66%   2.76%   0.23%   0.01%  54.57%
JANE.FF         0.68%   0.54%   1.86%   8.53%   0.00%   0.00%  57.17%
HARRIET.SM      3.47%   1.82%   6.06%   5.98%   0.10%   0.03%  57.59%
CATHERIN.MOR    1.82%   3.02%   5.79%   4.32%   0.13%   0.07%  56.19%
ELEANOR.TIL     1.43%   1.08%   3.05%   5.80%   0.15%   0.00%  57.75%
ANNE.ELL        2.97%   2.58%   6.51%   2.71%   0.16%   0.02%  54.07%
CLAY.MRS        1.68%   0.51%   4.71%   4.04%   0.00%   0.00%  50.34%

FERRARS.ED      1.05%   1.09%   2.71%   8.42%   0.12%   0.04%  56.33%
JOHN.DAS        2.10%   2.36%   5.32%   3.34%   0.06%   0.00%  54.80%
BRANDON.COL     1.79%   3.13%   5.48%   6.46%   0.13%   0.00%  57.24%
DARCY.FW        2.09%   0.87%   3.60%   6.54%   0.17%   0.00%  55.58%
COLLINS.MR      1.54%   1.90%   4.03%   5.91%   0.08%   0.00%  54.62%
EDMUND.BER      1.52%   2.56%   4.86%   4.45%   0.19%   0.02%  56.84%
HENRY.CRA       2.53%   1.99%   4.99%   4.62%   0.13%   0.01%  55.51%
KNIGHTLE.MR     3.28%   2.68%   6.57%   4.37%   0.23%   0.00%  55.40%
FRANK.CH        1.18%   2.18%   4.00%   7.15%   0.09%   0.00%  55.11%
HENRY.TIL       1.10%   1.22%   3.18%   3.59%   0.21%   0.08%  54.68%
JOHN.THP        2.06%   0.38%   3.38%   5.88%   0.03%   0.00%  56.42%
WENTWORT.CPT    1.80%   1.61%   3.97%   6.37%   0.13%   0.00%  57.89%
WALTER.SIR      2.68%   1.64%   5.14%   3.17%   0.00%   0.11%  50.87%

NARRATOR.SS     2.51%   5.38%   9.53%   0.00%   0.11%   0.02%  50.62%
NARRATOR.PP     3.05%   5.27%   9.95%   0.00%   0.09%   0.01%  50.93%
NARRATOR.MP     2.72%   4.54%   8.24%   0.01%   0.13%   0.03%  50.61%
NARRATOR.EM     2.75%   4.33%   8.07%   0.00%   0.10%   0.02%  49.94%
NARRATOR.NA     1.87%   5.28%   8.51%   0.07%   0.09%   0.02%  50.16%
NARRATOR.PER    2.45%   4.10%   8.02%   0.01%   0.06%   0.02%  50.55%

AUSTEN.ALL      2.43%   3.41%   6.90%   2.51%   0.13%   0.02%  53.20%
Figure 5. Output from FINDLIST showing the percentage of words found in Austen characters' text files.

PICKWICK
Occasionally professors who do not program want software created for their own research. One of my colleagues who teaches Shakespeare wanted to compare portions of plays attributed to Shakespeare that might have been written by someone else. He wanted only those questionable passages isolated. I created a program called PICKWICK that allows the user to specify lines of a scene in a Shakespeare play that can be picked or abstracted from the play and placed in a file. The play to be thus processed must contain SGML-type tags. The tags are removed from the portion of the play that is isolated. My colleague used IDENT to determine whether the disputed passages contained the same percentage of selected words as passages that were undisputed.
In passing, it might be noted that asking students to use programs is an excellent way of testing the programs. Often students use programs in truly unexpected ways. An early version of DIALOG contained a programming error (in a rounding routine) that could produce inaccurate results -- but only if the word counts were very small. Since the idea behind the program was to compute the amount of quotation in novels, the texts that DIALOG was tested with were rather large, and the counts were large, and thus the error went undetected for some time. A student used DIALOG to count the amount of quotation in a short poem, and the error in the program was discovered (and corrected) immediately since the word counts generated were very small.

What is Needed for Professors to Create Programs
Some things are essential in order for a professor to do programming. It is obvious that a programmer must have access to a computer and the necessary editor and compiler. In addition, and less obvious, it is essential that the professor has skills and knowledge of an appropriate computer language. Perhaps the reason so few faculty in liberal arts write programs is that they have learned (or attempted to learn) computer languages that are not good at doing what is wanted. As a student of COBOL, I attempted to create programs for word counting and analysis of texts. It sometimes took me months to write the programs, and often it was simply impossible to figure out how to make them do the kinds of text analysis that I wanted to do. It was the same with PL/I. RPG was hopeless. The C language is powerful and effective for many kinds of programming, but it can be difficult to use, and it is certainly not my computer language of choice for text analysis.
After being introduced to the SNOBOL4 computer language, I found, immediately, that I could write programs of greater sophistication than in any other language -- and I could write them far more quickly. I have used SNOBOL4 (or its speedy implementation called SPITBOL -- see note 3) for almost all of my programming ever since I learned the language. All of the programs mentioned in this paper were constructed in a 32-bit version of SPITBOL. Another language called Icon (see note 4) is newer than SNOBOL4, and it has similar power and ease of use. If professors in liberal arts (in particular, in English) are to have success creating computer programs, they will almost always write them in SNOBOL4 or Icon.
In addition to knowledge and skills of programming in an appropriate language, professors will create software for students only if they enjoy creating computer code. It is difficult to say why some folks like to write programs and others do not. It would be an incentive for professors to create software if they were paid by their universities to produce programs. Junior professors could be granted credit toward promotion and tenure for programming. However, it is not easy to work out fair procedures to provide such incentives.

Documentation
Sometimes students can be given a demonstration of how a program works, and thus they can learn to use it without having any written directions. However, software that has no permanent documentation will usually not be widely used, and it often is quickly forgotten. Thus, some kind of directions should be produced -- ideally, perhaps, written by the programmer. However, even if they are English composition teachers, most programmers do not like writing documentation for a computer program. Moreover, the more complex the program (and thus in need of directions for its use), the less the programmer wants to write documentation -- probably because after a long struggle to get the code to execute properly, writing directions for users seems like an anti-climax. An assistant who can write clear user documentation is a terrific asset to a programmer. In any case, documentation of one kind or another is usually necessary. I have sometimes published articles about programs that I have created (citations of my articles are included in the program descriptions in the appendix); often I give copies of my articles to students who want to use the software. They seem to learn how to use the programs easily after reading the articles.

Conclusion
There is little doubt that students can make much better use of computers if at least one of their professors can create the programs that are needed. At a minimum, professors must have programming skills and knowledge and a determination to write software -- and some incentives and perhaps assistance are probably also necessary if many programs are to be produced by professors.

Notes
¹ SGML-type tags are used in a form of markup similar to the international system known as Standard Generalized Markup Language (SGML). See note 2.
² The Standard Generalized Markup Language (SGML) and the Text Encoding Initiative (TEI) are exhaustively described in 1300 pages of Guidelines edited by Sperberg-McQueen and Burnard (1994). Electronic versions of the Guidelines are available on CD-ROM and via anonymous FTP; for information, visit the TEI Web site at http://www.uic.edu:80/orgs/tei/info/elect.html
Special issues of journals about SGML and the TEI have been edited by Lou Burnard (1995) and Nancy Ide and Jean Véronis (1995).
³ For an introduction to SNOBOL4 and SPITBOL, see Hockey (1985) and Johnson (1995).
⁴ The standard introduction and reference text for the Icon programming language is Griswold and Griswold (1990). Information about the language can be found on the Web at http://www.cs.arizona.edu:80/icon/www/

References
Burnard, Lou, Guest Editor. Special issue of TEXT Technology on Electronic Texts and the Text Encoding Initiative. 5.3 (Autumn, 1995).
Griswold, Ralph E. and Madge T. Griswold. The Icon Programming Language. Second ed. Englewood Cliffs, NJ: Prentice Hall, 1990.
Hockey, Susan. SNOBOL Programming for the Humanities. New York: Oxford University Press, 1985.
Ide, Nancy and Jean Véronis, Guest Editors. Special triple issue of Computers and the Humanities on The Text Encoding Initiative: Background and Contexts. Volume 29, numbers 1, 2, and 3 (February, April, and June, 1995).
Johnson, Eric. Computer Programming for the Humanities in SNOBOL4. Madison, SD: Dakota State University Press, 1995.
Sperberg-McQueen, C. M. and Lou Burnard, Editors. Guidelines for Electronic Text Encoding and Interchange (TEI P3). Chicago, Oxford: Text Encoding Initiative, 1994.

Appendix
Descriptions of Professor-Created Programs
Following are descriptions of programs that were created to run on 32-bit PCs (containing 386SX or better processors); they execute under native DOS 3.1 or above and under Windows 386 Enhanced Mode or under other DPMI hosts. Although the amount of RAM needed by the programs depends on the size of the texts to be processed, a minimum of 2 MB is necessary.

ACTORS

processes the text of a play and provides (1) listings of the characters that are on stage simultaneously -- generated each time there is an entrance or exit -- (2) a table indexed for each character showing which characters appear on stage at least once with the indexed character; and a table showing which characters never appear on stage with the indexed character, and (3) a listing of possible doubling of roles for a performance of a play. Requires text with SGML-like tags such as those in William Shakespeare: The Complete Works: Electronic Edition (Oxford University Press). The program is described in my "Project Report: ACTORS: Computing Dramatic Characters that are on Stage Simultaneously," Computers and the Humanities, 28.6 (December, 1994), 393-400.

BITZER

generates an index of page numbers (or line numbers) for all words (or for selected words) in a text automatically. It is described in my "Creating an Index: Project Report: BITZER," TEXT Technology, 5.2 (Summer, 1995), 91-100.

CONCORD

produces a key-word-in-context concordance for all words (or for selected words) in a text file. The program is briefly described in my "SPITBOL-386: The Language of Choice for Non- numeric Computing," TEXT Technology, 4.3 (Autumn, 1994), 177-185.

DIALOG

counts the number of words contained within quotation marks and computes that number's percentage of the total words in a text file. DIALOG2 computes the number and percentage of words in quotation marks by chapters in a text file that contains appropriate tags (such as <C 1> at the start of chapter one). These programs are described in my "Computing the Amount of Quotation in Novels," TEXT Technology, 3.6 (November, 1993), 3-5.

FINDLIST

computes the percent of words on multiple lists that are found in multiple ASCII text files. The names of the text files can be entered as requested by the program, or the program will read a file of the names of the text files. The results of this program are described in my "How Jane Austen's Characters Talk," TEXT Technology, 4.4 (Winter, 1994), 263-267.

IDENT

compares the number and percentage of occurrence of selected words in two text files. The program is described in my "Comparing Texts and Identifying Authors," TEXT Technology, 4.1 (Spring, 1994), 7-12.

The following four programs are designed to process the Oxford Electronic Text Library Edition of the Complete Works of Jane Austen. The programs are described (although not named) in my "Oxford Electronic Text Library Edition of the Complete Works of Jane Austen," Computers and the Humanities, 28.4-5 (Aug/Oct, 1994), 317-321.

JAFORMAT

processes SGML files and removes and entity references leaving only ASCII text with paragraphs indented in an output file. The program requires a file named IBMENTS.DTD (which is distributed with the Oxford electronic texts) in order to convert the entity references in the SGML files to the proper characters used by a PC.

JAWORDS

produces an index of locations for selected words in SGML text files.

JADIALOG

processes SGML files to create untagged ASCII files that contain the words of dialogue of each speaker. The program requires files distributed with the Oxford electronic texts: IBMENTS.DTD and an OET file for the text file to be processed (i.e., NA.OET which lists the codes for each of the "speaking characters" in Northanger Abbey is required for the processing of the SGML file NA.TXT).

JATALK

counts the number of words of dialogue for each speaker in an SGML file. The program requires files distributed with the Oxford electronic texts: IBMENTS.DTD and an OET file for the text file to be processed (i.e., NA.OET which lists the codes for each of the "speaking characters" in Northanger Abbey is required for the processing of the SGML file NA.TXT).

PICKWICK

isolates all or part of a specified scene of a play and places it in a file. The program requires text with SGML-like tags such as those in William Shakespeare: The Complete Works: Electronic Edition (Oxford University Press).

SENT

computes the number and percent of sentences of various lengths in an ASCII file and graphs the results. That is, SENT counts and graphs the percent of one-word sentences, two-word sentences, and so on.

SHAKWORD

finds selected words in a text of a play, and it notes the location of all occurrences by play title, act, scene, and line numbers. The program requires text with SGML-like tags such as those in William Shakespeare: The Complete Works: Electronic Edition (Oxford University Press).

WORDCELL

records the location of words in an ASCII text file and graphs the positions within 66 cells (or blocks). Output from the program is described in my "Rhythm in the Novel," TEXT Technology, 3.5 (September, 1993), 3-6.

WORDS

is designed to read a text file and (1) to count the number of running words in a text, (2) to count the number of unique word forms, and (3) to list the number of occurrences of each unique form in either alphabetical order or in order of frequency of occurrence. A control file allows the user to set parameters for recognition of words. The program is described in my "Counting Words and Computing Word Frequency: Project Report: WORDS," TEXT Technology, 5.1 (Spring, 1995), 8-17.

Eric Johnson is the author of more than one hundred articles, papers, and monographs about computers, writing, and literature. He is Professor of English and Dean of the College of Liberal Arts at Dakota State University. He can be contacted by email at johnsone@jupiter.dsu.edu

Click here to go to Eric Johnson's publications.
Click here to go to Eric Johnson's home page.

Página creada y actualizada por grupo "mmm".
     Para cualquier cambio, sugerencia,etc. contactar con: fores@uv.es
     © a.r.e.a./Dr.Vicente Forés López
      Universitat de València Press
    Creada: 15/09/2000 Última Actualización: 18/06/2001

Professor-Created Computer Programs for Student Research

by Eric Johnson

Dakota State University, Madison, South Dakota 57042 Email: JohnsonE@jupiter.dsu.edu

Published in Computers and the Humanities, 30.2 (April, 1996), 171-179.

Dakota State University, Madison, South Dakota 57042
Email: JohnsonE@jupiter.dsu.edu