Author: | Sergio Talens-Oliag |
---|---|
Contact: | sto@iti.upv.es |
Organization: | Instituto Tecnológico de Informática |
Date: | 8 July 2006 |
This article is about the importance of documentation on software related projects and the tools and systems that developers can use to be more productive when writing that documentation.
We will start by explaining what are we talking about when we use the term documentation on the context of software related projects and why we believe it is important; later we will move to the formats and tools problem, that is, we will talk about the systems used to write documentation, centering our discussion on the types of markup languages available and their advantages and problems.
After this general exposition we will move into our main topic, agile documentation tools. We will enumerate the features we want from a documentation system (i.e. we want lightweigh markup formats to be able to write fast) and will compare some of the available formats and tools that provide the desired characteristics, discussing their advantages and drawbacks.
Note that we will simply talk about tools and formats useful to write and manipulate documentation, not about what we should document nor when or how we should do it; that would be the subject of another kind of article, probably related to project development methodologies.
When talking about software projects, the term documentation refers to the written text that accompanies the software produced by the project. This text can be of various types:
Architecture / design: this kind of documents provide an overview of how the project is going to be developed and why it is going to be done that way; those documents are usually generated at the beginning of the project, but, to be useful, must be reviewed when things change during the project life cycle.
The documents don't need to be very detailed, the idea is to have a high level description of the components that are going to be used, why they have been chosen, what is their expected functionality and what are the relationships between them. The design document can also include notes about how they should be implemented (data types, algorithms, etc.).
Technical: documentation of code, algorithms, interfaces, APIs, etc. It is more detailed and usually has to be written while developing the code; generally the best approach is to keep it inside the source code, at least for the API documentation; there are two approaches to do that:
Of course there are always some documents that should be in separate files like, for example, the project guidelines for developers; those guidelies can include documents explaining how is the project code organized, how to use the SCM system, how the code should be formatted and commented, etc.
End User: Manuals for the end-user, system administrators and support staff. Usually written by specialized personnel, this kind of documentation usually has nothing to do with the source code, it simply describes how to to use the software produced by the project.
We are mostly interested on the first two types of documents, as those are the ones that should be written by computer science specialists (analysts, developers, system and network administrators, testers, etc.), but the tools and formats discused can also be used to write end user documentation.
Note that we will limit the scope of this article to formats and tools useful to write standalone documents, leaving the documentation generators and the literate programming tools for another time.
Anyway, from the writer point of view, documentation generators are not too different from the tools we will talk about, they tend to be very specialized but the markup formats they use are quite similar to the ones used on the systems we will be discussing.
Similarly, literate programming tools can also use lightweigh markup languages or more complex systems like TeX, but that is usually not the main problem when discussing the use of those systems, their main problem is that using them means changing the way developers work, and that is a totally different issue. The reader interested in the ideas and available tools to work on literate programming, there is a very interesting website devoted to the subject on http://www.literateprogramming.com/.
There are a lot of valid reasons to write documentation on a software related project:
In fact, on the traditional development models documentation is quite important on all the phases of the software life cycle:
That being said, keep in mind that our idea is to have and work with light documents, that is, the documentation should be short, terse and easy to understand by qualified individuals; the documentation has to serve the needs of new project developers, that is, to help someone not involved on the project to start working on it and also to keep a history of important events (requirement changes, design decisions, etc.) for the people working on it.
Almost all texts written on a computer use some kind of markup system to include information about the structure and characteristics of the text; this markup is usually used to know how to display and process the document content.
Depending on the complexity of the markup format the text can be written using a plain text editor (an editor that simply manipulates characters and lines using a fixed width font), a programmable plain text editor (when the text is readable but the format has so many rules than we prefer to use the help of the computer to mark the text) or a word processor that hides the markup complexity and usually lets us work using a graphical view that is approximately equivalent to the desired output format (what we know as WYSIWYG).
On the article titled Markup systems and the future of scholarly text processing by James H. Coombs, Allen H. Renear, and Steven J. DeRose, published in the November 1987 CACM (available on the URL <http://xml.coverpages.org/coombs.html>) the authors identify six different types of markup:
Punctuational. Punctuational markup consists of the use of a closed set of marks to provide primarily syntactic information about written words. This kind of markup is part of writing systems and is not usually considered a special markup language (punctuation is included on all the other types of markup).
Presentational. This kind of markup is used to identify the structure of the document using cues on the text encoding, like leaving empty lines between paragraphs or putting spaces before a string to center it to denote a title.
This kind of markup clarifies the presentation of a document and makes it suitable for reading, but it does not provide information about the structure of the document (word-processing and desktop publishing products sometimes attempt to deduce structure from presentational conventions, but it is almost an impossible task, as there is no common standard).
Procedural. In many text-processing systems, presentational markup is replaced by procedural markup, which consists of commands indicating how text should be formatted; documents written using this kind of markup can be viewed as programs to generate a final document, as the commands are usually visible to the user and, in most cases, the procedural markup capabilities comprise a Turing-complete programming language.
As happens with presentational markup, the procedural markup by itself does not provide information about the logical structure of the document, although some of them define commands that implicitly give some structural information (i.e., a command to build a title).
Systems like nrof/troff, TeX and PostScript are examples of this kind of markup systems.
Descriptive or semantic. This kind of markup systems identify the element types of text tokens, not how they should be presented. The output format of documents written using descriptive markup is generated by processing the descriptive elements and applying to them procedural markup rules. The good thing about this type of markup is that the procedural rules can be replaced by others, allowing us to have multiple output formats and styles without extra effort.
Examples of this kind of systems are SGML and XML.
Referential. Referential markup refers to entities external to the document and is replaced by those entities during processing; that is the kind of markup used to replace special characters or abbreviations. Almost all text formatters that support procedural markup support some kind of referential functionality using variables or macros, but usually referential markup is associated with descriptive markup systems like SGML.
Metamarkup. Metamarkup provides facilities for controlling the interpretation of markup and extending the vocabulary of descriptive markup languages. Metamarkup provides tools to define macros, tags, valid and default attributes, etc. Almost all nontrivial systems support metamarkup, but most do not provide a suitable interface for non-programmers.
From this six types of markup we will only take into account three of them, presentational, procedural and descriptive, as the other ones are usually always available on the systems that use one of those three paradigms.
So, what are the advantages or disadvantages of using one of those three markup systems?
It depends on what is the author interested in, if he or she is interested on the content of the document (usually the case for developers), descriptive markup is better, as it helps the author to focus his attention on the structure and content of the document (it has to be declared explicitly), while presentational and procedural markup fail to support the writer in developing the structure; even worse, they distract from the content.
As is discussed on the article cited before, the first step of marking up a document, element recognition, is the same for all forms of markup, but the next step, markup selection, always involves some additional effort, but while descriptive markup keeps this effort focused on the element and its role in the document, presentational markup turns the author's attention toward typographic conventions and style sheets and procedural markup leads even further away from the document toward the special markup required to make a particular text formatter produce the selected presentational markup.
We are basically looking for a document format that allows a developer to write structured documents with the following characteristics:
As those goals are shared by many people, a lot of lightweight markup languages formats and tools have been developed on the recent years; the main idea behind almost all of them is to avoid the use of complex descriptive markup formats based on SGML or XML like html and xhtml or DocBook and the aproach taken is to define markup formats that use a set of simple rules to provide a subset of the information that can be given using the heavyweigh languages.
Usually the markup used by those systems that can be seen as punctuation for the casual reader, some times quite weird punctuation, but generally not distracting if you are only interested on the text content.
Once the format is defined we just need a parser for the markup rules and conversion tools to transform the parsed document into another format directly usable by output systems like text terminals, web browsers or printers or into an intermediate format like TeX or XML that has to be processed by other tools to be readable on the output systems.
Inside this category we find a lot of formats and tools with different features; there are a lot of tools that are only used to transform text into HTML an are used on web based systems like blogging software, content managers or wiki systems.
The use of those systems has multiple advantages; it is a good way to allow normal users to edit dynamic web content without forcing them to learn HTML and avoids some of the problems of letting them to use HTML directly (it is easier to keep a consistent look and feel and avoids harmful or annoying java script code on the pages).
On the following section we will talk about some free Lightweight Markup Languages and their related tools.
The following table lists some lightweight markup languages and the URL of tools and documentation related to them:
Language or Tool | URL |
---|---|
Almost Free Text (aft) | http://www.maplefish.com/todd/aft.html |
AsciiDoc (adoc) | http://www.methods.co.nz/asciidoc/ |
Markdown (mdwn) | http://daringfireball.net/projects/markdown/ |
StructuredText (stx) | http://sange.fi/~atehwa/cgi-bin/piki.cgi/stx2any |
reStructuredText (rst) | http://docutils.sourceforge.net/rst.html |
Textile (txtl) | http://www.textism.com/tools/textile/ |
As I am a Debian GNU/Linux user and advocate and all the formats have tools to handle them on that distribution I'm also listing here the packages that a user has to install to be able to write and process documents using those formats on a Debian Sid system; note that the tools can need additional programs to generate final formats, i.e. to generate a pdf file from a reStructuredText document we generate a .tex file using rst2latex (a tool included on the python-docutils package) but later we need to process the resulting file with pdflatex, a program that is included on a separate package not listed here:
Tool | Package |
---|---|
Almost Free Text (aft) | aft |
AsciiDoc (adoc) | asciidoc |
Markdown (mdwn) | markdown or python-markdown |
StructuredText (stx) | stx2any |
reStructuredText (rst) | python-docutils |
Textile (txtl) | python-textile |
To compare those languages and tools we should look at the following features:
Quoting the reStructuredText main page:
reStructuredText is an easy-to-read, what-you-see-is-what-you-get plain text markup syntax and parser system. It is useful for in-line program documentation (such as Python docstrings), for quickly creating simple web pages, and for standalone documents. reStructuredText is designed for extensibility for specific application domains. The reStructuredText parser is a component of Docutils. reStructuredText is a revision and reinterpretation of the StructuredText and Setext lightweight markup systems.
The primary goal of reStructuredText is to define and implement a markup syntax for use in Python docstrings and other documentation domains, that is readable and simple, yet powerful enough for non-trivial use. The intended purpose of the markup is the conversion of reStructuredText documents into useful structured data formats.
The main advantage of reStructuredText over other lightweight markup systems is that it has a well defined syntax and there are a lot of tools already available to transform rst text files into different output formats like html, xml, pseudo xml, latex, s5 (for making html slides).
Besides, as the parser is a component of docutils and can be easily used from other programs implemented in Python, a lot of web systems written in Python are including plugins to enable the use of reStructuredText and generate HTML form it.
Some of the programs that have support for rst include:
Some examples of the markup rules of the reStructuredText format:
For a complete reference of the reStructuredText syntax see the reStructuredText Markup Specification available on the URL http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html.