Agile Documentation Tools

V Jornades Programari Lliure

Author: Sergio Talens-Oliag
Contact: sto@iti.upv.es
Organization: Instituto Tecnológico de Informática
Date: 8 July 2006

Introduction

This article is about the importance of documentation on software related projects and the tools and systems that developers can use to be more productive when writing that documentation.

We will start by explaining what are we talking about when we use the term documentation on the context of software related projects and why we believe it is important; later we will move to the formats and tools problem, that is, we will talk about the systems used to write documentation, centering our discussion on the types of markup languages available and their advantages and problems.

After this general exposition we will move into our main topic, agile documentation tools. We will enumerate the features we want from a documentation system (i.e. we want lightweigh markup formats to be able to write fast) and will compare some of the available formats and tools that provide the desired characteristics, discussing their advantages and drawbacks.

Note that we will simply talk about tools and formats useful to write and manipulate documentation, not about what we should document nor when or how we should do it; that would be the subject of another kind of article, probably related to project development methodologies.

What is documentation?

When talking about software projects, the term documentation refers to the written text that accompanies the software produced by the project. This text can be of various types:

We are mostly interested on the first two types of documents, as those are the ones that should be written by computer science specialists (analysts, developers, system and network administrators, testers, etc.), but the tools and formats discused can also be used to write end user documentation.

Note that we will limit the scope of this article to formats and tools useful to write standalone documents, leaving the documentation generators and the literate programming tools for another time.

Anyway, from the writer point of view, documentation generators are not too different from the tools we will talk about, they tend to be very specialized but the markup formats they use are quite similar to the ones used on the systems we will be discussing.

Similarly, literate programming tools can also use lightweigh markup languages or more complex systems like TeX, but that is usually not the main problem when discussing the use of those systems, their main problem is that using them means changing the way developers work, and that is a totally different issue. The reader interested in the ideas and available tools to work on literate programming, there is a very interesting website devoted to the subject on http://www.literateprogramming.com/.

Why is documentation important?

There are a lot of valid reasons to write documentation on a software related project:

In fact, on the traditional development models documentation is quite important on all the phases of the software life cycle:

That being said, keep in mind that our idea is to have and work with light documents, that is, the documentation should be short, terse and easy to understand by qualified individuals; the documentation has to serve the needs of new project developers, that is, to help someone not involved on the project to start working on it and also to keep a history of important events (requirement changes, design decisions, etc.) for the people working on it.

Markup languages

Almost all texts written on a computer use some kind of markup system to include information about the structure and characteristics of the text; this markup is usually used to know how to display and process the document content.

Depending on the complexity of the markup format the text can be written using a plain text editor (an editor that simply manipulates characters and lines using a fixed width font), a programmable plain text editor (when the text is readable but the format has so many rules than we prefer to use the help of the computer to mark the text) or a word processor that hides the markup complexity and usually lets us work using a graphical view that is approximately equivalent to the desired output format (what we know as WYSIWYG).

On the article titled Markup systems and the future of scholarly text processing by James H. Coombs, Allen H. Renear, and Steven J. DeRose, published in the November 1987 CACM (available on the URL <http://xml.coverpages.org/coombs.html>) the authors identify six different types of markup:

From this six types of markup we will only take into account three of them, presentational, procedural and descriptive, as the other ones are usually always available on the systems that use one of those three paradigms.

So, what are the advantages or disadvantages of using one of those three markup systems?

It depends on what is the author interested in, if he or she is interested on the content of the document (usually the case for developers), descriptive markup is better, as it helps the author to focus his attention on the structure and content of the document (it has to be declared explicitly), while presentational and procedural markup fail to support the writer in developing the structure; even worse, they distract from the content.

As is discussed on the article cited before, the first step of marking up a document, element recognition, is the same for all forms of markup, but the next step, markup selection, always involves some additional effort, but while descriptive markup keeps this effort focused on the element and its role in the document, presentational markup turns the author's attention toward typographic conventions and style sheets and procedural markup leads even further away from the document toward the special markup required to make a particular text formatter produce the selected presentational markup.

Lightweight Markup Languages

We are basically looking for a document format that allows a developer to write structured documents with the following characteristics:

As those goals are shared by many people, a lot of lightweight markup languages formats and tools have been developed on the recent years; the main idea behind almost all of them is to avoid the use of complex descriptive markup formats based on SGML or XML like html and xhtml or DocBook and the aproach taken is to define markup formats that use a set of simple rules to provide a subset of the information that can be given using the heavyweigh languages.

Usually the markup used by those systems that can be seen as punctuation for the casual reader, some times quite weird punctuation, but generally not distracting if you are only interested on the text content.

Once the format is defined we just need a parser for the markup rules and conversion tools to transform the parsed document into another format directly usable by output systems like text terminals, web browsers or printers or into an intermediate format like TeX or XML that has to be processed by other tools to be readable on the output systems.

Inside this category we find a lot of formats and tools with different features; there are a lot of tools that are only used to transform text into HTML an are used on web based systems like blogging software, content managers or wiki systems.

The use of those systems has multiple advantages; it is a good way to allow normal users to edit dynamic web content without forcing them to learn HTML and avoids some of the problems of letting them to use HTML directly (it is easier to keep a consistent look and feel and avoids harmful or annoying java script code on the pages).

On the following section we will talk about some free Lightweight Markup Languages and their related tools.

Lightweight Markup Languages

The following table lists some lightweight markup languages and the URL of tools and documentation related to them:

Language or Tool URL
Almost Free Text (aft) http://www.maplefish.com/todd/aft.html
AsciiDoc (adoc) http://www.methods.co.nz/asciidoc/
Markdown (mdwn) http://daringfireball.net/projects/markdown/
StructuredText (stx) http://sange.fi/~atehwa/cgi-bin/piki.cgi/stx2any
reStructuredText (rst) http://docutils.sourceforge.net/rst.html
Textile (txtl) http://www.textism.com/tools/textile/

As I am a Debian GNU/Linux user and advocate and all the formats have tools to handle them on that distribution I'm also listing here the packages that a user has to install to be able to write and process documents using those formats on a Debian Sid system; note that the tools can need additional programs to generate final formats, i.e. to generate a pdf file from a reStructuredText document we generate a .tex file using rst2latex (a tool included on the python-docutils package) but later we need to process the resulting file with pdflatex, a program that is included on a separate package not listed here:

Tool Package
Almost Free Text (aft) aft
AsciiDoc (adoc) asciidoc
Markdown (mdwn) markdown or python-markdown
StructuredText (stx) stx2any
reStructuredText (rst) python-docutils
Textile (txtl) python-textile

To compare those languages and tools we should look at the following features:

An example system: reStructuredText

Quoting the reStructuredText main page:

reStructuredText is an easy-to-read, what-you-see-is-what-you-get plain text markup syntax and parser system. It is useful for in-line program documentation (such as Python docstrings), for quickly creating simple web pages, and for standalone documents. reStructuredText is designed for extensibility for specific application domains. The reStructuredText parser is a component of Docutils. reStructuredText is a revision and reinterpretation of the StructuredText and Setext lightweight markup systems.

The primary goal of reStructuredText is to define and implement a markup syntax for use in Python docstrings and other documentation domains, that is readable and simple, yet powerful enough for non-trivial use. The intended purpose of the markup is the conversion of reStructuredText documents into useful structured data formats.

The main advantage of reStructuredText over other lightweight markup systems is that it has a well defined syntax and there are a lot of tools already available to transform rst text files into different output formats like html, xml, pseudo xml, latex, s5 (for making html slides).

Besides, as the parser is a component of docutils and can be easily used from other programs implemented in Python, a lot of web systems written in Python are including plugins to enable the use of reStructuredText and generate HTML form it.

Some of the programs that have support for rst include:

Some examples of the markup rules of the reStructuredText format:

For a complete reference of the reStructuredText syntax see the reStructuredText Markup Specification available on the URL http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html.