There are lots of good reasons why many organizations are likely to adapt XML dialects for many of their documentation needs. For these same reasons, developerWorks has developed its own XML DTD for articles. Once you have an XML source -- either a shared standard like DocBook or an in-house dialect -- it is easy to transform the source into arbitrary target formats (HTML, PDF, other XML, and so on). Moreover, validation against a DTD provides a nice check that a document contains all the parts it needs to have, with all the right relationships between them. In addition, XML is a much more platform- and tool-neutral format than those used by proprietary (or even open source) word processors and publishing applications.
Source formats and human interfaces
The problem with XML, however, is that it is a really crummy human interface. Even though XML is just ASCII bytes, typing the element tags into a text editor takes a lot of extra keystrokes. Besides requiring a littering of angle brackets and punctuation to interrupt the flow of a touch typist, it is difficult to make sure that every tag gets closed in the proper order as you type. And how many of us understand even a moderately complex DTD well enough to remember exactly what elements and attributes are allowed at each point in a document? Worst of all, the abundance of XML tags makes visually scanning a document significantly harder.
At least two approaches ease the pain of editing XML documents with a text editor. One approach is to use a higher-level tool for the editing. An XML-aware editor can automate conformance with a DTD, and some of these editors can even hide or highlight the XML markup to make visual scanning easier. Some developerWorks writers, myself included, are particularly fond of XMetaL, but many excellent programs exist. All of these programs, however, run on specific platforms; they each have their own set of quirks (different from those of a favorite text editor); and many of them will set you back a large number of dollars.
The second approach is the one txt2dw takes: Let writers write using tools that don't get in their way. Then let computers worry about how the documents need to be formatted. Word processors try to take this approach, but the state of tools for getting from a word processor to XML is still crude. Personally, I prefer to use the "smart ASCII" markup format that has informally evolved in e-mail, on the Usenet, and in project documentation for open-source software projects. One can formalize it just a little bit without getting in the way of writers (while simultaneously aiding the converter).
The use of txt2dw could hardly be simpler. Just read some "smart ASCII" input from STDIN, and write some valid XML to STDOUT. For example:
% txt2dw.py < MyArticle.txt > MyArticle.xml |
At this point, one has an XML-formatted document. The eventual target will most likely be something different from XML. In my own case -- and this is true for many writers -- the eventual target format is not really all that noteworthy (that's for editors and publishers to worry about and change as needed). All that really matters is that the XML version is valid according to article.dtd.
However, someone will want to transform the XML into something else. XSLT is a common transformation technique, and one for which developerWorks uses the custom style sheet article-html.xsl. Assuming you want the HTML version developerWorks will use, you can simply run something like this:
% xslt article-html.xsl < MyArticle.xml > MyArticle.html |
The exact details will vary with the XSLT engine one uses, but the idea will be the same.
For the most part, "smart ASCII" is what you have been writing
for years if you use e-mail and the Usenet. Most of the details
are documented at the top of the script. Asterisks surround
bold or heavily emphasized phrases; dashes surround
italicized or lightly emphasized phrases; underscores
introduce Book or Series Titles. I have adopted the use of
single quotes to set apart appnames and filenames (usually
rendered in a fixed font), and square brackets to indicate
libraries
and
modules
. Take a look at the ASCII version of this Tip in the Resources for how these features started out.
These conventions are not quite universal, but they will also
not be unfamiliar to readers. They are all very quick to type.
Anything that looks like a URL is turned into a link automatically. A fairly simple special format with curly braces and the ALT text before a colon is used to insert images, such as charts and graphs.
At the paragraph level, a few types of paragraphs are allowed, and are indicated by indentation level. Headers are not indented. In addition, any header line that only consists of a row of dashes is stripped out (this helps beautify the ASCII originals). Regular text paragraphs are indented two spaces. Block quotes are indented four spaces. Code samples are indented six spaces (or more). If a code sample begins with a line that consists of a pound sign, some dashes, a title, some more dashes, then another pound sign, then that line is treated as a label for the code sample (in many programming languages, it would be a comment line anyway). If not, no harm is done.
There are a few features of txt2dw that are more rigid than I would like. These were concessions to the fairly rigid format of article.dtd. On the plus side, the rigid constraints were exactly the conventions I had adopted anyway, so obeying them was not difficult. Moreover, none of them look odd or unnatural (but you still have to remember to use these features, or create a template that does so). A few moderately intelligent changes are made when ALLCAPS sections are encountered. Here is a usable template:
Template for txt2dw "smart ASCII" source
SERIES: Main Title
Subtitle
Author Name
Title, Affiliation
Date
Abstract of the article (block quote indented)...
FIRST SECTION
----------------------------------------------------------
Regular paragraph...
#----- Title of code sample -----#
Sample code line 1
[...]
Regular paragraph...
MORE SECTIONS...
----------------------------------------------------------
[...]
{Picture of Author: http://mysite/mypic.png}
Author blurb...
|
Sometimes computer tools that are chosen for good technical reasons wind up forcing users to think like computers. XML markup can have this quality. A writer should not need to spend a lot of time thinking about formats, but should be allowed to focus on content. In any ongoing documentation process, it is worth a little extra up-front programming work to allow writers to think a little bit less about the nitty-gritty of formatting and markup. txt2dw is one tool that lets computers worry about computer matters, while writers worry about words.
- Download the txt2dw.py utility from: http://gnosis.cx/download/txt2dw.py
- Users who want to include Python source code examples, might want to pick up the supporting module dw_colorize (other languages might be supported later):
http://gnosis.cx/download/dw_colorize.py.
- This article, in its original "smart ASCII" form, can be found at http://www.gnosis.cx/publish/programming/txt2dw_tip.txt.
- Terrence Parr has written a wonderful essay in the XML zone Soapbox opinion department, called Humans should not have to grok XML. I couldn't agree more.
- I looked at a number of custom XML editors in my column XML
Matters #6: A roundup of editors.
- Smart ASCII can also be converted directly to HTML using a
related utility
Txt2Htmlthat I discussed in Charming Python: Converting text to HTML using Txt2Html. - The DocBook dialect of XML and many of the reasons one would want to use XML for prose-oriented documents were discussed in several of my XML Matters column installments:
- Getting started with the DocBook XML dialect
- Getting comfortable with the DocBook XML dialect
- Transforming DocBook documents using XSLT
-
IBM trial software: Build your next development project with trial software available for download directly from developerWorks.
- Find more XML resources on the developerWorks XML zone. For a complete list of XML tips to date, check out the tips summary page.
Comments (Undergoing maintenance)






