Using Emacs for XML documents

Install add-ons to the powerful Emacs text editor to build a platform-independent (and free) environment for working with XML

Emacs, best known as a powerful text editor for UNIX developers, can be an ideal XML editor for MS-DOS, Windows, and MacOS. The author describes how to install the right add-on packages and modify settings to create a powerful XML/SGML editing-and-validation environment in Emacs with extensions such as PSGML and OpenSP. Most of the work involved in setting up this environment ends with downloading and installing Emacs and the individual packages, but you must also configure Emacs properly and enable the DTDs you plan to work with. The article includes sample configuration files and XHTML DTDs.

Share:

Brian Gillan (bgillan@us.ibm.com), Software engineer, IBM, Software Group

Brian Gillan works in the ID Technology and Design group for IBM in Research Triangle Park, North Carolina, providing programming, integration, strategy, and support for publishing tools used by the IBM Information Design community. You can contact Brian at bgillan@us.ibm.com.



01 December 2001

Though it's best known as a powerful text editor favored by UNIX developers, Emacs can be used to work with XML in non-UNIX platforms such as Windows, MS-DOS, and MacOS. Emacs (see the sidebar Emacs in a nutshell) works as a full-blown development environment for processing text, writing applications, and, as I'll discuss, creating structured information like XML and SGML. I use it as a general-purpose editor for creating and managing some of my programming projects, and for writing XHTML and playing around with SGML and XML. In fact, I used it to write this article.

This article tells how to install Emacs and the extensions PSGML and OpenSP. It also outlines how to customize Emacs to make it function with a variety of DTDs. I present many of the Emacs customizations one piece at a time. However, you can download a zip file with sample DTDs and all of the Emacs customizations (see Resources). My intent is to get you started using Emacs by providing you with just enough information for you understand what's going on. Then you'll be able to add DTDs and customize Emacs based on your needs and preferences.

Getting and installing Emacs

Start by installing Emacs. You can access additional Emacs information and distributions from the GNU Web site or its mirrors (see Resources). Some UNIX-based distributions come with Emacs. For example, my Redhat Linux 7.1 came with Emacs version 20.5.1 (an older version of PSGML) already installed.

Linux and UNIX installation

Because most UNIX and Linux users are savvy enough to get and install software without any guidance from me, I'll just direct you to the GNU project site. The customizations I describe in the rest of the article will apply to UNIX/Linux environments.

Emacs in a nutshell

Not long ago UNIX users had two choices of editor: vi or Emacs. vi was (and still is) a powerful editor if you're mainly interested in editing flat text. However, the real power users prefer Emacs, one of the original offerings of the GNU project. Described as "the extensible, customizable, self-documenting real-time display editor," Emacs can run in tty or character mode -- for example in a telnet or xterm session -- or in a graphical user interface on just about any UNIX platform. It has also been ported to non-UNIX platforms including MS-DOS, Windows, and MacOS.

The extensibility of Emacs lends to its popularity. That extensibility comes from Emacs's architecture (which is designed for adding new function), and goes as far as its own language, Emacs Lisp, for crafting custom functions. You can customize Emacs through variable settings and macros, or by adding packages. Emacs is "self documenting" in the way it lets you query its environment, and also in the way documentation is accessed for the editor itself and for any installed packages.

Windows installation

Windows users can find the latest binary distribution from the windows/emacs/ directory of any of the FTP sites listed on the GNU FTP list. The emacs-20.7-bin-i386.tar.gz file does not include the Emacs Lisp source. Editor's note: A newer version, version 21.1, was released in late October, while this article was in production. This article is based on the 20.7 version and will be updated to include details on the new version. If you're interested in programming Emacs or seeing how particular functions are implemented, instead download the emacs-20.7-fullbin-i386.tar.gz file. Download the .gz file to your local hard drive. Use WinZip or some other .gz-aware tool to extract the contents to a directory structure on your hard drive (make sure you "retain folder information" when you extract so the appropriate directory structure is created). If you unzip to a drive d:, and allow the original directory structure to be created, you will end up with a base path of d:\emacs-20.7, where d: is the drive on which you unpacked the distribution. For the remainder of this article, I'll refer to this directory as d:\Emacs. The readme suggests that you avoid spaces in your install path. I'd heed this warning.

After you've unpacked the distribution, there will be a number of files and four sub-directories: bin, etc, info, and lisp under the main directory. The README.W32 file contains information on obtaining future distributions, setting up Emacs, and so on. (The README file also includes a URL for the FAQ for GNU Emacs on Windows 95/98/ME, and 2000.) Though it is not required, I suggest that you run the addpm.exe file in the bin subdirectory to register Emacs so that it's accessible from your Start menu. Once it is installed, select Start->Gnu Emacs->Emacs. If you opt not to register Emacs, start it up by double clicking the runemacs.exe file installed in the d:\Emacs\bin directory.

You can take a tutorial by starting Emacs and selecting Help->Emacs Tutorial. Don't get discouraged by the fact that you must use control-key sequences for many of the functions. You can begin by learning a few commonly used control-key sequences and learn new ones as you find you need them. Besides, in the GUI version of Emacs, many functions are accessible from menus. See Resources for a couple of suggestions for other tutorials on Emacs and PSGML.

Customizing Emacs
The next step is to start customizing Emacs as necessary, such as:

  • Setting variables to control various behaviors
  • Adding packages
  • Writing your own Emacs Lisp code

So first I'll cover how to set variables and add packages.

Your first step is to access an Emacs initialization file. Emacs looks for this file in your home directory. In a UNIX environment, the initialization file is typically named .emacs, and located (by default) in your home directory.

On Windows, I use a file named _emacs since Windows doesn't generally like filenames that start with a period. On Windows, you specify the home directory by setting an environment variable or by setting a registry entry. As a last resort, Emacs looks for the initialization file in the directory c:\. (So for now, either create this file in c:\, or consult the GNU Emacs FAQ For Windows (see Resources) for other options.)

To test that Emacs is finding your initialization file, use your favorite text editor to add the entry in Listing 1, which turns on the clock in the Emacs status bar. After turning on the clock and starting Emacs, look for the time in the status area (after the name of the current file). If you see the clock, all is well.

Listing 1. Testing the Emacs initialization file
; Display the time in the Emacs status area (an easy way to test
; that we are picking up our Emacs customizations).
(display-time)

Now that you have Emacs installed and you've laid the foundation for customizing it, we'll look at how to add packages that provide an environment for editing and validating SGML and XML documents.


Adding PSGML for SGML and XML modes

The current distribution of GNU Emacs includes major editing modes for HTML and SGML. Generally, the function provided by these is limited to assisting with element/attribute entry and navigating among elements. The HTML support is based on earlier versions of HTML.

Around the time SGML was starting to become popular for document publishing, Lennart Staflin created PSGML, a package for adding an SGML major editing mode to Emacs. Because HTML and XML are subsets of SGML, you can use PSGML for editing those as well. In fact, recent PSGML versions provide an XML editing mode.

PSGML also includes a built-in SGML parser that is DTD aware. If you have your own dialect of SGML or XML, you simply install your DTD(s). Changes in HTML standards are handled by installing a new DTD (or set of DTDs). PSGML provides context-sensitive editing, so you can add elements or attributes based on where you are in the document. Navigation features allow you to move among elements and even move to the next-trouble-spot to locate markup that doesn't conform to your DTD. Formatting features indent elements, based on nesting, or hide element content so you can restrict the view to specific areas. Finally, you can validate documents with an external validating parser, which I discuss later in this article.

Figure 1. Emacs with PSGML installed (editing the DITA FAQs)
Emacs with PSGML

Figure 1 shows some of the structured editing features PSGML adds to Emacs, including:

  • Colored markup syntax.
  • Markup indented based on nesting level.
  • Element folding. Note how the <prolog> and first two <section> elements have been collapsed to one line, to get them out of the way, while you can see the subelements in the unfolded Tips and Techniques <section> element.
  • Validation using an external parser. The results of the validation are displayed in a buffer below the document buffer. In this case, I used OpenSP to validate the document. If validation results in errors, you can use the Emacs next-error command ([Ctrl]-x `) to locate the error(s) in the source.

In addition to the features visible in Figure 1, PSGML adds many functions that you can access via pull-down menus, pop-up menus, or control-key sequences or commands.

Windows tip

Here's a Windows tip that I find invaluable for tools such as editors. Rather than associating Emacs with the particular file types you want to edit with it (because you may want to associate another application with them), add Emacs to your SendTo menu.

  1. Open the folder where you installed Emacs and navigate to the \bin directory.
  2. Select runemacs.exe, click mouse-button-2 and select Create Shortcut. The new shortcut appears, highlighted.
  3. Click mouse-button-2 again and select Cut (you're going to move it).
  4. Move to the Start button and click mouse-button-2 to open the menu.
  5. Select Explore, which should open Windows Explorer to the Start Menu folder.
  6. In the navigation pane, select the SendTo folder (usually just above the Start Menu folder) to open it.
  7. Within the SendTo folder, click mouse-button-2 and select Paste to insert the runemacs.exe shortcut.
  8. Rename the shortcut, if you like.

Now when you navigate files in your system you can select a file (or several), click mouse-button-2, and select SendTo->Emacs to open the file(s) in Emacs.

Getting and installing PSGML

You'll need to download the current version of PSGML. Version of 1.2.2, current as of this writing, is available from Source Forge (see Resources). As with Emacs, PSGML is downloaded as a .gz file; unpack it using a .gz-aware utility such as WinZip. I unpacked the PSGML distribution into the site-lisp directory of my Emacs installation. Again, remember to specify to retain directory information when you unpack. In my installation, I have d:\Emacs\site-lisp\psgml-1.2.2.

Once unpacked, consult README.psgml for some basic information, including how to install it in the UNIX version of Emacs.

Installing PSGML in Emacs for Windows
To prepare to install PSGML in the Windows version of Emacs, first create a directory for it (mine is site-lisp) and unpack the .gz file into it, retaining directory information.

Next you need to make sure that Emacs can find the files that comprise PSGML. You do that by adding the contents of Listing 2 to your Emacs initialization file _emacs file.

Now Emacs should have access to the PSGML files and it will use PSGML whenever you invoke sgml-mode or xml-mode. Later I'll show how to invoke those modes automatically, based on the file extension of the file being edited.

Compiling the PSGML files

Whether you're working in Linux or Windows, there's one more thing to do to complete the installation: compile the PSGML files. Look in the psgml directory and find a bunch of .el file-types. These are Emacs Lisp files. If you compile them, the PSGML support runs faster. Here's a simple way to accomplish this:

  1. Start Emacs.
  2. Type [Alt]-x.
  3. When prompted for a command, enter byte-force-recompile [Enter].
  4. When prompted for a directory name, change the path to your PSGML files, for example d:/Emacs/site-list/psgml-1.2.2 and press [Enter].

That ought to compile most of the .el files and display the results in a "*Compile-log*" buffer. (I received a couple of warnings about obsolete variables when I compiled, but I believe they are harmless enough to ignore.) The end result should be an .elc file for most of the .el files in the psgml directory (not all of the files will be compiled, so don't worry if some are missing).


Adding DTDs

SGML and XML modes aren't much use without incorporating the DTDs to describe the types of documents you need to create. So here's how to add some DTDs and the appropriate configuration to make them useful with PSGML.

Let's start with XHTML 1.0, which is an XMLized version of HTML 4.01 (see Resources for more information on XHTML). The XHTML DTDs will let you create HTML that conforms to the XML standard and can be validated with a parser (more on this later), thereby providing more robust and manageable documents. (See Resources for a zip file that contains the XHTML 1.0 DTDs and catalog file I discuss in this section).

Here's how to download the XHTML DTDs and the related entities:

  1. Create a subdirectory for the XHTML DTDs. I keep all of my DTDs in one place on my system; let's assume they will reside under a DTDs folder at the same level as Emacs: d:\DTDs. Under there, create a folder for the XHTML DTDs, d:\DTDs\xhtml1.
  2. After creating a folder to hold them, simply go to the W3C's DTD site (see Resources) to obtain the XHTML DTDs. There are three document types (strict, transitional, and frameset).
  3. For each of the three document types, click mouse-button-2 on the links and then save the target as a file. (You may need to remove the extra .txt extension that the browser adds when saving the files).
  4. Save the three entity sets (xhtml-lat1.ent, xhtml-special.ent, and xhtml-symbol.ent) into the same subdirectory as the DTDs.

Next, you need to create an SGML catalog file that PSGML can use to find these DTDs.

In the same directory as the DTDs, create a file called xhtml1.soc. The content should look like Listing 3.

See Resources for background on SGML Open Catalogs. For this article, I'll just explain the particular features that are used in Listing 3. The PUBLIC entries map what is referred to as a formal public identifier to a file system entity, which in this case is the file containing the various DTDs. This will allow us to refer to these DTDs without having to actually know where they are in the file-system. They require that your documents have a <!DOCTYPE xxxxxx PUBLIC "yyyyy"> document type declaration, where the "xxxxx" matches one of the entries in your catalog file. The DTDDECL entries are not actually used by PSGML, but they will be used by the SGML parser (stay tuned!), and they indicate what SGML declaration should be used with the DTD that has the same formal public identifier.

Lastly, the DOCTYPE entry allows us to refer to a particular DTD without using the formal public identifier or an actual filename. The downside to this is that, for XHTML, there are several DTDs that define the same document type html, so you have to pick one. I would simply choose the one you'd expect to use the most. In Listing 3, I've chosen the transitional DTD. Remember, you can use any of the XHTML document types as long as you include the full !DOCTYPE declaration.

There's one more piece of configuration that you need to do. PSGML needs to know where to find the SGML catalog files. There are a couple of ways to accomplish this, as described in the PSGML documentation. I use the method that makes use of the environment variable SGML_CATALOG_FILES because it is also used by the SGML parser (patience, I come to it in the next section of this article). So, now that you have a set of DTDs and a catalog file, create the afore-mentioned environment variable and set it to include the path to your xhtml1.soc file, for example d:\DTDs\xhtml1\xhtml1.soc. If you have more that one catalog file, you can include them all, separating them with a path delimiter (";" on Windows, ":" on UNIX-based systems).

I'll show you how to add one more set of DTDs:

  1. If necessary, create a subdirectory for the new DTDs, such as d:\DTDs\dita.
  2. Download the current DITA zip.
  3. Once you have the download, use your favorite utility to unpack the distribution to d:\DTDs\dita, once again preserving the directory information.
  4. Add the included catalog file to your SGML_CATALOG_FILES environment variable, so you might now have d:\DTDs\xhtml1\xhtml1.soc;d:\DTDs\dita\dtd\dita.soc.
Listing 4. dita.soc - SGML catalog file for DITA DTDs
OVERRIDE        YES

-- For documents that don't include a DOCTYPE declaration --
DOCTYPE topic "topic.dtd"
--DOCTYPE topic "ditabase.dtd"--
DOCTYPE task "task.dtd"
DOCTYPE reftopic "reftopic.dtd"
DOCTYPE concept "concept.dtd"
DOCTYPE APIdesc "APIdesc.dtd"
DOCTYPE bctask "bctask.dtd"

-- There should probably be an entry here referencing the standard --
-- XML SGML declaration for example SGMLDECL or DTDDECL  --
-- (once we have public identifiers for the DTDs) --

As you can see, once you get things initially set up, adding new DTDs is relatively easy.


Editing a document with PSGML

Now that you have Emacs with PSGML installed and you have a set of DTDs to work with, you can begin editing documents using PSGML. Whenever you edit a document with an extension of .sgml or .xml, you will note that Emacs invokes SGML major mode (indicated in the status area) and the menu changes to look like the one shown in Figure 2.

Figure 2. Emacs menu with SGML editing mode
Emacs Menu

So far, if you edit an .html document, the old HTML major mode will be invoked. I'll show you how to fix that in a moment. In the meantime, you could invoke [Alt]-x and key in xml-mode to force XML mode.

To try using PSGML, edit a test file called test.html and insert beginning and ending html tags:

<html>
</html>

Turn on XML mode by invoking [Alt]-x and then keying in xml-mode. Next, click on the menu item DTD->Info->General DTD Info. This causes PSGML to parse the DTD and display general information in a buffer below your document. If your test was not successful, check for an error in your catalog file or environment variable. Also, this test assumes you have the DOCTYPE html entry in one of your SGML catalog files so that PSGML knows what DTD to associate with a doctype of "html". Alternatively, you could include a doctype declaration, such as <!DOCTYPE html PUBLIC ...>, where the PUBLIC identifier matches an entry in one of your SGML catalog files. If you have your catalogs and environment variables set up correctly, you should see something like this:

            Doctype: html
      Element types: 89
           Entities: 253
 Parameter entities: 63
         Files used: d:/DTDs/xhtml1/xhtml-special.ent
                     d:/DTDs/xhtml1/xhtml-symbol.ent
                     d:/DTDs/xhtml1/xhtml-lat1.ent
                     d:/DTDs/xhtml1/xhtml1-transitional.dtd

The output indicates that PSGML was able to locate the DTD and parse it, including all of the referenced entity modules.

Now PSGML is aware of your DTD, and you can begin utilizing some of PSGML's more powerful features. For example, place the cursor after the <html> tag and select menu item Markup->Insert Element. You will be presented with a list of elements that are valid at that location in the document. But before getting into any more of the editing features, let's do some more customization to get more out of PSGML.


More customization

Now that you can edit documents with PSGML, let's explore some more customizations that will exploit more of PSGML's features and make it easier to use. Listing 5 shows some more customizations you can append to your existing Emacs initialization file.

The first section of Listing 5 tells Emacs which major mode to invoke when you load a file with a particular extension, similar to the way Windows associates application based on file type. Note here that I've set .htm and .html files to use xml-mode. This is because I'm actually writing XHTML.

The next four sections of Listing 5 provide for syntax-based highlighting which causes different markup constructs to appear in different colors in the editor. By default, PSGML simply defines tags to appear in bold and comments to appear in italic. Here, I've set start and end tags to appear in blue, comments to appear in purple, entity references to appear in blue, PIs to appear in magenta, and so on. In addition to the constructs I've modified, you can also define the appearance of ignored marked sections, marked section start and ends, and short references. The purpose of the four sections is to:

  • Define a face
  • Set the characteristics of the face
  • Associate the face with the particular markup type
  • Activate the settings

The next section of Listing 5, sgml-auto-activate-dtd, causes the DTD associated with the document to be parsed as soon as the document is loaded. This is set to false by default because of the processing required. With processors as fast as they are, this shouldn't be a concern. Also, if this is not set to true, when a document is initially loaded, the syntax coloring will not take effect until you explicity parse the DTD, using either the DTD->Parse DTD menu item or the [Ctrl]-c[Ctrl]-p key sequence.

The next section modifies the DTD->Insert DTD menu item to allow you to quickly insert the DOCTYPE declaration for a new document. I've included a variety of document types, including both SGML and XML document types (some are commented out). Note how the XML document types include the XML declaration. Whenever you add a new DTD, you'll probably want to update the sgml-custom-dtd variable to add your new DTD to the Insert DTD menu.

The last section defines my-psgml-hook and hooks it into the SGML mode. This allows you to launch your default browser against the current file you are editing. This is handy for viewing HTML and XHTML as you edit. It will be even more handy when browsers more fully support XML and XSLT.


A quick PSGML test drive

Now that you have some customizations in place, let's take a quick test drive to see some of the PSGML editing features.

  1. Start Emacs and open a file ([Ctrl]-x[Ctrl]-f) called test.html. That should put Emacs into XML mode, which you can verify by looking at the status line.
  2. From the menu, select DTD->Insert DTD->XHTML 1.0 Transitional. That should insert the XML declaration and a <!DOCTYPE html...> declaration for an document with the default name "html." Also notice syntax coloring of these two entries.
  3. Next, place the cursor after the DOCTYPE declaration and from the menu select Markup->Insert Element (or press Shift and mouse-button-2). You should see a pop-up menu with a list of elements that are valid at this point in the document, in this case the html element. Notice that when you insert the HTML element, its required elements, head and body, are also inserted. Also, a comment appears prompting you that you must insert either a title or base element. This feature is handy until you get used to a particular markup language, after which it's more annoying than helpful. You can disable the prompting by setting the sgml-insert-missing-element-comment variable to false in your Emacs initialization file.
  4. You can use the same technique to add or modify attributes: Place the cursor inside a start-tag and select from the menu Markup->Insert Attribute (or press [Shift]mouse-button-2). A pop-up menu appears that offers valid attributes for the selected element. Select an attribute from the pop-up menu.
  5. Note how the structure is indented based on element nesting. If you insert an H1 inside the body, it will not be indented. This is because the default settings do not indent mixed content elements (elements that may contain both markup and text, or PCDATA in SGML/DTD parlance). You can change the indenting assumptions by setting sgml-indent-data to true in your Emacs initialization file. Before doing that, consider whether white-space will be significant in your XML application (see Resources).
  6. If you have already installed an external validator, try validating your document: Select SGML->Validate and then press Enter (you may be prompted to save your file) or press [Ctrl]-c [Ctrl]-v and then press Enter.
    Note: If validation doesn't work, install an external validator (as I explain how to do in the next section) and test drive that feature later. If validation does work, you should receive an error indicating the "head" is not finished. If you press Ctrl-x` (note the back-tic), you will be taken to the line number in the source where the error occurred. Go ahead and insert a title element.

Using SP or OpenSP for SGML and XML validation

Although PSGML contains an SGML parser, it is not a fully functional parser. It does, however, provide the ability to validate SGML and XML documents using an external parser. This allows you to fully validate your source and find, for example, elements with IDREFs that lack a corresponding target element with a matching ID.

When you invoke SGML->Validate from the menu or keyboard (Ctrl-c Ctrl-v), PSGML will shell a process to invoke the SGML parser against the file you are currently editing. It displays the results of the validation in a buffer below the file you are currently editing. If it encounters errors, use the Emacs [Ctrl]-x ` (note the back-tic) to have Emacs take you to the location of the error in your source document.

By default, it is configured to invoke nsgmls, part of SP, an SGML parser originally written by James Clark. SP is no longer being supported, but is the foundation for OpenSP, which is now maintained on SourceForge.net as part of the OpenJade project. (See Resources for more information on SP and OpenSP.) You can download and use SP or OpenSP. I chose OpenSP because it is actively supported, and it contains support for the DTDDECL keyword of SGML catalogs whereas SP does not (DTDDECL is supported as of the 1.4 version of OpenSP). If you are dealing only with XML, you will need only a single SGML declaration defined for XML. If, however, you will also be dealing with SGML, the DTD you are using will probably reference its own declaration. Because PSGML allows you to specify only one particular SGML declaration to be used, via the sgml-declaration (or sgml-xml-declaration for XML mode), the DTDDECL catalog feature can come in handy. One last consideration is that I was unable to locate binaries for OpenSP for the Windows platform. Because SourceForge.net maintains only source code, you will need to build the binaries yourself or locate them by searching more diligently than I did.

Using SP

If you prefer to use SP, all you really need to do is download SP (see Resources), unpack it, and update two environment variables. You will need to append your PATH so that nsgmls can be found when invoked by PSGML. Assuming you unpack the distribution to the path d:\SP, you would need to add d:\SP\bin to your PATH. Also, you will need to add an entry to your SGML_CATALOG_FILES so the SGML declaration for XML can be found. If you don't pick up the correct SGML declaration when validating your XML, you will probably receive a lot of error messages. This is because XML doesn't support the SGML's OMITTAG feature which requires the DTD to specify minimization information (XML DTDs do not include this information because all tags are required). Again, assuming you installed SP in d:\SP, an SGML declaration for XML will be in d:\SP\pubtext\xml.dcl which is referenced by d:\SP\pubtext\xml.soc (see the SGMLDECL entry). So simply add d:\SP\pubtext\xml.soc to your SGML_CATALOG_FILES so nsgmls can find this catalog. Alternatively, you can set the Emacs/PSGML variable sgml-xml-declaration in your Emacs initialization file to point to this file as shown in Listing 6.

Listing 6. _emacs - enabling SP for validation
; Note the forward slashes in the path!!!! 
(setq sgml-xml-declaration "d:/SP/pubtext/xml.dcl")

Using OpenSP

If you wish to use OpenSP, you need to make a couple of slight modifications to PSGML, however, all of this can be done using the Emacs initialization file.

Assuming you have built and installed OpenSP or found a pre-built binary distribution, again the first thing you need to do is update your PATH so the executables can be found. Assuming OpenSP is installed in d:\OpenSP, you would need to add d:\OpenSP\bin to your PATH. Note that you can have both SP and OpenSP installed and accessible at the same time because the executables in OpenSP have been renamed.

The next thing you need to do is update your Emacs configuration to alter the command used for validation. This would normally be done by setting the Emacs variable sgml-validate-command, and in fact we will set this variable to handle the case of using OpenSp's onsgmls executable to validate in sgml-mode. For xml-mode, however, this doesn't seem to work correctly: When I set this variable in my Emacs initialization file, the sgml-mode picks up the change, but the xml-mode does not. You can get around this issue by providing a mode-hook. The goal is to override the default validate command, which is defined as nsgmls -wxml -s %s %s, setting it to onsgmls -wxml -s %s %s. The fragment of Emacs initialization code in Listing 7 takes care of both of these tasks.

You really don't need to understand what's going on here to make PSGML work with OpenSP. However if you're interested, a mode-hook basically defines an Emacs function that will be invoked after the mode is initialized. This gives you an opportunity to override functions and settings established by that mode. In this case, since the validate command is hardwired in the PSGML code, you can use the mode-hook to override that setting without having to modify the PSGML code and recompile it (which would need to be done each time you install a new version of PSGML).


Suggestions and tips

Once you get comfortable with the basic functions I've described, try exploring each of the menus that PSGML adds to the Emacs menu bar:

  • On the SGML menu, experimenting with the File Options and User Options can give you a good idea of what you can customize within PSGML. For more information on particular settings, you can refer to the online documentation or consult the "Editing SGML with Emacs and PSGML" document included with PSGML. Changes you make through this menu persist only for that particular editing session. If you prefer to make a permanent change, you have to update your Emacs initialization file.
  • The Modify menu mainly provides functions for changing existing markup. Some of these functions, for example Normalize, might come in handy for trying to clean up HTML and make it XHTML.
  • Functions under the Move menu basically allow for quicker navigation of the structure of your document.
  • The Markup menu provides menu access for inserting elements, tags, attributes, entities, and so on. I'll just point out two things that might not be obvious. Tag Region allows you to wrap existing text inside an element, using PSGML's internal parser to determine what elements are valid for the highlighted location. Insert Entity allows you to insert general text entities defined in your DTD. If you define new text entities in your internal subset at the beginning of the document, you will need to reparse the DTD to pick up the newly defined entities during your editing session.
  • Items under View are self explanatory.
  • Most of the items under the DTD menu have been covered. The Info items are worth a mention, however, because they can be useful for exploring your DTD if not already familiar with it.

Download

DescriptionNameSize
Source code for this articlex-emacs/emacscust.zip35 KB

Resources

  • The XHTML 1.0 DTDs, _emacs customizations, and updated dita.soc files I described in this article are available in the emacscust.zip.
  • Download PSGML version of 1.2.2 (or whatever version is current) from Source Forge.
  • The GNU Web site provides information on Emacs as well as numerous other GNU projects.
  • If you prefer to learn from a book, O'Reilly & Associates publishes a good book called Learning GNU Emacs, which provides information on how to accomplish basic editing tasks, use many of the major editing modes, customize, and even program Emacs.
  • There's also an excellent tutorial in Bob DuCharme's book SGML CD, in the chapter "Editing SGML with the Emacs Text Editor", which is available online. In addition to providing a tutorial on using Emacs, Bob also discusses using PSGML for editing SGML documents, and in fact this chapter is what got me started.
  • Check out the GNU Emacs FAQ for Windows.
  • For more information on XHTML, visit the XHTML 1.0 section of the W3C Web site.
  • The Darwinian Information Typing Architecture, DITA, is an architecture for creating article-based information. DITA includes a base set of DTDs and framework that allows for specialization using derived DTDs and processing conventions.
  • In the DTD samples file provided, I've included a DTD I used to edit Host On Demand (HOD) macros (in the hodmacro directory). This demonstrates how Emacs with PSGML can be used to edit XML which is not of the traditional book or article type of information. For more information on HOD, see WebSphere Host Publisher. You can learn more about WebSphere Host Publisher file formats from WebSphere Host Publisher Programmer's and Reference.
  • For a more data-oriented XML editing tool, check out the replacement for the WebSphere Studio Application Developer environment -- WebSphere Studio Site Developer which contains a visual XML editor, or check out the Downloads and products section of developerWorks XML zone (to view editing tools only, select Editing in the View by field).
  • Find out more about SGML Open Catalogs.
  • SP is an SGML parser originally written by James Clark. It is no longer being supported, but is the foundation for OpenSP which is now maintained on SourceForge.net as part of the OpenJade project. If you're looking for a pre-built RPM package for Linux, you can try RPM Find (SP) or RPM Find (OpenSP).
  • Another good source of publicly available XML/SGML tools is The XML Cover Pages.
  • For details on comparing XML documents, including a discussion of significant and nonsignificant whitespace, see Brett McLaughlin's tip, What's the diff?.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12057
ArticleTitle=Using Emacs for XML documents
publish-date=12012001