Note: This tip assumes that you have a basic understanding of DTDs and a validating parser to check your structure. Any validating parser, such as JAXP, will do; the tip discusses only XML and DTD files.
In this tip, I'm dealing with documents from the Millennium Memory Project, which catalogs home movies submitted to a central location. In some cases, these documents take the long, or full form, as in Listing 1:
Listing 1. The document -- full form
<?xml version="1.0"?>
<memories>
<memory tapeid="T1">
<media mediaid="T1" status="vhs" />
<subdate>2001-05-23</subdate>
<donor>John Baker</donor>
<subject>Fishing off Pier 60</subject>
<location>
<description>Outside in the woods</description>
</location>
</memory>
<memory tapeid="T2">
<media mediaid="T2" status="vhs"/>
<subdate>2001-05-18</subdate>
<donor>Elizabeth Davison</donor>
<subject>Beach volleyball</subject>
<location>
<place>Clearwater beach</place>
</location>
</memory>
</memories> |
Sometimes users will submit only the very basic information or the short form, as in Listing 2:
Listing 2. The document -- short form
<?xml version="1.0"?>
<memories>
<memory tapeid="T1">
<media mediaid="T1" status="vhs" />
<subdate>2001-05-23</subdate>
<subject>Fishing off Pier 60</subject>
</memory>
<memory tapeid="T2">
<media mediaid="T2" status="vhs"/>
<subdate>2001-05-18</subdate>
<subject>Beach volleyball</subject>
</memory>
</memories> |
The DTD will need to permit the user to choose between these two structures.
To make this possible, you need to create a DTD that includes both definitions -- but not at the same time. To do this, you'll create a conditional section within a DTD:
Listing 3. Conditional sections within a DTD
<!ELEMENT memories (memory)* >
<!-- Short form -->
<![IGNORE[
<!ELEMENT memory (media | subdate | subject+)* >
]]>
<!-- Full form -->
<![INCLUDE[
<!ELEMENT memory (media | subdate | donor?| subject+| location)* >
<!ELEMENT location (description|place) >
<!ELEMENT description (#PCDATA) >
<!ELEMENT place (#PCDATA) >
<!ELEMENT donor (#PCDATA) >
]]>
<!ATTLIST memory tapeid IDREF #REQUIRED>
<!ELEMENT subdate (#PCDATA) >
<!ELEMENT subject (#PCDATA) >
<!ELEMENT media EMPTY >
<!ATTLIST media mediaid ID #REQUIRED
status CDATA #IMPLIED > |
In Listing 3, only the full form definitions are actually included in the DTD, because the short form definitions are excluded through the IGNORE keyword. You can include as many of these conditional sections as you wish within a DTD, controlling each of them individually.
To control these sections from the actual document, however, you need to create parameter entities.
A parameter entity is a placeholder for a value, similar to a variable. Parameter entities can only be used within the DTD itself, and are distinguished by the percent sign (%).
Listing 4. Creating parameter entities
<!ENTITY % short "IGNORE">
<!ENTITY % full "INCLUDE">
<!ELEMENT memories (memory)* >
<!-- Short form -->
<![%short;[
<!ELEMENT memory (media | subdate | subject+)* >
]]>
<!-- Full form -->
<![%full;[
<!ELEMENT memory (media | subdate | donor?| subject+| location)* >
<!ELEMENT location (description|place) >
<!ELEMENT description (#PCDATA) >
<!ELEMENT place (#PCDATA) >
<!ELEMENT donor (#PCDATA) >
]]>
<!ATTLIST memory tapeid IDREF #REQUIRED>
<!ELEMENT subdate (#PCDATA) >
<!ELEMENT subject (#PCDATA) >
<!ELEMENT media EMPTY >
<!ATTLIST media mediaid ID #REQUIRED
status CDATA #IMPLIED > |
When an application evaluates the DTD, it replaces the parameter entity with its value -- in this case IGNORE or INCLUDE -- so the full form definitions are still in effect. To change the DTD so it uses the short form, simply change the values of the short and full entities to INCLUDE and IGNORE, respectively.
This is certainly more convenient than having to rebuild the DTD, but the real power comes when you can allow the user to make the choice from within the XML document itself.
You can permit the user to choose which form of the DTD he or she wants based on two factors:
- It is possible to have both an internal and external subset of a DTD
- The internal subset always takes precedence over the external subset
For these reasons, the user can redefine the values of the short and full parameter entities from within the document:
Listing 5. Redefining the parameter entities
<?xml version="1.0"?>
<!DOCTYPE memories SYSTEM "memory.dtd" [
<!ENTITY % short "INCLUDE">
<!ENTITY % full "IGNORE">
]>
<memories>
<memory tapeid="T1">
<media mediaid="T1" status="vhs" />
<subdate>2001-05-23</subdate>
<subject>Fishing off Pier 60</subject>
</memory>
<memory tapeid="T2">
<media mediaid="T2" status="vhs"/>
<subdate>2001-05-18</subdate>
<subject>Beach volleyball</subject>
</memory>
</memories> |
The external DTD subset, represented by the definitions in the memory.dtd file, is still in effect, but the definitions in the internal subset override the parameter entities, enabling the short form and disabling the full form.
This tip demonstrates how to create conditional DTDs and allow the user to choose a particular form from within the document. You can create unlimited permutations in this way, including the inclusion or exclusion of external information.
- Check out the XML 1.0 recommendation for more information about Document Type Definitions.
- Get more information on validating documents in the Validating XML tutorial (developerWorks, September 2001).
- Download JAXP or Xerces-Java.
- Find more XML resources on the developerWorks XML zone.
- Take a look at IBM WebSphere Studio Application Developer, an easy-to-use, integrated development environment for building, testing, and deploying J2EE applications, including generating XML documents from DTDs and schemas.
- Find out how you can become an IBM Certified Developer in XML and related technologies.
- Want us to send you useful XML tips like this every week? Sign up for the developerWorks XML Tips newsletter.

Nicholas Chase has been involved in Web site development for companies such as Lucent Technologies, Sun Microsystems, Oracle, and the Tampa Bay Buccaneers. Nick has been a high school physics teacher, a low-level radioactive waste facility manager, an online science fiction magazine editor, a multimedia engineer, and an Oracle instructor. More recently, he was the Chief Technology Officer of Site Dynamics Interactive Communications in Clearwater, FL, USA, and is the author of three books on Web development, including Java and XML From Scratch (Que) and the upcoming Primer Plus XML Programming (Sams). He loves to hear from readers and can be reached at: nicholas@nicholaschase.com.