XML basics for new users
An introduction to proper markup
XML stands for Extensible Markup Language, with the markup bit being the key. You can create content and mark it up with delimiting tags, making each word, phrase, or chunk into identifiable, sortable information. The files, or document instances, you create consist of elements (tags) and content, and the elements help the documents to be understood fairly well when read from printouts or even processed electronically. The more descriptive the elements, the more a document's parts can be identified. From the early days of markup to today, one advantage of tagging content is that if a computer system is lost, the data in print can still be understood from its tags.
Markup languages evolved from early, private company and government forms into Standard Generalized Markup Language (SGML), Hypertext Markup Language (HTML), and eventually into XML. SGML can seem complex, and HTML (which was really just an element set) was just not powerful enough to identify information. XML is designed as an easy-to-use and easy-to-extend markup language.
With XML, you can create your own elements, giving you the freedom to precisely represent your pieces of information. Rather than treating your documents as headings and paragraphs, you can identify each part within the document. For efficiency, you'll want to define a finite list of your elements and stick to them. (You can define your elements in a Document Type Definition (DTD) or in a schema, which I will discuss briefly later.) As you start out and get used to XML, feel free to experiment with element names as you build practice files.
As I mentioned, your XML files will consist of content plus markup. You
place much of your content in elements by surrounding your content with
tags. For example, suppose you need to create an XML cookbook. You have a
recipe named Ice Cream Sundae to prepare in XML. To mark up the
recipe name, you enclose that text in your element by placing the
beginning tag before your text and the ending tag after your text. You
might call the element
recipename. To mark the beginning tag
of the element, place the element's name inside angle brackets
<>) like this:
type your text
Ice Cream Sundae. After the text, enter the
element's ending tag, which is the element's name inside angle brackets
plus an ending forward slash (
/) before the element's name,
</recipename>. These tags form an
element, into which you can enter content or even other
You can create element names for individual documents or for document sets. You can craft the rules for how the elements fit together based on your specific needs. You can be very specific or keep element names more generic. You can create rules for what each element is allowed to contain and make these rules strict, lax, or something in between. Just be sure to create elements that identify the parts of your documents that you feel are important.
Start your XML file
The first line of your XML document might be an XML declaration. This
optional part of the file identifies it as an XML file, which can help
tools and humans identify the file as XML rather than SGML or some other
markup. The declaration can be written simply as
<?xml?> or include the XML version
<?xml version="1.0"?>) or even the character encoding,
Unicode. Because this declaration must be first in the file, if you plan
to combine smaller XML files into a larger file, you might want to omit
this optional information.
Create your root element
The root element's beginning and end tags surround your XML document's
content. Only one root element is in the file, and you need this "wrapper"
to contain it all. Listing 1 shows a truncated
portion of the example I use here with a root element named
<recipe>. (See Download for the
full XML file.)
Listing 1. The root element
<?xml version="1.0" encoding="UTF-8"?> <recipe> </recipe>
As you build your document, your content and additional tags will go
Name your elements
So far, you have
<recipe> as your root element. With
XML, you choose the names for your elements, then define the corresponding
DTD or schema based on those names. The names you create can contain
alphabetic characters, numbers, and special characters such as underscores
_). Here are a few things to note about your naming:
- Spaces are not allowed in the element names.
- Names must begin with an alphabetic character, not a number or symbol. (After this first character, you can use any combination of letters, numbers and the allowed symbols.)
- Case does not matter, but be consistent to avoid confusion.
Building on the prior example, if you add an element named
<recipename>, it will have a beginning tag
<recipename> and a corresponding end tag
Listing 2. More elements
<?xml version="1.0" encoding="UTF-8"?> <recipe> <recipename>Ice Cream Sundae</recipename> <preptime>5 minutes</preptime> </recipe>
An XML document can have some empty tags that do not have anything inside
and can be expressed as a single tag instead of as a set of beginning and
end tags. To use an HTML-like example, you might have
<img src="mylogo.gif"> as a stand-alone element. It
doesn't contain any child elements or text, so it is an empty element and
you can express it as
<img src="mylogo.gif" />
(finished off with a space and the familiar ending slash).
Nest the elements
Nesting is the placement of elements inside other elements. These
new elements are called child elements, and the elements that
enclose them are their parent elements. Several elements are
nested inside the
<recipe> root element, as in Listing 3. These nested child items include
<preptime>. Inside the
element are multiple occurrences of its own child element,
<listitem>. Nesting can be many levels deep in an XML
A common syntax error is improper nesting of parent and child elements. Any child element must be completely enclosed between the starting and end tags of its parent element. Sibling elements must each end before the next sibling begins.
The code in Listing 3 shows proper nesting. The tags begin and end without intermingling with other tags.
Listing 3. Properly nested XML elements
<?xml version="1.0" encoding="UTF-8"?> <recipe> <recipename>Ice Cream Sundae</recipename> <ingredlist> <listitem> <quantity>3</quantity> <itemdescription>chocolate syrup or chocolate fudge</itemdescription> </listitem> <listitem> <quantity>1</quantity> <itemdescription>nuts</itemdescription> </listitem> <listitem> <quantity>1</quantity> <itemdescription>cherry</itemdescription> </listitem> </ingredlist> <preptime>5 minutes</preptime> </recipe>
Attributes are sometimes added to elements. Attributes consist of
a name-value pair, with the value in double quotation marks
type="dessert". Attributes provide a
way to store additional information each time you use an element, varying
the attribute value as needed from one instance of an element to another
within the same document.
You type the attribute—or even multiple attributes—within the
starting tag of an element:
<recipe type="dessert">. If
you add multiple attributes, separate them with spaces:
<recipename cuisine="american" servings="1">. Listing 4 shows the XML file as it currently stands.
Listing 4. The current XML file with elements and attributes
<?xml version="1.0" encoding="UTF-8"?> <recipe type="dessert"> <recipename cuisine="american" servings="1">Ice Cream Sundae</recipename> <preptime>5 minutes</preptime> </recipe>
You can use as few or as many attributes as you feel you need. Consider the
details you might add to your documents. Attributes are especially helpful
if documents will be sorted—for example, by
recipe. Attribute names can include the same characters as element names,
with similar rules for omitting spaces and starting names with alphabetic
Well-formed versus valid XML
If you follow the rules outlined in your structure, you can easily produce well-formed XML. Well-formed XML is XML that follows all the rules of XML: proper element naming, nesting, attribute naming, and so on.
Depending on what you do with your XML, you might work with well-formed
XML. But consider the aforementioned example of sorting by recipe type.
You need to ensure that every
<recipe> element contains
type attribute in order to sort recipes. Being able to
properly validate and ensure that this attribute’s value is always present
can be invaluable (no pun intended).
Validation is checking your document's structure against rules for your elements and how you defined child elements for each parent element. You define these rules in a Document Type Definition (DTD) or in a schema. This validation requires you to create your DTD or schema, and then reference the DTD or schema file within your XML files.
To enable validation, you include the document type (
in your XML documents near the top. This line refers to the DTD or schema
(your list of elements and rules) to be used to validate that document.
For example, your
DOCTYPE might read something like Listing 5.
Listing 5. DOCTYPE
<!DOCTYPE MyDocs SYSTEM "filename.dtd">
This example assumes that your element list file is named
filename.dtd and resides on your computer
PUBLIC if pointing to a public
Entities can be phrases of text or special characters. They can point internally or externally. Entities must be declared and expressed properly to avoid errors and to ensure proper display.
You cannot typed special characters directly into your content. To use a
symbol in your text, you must set it up as an entity using its character
code. You can set up phrases such as a company name as an entity, then
type the entity throughout your content. To set up an entity, create a
name for it, and type it within your content, starting with an ampersand
&) and ending with a semicolon
whatever you name it). You then enter code within your
DOCTYPE inside square brackets (
), as in Listing 6. This code identifies the text that stands
in for the entity.
Listing 6. ENTITY
<!DOCTYPE MyDocs SYSTEM "filename.dtd" [ <!ENTITY coname "Rabid Turtle Industries" ]>
Using entities might help you avoid typing the same phrase or information repeatedly. It can also make it easier to adjust the text—perhaps if the company name changes—in many places with a simple adjustment in the entity definition.
As you learn to create your XML files, open them in an XML editor to check for well-formedness and confirm that you're following the rules of XML. If, for example, you have Windows® Internet Explorer®, you can open your XML file in the browser. If it displays your elements, attributes, and content, then the XML is well formed. If instead errors are displayed, you likely have a syntax error and need to review your document carefully for typos or missing tags and punctuation.
As mentioned in Nest the elements, an element that
contains another element is the parent of that contained element.
In the example below,
<recipe> is the root element and
contains the full content of the file. This parent element,
<recipe>, contains child elements
<directions>, and several others. This structure makes
<directions> siblings. Remember to nest your
sibling elements properly, as well. Listing
7 shows well-formed and properly nested XML.
Listing 7. Well-formed XML
<?xml version="1.0" encoding="UTF-8"?> <recipe type="dessert"> <recipename cuisine="american" servings="1">Ice Cream Sundae</recipename> <ingredlist> <listitem><quantity units="cups">0.5</quantity> <itemdescription>vanilla ice cream</itemdescription></listitem> <listitem><quantity units="tablespoons">3</quantity> <itemdescription>chocolate syrup or chocolate fudge</itemdescription></listitem> <listitem><quantity units="tablespoons">1</quantity> <itemdescription>nuts</itemdescription></listitem> <listitem><quantity units="each">1</quantity> <itemdescription>cherry</itemdescription></listitem> </ingredlist> <utensils> <listitem><quantity units="each">1</quantity> <utensilname>bowl</utensilname></listitem> <listitem><quantity units="each">1</quantity> <utensilname>spoons</utensilname></listitem> <listitem><quantity units="each">1</quantity> <utensilname>ice cream scoop</utensilname></listitem> </utensils> <directions> <step>Using ice cream scoop, place vanilla ice cream into bowl.</step> <step>Drizzle chocolate syrup or chocolate fudge over the ice cream.</step> <step>Sprinkle nuts over the mound of chocolate and ice cream.</step> <step>Place cherry on top of mound with stem pointing upward.</step> <step>Serve.</step> </directions> <variations> <option>Replace nuts with raisins.</option> <option>Use chocolate ice cream instead of vanilla ice cream.</option> </variations> <preptime>5 minutes</preptime> </recipe>
Note: The line breaks make it easier for you to read your code and do not affect the XML.
You might wish to experiment with your test files, and move the end tags and beginning tags, to become familiar with the resulting error messages.
In Figure 1, your elements show up clearly when viewed
within Internet Explorer. Beginning and end tags surround your content.
Small plus (
+) and minus (
-) symbols are
available next to parent elements so you can collapse all elements nested
inside them (their descendants).
Figure 1. A sample XML instance (file) with some siblings collapsed
Beyond a few simple rules, you have flexibility in designing your XML elements and attributes. XML’s rules are not difficult. Typing an XML document is also not difficult. What is difficult is figuring out what you need from your documents in terms of sortability or searchability, then designing elements and attributes to meet your needs.
When you have a good idea of your goals and how to mark up your content, you can build efficient elements and attributes. From that point, careful tagging is all you need to create well-formed and valid XML.
- XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- XML topic on Wikipedia: Learn more about XML.
- XML specification from the World Wide Web Consortium: Read more about this flexible text format that works for large-scale electronic publishing as well as the exchange of a wide variety of data on the Web and elsewhere.
- Introduction to XML (Doug Tidwell, developerWorks, August 2002): Further explore what XML is, why it was developed, and how it shapes electronic commerce in this tutorial. Also, cover various important XML programming interfaces and standards, plus two case studies that show how companies solve business problems with XML.
- IBM trial software for product evaluation: Build your next project with trial software available for download directly from developerWorks, including application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
- developerWorks podcasts: Listen to interesting interviews and discussions for software developers.