What is XML?

XML, or EXtensible Markup Language, is a platform-independent way to represent data. Simply put, XML enables you to create data that can read by any application on any platform. You can even edit and create it by hand, because it is based on the same tag-based technology that underlies HTML.

An example

For example, suppose you want to use XML to store information about a transaction. This transaction originates on your salesman's iBook, so you'll want to store it there. But it will then be sent to the data application on your Windows server, and ultimately archived on your mainframe, so it needs to be very flexible. XML enables you to create something like that shown in Listing 1. Listing 1. XML example
<?xml version="1.0"?> <transaction ID="THX1138"> <salesperson>bluemax</salesperson> <order> <product productNumber="3263827"> <quantity>1</quantity> <unitprice currency="standard">3000000</unitprice> <description>Medium Trash Compactor</description> </product> </order> <return></return> </transaction>

Serialized this way, as text, the information is available in any environment in which you might need it. Even without a special application, you can see the content (in bold) and the markup, which describes it.

Learning more

XML is fairly straightforward to use, once you understand its structure. It also provides several different methods by which you can control the structure, and even the content, of your data. Once you begin to use XML, you'll also have questions about the best way to design your XML structures, but it doesn't have to be a complicated process.

Get started with these resources:

The flexibility of XML means that it's useful for so many applications, such as configuration files, Web services, data storage, and so on.

What can I do with XML?

Since its introduction, developers have found numerous uses for XML. Here are some resources that give you an idea of how you can put XML to work.

Storing data

The most obvious use of XML is to store data. XML provides advantages for both data-centric information (such as the data you find in a database) and document-centric information (such as data you store in XML so you can display it differently in different environments.)

Learn more about XML as a data-centric storage medium in these resources.

If you're interested in storing XML data, you should know that IBM provides a no-charge version of the new DB2 9, IBM DB2 Express-C 9. You should also check out the new DB2 Developer Workbench, which makes it easier to use XQuery and SQL/XML with DB2 9.

Web services

Web services began as a way to pass non-HTML information over HTTP. They have grown to be the foundation for fields from Ajax, used to add interactivity to Web sites, to today's Service Oriented Architectures (SOA), complex message-based applications. XML is integral to the field of Web services. All of the leading methods of Web services, SOAP, REST, and even XML-RPC, are based in XML.

See the section below on XML and Web services for more information.

Podcasting and other data syndication

One of the most common uses of XML today is in the realm of syndication. Millions of bloggers use RSS feeds to keep up with the latest information on their favorite blogs, and commercial interests have begun taking an interest in podcasting, or distributing audio and video over the internet to devices such as iPods, which also uses XML.

See what the syndication landscape looks like in these resources:

Platform-independent configuration and deployment instructions

A common place to find XML is behind the scenes of your favorite applications and development environments, where it serves as a common means for creating files of configurations or instructions. Providing configuration instructions in a human-readable XML file enables users to control the behavior of applications much more easily than before.

Does XML lend itself to application development?

Although the tags you see in Listing 1 are the most common serialization of XML, it is very common to deal with XML data in the context of an application. In that case, you will typically use one of several models, including the following.

The Document Object Model (DOM)

The Document Object Model, or DOM, is an object-based, tree-like way to view XML data. For example, in Listing 1, the salesperson, order, and return elements are children of the transaction element, meaning that they are contained below it in the hierarchy. DOM is the primary way in which most XML-based applications deal with XML.

The Simple API for XML (SAX)

DOM is useful when you are trying to manipulate data, because everything resides in memory. On the other hand, it can be quite a resource hog, because everything resides in memory.

The Simple API for XML, or SAX, solves the problem of having everything in memory at one time by analyzing data from the beginning of the document to the end, and notifying your application of every event, such as "start element" or "characters". It's more resource friendly than DOM, but you can't manipulate the data in quite the same way.

Start to understand SAX with these resources:

DOM and SAX are the most common ways of programmatically interacting with XML, but sometimes you don't need to build an application to manipulate XML data.

Transforming XML data (XSLT)

Sometimes the manipulation you want to do with XML doesn't even require programming. You can manipulate XML using EXtensible Stylesheet Language Transformations, or XSLT. XSLT enables you to transform an XML document into a different XML structure, or even into a non-XML structure. XSLT is extremely powerful, and very commonly used.

Can I use XML with my favorite programming language?

XML is platform and programming-language independent, so you can use it with virtually any programming language, as long as the underlying software, such as a parser, which reads the text file of tags and creates the XML Document for manipulation, is available. Learn how to work with XML using various programming languages with these resources:

Java

XML parsing and other capabilities are built directly into Java.

PHP

PHP support for XML started out a bit rough; early implementations weren't quite in synch with the DOM specification. These days, however, the situation is much better, with more standard-like support.

Perl

Perl was designed to work with text, so sometimes the temptation is to work on the text directly rather than use XML methods, but the benefits are definitely there.

Python

With Python's ease of use and XML's emphasis on cross-platform availability, the pair is a match made in heaven.

C++

C++ programmers can also get their hands on XML capabilities.

Ruby

The REXML library provides XML support for the Ruby programming language.

JavaScript

JavaScript support for XML is very similar to that of Java, at least in the more basic operations.

Are there existing XML vocabularies and applications?

As developers began to use XML for various applications, standard vocabularies, or XML applications, began to emerge. For example, XHTML is an XML version of HTML, and podcasting takes place using various flavors of an XML vocabulary called RSS. The Scalable Vector Graphics (SVG) language provides a way to define graphic images using XML in a way that browsers such as Firefox can render them.

Some examples of XML in action are discussed below.

RSS and syndication

Bloggers often provide external feeds that show their most recent posts and provide links back to the original material. These feeds have turned into big business, with advertisers taking note, and the distribution of audio and/or video, or podcasting, becoming the focus of major media companies such as the broadcast television networks. These feeds are in the form of XML, either in one of the varieties of RSS, or Atom.

Scalable Vector Graphics (SVG)

SVG tries to do for graphics what HTML did for desktop publishing, provide a way to specify graphics using small, simple text instructions. SVG enables you to create complex graphics that are both small in terms of bandwidth, and controllable programmatically.

XForms

Think of XForms as the next generation of HTML forms, providing a way to specify the information to be collected in a presentation-independent way. This enables you to not only add more functionality more easily than before, but also to easily reuse forms in other mediums, such as cell phones, where the information is the same, but the presentation might be totally different.

More XML in action

You can find XML in a variety of places, such as publishing, encoding semantic data, and even those voice recognition units you talk to over the telephone. Here are some more examples:

How is XML related to Web services and SOA?

Although you can implement Service Oriented Architectures (SOA) using a variety of technologies, the most common is to use Web services, and that means XML. The two most popular means to implement Web services, SOAP and REST, are both based on XML.

An example

For example, you can make a request to the Google Web service by sending this SOAP document as a Web request (see Listing 2). Listing 2. Making a request to the Google Web service by sending a SOAP document
<?xml version='1.0' encoding='UTF-8'?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV=
"http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/1999/XMLSchema">
<SOAP-ENV:Body>
<ns1:doGoogleSearch xmlns:ns1="urn:GoogleSearch"
SOAP-ENV:encodingStyle=
"http://schemas.xmlsoap.org/soap/encoding/">
<key xsi:type="xsd:string">00000000000000000000000000000000</key>
<q xsi:type="xsd:string">death star trash compactor</q>
<start xsi:type="xsd:int">0</start>
<maxResults xsi:type="xsd:int">10</maxResults>
<filter xsi:type="xsd:boolean">true</filter>
<restrict xsi:type="xsd:string"></restrict>
<safeSearch xsi:type="xsd:boolean">false</safeSearch>
<lr xsi:type="xsd:string"></lr>
<ie xsi:type="xsd:string">latin1</ie>
<oe xsi:type="xsd:string">latin1</oe>
</ns1:doGoogleSearch>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>

Here you see the SOAP envelope, a standard format the Web service engine can understand. The contents of this message, in this case the doGoogleSearch element, is known as the payload, and consists of the information to be processed by the Web service.

The overall Web services picture

In fact, most of the standards surrounding Web services -- and there are many -- are essentially XML vocabularies. Web Service Description Language is an XML file that describes a service, for example.

Get started with XML and Web services with these resources:

The ETTK for Web Services, an alphaworks technology, makes it easy to set up a Web services environment, complete with server.

You can get more information on XML and Web services on the New to SOA and Web services page.

What does the future hold for XML?

XML is at the heart of many of today's nascent technologies. For example, as search engines improve and the world moves towards the Semantic Web, XML is how webmasters can add meaningful information to their pages. Grid computing and autonomic computing continue to gain ground, and XML figures prominently in these technologies, as well. Database vendors continue to look at storing XML more efficiently, and XML Query Language gains steam.

In the following sections are resources to help you glimpse the future of XML:

RDF, microformats, and other semantic technologies

The semantic Web doesn't require XML, but you'd be hard-pressed to see that from the way the technology currently looks. Most information is encoded in some form of XML, whether it is the Resource Description Framework (RDF), or independent microformats. This is because of XML's nearly universal readability and understandability.

Grid and autonomic computing

The world becomes smaller, and computer systems get bigger. Specifically, researchers, companies, and other organizations begin to see the advantage in mending their systems together into a single larger system, either to provide enhanced computing power or to save money by eliminating waste. Because of its platform independence, XML is perfect for exchanging information between disparate systems.

Asynchronous JavaScript with XML (AJAX)

As the Web becomes more functional, in turn users expect more from the applications they deal with everyday. Asynchronous JavaScript with XML (AJAX) provides a more seamless experience for the user by requesting information -- in XML, more often than not -- in the background and replacing only part of the page, rather than forcing the user to request a whole new Web page. As a result, the Web has advanced in leaps and bounds in this area in just the last year or so.

Mashups

As more information becomes available through Web services, more enterprising developers find more things to do with it. One way much of this data has been utilized is in the mashup, a rapidly growing type of application that combines data from multiple sources into a single view.

What is the best way for me to improve my XML skills?

If you want to improve your XML skills, the best way to do it is to get a grounding in the essentials, and then simply use it. Start with the resources listed under What is XML? and move on to those under Does XML lend itself to application development?. From there, you can move on to any of the other areas that interest you.