If you have some XML to be bundled with related media, such as an image file, you can use simple URI reference. Listing 1 is an example of some XML with such a reference -- based, incidentally, on a construct in the IBM developerWorks content format.
Listing 1. Sample XML with reference to an image
<author>
<bio>Mr. Smiley is always in a good mood.</bio>
<image source="smiley.png"
width="64" height="80"
alt="Smiley photo"/>
<name>
<display>Guy Smiley</display>
</name>
</author> |
The URI reference smiley.png is a reference to an external image file. It could, of course, be a full URL such as http://www.ibm.com/developerworks/i/photo.jpg. One possible use of this might be to render HTML such as that in Listing 2.
Listing 2. Sample HTML output based on Listing 1
<div class="author">
<img src="smiley.png"
width="64" height="80"
alt="Smiley photo"/>
<p>Guy Smiley</p>
<p>Mr. Smiley is always in a good mood.</p>
</div>
|
To create a more intimate relationship between the two files, you can include the image as an entity. This might surprise you if you're accustomed to seeing XML entities as well-formed rather than binary files, but XML provides for non-well-formed attachments through unparsed entities and notations, which are little known features. Listing 3 shows the image reference using an unparsed entity.
Listing 3. Sample XML using an unparsed entity to reference an image
<?xml version="1.0" standalone="no"?>
<!DOCTYPE author [
<!ELEMENT author (bio, image, name)>
<!ELEMENT bio (#PCDATA)>
<!ELEMENT image EMPTY>
<!ATTLIST image
source ENTITY #REQUIRED
alt CDATA #IMPLIED
width CDATA #IMPLIED
height CDATA #IMPLIED
>
<!ENTITY smiley SYSTEM "smiley.png" NDATA PNG>
<!NOTATION PNG PUBLIC
'-//TEI//NOTATION IETF RFC2083 Portable Network Graphics//EN'>
<!ELEMENT name (display)>
<!ELEMENT display (#PCDATA)>
]>
<author>
<bio>Mr. Smiley is always in a good mood.</bio>
<image source="smiley"
width="64" height="80"
alt="Smiley photo"/>
<name>
<display>Guy Smiley</display>
</name>
</author>
|
Clearly, there's a lot more to this listing. Notations are part of the full regalia of DTDs, and I've included within the file a DTD internal subset that's required to declare the entity for the external file. Notice that I explicitly use standalone="no" in the XML declaration, just to make it clear that the XML document relies on an external file. The key bit is the declaration of the smiley entity. As an external unparsed entity, this file is a logical part of the XML document. A notation named PNG provides clues to the XML processing layer for how to handle this unparsed data. I chose to use a public identifier for PNG files defined in the well-regarded Text Encoding Initiative (TEI) specification (see Resources). Because of this entity and notation declaration, the source attribute becomes more than just the string smiley; it becomes a logical stand-in for the PNG file itself.
One thing that might put you off the unparsed entity and notation solution is that it involves DTD technology, which many XML developers choose to avoid. Another is that you still have to write all the XML processing code to grab and handle the smiley.png file. (Helpful hint: The key to getting the file associated with an unparsed entity in XSLT is the function unparsed-entity-uri.) A third solution you might consider is a data scheme URI.
RFC 2397 (see Resources) defines a special URI type where the resource data is all contained within the URI itself. The best way to understand such URIs is by example. Listing 4 is a URI for a simple HTML Web page.
Listing 4. A simple data URI for a Web page
data:text/html,%3Cp%3E%3Cb%3EHello%3C/b%3E%20world%3C/p%3E
|
If you use this URI as the address in a Web browser that supports data URIs, it would be just as if you had given it the address of a document with the content <p><b>Hello</b> world</p>. You can see how the angle brackets and whitespace are escaped according to URI standard form. Remember that this is valid HTML even though many structural elements are not explicitly written. The first part of the URI is the data: scheme. After that, you provide an optional MIME type for the data (text/html), then an optional encoding such as ;base64. The encoding is omitted in Listing 4, so the encoding defaults to standard URI encoding (based on ASCII). Finally comes the data itself.
This works just as well for binary data such as the image example above. Listing 5 is an example of the sample author XML using a data URI for the image.
Listing 5. Sample XML with data URI reference for an image
<author>
<bio>Mr. Smiley is always in a good mood.</bio>
<image source="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABMAAAATC
AYAAAByUDbMAAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAAsSAAALEgHS3X78AAAAB3RJT
UUH0gwDAQszGYGHgQAAAMFJREFUeJydVNsRBCEIA1vantyatiesifu4Q5GHupcZf2KIYUQRc
vBiDyOyJCZMBMDsF9HQLA77ioiAmfeLyBvquMzbszwQh4+0yb/4VpQVd6i2e7IplS54wWOJU
r2FpEOb6p9ksheNRhfagowXhGargtWNO7PsFk90xQpqPTOrNTZ0Ux9xdvIjLpwz3cbzDO6+R
Rxqe84wCQAwX2PZRPaNbt/marZsqjLxCNDaLNbfj0Zrvr0IRxcgrU0pN6YZwroPdCriaYFg3
d8AAAAASUVORK5CYII="
width="64" height="80"
alt="Smiley photo"/>
<name>
<display>Guy Smiley</display>
</name>
</author>
|
This time the encoding is explicitly given as base64, and the data is the encoded data in the file smiley.png (I added new lines for formatting). You could convert this file into HTML using the same XSLT that you might use for the conversion from Listing 1 to Listing 2. The resulting image would be displayed normally in any browser that supports data URIs.
One big problem with data URIs is that they are not yet supported in Internet Explorer. Just about every other browser -- including Mozilla (Firefox), Opera, Safari, and Konqueror -- supports them. There is some discussion of this feature among the browser user community for Internet Explorer 7, so there is a chance for future data URI coverage across the board. Data URIs do take up a lot of space, especially given the usual encoding schemes, and they can clutter up XML files. But if the convenience of not having to deal with separate files is more important than the extra space, and if you can target a browser base that supports data URIs (perhaps in intranet projects), then they can be a great tool for wrapping support files up in primary content.
Learn
- Learn the details of data URIs from RFC 2397, which is brief and readable, as RFCs go. For a more general discussion, including pros and cons, see the Wikipedia page.
- Learn about notations and other advanced topics that often puzzle even XML pros in "The skew.org XML Tutorial," by Mike Brown.
- Discover the Text Encoding Initiative (TEI), an XML (and SGML) document format that focuses on humanities texts. Most users would actually use TEI Lite. Standard public identifiers for graphics formats such as PNG are defined in Section 22: Tables, Formulae, and Graphics.
- Find more XML resources in the developerWorks XML zone, including articles, tutorials, tips, and standards. For a complete list of XML tips to date, check out the tips summary page.
- Learn how you can become an IBM Certified Developer in XML and related technologies.
Get products and technologies
- A useful resource for experimenting with data URIs is Ian Hickson's data URI kitchen, where you can provide content to be converted to a data URI.

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is also a lead developer of the Versa RDF query language. He is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can find more about Mr. Ogbuji at his Weblog Copia or contact him at uche@ogbuji.net.