Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Tip: Use data URIs to include media in XML

One of the many uses of data scheme URIs is to embed media directly into XML

Uche Ogbuji (uche@ogbuji.net), Principal Consultant, Fourthought Inc.
Photo of Uche Ogbuji
Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is also a lead developer of the Versa RDF query language. He is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can find more about Mr. Ogbuji at his Weblog Copia or contact him at uche@ogbuji.net.

Summary:  There are many ways to link to non-XML content within XML, including binary content. Sometimes you need to roll all such external content directly into the XML. Data scheme URIs are one way to specify a full resource within a URI, which you can then use in XML constructs. In this tip, Uche Ogbuji shows how to use this to bundle related media into a single file.

View more content in this series

Date:  15 Feb 2006
Level:  Intermediate

Comments:  

If you have some XML to be bundled with related media, such as an image file, you can use simple URI reference. Listing 1 is an example of some XML with such a reference -- based, incidentally, on a construct in the IBM developerWorks content format.


Listing 1. Sample XML with reference to an image
                
<author>
  <bio>Mr. Smiley is always in a good mood.</bio>
  <image source="smiley.png"
       width="64" height="80"
       alt="Smiley photo"/>
  <name>
    <display>Guy Smiley</display>
  </name>
</author>

The URI reference smiley.png is a reference to an external image file. It could, of course, be a full URL such as http://www.ibm.com/developerworks/i/photo.jpg. One possible use of this might be to render HTML such as that in Listing 2.


Listing 2. Sample HTML output based on Listing 1
                
<div class="author">
  <img src="smiley.png"
       width="64" height="80"
       alt="Smiley photo"/>
  <p>Guy Smiley</p>
  <p>Mr. Smiley is always in a good mood.</p>
</div>

To create a more intimate relationship between the two files, you can include the image as an entity. This might surprise you if you're accustomed to seeing XML entities as well-formed rather than binary files, but XML provides for non-well-formed attachments through unparsed entities and notations, which are little known features. Listing 3 shows the image reference using an unparsed entity.


Listing 3. Sample XML using an unparsed entity to reference an image
                
<?xml version="1.0" standalone="no"?>
<!DOCTYPE author [
<!ELEMENT author (bio, image, name)>
<!ELEMENT bio (#PCDATA)>
<!ELEMENT image EMPTY>
<!ATTLIST image
  source ENTITY #REQUIRED
  alt    CDATA  #IMPLIED
  width  CDATA  #IMPLIED
  height CDATA  #IMPLIED
>
<!ENTITY smiley SYSTEM "smiley.png" NDATA PNG>
<!NOTATION PNG PUBLIC
  '-//TEI//NOTATION IETF RFC2083 Portable Network Graphics//EN'>
<!ELEMENT name (display)>
<!ELEMENT display (#PCDATA)>
]>
<author>
  <bio>Mr. Smiley is always in a good mood.</bio>
  <image source="smiley"
       width="64" height="80"
       alt="Smiley photo"/>
  <name>
    <display>Guy Smiley</display>
  </name>
</author>

Clearly, there's a lot more to this listing. Notations are part of the full regalia of DTDs, and I've included within the file a DTD internal subset that's required to declare the entity for the external file. Notice that I explicitly use standalone="no" in the XML declaration, just to make it clear that the XML document relies on an external file. The key bit is the declaration of the smiley entity. As an external unparsed entity, this file is a logical part of the XML document. A notation named PNG provides clues to the XML processing layer for how to handle this unparsed data. I chose to use a public identifier for PNG files defined in the well-regarded Text Encoding Initiative (TEI) specification (see Resources). Because of this entity and notation declaration, the source attribute becomes more than just the string smiley; it becomes a logical stand-in for the PNG file itself.

One thing that might put you off the unparsed entity and notation solution is that it involves DTD technology, which many XML developers choose to avoid. Another is that you still have to write all the XML processing code to grab and handle the smiley.png file. (Helpful hint: The key to getting the file associated with an unparsed entity in XSLT is the function unparsed-entity-uri.) A third solution you might consider is a data scheme URI.

Data scheme URIs

RFC 2397 (see Resources) defines a special URI type where the resource data is all contained within the URI itself. The best way to understand such URIs is by example. Listing 4 is a URI for a simple HTML Web page.


Listing 4. A simple data URI for a Web page
                
data:text/html,%3Cp%3E%3Cb%3EHello%3C/b%3E%20world%3C/p%3E

If you use this URI as the address in a Web browser that supports data URIs, it would be just as if you had given it the address of a document with the content <p><b>Hello</b> world</p>. You can see how the angle brackets and whitespace are escaped according to URI standard form. Remember that this is valid HTML even though many structural elements are not explicitly written. The first part of the URI is the data: scheme. After that, you provide an optional MIME type for the data (text/html), then an optional encoding such as ;base64. The encoding is omitted in Listing 4, so the encoding defaults to standard URI encoding (based on ASCII). Finally comes the data itself.

This works just as well for binary data such as the image example above. Listing 5 is an example of the sample author XML using a data URI for the image.


Listing 5. Sample XML with data URI reference for an image
                
<author>
  <bio>Mr. Smiley is always in a good mood.</bio>
  <image source="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABMAAAATC
AYAAAByUDbMAAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAAsSAAALEgHS3X78AAAAB3RJT
UUH0gwDAQszGYGHgQAAAMFJREFUeJydVNsRBCEIA1vantyatiesifu4Q5GHupcZf2KIYUQRc
vBiDyOyJCZMBMDsF9HQLA77ioiAmfeLyBvquMzbszwQh4+0yb/4VpQVd6i2e7IplS54wWOJU
r2FpEOb6p9ksheNRhfagowXhGargtWNO7PsFk90xQpqPTOrNTZ0Ux9xdvIjLpwz3cbzDO6+R
Rxqe84wCQAwX2PZRPaNbt/marZsqjLxCNDaLNbfj0Zrvr0IRxcgrU0pN6YZwroPdCriaYFg3
d8AAAAASUVORK5CYII="
       width="64" height="80"
       alt="Smiley photo"/>
  <name>
    <display>Guy Smiley</display>
  </name>
</author>

This time the encoding is explicitly given as base64, and the data is the encoded data in the file smiley.png (I added new lines for formatting). You could convert this file into HTML using the same XSLT that you might use for the conversion from Listing 1 to Listing 2. The resulting image would be displayed normally in any browser that supports data URIs.

Wrap up

One big problem with data URIs is that they are not yet supported in Internet Explorer. Just about every other browser -- including Mozilla (Firefox), Opera, Safari, and Konqueror -- supports them. There is some discussion of this feature among the browser user community for Internet Explorer 7, so there is a chance for future data URI coverage across the board. Data URIs do take up a lot of space, especially given the usual encoding schemes, and they can clutter up XML files. But if the convenience of not having to deal with separate files is more important than the extra space, and if you can target a browser base that supports data URIs (perhaps in intranet projects), then they can be a great tool for wrapping support files up in primary content.


Resources

Learn

Get products and technologies

About the author

Photo of Uche Ogbuji

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is also a lead developer of the Versa RDF query language. He is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can find more about Mr. Ogbuji at his Weblog Copia or contact him at uche@ogbuji.net.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Web development
ArticleID=103907
ArticleTitle=Tip: Use data URIs to include media in XML
publish-date=02152006
author1-email=uche@ogbuji.net
author1-email-cc=dwxed@us.ibm.com

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).