XML and Related Technologies certification prep, Part 1: Architecture

Learn where and when to use XML in system design

A software system's architecture and performance requirements affect your decision of which XML technologies are most appropriate for your application's needs. This tutorial on architecture teaches you how to discern where and when to use XML in system design. It is the first tutorial in a series of five tutorials that you can use to help prepare for the IBM certification Test 142, XML and Related Technologies.

Mark Lorenz (mlorenz@nc.rr.com), Senior Application Architect, Hatteras Software, Inc.

Photo of Mark LorenzMark Lorenz is the founder of Hatteras Software, an object-oriented consulting firm, and the author of multiple books on software development. He is certified in object-oriented analysis and design (OOAD), XML, RAD, and Java. He uses XHTML, Web services, Ajax, JSF, Spring, BIRT, and related Eclipse-based tools to develop Java enterprise applications. You can read Mark's blog on technology.



29 August 2006

Also available in Chinese Vietnamese Spanish

Before you start

In this section, you'll find out what to expect from this tutorial and how to get the most out of it.

About this series

This series of five tutorials helps you prepare to take the IBM certification Test 142, XML and Related Technologies, to attain the IBM Certified Solution Developer - XML and Related Technologies certification. This certification identifies an intermediate-level developer who designs and implements applications that make use of XML and related technologies, such as XML Schema, Extensible Stylesheet Language Transformation (XSLT), and XPath. This developer has a strong understanding of XML fundamentals; has knowledge of XML concepts and related technologies; understands how data relates to XML, in particular with issues associated with information modeling, XML processing, XML rendering, and Web services; has a thorough knowledge of core XML-related World Wide Web Consortium (W3C) recommendations; and is familiar with well-known, best practices.

Develop skills on this topic

This content is part of a progressive knowledge path for advancing your skills. See XML and data compression

Anyone working in software development for the last few years is aware that XML provides cross-platform capabilities for data, just as the Java™ programming language does for application logic. This series of tutorials is for anyone who wants to go beyond the basics of using XML technologies.

About this tutorial

This tutorial is the first in the "XML and Related Technologies certification prep" series that takes you through the key aspects of effectively using XML technologies with Java projects. This first tutorial focuses on architecture -- that is, which technologies to use in which situations in ways that will perform well.

This tutorial lays the groundwork for Part 2, which focuses on information modeling, including the use of namespaces and the definition of Document Type Definition (DTD) schemas.

This tutorial is written for Java programmers who have a basic understanding of XML and whose skills and experience are at a beginning to intermediate level. You should have a general familiarity with defining, validating, and reading XML documents and a working knowledge of the Java language.

Objectives

After completing this tutorial, you will know how to:

  • Determine the implications of a given architecture on XML design considerations

  • Select appropriate XML technologies for a given architecture

  • Assess performance considerations for XML parsing, validation, and transformation

  • Implement Java classes using Java Architecture for XML Binding (JAXB)

  • Address XML security using XML encryption and signatures

Prerequisites

This tutorial is written for developers who have a background in programming and scripting and who have an understanding of basic computer-science models and data structures. You should be familiar with the following XML-related, computer-science concepts: tree traversal, recursion, and reuse of data. You should be familiar with Internet standards and concepts, such as Web browser, client-server, documenting, formatting, e-commerce, and Web applications. Experience designing and implementing Java-based computer applications and working with relational databases is also recommended.

System requirements

You need a system with an up-to-date browser.


XML Architecture

This section of the tutorial will discuss the most effective uses of XML technologies given the particular aspects of your system architecture. By the end of this section, you will:

  • Identify areas of your system as they relate to the use of XML

  • Choose optimal XML technologies for different portions of your system, taking into account performance and security

  • Understand how to bind XML to Java

Uses of XML abound, such as Asynchronous JavaScript and XML (Ajax) for dynamic Web pages, and Rich Site Summary (RSS) for blogs and feeds. The future will bring even more. This series focuses on the core technologies, including Simple API for XML (SAX), Document Object Model (DOM), DTD, XML Schema, XPath, XLink, and XQuery.

You can find an alphabet soup of acronyms out there. Just read technical articles and you'll find XDI, RDF, REST, SVG, XUL, and much more. That's to be expected, as XML is not just a hot topic, it's the über-hot topic. Why all the hype? The main reason is that XML offers cross-platform, cross-language capabilities for data, just as Java offers cross-platform support for application logic. Take a look at some uses of XML that have hit the world market recently:

  • Feeds (RSS and Atom)

  • Dynamic Web (Ajax)

  • Blogs (Representational State Transfer, or REST)

  • Service-Oriented Architecture (SOA) and Web services

These uses are depicted in Figure 1 and Figure 2, which show how you can integrate XML technologies in an application architecture for e-business and the dynamic Web, respectively.

Figure 1. e-commerce using XML technologies
Figure 1. E-commerce using XML technologies
Figure 2. Dynamic Web using XML technologies
Figure 2. Dynamic Web using XML technologies

To benefit from any of these uses, you need grounding in XML technologies, which is what this tutorial series provides.

What is an architecture and how does it relate to XML?

"An architecture is a framework for the disciplined introduction of change." -- Tom DeMarco

If you've ever received a support call late at night for a system with a less-than-optimal architecture, you know how important it is to make wise choices in the technologies you use. Architecture comes in different aspects, including physical and logical. Figure 3 shows an example of a physical architecture.

Figure 3. Example of a physical architecture
Figure 3. Example of a physical architecture
"The fundamental organization of a system, embodied in its components, their relationships to each other and the environment, and the principles governing its design and evolution." -- ANSI/IEEE 1471-2000, Recommended Practice for Architecture Description of Software-Intensive Systems

Many definitions of system architecture abound. For the purposes of this tutorial, let's view software system architecture as:

  • Building on top of an existing structure, where available (such as extending a framework and reusing common components)

  • Distributing across processes and processors as appropriate for the requirements, with published interfaces to each piece of the system

A particular technology can help certain areas in an architecture and not help others. In the example system from Figure 3, XML could play a role in multiple areas:

  • Browser
    You can render Web pages using XML content and related XSL stylesheets. XSLT supports this capability as well as conversion to many different formats.

  • Client request
    An XMLHttpRequest is at the heart of Ajax.

  • Server reply
    When an XMLHttpRequest comes back to you, the response contents can be in XML. But even if they aren't, the browser will use the DOM to manipulate the Web page. As you'll see in Part 3 of this tutorial series, the DOM is built from XML.

  • Web services
    SOAP is an XML-based protocol for exchanging information through HTTP (in other words, over the Web). Its primary use is to request Web services remotely. It is a successor to XML Remote Procedure Call (RPC).

  • Java Message Service (JMS)
    JMS is for sending messages between processes asynchronously. Connectivity and latency issues are bypassed with guaranteed delivery. XML content of the messages provides a lingua franca, so that all parties can understand, no matter what language they use or what platform they run on.

  • Reporting
    Besides rendering for Web browsers, PDAs, and other devices, you can render XML for reports. In addition to being useful for rendering Web page content, you can also use XSLT to render reports in multiple formats.

  • Database
    This isn't your dad's database anymore. Not wanting to be left out of the XML opportunities, both IBM® and Oracle have come out with native XML databases that store XML document structures and support XQuery. The third installment of this series will cover this in more detail, but for now keep in mind that XML is plain-text at heart, so you can store it in flat files and databases even if you don't have an XML-aware database.

BIRT

Business Intelligence and Reporting Tools (BIRT) is an open source Eclipse-based framework written in Java that supports the design of reports with output to HTML and PDF. The report designs are stored on disk as XML .rptdesign files (see Resources).

This is just one example architecture. Kevin Dick's book, XML: A Managers Guide (p. 216; see Resources), lists five different enterprise applications that receive significant benefits from the use of XML:

  1. Workforce automation

  2. Knowledge management

  3. Trading partner coordination

  4. Application integration

  5. Data integration

The point is that XML can be used in many different domains, including yours.

OK, now that you have some ideas of where XML can play a part, how do you choose which technologies and which locations in your system to actually use it? I'll touch on a number of considerations in this part of the tutorial, so read on.

Using XML with an existing application

One of XML's strengths is its ability to be understood by disparate systems. If you have an existing application, whether it's written in C and running on a Linux® machine or in Java code running on a Microsoft® Windows® machine, you can integrate the legacy application into other parts of the system through XML-based communication.

In addition, some products and frameworks use XML for configuration files. For example, struts uses a struts-config.xml file to define how the controlling servlet should work; Web applications use web.xml files to define how to deploy the application for running on a server. More peripheral uses of XML are appearing all the time. Your applications can certainly make good use of these capabilities.

I'll focus instead on the more core, integrated use of XML technologies with your applications. Table 1 lists some characteristics of applications, and gives advice on when XML technologies can play a role.

Table 1. Advice on XML use

Characteristic

Discussion

Advice

Output targets and formats (PDA, browser, iPod, PDF)

The more types of output, the more benefit from XML transformation.

Use XML when multiple output formats are required.

Content size

The larger the content, the more performance hurdles you'll have to overcome using XML. This leads to consideration of alternatives, such as compression, or another format entirely, such as Abstract Syntax Notation One (ASN.1), which loses the human readability benefit.

Use XML when messaging and processing efficiency is less important than interoperability and availability of standard tools.

Interoperability

XML's greatest strength is arguably its cross-language, cross-platform format that diverse systems can understand.

Use XML when you must communicate with diverse systems.

Searching

XML supports relatively simple queries through XPath and more complex queries with the more recent XQuery. While maturing, XML technologies have been relatively weaker at searching. It is yet to be seen if XML-aware databases can help with this, since they store the XML in a tree structure. See XML-aware databases.

Don't use XML documents when searching is important. Instead, store the content in a database or use an XML-aware database.

Summarizing

XML technologies are weak at summarizing data -- for example, for reports. See XML-aware databases.

Don't use XML documents when summarization is important. Instead, store the content in a database or use an XML-aware database.

Project size

To use XML, you need a parser and code to deal with the XML events or tree.

For small projects with simple requirements, you might not want to incur the overhead of XML.

XML-aware databases

Database vendors want to support projects using XML technologies, but relational databases don't make it easy to store and retrieve XML files. IBM has introduced a new DB2® version formerly known as Viper, which supports XML data storage and indexing in a native format (in other words, it doesn't pull the XML apart to fit a relational model). Databases that store XML support XQuery, which is the XML equivalent of SQL.

XML plain-text alternatives

More efficient alternatives to plain-text XML are being examined, including binary XML and XML compression (see Resources).

So, what do these new database capabilities mean for your projects? The main thing is that you can achieve the typical strengths of databases, such as searching and summarizing, with XML data in its native form.

Performance

In this section of the tutorial, I'll discuss some of the issues that can affect performance when using XML technologies.

Choosing an appropriate processing model

As outlined in the book, Designing Web Services with the J2EE™ 1.4 Platform: JAX-RPC, SOAP, and XML Technologies (see Resources), you can choose from one of four main XML processing models, available through the following APIs:

1. SAX: Provides an event-based programming model

2. DOM: Provides an in-memory tree-traversal programming model

3. XML data binding: Provides an in-memory Java content class-bound programming model

4. XSLT: Provides a template-based programming model

SAX and DOM comprise the most common programming models. Along with XSLT, these two models are available through Java API for XML Processing (JAXP). The XML data binding model is available through the JAXB technology.

All of these choices will be discussed later in this series of tutorials, but let's examine the implications of the processing model on performance. Table 2 compares some attributes of the SAX parser to the DOM parser.

Table 2. Parsers: SAX versus DOM

SAX

DOM

Event-driven

Tree manipulation

Scales to large sizes with little change in memory use

Larger documents take more memory

Must write to new document to change the contents

Can manipulate the document in memory

More difficult to manage complex changes

Easier to make complex changes

In general, faster

Comparatively slower

More control over parsing, but can be more work for you

Generally, less work for you

The system requirements, as in most things, usually determine which parser to use. Some examples include:

  • Merging documents
    This certainly requires working with a DOM tree. It hurts my head to think about doing this tag-by-tag using SAX.

  • Small devices
    If memory is a premium, SAX uses very little. DOM must build a tree structure of the entire document.

  • Looking for certain tags
    If a certain event is to happen whenever a certain tag occurs, SAX will work nicely.

  • Complex manipulation
    If changes to different parts of the document are required based upon values from other portions of the document, then it will most likely be easier to use the DOM parser.

Finally, you can also use the two parsers in tandem. For example, you can parse a number of small documents with the SAX parser to pull out information that you need to merge into an existing document, and modify the document using the DOM parser and tree manipulation.

StAX

A new API called Streaming API for XML (StAX) is to be released in late 2006. It's a pull API, as opposed to SAX's push model, so it keeps control with the application rather than the parser. StAX can also modify the document being parsed. See Resources for more details.

Caching stylesheets

If you use XSLT to convert XML documents into different formats, you can cache the compiled thread-safe stylesheet Templates in memory, and reuse them for individual users to create their own Transformers (see Figure 4). This results in a smaller footprint for your application, and it saves the time for parsing and compiling the stylesheets.

Figure 4. Caching XSLT stylesheets
Figure 4. Caching XSLT stylesheets

Using namespaces

As you might know already, namespaces are used to declare names in your documents independent of names declared elsewhere. This can become an issue when stylesheets and other documents are incorporated through statements such as include or import. It can also be an issue if you merge multiple documents, each with their own grammar. If you use a colon in an element or attribute name, you can distinguish between the namespace prefix (to the left of the colon) and the name within the context of the namespace (in other words, local to the namespace). For example, xmlns:prefix=URI would allow you to use names like this: prefix:myname.

An upcoming tutorial in this series will discuss namespaces at length. At this time, though, I'll mention how namespaces affect performance. As you saw earlier, SAX is an event-based parser. When the parser encounters a namespace declaration, it sends the application a startPrefixMapping call and an endPrefixMapping call. These callbacks slow down your application processing. The point is not to avoid namespaces altogether -- in fact, you probably can't -- but rather to use them sparingly if you think performance will be an issue.

Binding to Java classes

As you know, XML documents contain tags and other content in a plain-text format. This incurs a performance hit. What if you could speed this up? I'll discuss two ways: JAXB and XSLT Compiler (XSLTC).

JAXB

JAXB takes XML documents and creates a semantic tree of Java objects that represents the document contents (see Figure 5). You can then manipulate these objects according to the rules defined in the related XML schema, which you previously compiled and used to create a JAXB binding framework. You can also use this framework to marshal the tree into a resulting XML document.

Besides being faster to process documents, JAXB enables you to manipulate XML through Java objects. JAXB also makes it easy to keep up with schema changes.

Figure 5. JAXB
Figure 5. JAXB

Schemas

Technically speaking, DTDs, XML Schemas (capital S), and RELAX NG are all types of XML schema (little s). XML Schemas (capital S) are strictly called W3C XML Schemas. In this tutorial, whenever you see XML Schema, realize that it is the W3C language and not the generic schema document description.

Note: JAXB does not support the use of DTDs -- you must use XML Schema as your schema language.

XSLT Compiler

You know what XSL Transformation is. XSLTC adds a compiled aspect to the mix. XLSTC is composed of two parts (see Figure 6). The first part is a compiler that creates a translet, which is a set of Java classes, from an XSL stylesheet. The second part is a processor that applies the translet to an XML instance document to transform it to the desired output format. This allows you to parse the stylesheet once and reuse it later, and thus speed up processing.

Figure 6. XSLTC
Figure 6. XSLTC

Security

Applications must feature end-to-end data security when they communicate over the Internet. No one whose computer is hit with a virus or whose site is hacked into will question the importance of securing a company's information.

So, what is available to secure communications involving XML? At its heart, sending XML document contents over the Internet securely involves both XML encryption and XML digital signature.

XML encryption involves converting the content into an unintelligible form to enforce confidentiality. Of course, the intended recipient must be able to convert it back to its original form. XML encryption has some unique capabilities too, such as being able to encrypt certain elements or element contents. This is useful, for example, when conducting sales transactions between a customer, a vendor, and the customer's bank, where different parties need to read certain portions of the document contents but should not read other portions.

XML digital signature handles the integrity part of XML security (in other words, it determines if content was changed in any way). Like its encryption peer, XML digital signature allows more granularity -- in other words, you can sign portions of documents.

Issues related to XML digital signatures, such as keeping the order of attributes during document manipulation, ensure the document can be verified on the receiving end of a communication. This is beyond the scope of this tutorial, but you can read more about it on the JavaWorld Web site (see Resources).


Conclusion

XML technologies have numerous uses in the marketplace. The key to their successful integration into an application architecture is to recognize where to use them to leverage their strengths. Knowledge of the core XML technologies as well as an understanding of architectural choices are key to the successful introduction of XML into your projects.

Summary

In this tutorial on Architecture, you learned how to:

  • Determine the implications of a given architecture on XML design considerations

  • Select appropriate XML technologies for a given architecture

  • Assess performance considerations for XML parsing, validation, and transformation

  • Implement Java classes using JAXB

  • Address XML security using XML encryption and signatures

Part 2 of this five-part series focuses on information modeling, including the use of namespaces and the definition of DTDs and schemas.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Java technology
ArticleID=155694
ArticleTitle=XML and Related Technologies certification prep, Part 1: Architecture
publish-date=08292006