When people first thought about how to use XML, one of the early questions was If I have a lot of related XML files, how can I best manage them in a system that allows individual control over each file where necessary, plus broader access and management across files where necessary? Systems that provide some practical answer to this question are called XML repositories. XML repositories support collections of XML documents, and provide persistent storage and services (whether as a mainstream Web service or some other form) for access and manipulation of the contents. Once your XML repository is in place, you'll probably want to perform ad-hoc as well as programmatic queries on the data, and so developers of early XML databases, server frameworks, and programming language interfaces (commercial as well as open source) came up with proprietary interfaces for data access. Soon, as is the custom in most areas of XML development, an informal group got together to write common specifications for XML repository APIs. In this case it was the XML:DB group (see Resources).
The XML:DB group is even less formal than most, and has developed its specifications in very sketchy fashion. Even so, it has come up with several specifications that are influential as well as widely implemented. XUpdate (see Resources) is probably the best known example, but another important entry is the Application Programming Interface for XML Databases (XAPI). In the XML:DB's own words (from the XAPI specification page):
The XML:DB API [XAPI] is designed to enable a common access mechanism to XML databases. The API enables the construction of applications to store, retrieve, modify and query data that is stored in an XML database. These facilities are intended to enable the construction of applications for any XML database that claims conformance with the XML:DB API. The API can be considered generally equivalent to technologies such as ODBC, JDBC or Perl [or Python] DBI.
In this article I introduce XAPI and contrast it with similar specifications.
With the words "XML", "API", and "query" featuring so prominently in the description so far, you might wonder But what about DOM, XQuery, and the like? XAPI neatly falls into a niche between some of the better known specifications. DOM focuses on node-level access of individual documents. XPath is also typically used in association with individual documents, although technically it can work on any grove of XML nodes; a grove is a collection of trees (in computer science terms, a Directed Acyclic Graph), and XSLT is an example of a host language for XPath that takes advantage of this distinction to allow XPath to process the contents of multiple source documents (courtesy of the XSLT document() function). XAPI also takes advantage of XPath's flexibility, as I shall show.
XQuery is designed from the ground up to support aggregations that span multiple documents, but its basic method is to define a very rigorous data model and semantics of the abstract data being queried. XQuery is designed to work just as well accessing legacy data stores such as relational and object databases, and the resulting abstraction and complexity gives it a very different feel from that of popular XML repositories, which are often just collections of XML as plain files or in simple hash databases. So where XQuery essentially provides a calculus for all conceptualizations of how one might access XML data, there is still room for a simple arithmetic of basic XML collections, focusing on the idea of file-systems-like hierarchies of XML documents without too much conceptual load. To offer another analogy, XQuery feels like a high-end enterprise DBMS, with all the associated power and cost, whereas XAPI is more like the GNU utilities for UNIX file processing -- focusing on pipelines of input and output text, with very simple operations in each processing stage. The two need not be mutually exclusive, and indeed some repositories that implement XAPI also implement XQuery.
XAPI is designed for modular understanding and implementation. As with DOM it is broken into modules, each of which is defined in Interface Definition Language (IDL) for language-independence, although this imparts a very strongly-typed, object-oriented bias that may not suit all languages very well. Again like DOM, the XAPI modules are organized into levels of conformance: Minimum Conformance Level, Core Level 0, and Core Level 1.
Minimum Conformance Level defines interfaces for basic repository features such as Resource -- the basic unit of data (typically an individual document) -- and Collection -- representing a collection of resources (typically some sort of folder or container). In addition, Service provides extensions to collections for query and management tasks, Database abstracts a connection to a particular repository, and ResourceIterator and ResourceSet generally represent result sets from queries. This conformance level deals strictly with abstract objects as the content of resources, but it does support the idea of extensible types and identifiers for resources.
Core Level 0 refines the abstract idea of resources to add XML particulars. It allows you to get the contents of an XMLResource as a DOM node (method getContentAsDOM()) or as a series of SAX events (method getContentAsSAX()). Similarly, Core Level 0 includes methods for modification of XML content, as well as an interface (BinaryResource) for content defined as (non-XML) byte streams.
Core Level 1 builds upon the other levels, and adds the following interfaces for common query and manipulation services:
XPathQueryService: Allows you to use XPath to query a collection or resource, including methods for namespace mapping and for query execution.XUpdateQueryService: Allows you to use XUpdate to modify a collection or resource.CollectionManagementService: Allows you to add and remove collections (think "make directory" and "remove directory").TransactionService: Defines transaction context within services, allowing for clean data operations when multiple tasks are operating at the same time.
Listing 1 is a simple Java program excerpt from the XAPI use cases that exercises all three XAPI levels. It queries a movie database for movies with the title "Music Man".
Listing 1. Simple query of a movie database
import org.xmldb.api.base.*;
import org.xmldb.api.modules.*;
import org.xmldb.api.*;
/**
* Simple XML:DB API example to query the database.
*/
public class Example1 {
public static void main(String[] args) throws Exception {
Collection col = null;
try {
/* Section A */
String driver = "org.vendorx.xmldb.DatabaseImpl";
Class c = Class.forName(driver);
Database database = (Database) c.newInstance();
DatabaseManager.registerDatabase(database);
col =
DatabaseManager.getCollection("xmldb:vendorx://db.xmlmovies.com:2030/movies");
/* Section B */
String xpath = "//movie[@title='Music Man']";
XPathQueryService service =
(XPathQueryService) col.getService("XPathQueryService", "1.0");
ResourceSet resultSet = service.query(xpath);
/* Section C */
ResourceIterator results = resultSet.getIterator();
while (results.hasMoreResources()) {
Resource res = results.nextResource();
System.out.println((String) res.getContent());
}
}
catch (XMLDBException e) {
System.err.println("XML:DB Exception occurred " + e.errorCode);
}
finally {
if (col != null) {
col.close();
}
}
}
} |
The driver is the module that ties the XAPI interfaces to the actual database or library implementation. The code section labeled "A" essentially creates a database connection and selects a collection within the database. The section labeled "B" sets up and executes an XPath query (for simplicity, no namespaces are used). The section labeled "C" iterates over and prints the results of the query.
The current XAPI working drafts are three years old, which is cause for some caution but should not put you off entirely. For one thing, remember that the much-touted XQuery has been winding its way along for at least as many years. For another, the XML:DB group is (in-)famous for developing specifications that hit a bit of a rut in mid-development, and yet are simple and clean enough to become widely implemented. (XUpdate is a good example; that spec is also in need of repair, yet it is widely implemented.) You can download a good number of XAPI implementations right away (see Resources), including a reference implementation hosted on SourceForge. I have seen more and more lightweight XML repository systems emerging, and if you are developing one you should certainly consider providing an XAPI-like interface. It's simple enough to understand, and probably just as easy to implement.
- Check out the XAPI home page and the overall XML:DB page. The XAPI Use Cases are a good way to get a quick idea of the API, which is set forth in more detail (IDL and Javadocs) in the Working Draft.
- Try out some XAPI implementations, such as the reference implementation, Apache XIndice, or eXist. Be aware that one of the earliest XAPI implementations, dbXML, no longer exists, having merged with the SleepyCat Berkeley XML DB project. XAPI support was dropped in the process.
- Learn more about XUpdate,
which defines update facilities for modifying data in XML documents. XUpdate is designed to work on regular XML documents as well as XML in database collections, and even virtual XML data models. It is an XML vocabulary similar to XSLT, but is much simpler and is a very accessible vocabulary overall. Like XSLT, it uses XPath for accessing the document to be modified, and has specialized elements that define output operations. XUpdate is also widely implemented, mostly among open-source tools such as XML DBMS and XML difference and patching tools. The XUpdate Use Cases draft also serves as an excellent introduction to XUpdate.
- Find more XML resources on the developerWorks XML zone, including Uche Ogbuji's Thinking XML column. See also Part 2 and Part 4 of the XML standards survey, which mention XUpdate and XAPI.
- Browse for books on these and other technical topics.
- Find out how you can become an IBM Certified Developer in XML and related technologies.

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is also a lead developer of the Versa RDF query language. He is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche@ogbuji.net.