Wolfgang Meier's open source eXist database is probably the most popular native XML database available today (which is not at all the same thing as saying it's the best). eXist is written in the Java™ programming language and runs on most major platforms. Programs interface with eXist through its bundled HTTP server. SOAP, XML-RPC, and RESTful interfaces are all provided, and through these you can submit XPath, XQuery, and XUpdate requests to the core server. Command-line and GUI clients are also available.
eXist requires Java 1.4 or later; otherwise, all necessary dependencies are bundled (a nice touch). In fact, installing eXist is shockingly easy for a server-side open source project. A lot of other projects, closed and open source, might learn from it. The installer is built with IzPack. The distribution is a single JAR archive. To install eXist, just run the archive like so:
$ java -jar eXist-1.0b2-build-1107.jar |
The installer brings up a GUI that asks you where you want to install the eXist directory. I put it in /home/elharo/eXist. The eXist/bin directory contains the necessary startup scripts. To launch the server, execute startup.sh (UNIX®) or startup.bat (Microsoft® Windows®):
$ ./startup.sh |
This command runs the server on port 8080 and begins serving the files in /eXist. You can connect to eXist from any Web browser. For instance, I installed eXist on eliza.elharo.com, so I can connect to it at the following URL:
http://eliza.elharo.com:8080/exist/ |
(Don't try this at home -- my firewall will block you. You'll have to connect to your own server.)
Initially, you'll see the eXist documentation, as well as some samples that you can try out.
eXist isn't really a Web server; it just uses one as a convenient interface to the underlying database server. The package also includes independent GUI clients and programming APIs that you can use to perform various operations. You can even browse it from Microsoft Windows Explorer using WebDAV. For initial experimentation, it's probably easiest to use the simple GUI client. To launch the client, execute client.sh (UNIX) or client.bat (Windows) from the eXist/bin directory:
$ ./client.sh |
As you can see in Figure 1, by default the client tries to connect to an eXist database running on the localhost on port 8080. You can specify a different host and port in the URL text field. The same window also asks you for a username and a password. By default, the username is admin; you can leave the password field blank. (Haven't software companies learned by now to not ship servers with default usernames and passwords?)
Figure 1. Connect to eXist
After you've logged in, the client displays the GUI shown in Figure 2. Initially, eXist comes with one collection, called system, in which the user information is stored. You want to stay out of this collection for now. Instead, create a new collection for your documents by selecting File > New Collection. I created a collection named books. To open the collection, double-click it in the GUI. After you open a collection, to upload documents, click the icon that looks like a bent piece of paper with a plus sign next to it.
Figure 2. The eXist admin client
I first uploaded a couple of small documents, and the database accepted them without complaint. I then tried to upload the complete text of my book Processing XML with Java. This operation failed silently, with no error message. Uploading through the Web interface instead of the GUI client also failed. However, that interface showed me a stack trace to help debug the problem. It turned out that eXist didn't resolve the relative URL used in the document type declaration. To load documents with external DTD subsets, you must manually install the DTDs on the server's filesystem and edit a catalog file to tell the database where they are; then, you have to restart the database server to make it reload the catalog file. This is a major hassle, although you normally only need to install each different DTD once. eXist works best if your documents either don't use DTDs or use only a small number of infrequently changed DTDs.
eXist supports both XPath and XQuery (see Resources for more information on both). eXist uses the XQuery syntax from the November 2003 XQuery working draft. Work is ongoing to update the database to use the syntax from more recent working drafts. The differences between the drafts for basic For-Let-Where-Order-Return (FLWOR) queries aren't large.
To enter queries against a collection, click the little binoculars icon in the GUI client to bring up the window shown in Figure 3.
Figure 3. eXist query window
Annoyingly, copy and paste functions don't work in this interface, so you have to manually type in all queries. Of course, this program is really just for testing and experiments -- you wouldn't use it for heavy-duty interaction with the database any more than you'd type raw SQL into an Oracle database. After you have a fairly good idea of the queries that you want to run, you can write programs that generate and submit the queries algorithmically, as I discuss next.
Write programs that interface with eXist
IBM®, Oracle, and the other members of the JSR 225 expert group are currently working to define an API that will do for XQuery what JDBC does for SQL. However, until this process is finished and the API is implemented in eXist, it will be necessary to use eXist's native API. You can access this API through SOAP, XML-RPC, WebDAV, or HTTP interfaces. Any API that supports one of these protocols can communicate with eXist. For instance, you can use JAX-RPC to talk to eXist over SOAP or java.net to talk to it over HTTP.
The RESTful HTTP interface is the simplest and most broadly available of the options.
For example, suppose you want to find all para elements
in the books collection that contain the word "XSLT." The XQuery in Listing 1
locates all such elements.
Listing 1. A sample XQuery
for $p in //para where contains($p, "XSLT") return $p |
You GET this query from the following URL:
http://eliza.elharo.com:8080/exist/servlet/db/books/ |
Here, eliza.elharo.com is the network host on which the database is running; 8080 is
the port; /exist/servlet/db identifies the Web app, the servlet, and the database, respectively;
and books is the specific collection you're querying in that database. eXist allows
nested collections. For instance, the books collection might contain separate fiction
and nonfiction collections, which are available at the following URLs:
http://eliza.elharo.com:8080/exist/servlet/db/books/fiction/ http://eliza.elharo.com:8080/exist/servlet/db/books/nonfiction/ |
For the purposes of this article, however, you want to query all the books, both fiction
and nonfiction. The XQuery is sent as the value of the _query
field in the URL's query string (the part of the URL after a question mark). It must be
percent-encoded in the usual way (for example, spaces become %20,
the double quotation mark becomes %22, and so forth).
Thus, you can send the query in Listing 1 to the server by
GETting the following URL:
http://eliza.elharo.com:8080/exist/servlet/db/books/?_query= for%20$p%20in%20//para%20where%20contains($p,%20%XSLT%22)%20return%20$p |
The server sends back the query results wrapped in an exist:result
element like the one in Listing 2.
Listing 2. Results of sample query
<exist:result xmlns:exist="http://exist.sourceforge.net/NS/exist" exist:hits="148" exist:start="1" exist:count="10"> <para><quote>HTML? You must be joking</quote> said the fourth, a computer science professor on sabbatical from MIT, who was engrossed in an XSLT stylesheet ...</para> <para>XSLT and the TrAX API</para> <para>Combine functional XSLT transforms with traditional imperative Java code</para> <para>The TrAX API for XSLT processing</para> <para>Once you're comfortable with one or more of these APIs, you can read Chapters 16 and 17 on XPath and XSLT. However, those APIs and chapters do require some knowledge of at least one of the three major APIs.</para> ...</exist:result> |
Other optional query string variables control whether the results are pretty printed, what elements wrap the results, how many matches return (by default, eXist only returns the first 10 hits), and so forth.
Because this is all done with HTTP GET, you can make
this query simply by typing the appropriate URL into a Web browser. Of course, any software
library that speaks HTTP can also send this query and get back the result as a stream
of XML. If you were to write this query in the Java language, you might use the
URLEncoder class to encode the query string, the
URL class to submit it, and XOM to process the results,
as shown in Listing 3.
Listing 3. Query eXist in Java code
String xquery = "for $p in //para"
+ " where contains($p, \"XSLT\") "
+ " return $p";
String encodedQuery = URLEncoder.encode(xquery);
URL u = new URL("http://eliza.elharo.com:8080/exist/servlet/db/books/?_query=");
+ encodedQuery);
InputStream in = u.openStream();
Document doc = (new Builder()).build(in);
// work with the document... |
An HTTP interface like this one is completely language independent. You can easily reproduce the functionality in Listing 3 in Perl, Python, C, C#, or any other language that has a simple HTTP library and some XML support. One of the most effective ways to query such a database is to write an XSLT stylesheet that formats the results.
XQuery allows you to get information out of the database. But what about putting
data in? This is even easier. Instead of sending a GET
request, you send a PUT request. The URL where you
PUT the data is the URL where the document
will be placed inside the database; the body of the request is the document to
store. For example, the Java code in Listing 4 grabs the RSS feed from the Cafe
con Leche Web site and puts it in the syndication collection with the name 20050401.
Listing 4. Insert documents into eXist with Java code
URL u = "http://www.cafeaulait.org/today.rss";
InputStream in = u.openStream();
URL u = new URL("http://eliza.elharo.com:8080/exist/servlet/db/syndication/20050401");
HttpURLConnection conn = (HttpURLConnection) u.openConnection();
conn.setDoOutput(true);
conn.setRequestMethod("PUT");
conn.setHeaderField("Content-type", "application/xml");
OutputStream out = conn.getOutputStream();
for (int c = in.read(); c != -1; c = in.read()) {
out.write(c);
}
out.flush();
out.close();
in.close();
// read the response... |
PUTting new documents into the database
typically requires authentication. eXist's REST interface supports HTTP Basic
authentication. The Java language supports this through the
java.net.Authenticator class. Complete details
would take this discussion a little too far afield; but in brief, you have to subclass
Authenticator with a class that knows (or knows
how to ask for) the user name and password for the database, and then install
an instance of this subclass as the system default authenticator.
Need to remove a document from the collection? Just send a
DELETE request to the appropriate URL, as shown in
Listing 5.
Listing 5. Delete a document in eXist
URL u = new URL("http://eliza.elharo.com:8080/exist/servlet/db/syndication/20050401");
HttpURLConnection conn = (HttpURLConnection) u.openConnection();
conn.setRequestMethod("DELETE");
conn.connect();
// read the response...
|
Again, in practice you also need to supply a username and a password via an
Authenticator object.
The final and trickiest operation is to modify information in the database. For
example, suppose I change my e-mail address from elharo@metalab.unc.edu
to elharo@macfaq.com. Therefore, I want to change all
<email>elharo@metalab.unc.edu</email> elements
to <email>elharo@macfaq.com</email>. XQuery
doesn't provide this capability, so eXist uses XUpdate instead. The XUpdate query
in Listing 6 makes the change.
Listing 6. Using XUpdate to update documents in eXist
<xupdate:update xmlns:xupdate="http://www.xmldb.org/xupdate" select="//email[.='elharo@metalab.unc.edu']"> elharo@macfaq.com </xupdate:update> |
Because this operation changes a resource, you need to use the
POST method to send it to the server. You post to the
URL of the document you want to change and give the XUpdate instructions in the
body of the request.
I've just hit the highlights of the REST interface. It also includes instructions to create and drop collections, to specify how the query results are formatted, and to supply user credentials. Nor is HTTP the only interface to eXist. eXist also has native APIs for Perl, PHP, and the Java language, along with generic WebDAV, SOAP, and XML-RPC interfaces. Broad API support is one of the particular strengths of eXist.
Performance, robustness, and stability
eXist is not the fastest database on the planet. You can easily use a stopwatch to measure the time it takes to load a medium-sized document, even on fast hardware connecting to a local database. Query speed is of similar quality. Complex queries over moderately large collections give you enough time to brew a cup of coffee. To improve both document loading and query times, you can give eXist more memory. The default configuration that ships with eXist specifies settings that are appropriate for machines with about 256 MB of memory. If you have a beefier server, you can modify the conf.xml file to allocate more memory.
To tune the database, you can add indexes. By default, eXist indexes element
and attribute nodes as well as the full text of the document. You can specify additional
range indexes for particular node-sets that are likely to occur in your queries. For instance,
if you know that you are likely to do a lot of queries that looked at para
elements, you can define an index on //para. This tells
eXist to precompute and store the values of all the para
elements in the document because they're likely to be needed later.
Still, eXist is mostly suitable for small collections where speed isn't critical. If you have gigabyte-sized documents or you process thousands of transactions per hour, plan to look elsewhere.
Similarly, I'm not sure I'm ready to trust my critical data to eXist. I haven't personally experienced any database corruption. However the developers are still finding and fixing database corruption problems more frequently than I'm comfortable with. On the plus side, eXist does make it quite easy to back up the database. Very importantly, the backup format saves the contents in real textual XML, not some proprietary binary format; this means that in a worst-case scenario, you can fix problems with a text editor. If you make frequent archival backups, eXist is unlikely to do anything that makes the data irretrievable.
Feature-wise, eXist suffices for basic needs and includes some unexpected lagniappes such as XInclude support. Transactions, rollover, fallback, and similar enterprise-level features are all missing (transactions are on the "to do" list); but many applications don't need such advanced functionality.
One of my biggest concerns about eXist (or any other XQuery-based native XML database, for that matter) is the stability of the underlying standards and APIs. This article is based on the latest beta of eXist, from November 2004, which is based on the XQuery drafts from November 2003. The version of eXist now in CVS has made quite a few backwards-incompatible changes that are not yet fully documented. More changes will come in the future, both in eXist and in the W3C specs it depends on. Do not put eXist into production unless you're comfortable with frequent updates that will require you to retest and rewrite some of your own code.
The more data you have, the more important it becomes to use some sort of database system to manage it. If the data is XML, a solid native XML database is an obvious choice. Is eXist such a solid system? Sadly, the answer is no. eXist is an interesting research project that might develop into a useful tool in a year or two. However, it's hard to recommend in its current state. Documentation is incomplete and often misleading. Error messages are nonexistent. (Note to programmers everywhere: Exception stack traces don't count as decent error messages -- and sometimes eXist doesn't even give you those.) GUIs violate user interface standards at every turn. Basic features like copy and paste are omitted. During the very basic testing I did for this article, I encountered multiple bugs.
eXist isn't finished yet. It's currently classified as a beta. Many of the problems I encountered might be fixed before version 1.0 ships, but that won't happen tomorrow. I know some people now use eXist for real work today, and that worries me. Either they're very lucky, or they carefully craft their queries and documents to avoid eXist's bugs. If you're interested in contributing to a worthwhile open source project, eXist is a worthwhile candidate. However, the same incompleteness that makes it a fun project for programmers with time on their hands makes it unsuitable for production systems.
- Download eXist from SourceForge.
- eXist sits on top of the Cocoon application server
from the XML Apache Project and bundles the Jetty servlet engine.
However, it can be integrated into other servlet containers, such as the Apache Jakarta Project's Tomcat.
- The eXist installer was built with Julien Ponge's open source IzPack.
- Read Elliotte Rusty Harold's book Processing XML with Java (Addison Wesley Professional, 2002)
online or
buy it on paper.
- Explore Java Network
Programming (O'Reilly Media, 2004) and its explanation of how the
URLandURLConnectionclasses talk to HTTP servers such as eXist's REST interface. - Read Ronald Bourret's solid introduction to using XML with various types of database systems.
- Check out how IBM and Oracle lead the expert group for
Java Specification Request 225, XQuery API for Java.
Currently, an early draft review of this specification is available.
- Printed out, the W3C's XQuery specs run to hundreds of pages. The author recommends that you start with the XML Query Use Cases.
- Learn more about the XML Path Language (XPath) by reading the W3C Recommendation.
- Read the XUpdate specification.
- The author's server Eliza is named after Eliza de la Zeur in Neal Stephenson's Baroque Cycle.
- Read the previous installments of Elliotte Rusty Harold's Managing XML data column here on developerWorks.
- Find hundreds more XML resources on the developerWorks XML zone.
- Learn how you can become an IBM Certified Developer in XML and related technologies.

Elliotte Rusty Harold is originally from New Orleans, to which he returns periodically in search of a decent bowl of gumbo. However, he resides in the Prospect Heights neighborhood of Brooklyn with his wife Beth and cats Charm (named after the quark) and Marjorie (named after his mother-in-law). He's an adjunct professor of computer science at Polytechnic University, where he teaches Java technology and object-oriented programming. His Cafe au Lait Web site has become one of the most popular independent Java sites on the Internet, and his spin-off site, Cafe con Leche, has become one of the most popular XML sites. His books include Effective XML, Processing XML with Java, Java Network Programming, and The XML 1.1 Bible. He's currently working on the XOM API for processing XML and the Jaxen XPath engine. You can contact him at elharo@metalab.unc.edu.



