The busy Java developer's guide to db4o: Queries, updates, and identity

Count the ways to query in db4o

Whereas the RDBMS uses SQL as its principal mechanism for finding and retrieving data, an OODBMS can use one of several different mechanisms. In this second installment of his series, Ted Neward introduces a few of the options, including Query by Example and custom mechanisms unique to the OODBMS. As he explains, some of the alternatives can be easier to use than SQL itself.

Share:

Ted Neward, Principal, Neward & Associates

Ted Neward photoTed Neward is the principal of Neward & Associates, where he consults, mentors, teaches, and presents on Java, .NET, XML Services, and other platforms. He resides near Seattle, Washington.



27 March 2007

Also available in Chinese Japanese

In the first article in this series I discussed the failure of the RDBMS as a solution for Java™ object storage. As I explained, an object database like db4o simply has more to offer to object-oriented developers, in today's object-oriented world, than its relational cousins.

About this series

Information storage and retrieval has been nearly synonymous with RDBMS for about a decade now, but recently that has begun to change. Java developers in particular are frustrated with the so-called object-relational impedance mismatch, and impatient with the solutions that attempt to resolve it. This, along with the emergence of a viable alternative, has led to a renaissance of interest in object persistence and retrieval. This series is a working introduction to db4o, an open source database that leverages today's object-oriented languages, systems, and mindset. See the db4o home page to download db4o now; you'll need it to follow the examples.

In this and future articles, I'll continue making the case for the object database. I'll use examples to demonstrate the power of a storage system optimized for the same "shape" of entities that you work with in your object-oriented programming language of choice -- in this case, the Java language. In particular, I'll introduce the various mechanisms available for retrieving, modifying, and restoring objects back into db4o. As you'll learn, it's actually quite amazing what you can do once you're freed from the constraints of SQL.

If you haven't already done so, you may want to download db4o now. You need it to compile the examples.

Query by Example

Query by Example (QBE) is a database query language that allows you to create queries by designing a "template" against which to do comparisons, rather than a language using predicate criteria (as in SQL). I demonstrated data retrieval using db4o's QBE engine last time, but I'll quickly recap here. Start with a look at my admittedly primitive database. It consists of one type, whose definition appears in Listing 1:

Listing 1. The Person class
package com.tedneward.model;

public class Person
{
    public Person()
    { }
    public Person(String firstName, String lastName, int age)
    {
        this.firstName = firstName;
        this.lastName = lastName;
        this.age = age;
    }
    
    public String getFirstName() { return firstName; }
    public void setFirstName(String value) { firstName = value; }
    
    public String getLastName() { return lastName; }
    public void setLastName(String value) { lastName = value; }
    
    public int getAge() { return age; }
    public void setAge(int value) { age = value; }

    public String toString()
    {
        return 
            "[Person: " +
            "firstName = " + firstName + " " +
            "lastName = " + lastName + " " +
            "age = " + age + 
            "]";
    }
    
    public boolean equals(Object rhs)
    {
        if (rhs == this)
            return true;
        
        if (!(rhs instanceof Person))
            return false;
        
        Person other = (Person)rhs;
        return (this.firstName.equals(other.firstName) &&
                this.lastName.equals(other.lastName) &&
                this.age == other.age);
    }
    
    private String firstName;
    private String lastName;
    private int age;
}

As POJOs go, Person is hardly a complex beast. It consists of three fields and some basic methods to support POJO-like activities, namely toString() and equals(). (Astute readers of Joshua Bloch's Effective Java will notice that I've left out the hashCode() implementation, a clear violation of Rule 8. In the classic parlance of authors everywhere, I leave hashCode() as "an exercise to the reader," which typically means the author either doesn't want to bother with it or doesn't think it's necessary to the example at hand. I also leave it as an exercise to the reader to decide which is the case here.)

In Listing 2, I create a half dozen of objects, place them into a file, and then use QBE to call up the two objects whose first names match the pattern "Brian." This style of query uses a prototype object (the one passed in to the get() call) to decide if objects in the database match and returns an ObjectSet (essentially a collection) of those objects that match the criteria.

Listing 2. Query by Example
import com.db4o.*;

import com.tedneward.model.*;

public class Hellodb4o
{
    public static void main(String[] args)
        throws Exception
    {
        ObjectContainer db = null;
        try
        {
            db = Db4o.openFile("persons.data");

            Person brian = new Person("Brian", "Goetz", 39);
            Person jason = new Person("Jason", "Hunter", 35);
            Person brians = new Person("Brian", "Sletten", 38);
            Person david = new Person("David", "Geary", 55);
            Person glenn = new Person("Glenn", "Vanderberg", 40);
            Person neal = new Person("Neal", "Ford", 39);
            
            db.set(brian);
            db.set(jason);
            db.set(brians);
            db.set(david);
            db.set(glenn);
            db.set(neal);

            db.commit();
            
            // Find all the Brians
            ObjectSet brians = db.get(new Person("Brian", null, 0));
            while (brians.hasNext())
                System.out.println(brians.next());
        }
        finally
        {
            if (db != null)
                db.close();
        }
    }
}

Rules of query

Because QBE uses a prototype object as its template to search for data, there are a few simple rules regarding its usage. When db4o is searching all the objects of the Person type for a given target (an oversimplification of what happens, but conceptually accurate), to determine if a particular object in the data store meets the criteria, the field values are compared one by one. If a field in the prototype is "null," then that value matches against any value in the data store; otherwise, the values must match exactly. For primitive types, because primitive types cannot really hold a value of "null," zero is used as the wildcard value. (This also points out a limitation of the QBE approach -- zero cannot effectively be used as a value to search on.) Should multiple field values be specified, then all the field values must be met by the object in the database for the candidate object to meet the query criteria; in essence, this means that the fields are "AND"-ed together to form the query predicate.

In the previous example, the query is looking for all Person types where the firstName field is equal to "Brian," and the lastName and age fields are effectively ignored. In a table, this call would roughly correspond to an SQL query of SELECT * FROM Person WHERE firstName = "Brian". (Be careful, though, about trying to map OODBMS queries to SQL: the analogy isn't perfect and can lead to misunderstanding about the nature and performance of particular queries.)

The returned object from a query is an ObjectSet, which is similar to a JDBC ResultSet in that it's a simple container of objects. Walking the results is a simple exercise in using the Iterator interface implemented by ObjectSet. Using the particular methods of Person would require a downcast on the objects returned by next().


Updates and identity

While a simple display of data is interesting in itself, most objects will also need to be modified and restored back to the database. This is probably the trickiest part of working with an OODBMS because an object database uses a different notion of identity from a relational database. Practically speaking, this means you have to be more careful about objects-in-memory-versus-objects-in-storage when working with an object database.

The simple example in Listing 3 demonstrates this differing notion of identity:

Listing 3. The three Brians
import com.db4o.*;

import com.tedneward.model.*;

public class Hellodb4o
{
    public static void main(String[] args)
        throws Exception
    {
        ObjectContainer db = null;
        try
        {
            db = Db4o.openFile("persons.data");

            Person brian = new Person("Brian", "Goetz", 39);
            Person jason = new Person("Jason", "Hunter", 35);
            Person brians = new Person("Brian", "Sletten", 38);
            Person david = new Person("David", "Geary", 55);
            Person glenn = new Person("Glenn", "Vanderberg", 40);
            Person neal = new Person("Neal", "Ford", 39);
            
            db.set(brian);
            db.set(jason);
            db.set(brians);
            db.set(david);
            db.set(glenn);
            db.set(neal);

            db.commit();
            
            // Find all the Brians
            ObjectSet brians = db.get(new Person("Brian", null, 0));
            while (brians.hasNext())
                System.out.println(brians.next());
                
            Person brian2 = new Person("Brian", "Goetz", 39);db.set(brian2);db.commit();

            // Find all the Brians
            ObjectSet brians = db.get(new Person("Brian", null, 0));
            while (brians.hasNext())
                System.out.println(brians.next());
        }
        finally
        {
            if (db != null)
                db.close();
        }
    }
}

When you run the query in Listing 3, the database reports three Brians, two of them Brian Goetz. (A similar effect would occur if the persons.data file already existed in the current directory -- all of the Persons created would be stored into the persons.data file, and all the Brians stored there would be returned by the query.)

The Extension interface

The db4o development team occasionally finds that certain APIs are less frequently used, or represent "experiments" on the API that the team isn't sure should be a part of the core ObjectContainer API. In those cases, the methods are provided on the ExtObjectContainer instance returned by the ext() method. Methods available on this class vary from release to release as they are introduced, removed, or moved to the core ObjectContainer class itself. The list has been known to include methods to test objects in memory to see if they are associated with the db4o container instance, get a list of all the classes known by the container, or to set/release semaphores for concurrency. As always, see the db4o documentation for the complete details of the ExtObjectContainer class.

Clearly, the old rules regarding primary keys aren't in force here; so how does an object database deal with notions of uniqueness?

Embrace the OID

When an object is stored into an object database, a unique key is created, called an Object identifier or OID (pronounced similarly to the last syllable of avoid), which uniquely identifies that object. The OID, like the this pointer/reference found in C# and Java programming, is silent unless explicitly requested. In db4o, the OID for a given object can be found through a call to db.ext().getID(). (You can also use the db.ext().getByID() method to retrieve objects by OID. Calling this method has some implications too complex to discuss here, but it remains an option.)

In practice, all this means is that it falls to the developer to determine whether an object previously exists in the system, usually by querying the container for that object before inserting it, as shown in Listing 4:

Listing 4. Query before inserting
// ... as before
        ObjectContainer db = null;
        try
        {
            db = Db4o.openFile("persons.data");
            
            ...

            // We want to add Brian Goetz to the database; is he already there?
            if (db.get(new Person("Brian", "Goetz", 0).hasNext() == false)
            {
                // Nope, no Brian Goetz here, go ahead and add him
                db.set(new Person("Brian", "Goetz", 39));
                db.commit();
            }
        }

In this particular case, let's assume that the uniqueness of a Person in the system is its first name-last name combination. When searching for Brian in the database, you therefore need only look for those attributes on the Person instances. (Maybe Brian was added a couple of years ago, before he turned 39.)

If you want to modify the object in the database, it's simple to take the object retrieved from the container, modify it in some way, and then store it back, as shown in Listing 5:

Listing 5. Updating an object

Click to see code listing

Listing 5. Updating an object

// ... as before
        ObjectContainer db = null;
        try
        {
            db = Db4o.openFile("persons.data");
            
            ...

            // Happy Birthday, David Geary!
            if ((ObjectSet set = db.get(new Person("David", "Geary", 0))).hasNext())
            {
                Person davidG = (Person)set.next();davidG.setAge(davidG.getAge() + 1);db.set(davidG);db.commit();
            }
            else
                throw new MissingPersonsException(
                    "David Geary doesn't seem to be in the database");
        }

The db4o container doesn't run into problems of identity here because the object in question has already been identified as one coming from the database, meaning its OID is already stored inside the db4o bookkeeping infrastructure. Accordingly, when you call set, db4o knows to update the existing object rather than insert a new one.


A search utility method

The notion of a primary key that is application-specific is worth keeping around, even though not inherent to QBE. What you need is a utility method to simplify identity-based searches. This section shows you a solution based on using the Reflection APIs to poke the right values into the right fields, as well as suggesting ways to tune the solution for various preferences and aesthetics.

Let's start with a basic premise: I have a db4o database in which I have a type (Person) that I want to query based on a set of fields that have certain values in them. Within this method, I use the Reflection APIs on Class to create a new instance of that type (invoking its default constructor). I then iterate through an array of Strings that have the fields in them, getting back each Field object in the Class. Following that, I iterate through the array of objects that correspond to the values for each of those fields and then call Field.set() to poke that value into my template object.

Once that's all done, I call get() on the db4o database and check to see if the ObjectSet returned contains any objects. This gives me a basic method outline that looks like the one shown in Listing 6:

Listing 6. A utility method for doing QBE identity searches
import java.lang.reflect.*;
import com.db4o.*;

public class Util
{
    public static boolean identitySearch(ObjectContainer db, Class type,
        String[] fields, Object[] values)
            throws InstantiationException, IllegalAccessException,
                NoSuchFieldException
    {
            // Create an instance of our type
            Object template = type.newInstance();
            
            // Populate its fields with the passed-in template values
            for (int i=0; i<fields.length; i++)
            {
                Field f = type.getDeclaredField(fields[i]);
                if (f == null)
                    throw new IllegalArgumentException("Field " + fields[i] + 
                        " not found on type " + type);
                if (Modifier.isStatic(f.getModifiers()))
                    throw new IllegalArgumentException("Field " + fields[i] + 
                        " is a static field and cannot be used in a QBE query");
                f.setAccessible(true);
                f.set(template, values[i]);
            }
            
            // Do the query
            ObjectSet set = db.get(template);
            if (set.hasNext())
                return true;
            else
                return false;
    }
}

Obviously a great deal could be done to tune this method to taste, such as catching all of the exception types and rethrowing them as runtime exceptions instead, or returning the ObjectSet itself instead of true/false, or even returning an array of objects containing the contents of the ObjectSet (which would then make it easy to check the length of the returned array). What is apparent from Listing 7, however, is that its usage is arguably not much simpler than the basic QBE version shown already:

Listing 7. The utility method at work
// Is Brian already in the database?
if (Util.identitySearch(
    db, Person.class, {"firstName", "lastName"}, {"Brian", "Goetz"}) == false)
{
    db.set(new Person("Brian", "Goetz", 39));
    db.commit();
}

Actually, much of the utility method's utility become apparent when placed onto the stored class itself, as shown in Listing 8:

Listing 8. Using the utility method from within Person
public class Person
{
    // ... as before
    
    public static boolean exists(ObjectContainer db, Person instance)
    {
        return (Util.identitySearch(db, Person.class,
            {"firstName", "lastName"},
            {instance.getFirstName(), instance.getLastName()});
    }
}

Or, again, you could tweak the method to return the instance found, so that the Person instance had its OID appropriately associated, and so on. The key thing to remember is that you can build convenience methods on top of the db4o infrastructure to make it easier to use.

Note that there is a more efficient way to perform this style of query against the underlying objects stored on disk using the db4o SODA query API, but it's slightly out of scope for this article, so I'll leave it for a later discussion.


Advanced queries

So far you've seen how to query for individual objects, or objects that meet a particular criteria. Although this makes for a fairly easy way to issue queries, it also makes for somewhat limited options; for example, what if you needed to retrieve all Persons whose last name started with G, or all Persons of an age greater than 21? A QBE approach would fail pretty badly for these types of queries because QBE does equality matches, not comparison.

Historically, even moderately complex comparison has been a weakness of the OODBMS and a strength of the relational model and SQL. Issuing a comparison query in SQL is trivial, but to do the same in the OODBMS required one of several unappealing approaches:

  • Fetch all the objects and do the relative comparison yourself.
  • Extend the QBE API to include predicates.
  • Create a query language to be translated into a query against your object model.

Weak on comparison

Clearly the first option above is only viable for the most trivial of databases because it puts an obvious upper-bound on the size of the database that you can practically use. Fetching a million objects is not something even the hardiest hardware will shrug off easily, particularly if it's across a network connection. (This is not an indictment of the OODBMS, by the way -- fetching a million rows across a network connection may be within the RDBMS server's capabilities, but it will still crush the network it's on.)

The second option pollutes the simplicity of the QBE approach and leads to monstrosities like the one shown in Listing 9:

Listing 9. A QBE call with predicates
Query q = new Query();
q.setClass(Person.class);
q.setPredicate(new Predicate(
    new And(
        new Equals(new Field("firstName"), "David"),
        new GreaterThan(new Field("age"), 21)
    )));
q.Execute();

It's fairly easy to see how any moderately complicated query will rapidly become unworkable using this technique, particularly when compared against the simplicity of a query language like SQL.

The third option is to create a query language that can then be used to query the database's object model. In the past, the OODBMS folks created a standard query language, Object Query Language, or OQL, which looked something like what you see in Listing 10:

Listing 10. A snippet of OQL
SELECT p FROM Person
WHERE p.firstName = "David" AND p.age > 21

On the surface, OQL seems remarkably similar to SQL, and thus supposedly just as powerful and easy to use. The drawback of OQL is that it wants to return ... what? A language so similar to SQL would seem to want to return column sets (tuples), like SQL does, but an object database doesn't work that way -- it wants to return objects, not arbitrary sets. Especially in a strongly typed language like C# or Java programming, these object types have to be known a priori, unlike the set-based notion of SQL.


Native queries in db4o

Rather than force a complex query API onto developers or introduce a new "something-QL," db4o offers a facility called native queries, which is both powerful and remarkably easy to use, as you can begin to see in Listing 11. (A query API for db4o is available in the form of SODA queries, which are used principally for fine-grained query control. As you'll see in a second, however, SODA is generally only necessary for hand-optimizing queries.)

Listing 11. A db4o native query
// ... as before
        ObjectContainer db = null;
        try
        {
            db = Db4o.openFile("persons.data");
            
            ...

            // Who wants to get a beer?
            List<Person> drinkers = db.query(new Predicate<Person>() {
                public boolean match(Person candidate) {
                    return person.getAge() > 21;
                }
            }
            for (Person drinker : drinkers)
                System.out.println("Here's your beer, " + person.getFirstName());
        }

The "native" part of the query is the fact that it is written in the programming language itself -- in this case the Java language -- rather than in some arbitrary language that must then be translated into something else. (A non-generics version of the Predicate API is available for versions prior to Java 5, though it isn't quite as easy to use.)

Thinking about this for a moment, you will probably begin to wonder how exactly this particular approach is being implemented. Either a source preprocessor has to be used to translate the source file with the query in it into something the database engine understands (a la SQL/J or other embedded preprocessors of note), or the database is sending all the Person objects back to the client where the predicate is executed against the complete set (in other words, exactly the approach rejected earlier).

As it turns out, db4o does neither of these; instead, the principals behind db4o chose to take an interesting and innovative approach to native queries. Loosely put, the db4o system sends a predicate to the database, where it performs bytecode analysis at runtime on the bytecode for the match() method. If the bytecode is easy enough to understand, db4o will turn that query into a SODA query for efficiency, in which case there is no need to instantiate all the objects to pass into the match() method. In this way, programmers can continue to write queries in the language they're comfortable with, but the query itself can be translated into something the database can understand and execute efficiently. (Sort of a "JQL" -- Java Query Language -- if you will. But please don't repeat that name to the db4o developers; you'll get me in trouble.)

Be sure to include BLOAT!

The db4o Java distribution includes several jar files, including a core db4o implementation for each of the JDK 1.1, JDK 1.2, and Java 5 releases. Also included in the distribution is a jar file called BLOAT. Despite its name, this is a Java bytecode optimizer developed at Purdue University that must be present, along with the db4o-5.0-nqopt.jar, in the runtime classpath for native queries to work. Failing to include these libraries will not generate an error of any kind but will simply cause every native query to be unoptimized. (Developers can find this out, but only in a passive fashion, using the listeners described in this section.)

Let db4o tell you ...

The native queries approach isn't perfect. For example, it is entirely possible to write a native query complex enough to defeat the bytecode analyzer, thus requiring the worst-case execution model to take place. In this worst-case scenario, db4o would have to instantiate every object of the queried type in the database and pass each one through the match() implementation. Predictably, this would kill query performance, but you can work around it by installing listeners where you need them.

Intuition isn't always sufficient for anticipating a failure to optimize because the reasons can be entirely different from what a code review would imply. For example, including a console print statement (System.out.println in your Java code, or System.Console.WriteLine in C#) causes the optimizer to fail in the .NET version of db4o, whereas the Java version optimizes the statement away. You can't really anticipate variations of this type (although you can learn about them by experience), so it's always a good idea to Let The System Tell You, as they say in extreme programming.

Simply register a listener (Db4oQueryExecutionListener) against the ObjectContainer itself to inform you if a native query cannot be optimized, as shown in Listing 12:

Listing 12. DiagnosticListener
// ... as before
        ObjectContainer db = null;
        try
        {
            db = Db4o.openFile("persons.data");
            
            db.ext().configure().diagnostic().addListener(new DiagnosticListener() {
                public void onDiagnostic(Diagnostic d) {
                    if (d instanceof NativeQueryNotOptimized)
                    {
                        // could display information here, but for simplicity
                        // let's just fail loudly
                        throw new RuntimeException("Native query failed optimization!");
                    }
                }
            });
        }

Obviously, this would only be desirable during development -- at runtime it would be preferable to log this failure to a log4j error stream or something equally less distracting to the user.


In conclusion

In this second article in The busy Java developer's guide to db4o, I used the OODBMS notion of identity as a launching point for explaining how db4o stores and retrieves objects, as well as introducing its native query facility.

QBE is the preferred mechanism for simple query situations because it's an easier API to work with, but it does require that your domain objects permit any or all of the fields containing data to be set to null, which may violate some of your domain rules. For example, it would be nice to be able to enforce both a first name and a last name for Person objects. Using Person in a QBE query for just last name, however, mandates that the first name be allowed to be null, which effectively means we either have to choose the domain constraint or the query capability, neither of which is entirely acceptable.

Native queries provide a powerful way to execute complex queries without having to learn a new query language or resort to complicated object structures to model a predicate. And for those situations where db4o's native query facility fails to meet the need, the SODA API (which originally appeared as a standalone query system for any object system, and still lives on SourceForge) allows you to tune the query down to its tiniest detail, at the expense of simplicity.

This multi-faceted approach to querying databases may strike you as complex and confusing and entirely different from how an RDBMS works. In truth, this isn't quite the case: Most large-scale databases translate SQL text into a bytecode format that is analyzed and optimized and then executed against the data stored on disk, assembled back into text, and returned. The db4o native query approach puts the compilation into bytecode back into the hands of the Java (or C#) compiler, thus allowing for type safety and earlier detection of incorrect query syntax. (Type safety is sadly missing from JDBC's approach to accessing SQL, by the way, because it is a call-level interface and is thus restricted to strings that can only be checked at runtime. This is true of any CLI, not just JDBC; ODBC and .NET's ADO.NET suffer from the same limitation.) Optimization is still done inside the database, but instead of text being returned real objects are sent back, ready for use. This is in marked contrast to the SQL/Hibernate or other ORM approach, which Esther Dyson famously described as follows:

Using tables to store objects is like driving your car home and then disassembling it to put it into the garage. It can be assembled again in the morning, but one eventually asks whether this is the most efficient way to park a car.

Indeed. See you next time.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Java technology on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology, Open source
ArticleID=204245
ArticleTitle=The busy Java developer's guide to db4o: Queries, updates, and identity
publish-date=03272007