The busy Java developer's guide to db4o: Database refactoring with db4o

Why fear change -- db4o doesn't!

Refactoring Java™ code is far simpler than refactoring a relational database, but fortunately that isn't so much the case with object databases. In this installment of The busy Java developer's guide to db4o, Ted Neward introduces you to yet another advantage of his favorite object database: db4o simplifies refactoring to the point where it's almost trivial.

Share:

Ted Neward, Principal, Neward & Associates

Ted Neward photoTed Neward is the principal of Neward & Associates, where he consults, mentors, teaches, and presents on Java, .NET, XML Services, and other platforms. He resides near Seattle, Washington.



22 May 2007

Also available in Chinese Russian

In the last article in this series, I talked about the difference between querying an RDBMS and querying an object database like db4o. As I showed, db4o offers quite a few more ways to query than your typical relational database can, giving you a range of options for dealing with various application scenarios.

This time, I'll continue that theme -- the many options found in db4o -- with a look at how it handles refactoring. As of version 6.1, db4o automatically recognizes and handles three different kinds of refactoring: adding a field, removing a field, and adding a new interface to a class. I won't cover all three (I'll focus on adding a field and changing a class name), but I will introduce you to what's most exciting about refactoring with db4o -- which is its introduction of backward- and forward-compatibility to database change management.

As you'll see, db4o's ability to silently roll with updates and ensure consistency from code to disk takes a lot of the stress out of refactoring the persistence portion of your system. That same flexibility also makes db4o a good candidate for inclusion in a test-driven development process.

About this series

Information storage and retrieval has been nearly synonymous with RDBMS for about a decade now, but recently that has begun to change. Java developers in particular are frustrated with the so-called object-relational impedance mismatch, and impatient with the solutions that attempt to resolve it. This, along with the emergence of a viable alternative, has led to a renaissance of interest in object persistence and retrieval. The busy Java developer's guide to db4o introduces db4o, an open source database that leverages today's object-oriented languages, systems, and mindset. See the db4o home page to download db4o now; you'll need it to follow the examples.

Refactoring in the real world

Last month, I talked about querying db4o using both native and QBE-style queries. In that discussion, I suggested readers running the example code should delete the existing database file containing the results from previous runs. This was to avoid "weird" results stemming from the fact that the OODBMS notion of identity isn't the same as that found in relational theory.

That workaround suited my example, but it poses an interesting question in real life. What happens to an OODBMS when the code that defines the objects it stores changes? In an RDBMS, the line between "storage" and "object" is supposedly pretty clear: The RDBMS obeys a relational schema defined by DDL statements executed at some point prior to working with the database. The Java code then either uses handwritten JDBC processing code to map the results of the query into the Java objects, or else the mapping is done "automatically" via a library like Hibernate or the new Java Persistence API (JPA). Either way, the mapping is explicit and has to be modified every time a refactoring takes place.

In theory, there is no difference between theory and practice. But that's true only in theory. Refactoring a relational database and object/relational mapping files should be simple. But in real life, RDBMS refactoring is only clear if the refactoring is purely a Java-code-level issue; in that case, simply changing the mapping is enough to complete the refactoring. If the change is to the relational storage of the data itself, however, then suddenly you're in a whole new world of complexity, so much so that an entire book has been written on the subject. (A book once described by one of my colleagues as "500 pages of database tables, triggers, and views.") Suffice it to say that because a real-world RDBMS frequently contains data that needs to be preserved, just dropping the schema and rebuilding it from DDL statements is not an option.

So now we know what happens to an RDBMS when the Java code defining its objects changes. (Or at least we know what happens to the RDBMS manager, which is a great big headache.) Now let's find out what happens in a db4o database when the code changes.


Setting up the database

If you've read the previous two articles in this series, then my admittedly primitive database is familiar to you. It currently consists of one type, the Person type, whose definition appears in Listing 1:

Listing 1. Person
package com.tedneward.model;

public class Person
{
    public Person()
    { }
    public Person(String firstName, String lastName, int age)
    {
        this.firstName = firstName;
        this.lastName = lastName;
        this.age = age;
    }
    
    public String getFirstName() { return firstName; }
    public void setFirstName(String value) { firstName = value; }
    
    public String getLastName() { return lastName; }
    public void setLastName(String value) { lastName = value; }
    
    public int getAge() { return age; }
    public void setAge(int value) { age = value; }

    public String toString()
    {
        return 
            "[Person: " +
            "firstName = " + firstName + " " +
            "lastName = " + lastName + " " +
            "age = " + age + 
            "]";
    }
    
    public boolean equals(Object rhs)
    {
        if (rhs == this)
            return true;
        
        if (!(rhs instanceof Person))
            return false;
        
        Person other = (Person)rhs;
        return (this.firstName.equals(other.firstName) &&
                this.lastName.equals(other.lastName) &&
                this.age == other.age);
    }
    
    private String firstName;
    private String lastName;
    private int age;
}

Next, I populate the database, as shown in Listing 2:

Listing 2. Database at 't0'
import java.io.*;
import java.lang.reflect.*;
import com.db4o.*;
import com.tedneward.model.*;

// Version 1
public class BuildV1
{
    public static void main(String[] args)
        throws Exception
    {
        new File(".", "persons.data").delete();
        
        ObjectContainer db = null;
        try
        {
            db = Db4o.openFile("persons.data");

            Person brianG = new Person("Brian", "Goetz", 39);
            Person jason = new Person("Jason", "Hunter", 35);
            Person brianS = new Person("Brian", "Sletten", 38);
            Person david = new Person("David", "Geary", 55);
            Person glenn = new Person("Glenn", "Vanderberg", 40);
            Person neal = new Person("Neal", "Ford", 39);
            Person clinton = new Person("Clinton", "Begin", 19);
            
            db.set(brianG);
            db.set(jason);
            db.set(brianS);
            db.set(david);
            db.set(glenn);
            db.set(neal);
            db.set(clinton);

            db.commit();
            
            // Find all the Brians
            ObjectSet brians = db.get(new Person("Brian", null, 0));
            while (brians.hasNext())
                System.out.println(brians.next());            
        }
        finally
        {
            if (db != null)
                db.close(); 
        }
    }
}

Notice that I explicitly deleted the file "persons.data" at the beginning of the code snip in Listing 2. Doing this ensures a clean slate getting started. In future versions of the Build application, I'll leave the persons.data file alone to demonstrate the refactoring process. Also note that the Person type will change (this will be the focus of my refactorings), so be sure to familiarize yourself with the version being stored and/or fetched for each example. (Look for comments in each version of Person in the source code for this article, as well as Person.java.svn files in the source tree of the code. These will make the examples easier to follow.)


Refactor me once!

Up until now, things around the old shop have being going pretty well. The company database is full of Persons that can be queried, stored, and used anytime anyone wants them, and basically everyone is happy. But Upper Management has just read The latest best-selling upper management book, called People have feelings too!, and they have decided the database needs to be modified to include the Person's mood.

In a traditional object/relational scenario, this implies two major undertakings: refactoring the code (which I'll discuss below) and refactoring the database schema to include the new data reflecting Persons' mood. Now, Scott Ambler has produced some great resources for RDBMS refactoring (see Resources), but nothing changes the fact that refactoring a relational database is far more complicated than refactoring Java code, particularly when you have to preserve existing production data.

Things are much simpler in an OODBMS, however, because the refactoring takes place entirely in the code, in this case, Java code. It's important to remember that in an OODBMS, the code is the schema. As a result, an OODBMS presents a "single source of truth," so to speak, as opposed to the O/R world where that truth (so called) is encoded in two different places: the database schema and the object model. (Which one "wins" in the event of a conflict is the subject of much debate and angst amongst Java developers.)

Refactoring the database schema

My first step is to create a new type that defines all the moods to track. This is easily done using a Java 5 enumeration type, as shown in Listing 3:

Listing 3. Howyadoin'?
package com.tedneward.model;

public enum Mood
{
    HAPPY, CONTENT, BLAH, CRANKY, DEPRESSED, PSYCHOTIC, WRITING_AN_ARTICLE
}

Second, I need to change the Person code by adding a field and the appropriate property methods to track mood, as shown in Listing 4:

Listing 4. No, howYOUdoin'?
                package com.tedneward.model;

// Person v2
public class Person
{
    // ... as before, with appropriate modifications to public constructor and
    // toString() method
    
    public Mood getMood() { return mood; }
    public void setMood(Mood value) { mood = value; }

    private Mood mood;
}

Checking in with db4o

Before I do anything else, let's see how db4o would respond to a query that looked for all the Brians in the database right now. In other words, how will db4o react if I run an existing Person-based query against the database when no Mood instances are stored (shown in Listing 5)?

Listing 5. How's everybody doing?
import com.db4o.*;
import com.tedneward.model.*;

// Version 2
public class ReadV2
{
    public static void main(String[] args)
        throws Exception
    {
        // Note the absence of the File.delete() call
        
        ObjectContainer db = null;
        try
        {
            db = Db4o.openFile("persons.data");

            // Find all the Brians
            ObjectSet brians = db.get(new Person("Brian", null, 0, null));
            while (brians.hasNext())
                System.out.println(brians.next());            
        }
        finally
        {
            if (db != null)
                db.close(); 
        }
    }
}

The results are somewhat startling in their passivity, as shown in Listing 6:

Listing 6. db4o takes it in stride
[Person: firstName = Brian lastName = Sletten age = 38 mood = null]
[Person: firstName = Brian lastName = Goetz age = 39 mood = null]

Not only did db4o not choke on the fact that the two definitions of Person (the one on disk and the one in code) weren't identical, it went one step further: it looked at the data on disk, determined that the Person instances there didn't have a mood field, and silently substituted in the default value of null. (Which, by the way, is exactly what the Java Object Serialization API would do in the same situation.)

The most important thing here is that db4o silently handled the mismatch between what it saw on the disk and in the type definition. This turns out to be a pretty consistent theme throughout the db4o refactoring story: Wherever possible, db4o silently deals with version mismatches. It either expands the elements on disk to include added fields, or, if the fields don't exist in the class definition it is working with in the given JVM, it ignores them.


Code-to-disk compatibility

This idea that db4o somehow adjusts as necessary to missing or extraneous fields on disk deserves exploration, so let's see what happens when I update the data on disk to include mood, as shown in Listing 7:

Listing 7. We're alright
import com.db4o.*;
import com.tedneward.model.*;

// Version 2
public class BuildV2
{
    public static void main(String[] args)
        throws Exception
    {
        ObjectContainer db = null;
        try
        {
            db = Db4o.openFile("persons.data");

            // Find all the Persons, and give them moods
            ObjectSet people = db.get(Person.class);
            while (people.hasNext())
            {
                Person person = (Person)people.next();
                
                System.out.print("Setting " + person.getFirstName() + "'s mood ");
                int moodVal = (int)(Math.random() * Mood.values().length);
                person.setMood(Mood.values()[moodVal]);
                System.out.println("to " + person.getMood());
                db.set(person);
            }
            
            db.commit();            
        }
        finally
        {
            if (db != null)
                db.close(); 
        }
    }
}

In Listing 7, I've found all the Persons in the database and randomly assigned them Moods. In a more real-world application, I would likely be working with a baseline set of data rather than a randomly chosen one, but this works for the example. Running the code produces the output shown in Listing 8:

Listing 8. How's everybody feeling today?
Setting Brian's mood to BLAH
Setting David's mood to WRITING_AN_ARTICLE
Setting Brian's mood to CONTENT
Setting Jason's mood to PSYCHOTIC
Setting Glenn's mood to BLAH
Setting Neal's mood to HAPPY
Setting Clinton's mood to DEPRESSED

You can verify this output by running ReadV2 again. Better yet, you could run the original query version, ReadV1 (which looks just like ReadV2 except that it was compiled against the V1 version of Person). When you do so, it produces the following:

Listing 9. The old version of 'How's everybody feeling today?'
[Person: firstName = Brian lastName = Sletten age = 38]
[Person: firstName = Brian lastName = Goetz age = 39]

What's remarkable about the output in Listing 9 is that it's no different from what db4o spit back before I added the Mood extension to the Person class (in Listing 6) -- which means db4o is both backward- and forward-compatible.


Refactor me twice!

Suppose you want to change the type of a field in an existing class, for example by changing Person's age from an integer type to a short type? (People don't generally live beyond 32,000 years, after all -- and I think it's safe to suggest that if that does ever become a concern, you'll be able to refactor the code back to an integer field.) Assuming the two types are similar in nature, such as int-to-short or float-to-double, db4o just silently rolls with the changes -- once again more or less emulating the Java Object Serialization API. The downside of this sort of operation is that db4o could accidentally truncate a value. This would only happen if the value were a "narrowing conversion," where a value exceeded the possible range value allowed in the new type, such as trying to convert from long to int, for example. Caveat emptor -- and be sure to unit-test thoroughly during development or prototyping.

db4o saves old data?

If you remove a field and then add the field back later, db4o is actually able to pick up the original values that were stored back when the field was originally present. No, db4o doesn't silently track all removed field values forever -- it removes them when the database is asked to conduct a so-called defragment operation. Stay tuned for more on defragmenting in a future article in this series.

Actually, db4o's knack for backward-compatibility deserves a bit more explanation. Basically, when db4o sees the field of the new type, it creates a new field on disk with the same name but a new type, just as if it were any other new field added to the class. This also means that the old values are still present in the field of the old type. So, once again, you can always "call back" old values by refactoring the field to the original value, which can either be viewed as a feature or a bug, depending on your point of view at the time.

Note that method changes to the class are irrelevant to db4o because it doesn't store methods or method implementations as part of the stored object data, and ditto for constructor refactorings. Only fields and the name of the class itself (discussed next) are of any importance to db4o.


Third time's ... tricky

In some cases, the refactoring that needs to happen is a bit more drastic, such as changing the name of a class entirely (meaning either the class name or the package it lives in). Something like this is a drastic change to db4o because it stores objects in a manner that keys off of classname. When db4o is looking for instances of Person, for example, it looks in specific areas at blocks that are tagged with the name com.tedneward.model.Person. So, changing the name effectively puts db4o in an untenable situation: it can't magically infer that com.tedneward.model.Person is now com.tedneward.persons.model.Individual. Fortunately, there are a couple of ways to teach db4o how to manage the transition.

Changing the names on disk

One way you can ease db4o into such a dramatic change is to write a refactoring tool of your own, using the db4o Refactoring API to open the existing data file and change the name on disk. You can do this with a pretty simple set of calls, as shown in Listing 10:

Listing 10. Refactoring from Person to Individual
import com.db4o.*;
import com.db4o.config.*;

// ...

Db4o.configure().objectClass("com.tedneward.model.Person")
    .rename("com.tedneward.persons.model.Individual");

Notice that the code in Listing 10 uses the db4o Configuration API to get hold of a configuration object, which in turn is used as a sort of "meta-control" over most of db4o's options -- you will use this API rather than command-line flags or configuration files to set particular settings at run time. (Though there's nothing stopping you from creating your own command-line flags or configuration files to drive Configuration API calls.) The Configuration object is then used to obtain the ObjectClass instance for the Person class ... or, to be more precise, the ObjectClass instance representing the stored Person instances on disk. ObjectClass contains a number of other options as well, some of which I'll show you later in the series.

Using an alias

In some cases, the data on disk has to remain in place to support earlier applications that cannot be recompiled for whatever reasons, technical or political. In these cases, the V2 application has to somehow accommodate pulling V1 instances in and turn them into V2 instances in memory. Fortunately, you can rely on db4o's alias feature to create a shuffle step while storing and retrieving objects to/from disk. This allows you to vary the types stored from the types used in memory.

db4o supports three different kinds of aliases, one of which is only useful when sharing data files between the .NET and Java flavors of db4o. The alias at work in Listing 11 is TypeAlias, which effectively tells db4o to swap out an "A" type in memory (the runtime name) for a "B" type on disk (the stored name). Enabling this is a two-line operation.

Listing 11. The TypeAlias shuffle
import com.db4o.config.*;

// ...

TypeAlias fromPersonToIndividual = 
    new TypeAlias("com.tedneward.model.Person", "com.tedneward.persons.model.Individual");
Db4o.configure().addAlias(fromPersonToIndividual);

When run, db4o will now recognize any call to query Individual objects from the database as a request to instead look across stored Person instances; this means that the Individual class should have fields of a similar name and type to those stored in Person, which db4o will map appropriately. Individual instances will then be stored under the Person name.

More ways to refactor

I haven't covered all the ways db4o supports refactoring, which means there's lots more to learn. Even if you find that db4o's refactoring options don't quite handle your situation, there is always the old fallback option, which is to create the new class in the desired location using a temporary name, write some code to create objects of the new class from the old ones, and then delete the old objects and rename the temporary class back to its correct name. If you're impatiently curious about this, see "Refactoring and meta-information" in the Advanced Type Handling section of db4o's doc\reference directory.

In conclusion

Every refactoring example in this article was made much simpler by the fact that the schema in an object database is the class definition itself, not a stand-alone DDL definition in a different language. Refactoring in db4o is an exercise in code, which can often be established easily through a configuration call, or at worst by writing and running a conversion utility to upgrade existing instances from the old type to the new one. This type of conversion is necessary for almost all RDBMS refactorings in production as well.

db4o's powerful refactoring capability makes it useful during development, when the rich domain objects being designed are still undergoing a lot of churn and you are refactoring on a daily, if not hourly, basis. Using db4o for unit testing and test-driven development can save you a great deal of time mucking around in the database, particularly if the refactorings are simple field addition/removal or type/name changes.

That's all for now, but remember this: If you're going to write with objects, and persistence truly is "just an implementation issue," then why would you look to flatten perfectly good objects into flat squares if you don't have to?


Download

DescriptionNameSize
Sample codej-db4o3-source.zip28KB

Resources

Learn

Get products and technologies

  • Download db4o: An open source native Java programming and .NET database.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Java technology on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology, Open source
ArticleID=224782
ArticleTitle=The busy Java developer's guide to db4o: Database refactoring with db4o
publish-date=05222007