In the last article in this series, I talked about the difference between querying an RDBMS and querying an object database like db4o. As I showed, db4o offers quite a few more ways to query than your typical relational database can, giving you a range of options for dealing with various application scenarios.
This time, I'll continue that theme -- the many options found in db4o -- with a look at how it handles refactoring. As of version 6.1, db4o automatically recognizes and handles three different kinds of refactoring: adding a field, removing a field, and adding a new interface to a class. I won't cover all three (I'll focus on adding a field and changing a class name), but I will introduce you to what's most exciting about refactoring with db4o -- which is its introduction of backward- and forward-compatibility to database change management.
As you'll see, db4o's ability to silently roll with updates and ensure consistency from code to disk takes a lot of the stress out of refactoring the persistence portion of your system. That same flexibility also makes db4o a good candidate for inclusion in a test-driven development process.
Last month, I talked about querying db4o using both native and QBE-style queries. In that discussion, I suggested readers running the example code should delete the existing database file containing the results from previous runs. This was to avoid "weird" results stemming from the fact that the OODBMS notion of identity isn't the same as that found in relational theory.
That workaround suited my example, but it poses an interesting question in real life. What happens to an OODBMS when the code that defines the objects it stores changes? In an RDBMS, the line between "storage" and "object" is supposedly pretty clear: The RDBMS obeys a relational schema defined by DDL statements executed at some point prior to working with the database. The Java code then either uses handwritten JDBC processing code to map the results of the query into the Java objects, or else the mapping is done "automatically" via a library like Hibernate or the new Java Persistence API (JPA). Either way, the mapping is explicit and has to be modified every time a refactoring takes place.
In theory, there is no difference between theory and practice. But that's true only in theory. Refactoring a relational database and object/relational mapping files should be simple. But in real life, RDBMS refactoring is only clear if the refactoring is purely a Java-code-level issue; in that case, simply changing the mapping is enough to complete the refactoring. If the change is to the relational storage of the data itself, however, then suddenly you're in a whole new world of complexity, so much so that an entire book has been written on the subject. (A book once described by one of my colleagues as "500 pages of database tables, triggers, and views.") Suffice it to say that because a real-world RDBMS frequently contains data that needs to be preserved, just dropping the schema and rebuilding it from DDL statements is not an option.
So now we know what happens to an RDBMS when the Java code defining its objects changes. (Or at least we know what happens to the RDBMS manager, which is a great big headache.) Now let's find out what happens in a db4o database when the code changes.
If you've read the previous two articles in this series, then my
admittedly primitive database is familiar to you. It currently consists
of one type, the Person type, whose definition
appears in Listing 1:
Listing 1. Person
package com.tedneward.model;
public class Person
{
public Person()
{ }
public Person(String firstName, String lastName, int age)
{
this.firstName = firstName;
this.lastName = lastName;
this.age = age;
}
public String getFirstName() { return firstName; }
public void setFirstName(String value) { firstName = value; }
public String getLastName() { return lastName; }
public void setLastName(String value) { lastName = value; }
public int getAge() { return age; }
public void setAge(int value) { age = value; }
public String toString()
{
return
"[Person: " +
"firstName = " + firstName + " " +
"lastName = " + lastName + " " +
"age = " + age +
"]";
}
public boolean equals(Object rhs)
{
if (rhs == this)
return true;
if (!(rhs instanceof Person))
return false;
Person other = (Person)rhs;
return (this.firstName.equals(other.firstName) &&
this.lastName.equals(other.lastName) &&
this.age == other.age);
}
private String firstName;
private String lastName;
private int age;
}
|
Next, I populate the database, as shown in Listing 2:
Listing 2. Database at 't0'
import java.io.*;
import java.lang.reflect.*;
import com.db4o.*;
import com.tedneward.model.*;
// Version 1
public class BuildV1
{
public static void main(String[] args)
throws Exception
{
new File(".", "persons.data").delete();
ObjectContainer db = null;
try
{
db = Db4o.openFile("persons.data");
Person brianG = new Person("Brian", "Goetz", 39);
Person jason = new Person("Jason", "Hunter", 35);
Person brianS = new Person("Brian", "Sletten", 38);
Person david = new Person("David", "Geary", 55);
Person glenn = new Person("Glenn", "Vanderberg", 40);
Person neal = new Person("Neal", "Ford", 39);
Person clinton = new Person("Clinton", "Begin", 19);
db.set(brianG);
db.set(jason);
db.set(brianS);
db.set(david);
db.set(glenn);
db.set(neal);
db.set(clinton);
db.commit();
// Find all the Brians
ObjectSet brians = db.get(new Person("Brian", null, 0));
while (brians.hasNext())
System.out.println(brians.next());
}
finally
{
if (db != null)
db.close();
}
}
}
|
Notice that I explicitly deleted the file "persons.data" at the beginning
of the code snip in Listing 2. Doing this ensures a clean slate getting
started. In future versions of the Build application, I'll leave the
persons.data file alone to demonstrate the refactoring process. Also
note that the Person type will change (this will
be the focus of my refactorings), so be sure to
familiarize yourself with the version being stored and/or fetched for each
example. (Look for comments in each version of Person in the source code for this article, as well as Person.java.svn files in the source tree
of the code. These will make the examples easier to follow.)
Up until now, things around the old shop have being going pretty well. The
company database is full of Persons that can be
queried, stored, and used anytime anyone wants them, and basically everyone
is happy. But Upper Management has just read The latest best-selling upper
management book, called People have feelings too!, and they have
decided the database needs to be modified to include the Person's mood.
In a traditional object/relational scenario, this implies two major
undertakings: refactoring the code (which I'll discuss below) and
refactoring the database schema to include the new data reflecting Persons' mood. Now, Scott Ambler has produced some
great resources for RDBMS refactoring (see Resources), but nothing changes the fact that
refactoring a relational database is far more complicated than refactoring
Java code, particularly when you have to preserve existing production
data.
Things are much simpler in an OODBMS, however, because the refactoring takes place entirely in the code, in this case, Java code. It's important to remember that in an OODBMS, the code is the schema. As a result, an OODBMS presents a "single source of truth," so to speak, as opposed to the O/R world where that truth (so called) is encoded in two different places: the database schema and the object model. (Which one "wins" in the event of a conflict is the subject of much debate and angst amongst Java developers.)
Refactoring the database schema
My first step is to create a new type that defines all the moods to track. This is easily done using a Java 5 enumeration type, as shown in Listing 3:
Listing 3. Howyadoin'?
package com.tedneward.model;
public enum Mood
{
HAPPY, CONTENT, BLAH, CRANKY, DEPRESSED, PSYCHOTIC, WRITING_AN_ARTICLE
}
|
Second, I need to change the Person code by
adding a field and the appropriate property methods to track mood, as shown
in Listing 4:
Listing 4. No, howYOUdoin'?
package com.tedneward.model;
// Person v2
public class Person
{
// ... as before, with appropriate modifications to public constructor and
// toString() method
public Mood getMood() { return mood; }
public void setMood(Mood value) { mood = value; }
private Mood mood;
}
|
Before I do anything else, let's see how db4o would respond to a
query that looked for all the Brians in the
database right now. In other words, how will db4o react if I run an
existing Person-based query against the database
when no Mood instances are stored (shown in Listing 5)?
Listing 5. How's everybody doing?
import com.db4o.*;
import com.tedneward.model.*;
// Version 2
public class ReadV2
{
public static void main(String[] args)
throws Exception
{
// Note the absence of the File.delete() call
ObjectContainer db = null;
try
{
db = Db4o.openFile("persons.data");
// Find all the Brians
ObjectSet brians = db.get(new Person("Brian", null, 0, null));
while (brians.hasNext())
System.out.println(brians.next());
}
finally
{
if (db != null)
db.close();
}
}
}
|
The results are somewhat startling in their passivity, as shown in Listing 6:
Listing 6. db4o takes it in stride
[Person: firstName = Brian lastName = Sletten age = 38 mood = null]
[Person: firstName = Brian lastName = Goetz age = 39 mood = null]
|
Not only did db4o not choke on the fact that the two definitions
of Person (the one on disk and the one in code)
weren't identical, it went one step further: it looked at the data on disk,
determined that the Person instances there didn't
have a mood field, and silently substituted in the default value of
null. (Which, by the way, is exactly what the Java Object
Serialization API would do in the same situation.)
The most important thing here is that db4o silently handled the mismatch between what it saw on the disk and in the type definition. This turns out to be a pretty consistent theme throughout the db4o refactoring story: Wherever possible, db4o silently deals with version mismatches. It either expands the elements on disk to include added fields, or, if the fields don't exist in the class definition it is working with in the given JVM, it ignores them.
This idea that db4o somehow adjusts as necessary to missing or extraneous fields on disk deserves exploration, so let's see what happens when I update the data on disk to include mood, as shown in Listing 7:
Listing 7. We're alright
import com.db4o.*;
import com.tedneward.model.*;
// Version 2
public class BuildV2
{
public static void main(String[] args)
throws Exception
{
ObjectContainer db = null;
try
{
db = Db4o.openFile("persons.data");
// Find all the Persons, and give them moods
ObjectSet people = db.get(Person.class);
while (people.hasNext())
{
Person person = (Person)people.next();
System.out.print("Setting " + person.getFirstName() + "'s mood ");
int moodVal = (int)(Math.random() * Mood.values().length);
person.setMood(Mood.values()[moodVal]);
System.out.println("to " + person.getMood());
db.set(person);
}
db.commit();
}
finally
{
if (db != null)
db.close();
}
}
}
|
In Listing 7, I've found all the Persons in the database and randomly assigned them
Moods. In a more real-world application, I would
likely be working with a baseline set of data rather than a randomly chosen
one, but this works for the example. Running the code produces the output shown in Listing
8:
Listing 8. How's everybody feeling today?
Setting Brian's mood to BLAH
Setting David's mood to WRITING_AN_ARTICLE
Setting Brian's mood to CONTENT
Setting Jason's mood to PSYCHOTIC
Setting Glenn's mood to BLAH
Setting Neal's mood to HAPPY
Setting Clinton's mood to DEPRESSED
|
You can verify this output by running ReadV2
again. Better yet, you could run the original query version, ReadV1 (which looks just like ReadV2 except that it
was compiled against the V1 version of Person).
When you do so, it produces the following:
Listing 9. The old version of 'How's everybody feeling today?'
[Person: firstName = Brian lastName = Sletten age = 38]
[Person: firstName = Brian lastName = Goetz age = 39]
|
What's remarkable about the output in Listing 9 is that it's no different
from what db4o spit back before I added the Mood
extension to the Person class (in Listing 6) -- which means db4o is both backward- and
forward-compatible.
Suppose you want to change the type of a field in an existing class, for
example by changing Person's age from an integer
type to a short type? (People don't generally live beyond 32,000 years,
after all -- and I think it's safe to suggest that if that does ever
become a concern, you'll be able to refactor the code back to an integer
field.) Assuming the two types are similar in nature, such as int-to-short
or float-to-double, db4o just silently rolls with the changes -- once
again more or less emulating the Java Object Serialization API. The downside
of this sort of operation is that db4o could accidentally truncate a value.
This would only happen if the value were a "narrowing conversion," where a
value exceeded the possible range value allowed in the new type, such as
trying to convert from long to int, for example. Caveat emptor -- and
be sure to unit-test thoroughly during development or prototyping.
Actually, db4o's knack for backward-compatibility deserves a bit more explanation. Basically, when db4o sees the field of the new type, it creates a new field on disk with the same name but a new type, just as if it were any other new field added to the class. This also means that the old values are still present in the field of the old type. So, once again, you can always "call back" old values by refactoring the field to the original value, which can either be viewed as a feature or a bug, depending on your point of view at the time.
Note that method changes to the class are irrelevant to db4o because it doesn't store methods or method implementations as part of the stored object data, and ditto for constructor refactorings. Only fields and the name of the class itself (discussed next) are of any importance to db4o.
In some cases, the refactoring that needs to happen is a bit more
drastic, such as changing the name of a class entirely (meaning either the
class name or the package it lives in). Something like this is a drastic
change to db4o because it stores objects in a manner that keys off of classname. When db4o is looking for instances of Person, for example, it looks in specific areas at
blocks that are tagged with the name com.tedneward.model.Person. So, changing the name
effectively puts db4o in an untenable situation: it can't magically infer
that com.tedneward.model.Person is now com.tedneward.persons.model.Individual. Fortunately,
there are a couple of ways to teach db4o how to manage the transition.
One way you can ease db4o into such a dramatic change is to write a refactoring tool of your own, using the db4o Refactoring API to open the existing data file and change the name on disk. You can do this with a pretty simple set of calls, as shown in Listing 10:
Listing 10. Refactoring from Person to Individual
import com.db4o.*;
import com.db4o.config.*;
// ...
Db4o.configure().objectClass("com.tedneward.model.Person")
.rename("com.tedneward.persons.model.Individual");
|
Notice that the code in Listing 10 uses the db4o Configuration API to get
hold of a configuration object, which in turn is used as a sort of
"meta-control" over most of db4o's options -- you will use this API rather than command-line flags or configuration files to set particular settings at
run time. (Though there's nothing stopping you from creating your
own command-line flags or configuration files to drive Configuration API
calls.) The Configuration object is then used to
obtain the ObjectClass instance for the Person class ... or, to be more precise, the ObjectClass instance representing the stored Person instances on disk. ObjectClass contains a number of other options as well,
some of which I'll show you later in the series.
In some cases, the data on disk has to remain in place to support earlier applications that cannot be recompiled for whatever reasons, technical or political. In these cases, the V2 application has to somehow accommodate pulling V1 instances in and turn them into V2 instances in memory. Fortunately, you can rely on db4o's alias feature to create a shuffle step while storing and retrieving objects to/from disk. This allows you to vary the types stored from the types used in memory.
db4o supports three different kinds of aliases, one of which is only
useful when sharing data files between the .NET and Java flavors of db4o.
The alias at work in Listing 11 is TypeAlias, which
effectively tells db4o to swap out an "A" type in memory (the runtime
name) for a "B" type on disk (the stored name). Enabling this is
a two-line operation.
Listing 11. The TypeAlias shuffle
import com.db4o.config.*;
// ...
TypeAlias fromPersonToIndividual =
new TypeAlias("com.tedneward.model.Person", "com.tedneward.persons.model.Individual");
Db4o.configure().addAlias(fromPersonToIndividual);
|
When run, db4o will now recognize any call to query
Individual objects from the database as a request to instead
look across stored Person instances; this means
that the Individual class should have fields of a
similar name and type to those stored in Person,
which db4o will map appropriately. Individual instances will then be stored under the Person name.
Every refactoring example in this article was made much simpler by the fact that the schema in an object database is the class definition itself, not a stand-alone DDL definition in a different language. Refactoring in db4o is an exercise in code, which can often be established easily through a configuration call, or at worst by writing and running a conversion utility to upgrade existing instances from the old type to the new one. This type of conversion is necessary for almost all RDBMS refactorings in production as well.
db4o's powerful refactoring capability makes it useful during development, when the rich domain objects being designed are still undergoing a lot of churn and you are refactoring on a daily, if not hourly, basis. Using db4o for unit testing and test-driven development can save you a great deal of time mucking around in the database, particularly if the refactorings are simple field addition/removal or type/name changes.
That's all for now, but remember this: If you're going to write with objects, and persistence truly is "just an implementation issue," then why would you look to flatten perfectly good objects into flat squares if you don't have to?
| Description | Name | Size | Download method |
|---|---|---|---|
| Sample code | j-db4o3-source.zip | 28KB | HTTP |
Information about download methods
Learn
- "The busy Java developer's guide to db4o: Introduction and overview" (Ted Neward, developerWorks, March 2007): Introduces db4o and explains why it has become an important alternative to today's relational databases.
- "The busy Java developer's guide to db4o: Queries, updates, and identity" (Ted Neward, developerWorks, March 2007): Explores db4o's various mechanisms for
finding and retrieving data.
-
Refactoring Databases: Evolutionary Database Design
(Scott Ambler and Pramod J. Sadalage; Addison-Wesley Signature Series, 2006): A 500-page tome on database refactoring.
-
Book review -- Refactoring Databases: Evolutionary Database Design (Eric Naiburg, developerWorks, September 2006): A positive review from the Rational Edge.
-
The db4o home page: Learn more about db4o.
-
New to IBM Information Management: Still not sold on OODBMS? Get more information about IBM's powerful family of relational database management system (RDBMS) servers.
-
ODBMS.org: An excellent collection of free material on object database technology.
- The developerWorks Java technology zone: Hundreds of articles about every aspect of Java programming.
Get products and technologies
-
Download db4o: An open source native Java programming and .NET database.
Discuss
-
developerWorks
blogs: Get involved in the developerWorks community.
Comments (Undergoing maintenance)






