When you are designing distributed applications, you must consider availability and performance. A common solution is to include a data store on the client system. Typically, the client will require a lightweight data store as a result of limited resources. This approach poses a challenge for data synchronization between heterogeneous data stores. One resolution to this problem is a Java-based approach using JDBC and SyncML standards for heterogeneous database replication, but first a little background.
Note: This article assumes you are familiar with server-side Java technologies and are comfortable installing software and setting up without explicit instruction.
Replication is the process of duplicating either all or a portion of a database between two environments. To maintain consistency, changes made to the source database are propagated to the replicated database.
Replication may be one-way or two-way. Two-way replication can be much more difficult because changes made to each database may result in conflicting data. When these changes are propagated between the two databases, a strategy is required to reconcile the differences in order to maintain consistency.
As a basic requirement of replication, we need to uniquely identify individual data units to be replicated. For two-way replication, we also need a mapping scheme to identify corresponding units of data between the databases.
Various schemes can be followed based on the replication requirements. For one-way replication, the master database can dictate the ID generation for data units. For two-way replication, a mapping scheme must be defined based on the ID generation schemes of the application. The ID generation and mapping schemes can be as simple as mutually exclusive ranges of numbers for IDs used by each database, where IDs are matched exactly. A more complex example may involve an ID generation service or database-specific schemes, where a mapping of IDs must be maintained between databases.
The next step in the replication process is to identify which data units have been changed. In the case of relational databases, you can use additional fields to record the data unit's state and a timestamp of the state change. Alternatively, you can use triggers to automate parts of the change detection process. Insert/update/delete triggers attached to a table can detect changes made to data units and record the changes in a change log table. The change log table can then be used to determine the changes made to data units at any given time.
The goal of replication is to produce multiple consistent copies of a common data set. To achieve this goal, the changes detected in each database must be exchanged and reconciled.
Here you are faced with a design challenge. You can choose:
- A data format to represent the changes to be exchanged
- A transport mechanism, such as HTTP, FTP, JMS, and so on, based on the application's requirements
- An exchange protocol between the participating entities
- A conflict detection and resolution mechanism between the data units
SyncML defines a synchronization protocol that uses XML messages for add, remove, and modification of the data units. It also allows for the exchange of security information so that the nodes can perform authentication.
Sync4J is an open source Java implementation of SyncML, which can support multiple transport mechanisms. Sync4J also gives a framework for conflict detection and resolution strategies based on the application's requirements.
In this article we'll walk through a simple data replication example. This example demonstrates two-way replication of customer data records between two heterogeneous databases, each containing a customer table of similar structure. The unique customer IDs are used to identify the individual records to be replicated. We have added two columns to the table to record the state and timestamp of the last state change for each record. For this example, the state and timestamp columns must be populated either by the application or manually. As stated earlier, triggers are an appropriate solution for automating the population of change data in a change log table. However, the databases used in this example currently have limited trigger capabilities, so we chose not to use them for this implementation. Also, this example does not include a conflict resolution strategy for two-way replication. Conflicts will cause the replication process to fail.
j-sync4j.zip, the download that accompanies this article contains the SQL scripts to set up the customer tables for the source and target databases. Code for the Sync4J server module and replication client is also included in the zip file.
Figure 1 shows a network diagram depicting the nodes and databases and how they are connected:
Figure 1. Network diagram
We'll configure the server-side environment to run the Sync4J server inside the Tomcat servlet container, and we'll use Derby as a source database.
We'll also configure the client-side environment to run a Sync4J client program using HSQLDB as the target database. (In this case, we're assuming that you've also installed J2SE SDK and Ant.)
To run the example application, you'll need to install three open source components on the server side: Tomcat , the Sync4J server, and the database engine.
Follow these steps to install the three main server components (see Resources for download links):
- Download and extract Tomcat 5.0.29. We'll refer to the Tomcat installation directory as TOMCAT.
- Set the environment variable J2EE_HOME to TOMCAT.
- Download and extract Sync4J Server 4.0.2. We'll refer to the Sync4J server installation directory as SYNC4J.
- Download and install Derby. We'll refer to the Derby installation directory as DERBY.
Next, you need to configure the example code. Click on the Code icon at the top or bottom of this article to download j-sync4j.zip. Extract the file to a directory that contains the client and server code as well as the SQL scripts for this example. We'll refer to this directory as TESTSYNC.
Set up the JDBC driver classpath by copying db2jcc.jar and db2jcc_license_c.jar from DERBY\lib to TOMCAT\common\lib and SYNC4J\lib, respectively.
Next, move the directory TESTSYNC\derby to SYNC4J\default\sql.
The supplied scripts are similar to other database scripts that come with the Sync4J server download, but we have modified them slightly for Derby.
To build the example Sync4J server module, follow these steps:
SYNCSERVER_DIRin TESTSYNC\module\build.bat to point to
- Run TESTSYNC\module\build.bat.
To configure Sync4J, follow these steps:
- Create a database in Derby called sync4jdb. The JDBC settings
described in code at step 3 assume that a database called sync4jdb in DERBY is available with
the username sync4j and the password sync4j.
- Use the IJ tool that comes with Derby to create and set up this
Resources for information about Derby
- Set the properties in
SYNC4J\install.properties, as shown below (replacing DERBY with the actual path in the JDBC settings):
dbms=derby jdbc.classpath=DERBY/lib/db2jcc.jar;DERBY/lib/db2jcc_license_c.jar; jdbc.driver=com.ibm.db2.jcc.DB2Driver jdbc.url=jdbc:derby:net://localhost:1527/"DERBY/sync4jdb" jdbc.user=sync4j jdbc.password=sync4j modules-to-install=foundation-1.0,pdi-1.1,testsync-1.0
Start Derby as a network server by executing DERBY\frameworks\NetworkServer\bin\startNetworkServer.bat.
For more information on starting Derby as a network server, refer to the Derby manuals (see Resources).
Use the IJ tool to execute
TESTSYNC\create_table_server.sql to create the table customer1 in the Derby database
To install the Sync4J server for Tomcat server and set up the Sync4J database, complete these steps:
- From SYNC4J run:
- When asked, press Y to rebuild the database for the testsync-1.0 module.
To start the Tomcat server, run
SYNC4J\bin\sync4j-tomcat.cmd from SYNC4J.
That takes care of the server-side setup; we're ready to install the client
You need to install the database engine, HSQLDB, on the client side. You should then install and configure the example code.
To set up the client application database, follow these steps (see Resources for download links):
- Download and extract the HSQL database. We'll refer to the HSQL
database installation directory as HSQLDB.
- Start HSQL as a network server with a database called test.
- Run TESTSYNC\create_table_client.sql to
create a similar table (customer2) in the HSQL database.
- Copy the HSQL JDBC driver HSQLDB\lib\hsqldb.jar to TESTSYNC\client\lib.
- Set the JDBC properties for the client in
TESTSYNC\client\config\spds\sources\testsync.properties, as shown below:
- Set the user name and password correctly.
Then, run TESTSYNC\client\build.bat to build the client-side application.
Insert a record into
customer1 in the Derby database. For
insert into customer1 (userid, password, cl_op_type, clo_op_time) values ('user1', 'pass1', 'N', CURRENT_TIMESTAMP);
Insert a record into
customer2 in the HSQL database (make sure the userid
has a different value than the record in
customer1). For example:
insert into customer2 (userid, password, cl_op_type, clo_op_time) values ('user2', 'pass2', 'N', CURTIME());
customer1 table should now contain the record inserted into
customer2 should have the record from
In this article, we discussed the main concepts and issues involved in designing a generic database replication solution. The availability of small footprint embedded Java databases offers a great choice when you're designing sophisticated distributed applications, while allowing for the use of heterogeneous databases. A good understanding of the replication process and associated issues helps us improve our design of such systems. We encourage you to extend the example for two-way replication with a conflict resolution strategy, and use automatic recording of record change data.
- SyncML is a common language for synchronizing all devices and applications over any network. Learn more about this standard at the OMA SyncML Web page.
- Get the latest versions of all the software mentioned in this article:
- Sync4J, an open source SyncML implementation
- Tomcat, Servlet container
- Derby, Derby database
- HSQLDB, HSQL database
Cloudscape and Tomcat" (developerWorks, November 2002) by Lance D. Bader provides
details on how to set up Cloudscape and Tomcat.
- Find hundreds more Java technology resources on the developerWorks Java technology zone.
Browse for books on these and other technical topics.
Jayanthi Suryanarayana has worked extensively in application design and development using server-side Java technologies. Her major areas of interest are OO design, system integration, and tool development.
Neil Tunnicliffe has worked for a variety of software and technology companies over the past 10 years. Recently, he has worked for CustomVis, a leading Australian custom laser eye surgery company, where he developed several of the eye tracking systems. He is currently working for Retek, a retail software company, primarily focused on Java enterprise retail software solutions.