Skip to main content

Heterogeneous database replication with SyncML

JDBC and open source technologies simplify generic database replication

Jayanthi Suryanarayana (jaysurya@comcast.net), Senior Software Engineer, Retek
Jayanthi Suryanarayana has worked extensively in application design and development using server-side Java technologies. Her major areas of interest are OO design, system integration, and tool development.
Neil Tunnicliffe (neil.tunnicliffe@retek.com), Software Engineer, Retek
Neil Tunnicliffe has worked for a variety of software and technology companies over the past 10 years. Recently, he has worked for CustomVis, a leading Australian custom laser eye surgery company, where he developed several of the eye tracking systems. He is currently working for Retek, a retail software company, primarily focused on Java enterprise retail software solutions.

Summary:  Availability and performance are primary considerations when you're developing distributed applications. But using data stores to address these concerns can result in problems with data synchronization between heterogeneous data stores. In this article, Jayanthi Suryanarayana and Neil Tunnicliffe offer a solution using JDBC and the SyncML standard to achieve generic database data replication.

Date:  18 Jan 2005
Level:  Intermediate
Activity:  3626 views

When you are designing distributed applications, you must consider availability and performance. A common solution is to include a data store on the client system. Typically, the client will require a lightweight data store as a result of limited resources. This approach poses a challenge for data synchronization between heterogeneous data stores. One resolution to this problem is a Java-based approach using JDBC and SyncML standards for heterogeneous database replication, but first a little background.

Note: This article assumes you are familiar with server-side Java technologies and are comfortable installing software and setting up without explicit instruction.

Replication overview

Replication is the process of duplicating either all or a portion of a database between two environments. To maintain consistency, changes made to the source database are propagated to the replicated database.

Replication may be one-way or two-way. Two-way replication can be much more difficult because changes made to each database may result in conflicting data. When these changes are propagated between the two databases, a strategy is required to reconcile the differences in order to maintain consistency.

ID handling

As a basic requirement of replication, we need to uniquely identify individual data units to be replicated. For two-way replication, we also need a mapping scheme to identify corresponding units of data between the databases.

Various schemes can be followed based on the replication requirements. For one-way replication, the master database can dictate the ID generation for data units. For two-way replication, a mapping scheme must be defined based on the ID generation schemes of the application. The ID generation and mapping schemes can be as simple as mutually exclusive ranges of numbers for IDs used by each database, where IDs are matched exactly. A more complex example may involve an ID generation service or database-specific schemes, where a mapping of IDs must be maintained between databases.

Change detection

The next step in the replication process is to identify which data units have been changed. In the case of relational databases, you can use additional fields to record the data unit's state and a timestamp of the state change. Alternatively, you can use triggers to automate parts of the change detection process. Insert/update/delete triggers attached to a table can detect changes made to data units and record the changes in a change log table. The change log table can then be used to determine the changes made to data units at any given time.

Replication

The goal of replication is to produce multiple consistent copies of a common data set. To achieve this goal, the changes detected in each database must be exchanged and reconciled.

Here you are faced with a design challenge. You can choose:

  • A data format to represent the changes to be exchanged
  • A transport mechanism, such as HTTP, FTP, JMS, and so on, based on the application's requirements
  • An exchange protocol between the participating entities
  • A conflict detection and resolution mechanism between the data units

SyncML defines a synchronization protocol that uses XML messages for add, remove, and modification of the data units. It also allows for the exchange of security information so that the nodes can perform authentication.

Sync4J is an open source Java implementation of SyncML, which can support multiple transport mechanisms. Sync4J also gives a framework for conflict detection and resolution strategies based on the application's requirements.


An example implementation

In this article we'll walk through a simple data replication example. This example demonstrates two-way replication of customer data records between two heterogeneous databases, each containing a customer table of similar structure. The unique customer IDs are used to identify the individual records to be replicated. We have added two columns to the table to record the state and timestamp of the last state change for each record. For this example, the state and timestamp columns must be populated either by the application or manually. As stated earlier, triggers are an appropriate solution for automating the population of change data in a change log table. However, the databases used in this example currently have limited trigger capabilities, so we chose not to use them for this implementation. Also, this example does not include a conflict resolution strategy for two-way replication. Conflicts will cause the replication process to fail.

j-sync4j.zip, the download that accompanies this article contains the SQL scripts to set up the customer tables for the source and target databases. Code for the Sync4J server module and replication client is also included in the zip file.

The environment

Figure 1 shows a network diagram depicting the nodes and databases and how they are connected:


Figure 1. Network diagram
Network diagram

We'll configure the server-side environment to run the Sync4J server inside the Tomcat servlet container, and we'll use Derby as a source database.

We'll also configure the client-side environment to run a Sync4J client program using HSQLDB as the target database. (In this case, we're assuming that you've also installed J2SE SDK and Ant.)


Server-side installation

To run the example application, you'll need to install three open source components on the server side: Tomcat , the Sync4J server, and the database engine.

Install Tomcat, Sync4J, and Derby

Follow these steps to install the three main server components (see Resources for download links):

  1. Download and extract Tomcat 5.0.29. We'll refer to the Tomcat installation directory as TOMCAT.

  2. Set the environment variable J2EE_HOME to TOMCAT.

  3. Download and extract Sync4J Server 4.0.2. We'll refer to the Sync4J server installation directory as SYNC4J.

  4. Download and install Derby. We'll refer to the Derby installation directory as DERBY.

Configure the example application

Next, you need to configure the example code. Click on the Code icon at the top or bottom of this article to download j-sync4j.zip. Extract the file to a directory that contains the client and server code as well as the SQL scripts for this example. We'll refer to this directory as TESTSYNC.

Set the JDBC driver classplath

Set up the JDBC driver classpath by copying db2jcc.jar and db2jcc_license_c.jar from DERBY\lib to TOMCAT\common\lib and SYNC4J\lib, respectively.

Derby installation scripts for the Sync4J server

Next, move the directory TESTSYNC\derby to SYNC4J\default\sql.

The supplied scripts are similar to other database scripts that come with the Sync4J server download, but we have modified them slightly for Derby.

Build the example Sync4J server module

To build the example Sync4J server module, follow these steps:

  1. Set SYNCSERVER_DIR in TESTSYNC\module\build.bat to point to SYNC4J.

  2. Run TESTSYNC\module\build.bat.

Configure the Sync4J server

To configure Sync4J, follow these steps:

  1. Create a database in Derby called sync4jdb. The JDBC settings described in code at step 3 assume that a database called sync4jdb in DERBY is available with the username sync4j and the password sync4j.

  2. Use the IJ tool that comes with Derby to create and set up this database (see Resources for information about Derby tools).

  3. Set the properties in SYNC4J\install.properties, as shown below (replacing DERBY with the actual path in the JDBC settings):

    dbms=derby
    jdbc.classpath=DERBY/lib/db2jcc.jar;DERBY/lib/db2jcc_license_c.jar;
    jdbc.driver=com.ibm.db2.jcc.DB2Driver
    jdbc.url=jdbc:derby:net://localhost:1527/"DERBY/sync4jdb"
    jdbc.user=sync4j
    jdbc.password=sync4j
    modules-to-install=foundation-1.0,pdi-1.1,testsync-1.0
    

Set up the Derby database

Start Derby as a network server by executing DERBY\frameworks\NetworkServer\bin\startNetworkServer.bat.

For more information on starting Derby as a network server, refer to the Derby manuals (see Resources).

Use the IJ tool to execute TESTSYNC\create_table_server.sql to create the table customer1 in the Derby database

Install the Sync4J server and example module with Tomcat

To install the Sync4J server for Tomcat server and set up the Sync4J database, complete these steps:

  1. From SYNC4J run: SYNC4J\bin\install.cmd tomcat.
  2. When asked, press Y to rebuild the database for the testsync-1.0 module.

To start the Tomcat server, run SYNC4J\bin\sync4j-tomcat.cmd from SYNC4J.

That takes care of the server-side setup; we're ready to install the client


Client-side installation

You need to install the database engine, HSQLDB, on the client side. You should then install and configure the example code.

Set up the client application and database

To set up the client application database, follow these steps (see Resources for download links):

  1. Download and extract the HSQL database. We'll refer to the HSQL database installation directory as HSQLDB.

  2. Start HSQL as a network server with a database called test.

  3. Run TESTSYNC\create_table_client.sql to create a similar table (customer2) in the HSQL database.

  4. Copy the HSQL JDBC driver HSQLDB\lib\hsqldb.jar to TESTSYNC\client\lib.

  5. Set the JDBC properties for the client in TESTSYNC\client\config\spds\sources\testsync.properties, as shown below:

    jdbcDriver=org.hsqldb.jdbcDriver
    urlConnection=jdbc:hsqldb:hsql://localhost/test
    

  6. Set the user name and password correctly.

Then, run TESTSYNC\client\build.bat to build the client-side application.

Run the client application

Insert a record into customer1 in the Derby database. For example:

insert into customer1 (userid, password, cl_op_type, clo_op_time) 
values ('user1', 'pass1', 'N', CURRENT_TIMESTAMP);

Insert a record into customer2 in the HSQL database (make sure the userid has a different value than the record in customer1). For example:

insert into customer2 (userid, password, cl_op_type, clo_op_time) 
values ('user2', 'pass2', 'N', CURTIME());

Run TESTSYNC\client\output\testsync.bat.

The customer1 table should now contain the record inserted into customer2, and customer2 should have the record from customer1.


Conclusion

In this article, we discussed the main concepts and issues involved in designing a generic database replication solution. The availability of small footprint embedded Java databases offers a great choice when you're designing sophisticated distributed applications, while allowing for the use of heterogeneous databases. A good understanding of the replication process and associated issues helps us improve our design of such systems. We encourage you to extend the example for two-way replication with a conflict resolution strategy, and use automatic recording of record change data.



Download

DescriptionNameSizeDownload method
j-sync4j.zipj-sync4j.zip658 KB HTTP

Information about download methods


Resources

  • SyncML is a common language for synchronizing all devices and applications over any network. Learn more about this standard at the OMA SyncML Web page.

  • Get the latest versions of all the software mentioned in this article:
    • Sync4J, an open source SyncML implementation
    • Tomcat, Servlet container
    • Derby, Derby database
    • HSQLDB, HSQL database


  • "Integrating Cloudscape and Tomcat" (developerWorks, November 2002) by Lance D. Bader provides details on how to set up Cloudscape and Tomcat.

  • Find hundreds more Java technology resources on the developerWorks Java technology zone.

  • Browse for books on these and other technical topics.

About the authors

Jayanthi Suryanarayana has worked extensively in application design and development using server-side Java technologies. Her major areas of interest are OO design, system integration, and tool development.

Neil Tunnicliffe has worked for a variety of software and technology companies over the past 10 years. Recently, he has worked for CustomVis, a leading Australian custom laser eye surgery company, where he developed several of the eye tracking systems. He is currently working for Retek, a retail software company, primarily focused on Java enterprise retail software solutions.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology, XML
ArticleID=33295
ArticleTitle=Heterogeneous database replication with SyncML
publish-date=01182005
author1-email=jaysurya@comcast.net
author1-email-cc=
author2-email=neil.tunnicliffe@retek.com
author2-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers