IBM Extreme Transaction Processing (XTP) Patterns: Leveraging WebSphere Extreme Scale as an in-line database buffer

Learn how to optimize the performance of an application by leveraging WebSphere® eXtreme Scale as the intermediary between the database and the application. This article provides an overview of the theory and implementation of the write-behind caching solution and JPA loader concepts. It then reviews an example business case coupled with sample code to demonstrate how to deploy these features.

Share:

Lan Vuong (lvuong@us.ibm.com), Technical Evangelist, IBM

Lan Vuong photoLan graduated from Penn State University with a B.S. in computer science and joined IBM as a member of the WebSphere Extended Deployment development team until her recent role as technical evangelist for XTP.



16 December 2009 (First published 03 June 2009)

Also available in Chinese

Introduction

Applications typically use a data cache to increase performance, especially where the application predominantly uses read-only transactions. These applications directly update the database for changes in the data. The problem with this technique is that as the load increases then the response time on these updates grows. Databases are not good at executing lots of concurrent transactions with a small number of records per transaction. Databases are much better at executing batched transactions. Eventually, the database will saturate the CPU or disks and at that point the response time will rise as the load increases. Conventional in-memory caches are also limited to only storing what can fit in the free memory of a JVM. When we need to cache more than this amount of data, then thrashing occurs, where the cache continuously evicts data to make room for other data. You then need to read the required record continually, thereby making the cache useless and exposing the database to the full read load.

This paper shows how WebSphere eXtreme Scale lets you use all of the free memory of a cluster of JVMs as a cache rather than just the free memory of a single JVM. This technique lets the capacity of the cache scale linearly as you incorporate more JVMs. If these JVMs are on additional physical servers with CPUs, memory and network, then you also acheive linearly scalability, constant response time, and servicing of read requests. You can also achieve these improvements by leveraging eXtreme Scale's write-behind technology. The linear scalability of WebSphere eXtreme Scale makes it ideal for handling extreme transaction processing (XTP) scenarios. XTP is defined by Gartner as:

"an application style aimed at supporting the design, development, deployment, management and maintenance of distributed TP applications characterized by exceptionally demanding performance, scalability, availability, security, manageability and dependability requirements"

In this article, we illustrate how to optimize the performance of an application by leveraging WebSphere eXtreme Scale as the intermediary between the database and the application. WebSphere eXtreme Scale is a highly available, distributed in-memory cache with many advanced features to boost application performance. The write-behind function batches updates to the back-end database asynchronously within a user-configurable interval of time. The obvious advantage of this scenario is reduced database calls and therefore reduced transaction load and faster access to objects in the grid. This scenario also has faster response times than the write-through caching scenario, where an update to the cache results in an immediate update to the database. In the write-behind case, transactions no longer have to wait for the database write operation to finish. Additionally, it protects the application from database failure because the write-behind buffer holds changes through memory replication until it can propagate them to the database.

With this in-line database buffer, you need a loader to synchronize data between the grid and back-end database. Any user-written loader will work with write-behind, but in this article, we use the built in JPA loader that is included with WebSphere eXtreme Scale to discuss this capability. The Java Persistence API (JPA) specification allows mapping between Java Objects and relational databases. WebSphere eXtreme Scale 6.1.0.3 onwards includes a built-in JPA loader, which uses this specification to automatically map the cache data to the database relational data. You can use a JPA-compliant object relational mapper such as OpenJPA or Hibernate with this loader.

This article provides an overview of the theory and implementation of the write-behind caching solution and JPA loader concepts. We then review an example business case coupled with sample code to demonstrate how to deploy these features.

Key concepts and configuration

What is a “write-behind” cache?

In a write-behind cache, the cache services all data reads and updates , but unlike a write-through cache, updates are not immediately propagated to the data store. Instead, updates occur in the cache, the cache tracks the list of dirty updates and periodically flushes the current set of dirty records to the data store. As an additional performance improvement, the cache conflates these dirty records. Conflation means if the same record is updated or dirtied multiple times within the buffering period, then it only keeps the last update. This technique can significantly improve performance in scenarios where values change very frequently, such as stock prices in financial markets. If a stock price changes 100 times a second then normally that would have meant 30 x 100 updates to the loader every 30 seconds; however, conflation reduces that to one update.

This list of dirty updates is replicated to ensure its survival if a JVM exits. You can specify the level of replication; there are basically two choices, synchronous and asynchronous. Synchronous replication means no data loss when a JVM exits but it is slower as the primary must wait for the replicas to acknowledge they have received the change. Asynchronous replication is much faster (typically at least 6x) but changes from the very latest transactions may be lost if a JVM exits before they are replicated.

The dirty record list is written to the data source using a large batch transaction. If the data source is not available, then the grid continues processing requests and will try again later. The grid can offer constant response times as it scales for changes because changes are committed to the grid alone and transactions can commit even if the database is down. If the grid JVMs fail while these dirty records are buffered, then the grid automatically fails over and retries the flushing on the backup server. It also creates additional replicas after a failure to reduce the risk of losing this list of dirty records on a second failure. The main issue with this approach is that the database isn't always up to date. This may not be a problem because you can use this style of grid to preprocess large amounts of data at grid speed before writing to the data source for later processing/reporting.

A write-behind cache may not be suitable for all situations. The nature of write-behind means that, for a time, changes which the user sees as committed are not reflected in the database. This time delay is called cache write latency or database staleness; the delay between database changes and the cache being updated (or invalidated) to reflect them is called cache read latency or cache staleness. If all parts of the system access the data through the cache (e.g., through a common interface) then write-behind is acceptable because the cache will always have the correct latest record. It is expected that for a system using write-behind that all changes are made through the cache and through no other path.

You could use either a sparse cache or complete cache with the write-behind feature. The sparse cache only stores a subset of the data and can be populated lazily. Sparse caches are normally accessed using keys since not all of the data is available in the cache, and thus queries cannot be done using the cache. A complete cache contains all the data but could take a long time to load initially. A third method is available that is a compromise between these two options. It preloads the cache with a subset of the data in a short amount of time and then lazily loads the rest. The subset that is preloaded is roughly 20% of the total number of records but it fulfills 80% of the requests.

An application using WebSphere eXtreme Scale in this manner is typically only used for scenarios which access partitionable data models using simple CRUD (Create, Read, Update, and Delete) patterns.

Configuring the write-behind function

You enable the write-behind function in the objectgrid.xml configuration by adding the writeBehind attribute to the backingMap element as shown below. The value of the parameter uses the syntax [T(time)][;][C(count)], which specifies when the database updates occur. The updates are written to the persistent store when either the specified time in seconds has passed or the number of changes in the queue map has reached the count value.

Listing 1. An example of write-behind configuration
<objectGrid name="UserGrid">
<backingMap name="Map" pluginCollectionRef="User" lockStrategy="PESSIMISTIC" 
 writeBehind="T180;C1000"/>

What is a JPA loader?

You need loaders to read and write data from the database when using WebSphere eXtreme Scale as an in-memory cache. WebSphere eXtreme Scale 6.1.0.3 and later versions provide two built-in loaders that interact with JPA providers to map relational data to the ObjectGrid maps, the JPALoader and JPAEntityLoader. The JPALoader is used for caches that store POJO’s and the JPAEntityLoader is used for caches that store ObjectGrid entities.

JPA Loader Configuration

To configure a JPA loader, you need to change the objectgrid.xml, and add a persistence.xml file to the META-INF directory.

You also need to define a transaction callback to receive transaction commit or rollback events and send them to the JPA layer. Configure the transaction callback by adding a JPATxCallback bean to the objectGrid definition. The persistenceUnitName property defines the location of the JPA entity metadata in persistence.xml. Configure a loader by adding a JPALoader or JPAEntityLoader bean. The entityClassName property is required for the JPA loaders.

Listing 2 shows a sample objectgrid.xml:

Listing 2. Sample objectgrid.xml
<?xml version="1.0" encoding="UTF-8"?>
<objectGridConfig xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://ibm.com/ws/objectgrid/config ../objectGrid.xsd"
 xmlns="http://ibm.com/ws/objectgrid/config">
    <objectGrids>
        <objectGrid name="UserGrid" txTimeout="30">
            <bean id="TransactionCallback" 
              className="com.ibm.websphere.objectgrid.jpa.JPATxCallback">
                <property name="persistenceUnitName" type="java.lang.String" value="userPUDB2"/>
            </bean>
            <backingMap name="Map" pluginCollectionRef="User" lockStrategy="PESSIMISTIC" 
              writeBehind="T180;C1000"/>
        </objectGrid>
    </objectGrids>

    <backingMapPluginCollections>
    <backingMapPluginCollection id="User" >
        <bean id="Loader" className="com.ibm.websphere.objectgrid.jpa.JPALoader">
            <property name="entityClassName" type="java.lang.String" 
             value="com.ibm.websphere.sample.xs.inlinebuffer.model.User"/>
        </bean>
    </backingMapPluginCollection>
    </backingMapPluginCollections>
</objectGridConfig>

You configure the loader via the persistence.xml file, which should be stored in your application’s META-INF folder. This configuration file denotes a particular JPA provider for the persistence unit along with provider specific properties.

Listing 3 shows a sample persistence.xml using the OpenJPA provider:

Listing 3. Sample persistence.xml
<?xml version="1.0"?>
<persistence xmlns="http://java.sun.com/xml/ns/persistence"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    version="1.0">
    <persistence-unit name="userPUDB2">
        <provider>org.apache.openjpa.persistence.PersistenceProviderImpl</provider>
        <class>com.ibm.websphere.sample.xs.inlinebuffer.model.User</class>
        <class>com.ibm.websphere.sample.xs.inlinebuffer.model.UserAccount</class>
        <class>com.ibm.websphere.sample.xs.inlinebuffer.model.UserTransaction</class>
        <properties>
            <property name="openjpa.ConnectionProperties"
               value="DriverClassName=com.ibm.db2.jcc.DB2Driver,
              Url=jdbc:db2://localhost:50000/userdb,MaxActive=100,MaxWait=10000,
              Username=db2admin,Password=db2admin"/>
            <property name="openjpa.ConnectionDriverName" 
               value="org.apache.commons.dbcp.BasicDataSource"/>
            <property name="openjpa.jdbc.DBDictionary" value="db2"/>
            <property name="openjpa.Log" value="DefaultLevel=WARN, MetaData=INFO, 
                 Runtime=INFO, Tool=INFO, JDBC=INFO, SQL=WARN, Enhance=INFO"/>
            <property name="openjpa.ConnectionRetainMode" value="always"/>
            <property name="openjpa.jdbc.SynchronizeMappings" 
              value="buildSchema(ForeignKeys=true)"/>
        </properties>
    </persistence-unit>
</persistence>

Exploring an example business case

A fictitious online banking Web site with a growing number of users is experiencing slow response times and scalability issues with their environment. They need a way to support their clients with the existing hardware. Next we’ll walk you through this use case to see how the write-behind feature can help resolve their issue.

Use case: Portal personalization

Instead of pulling the user profile information directly from the database, the bank will preload the cache with the profiles from the database. This means the cache can service the read requests instead of the database. We have customers loading well over a 100Gb of records in to the cache for this kind of scenario. In the old system, they also wrote profile updates directly to the database. This technique limited the number of concurrent updates/second with an acceptable response time because the database machine would saturate.

The new system writes profile changes to the grid and then pushes these changes to the database using the write-behind technology. This technique lets the grid service these with the usual grid quality of service and performance, and it completely decouples the single instance database from the read and write profile operations. The bank can now scale up the profile service simply by adding more JVMs/servers to the grid, and they will see linear scaling of throughput with constant response time. The database is no longer a bottleneck because there are vastly fewer transactions sent to the back-end. The quicker response leads to faster page loads and results in a better user experience as well as cost effective scaling of the profile server, and better availability because the database is no longer a single point of failure. The grid recovers from typical failures in under a second and a failure only impacts the subset of the data on that server; the rest of the data remains available.

In this case, we are using a DB2® database with an OpenJPA provider. The data model for this scenario is a User which contains OneToMany UserAccounts and UserTransactions. Listing 4 from the User class shows this relationship:

Listing 4: Use case entity relationships
@OneToMany(mappedBy = "user", fetch = FetchType.EAGER, cascade = { CascadeType.ALL })
@ElementDependent
private Set<UserAccount> accounts = new HashSet<UserAccount>();
    
@OneToMany(mappedBy = "user", fetch = FetchType.EAGER, cascade = { CascadeType.ALL })
@ElementDependent
private Set<UserTransaction> transactions = new HashSet<UserTransaction>();

Step 1. Populating the database

The sample code includes the class PopulateDB, which loads some user data into the database. The DB2 database connection information is defined in the persistence.xml shown earlier. The persistence unit name listed in the persistence.xml is used to create the JPA EntityManagerFactory. We then create User objects and then persist them to the database in batches.

Listing 5 provides a code snippet to show the flow:

Listing 5: Database population example
javax.persistence.EntityManagerFactory emf = null;
synchronized (PopulateDB.class) {
emf = Persistence.createEntityManagerFactory(puName);
}
javax.persistence.EntityManager em = emf.createEntityManager();

for (int i = start; i < end; i++) {
  	if ((i - start) % BATCH_SIZE == 0) {
		em.getTransaction().begin();
	}

	User user = createUser(i, totalUsers);
	em.persist(user);

	if (((i - start) + 1) % BATCH_SIZE == 0) {
		em.getTransaction().commit();
		em.clear();
	}
}
if (em.getTransaction().isActive()) {
	em.getTransaction().commit();
	em.clear();
}

You would then use the following command to populate the database:

$JAVA_HOME/bin/java -Xms1024M -Xmx1024M 
-verbose:gc -javaagent:$APPLICATION_ROOT/lib/openjpa-1.2.0.jar 
-classpath $TEST_CLASSPATH com.ibm.websphere.sample.xs.inlinebuffer.dbloader.PopulateDB 
-n 1000000

Step 2. Warming the cache

After you load the database, you then preload the cache using data grid agents. You write the records to the cache in batches so there are fewer trips between the client and server. You should also use multiple clients to speed up the warm-up time. You can warm up the cache with a "hot" set of data that is a subset of the all the records, and lazily load the remaining data. Preloading the cache increases the chances of a cache hit and reduces the need to retrieve data from back-end tiers. For this example, to expedite the execution time we inserted data matching the database records into the cache rather than loading it from the database.

Listing 6 shows the batched inserts into the grid:

Listing 6: Cache preloading example
public void putAll(Map<K,V> batch, BackingMap bmap) throws Exception {
	Map<Integer, Map<K,V>> pmap = convertToPartitionEntryMap(bmap, batch);
	Iterator<Map<K,V>> items = pmap.values().iterator();
	ArrayList<Future<Boolean>> results = new ArrayList<Future<Boolean>>();
	while(items.hasNext()) {
		Map<K,V> perPartitionEntries = items.next();
		// we need one key for partition routing
		// so get the first one
		K key = perPartitionEntries.keySet().iterator().next();
			
		// invoke the agent to add the batch of records to the grid
		InsertAgent<K,V> ia = new InsertAgent<K,V>();
		ia.batch = perPartitionEntries;
		Future<Boolean> fv = threadPool.submit(new 
			InserterThread(bmap.getName(), key, ia));
		results.add(fv);
	}
	Iterator<Future<Boolean>> iter = results.iterator();
	while(iter.hasNext()) {
		Future<Boolean> fv = iter.next();
		Boolean r = fv.get();
		if(r.booleanValue() == false) {
			throw new RuntimeException("Put failed");
		}
	}
}

You would use this sample command to preload the cache:

$JAVA_HOME/bin/java -Xms1024M -Xmx1024M 
-verbose:gc -javaagent:$APPLICATION_ROOT/lib/openjpa-1.2.0.jar -
classpath $TEST_CLASSPATH com.ibm.websphere.sample.xs.inlinebuffer.ogdriver.ClientDriver 
-load -m Map -n 1000000 -g UserGrid -nt 5 -r 1000 -t 200000 -c $CATALOG_ENDPOINTS

Step 3. Generating load on the grid

The sample code includes a client driver that mimics operations on the grid to demonstrate how the write-behind caching function increases performance. The client has several options to tweak the load behavior. The following command will load 500K records to the “UserGrid” grid using 10 threads with a rate of 200 requests per thread.

$JAVA_HOME/bin/java -Xms1024M -Xmx1024M -verbose:gc 
-javaagent:$APPLICATION_ROOT/lib/openjpa-1.2.0.jar 
-classpath $TEST_CLASSPATH com.ibm.websphere.sample.xs.inlinebuffer.ogdriver.ClientDriver 
-m Map -n 500000 -g UserGrid -nt 10 -r 200 -c $CATALOG_ENDPOINTS

All of the available options are documented in the Options class.

Step 4. Reviewing the results

Using the write-behind feature can definitely improve performance. We ran the sample code to compare using write-through and write-behind in response time and database CPU utilization. We inserted data into the cache that matched the records in the database to avoid the warm up time and produce a consistent read response time so we can compare the write response times. Figures 1 and 2 show read and write response times for both cases. They show that the write-through scenario results in higher response times for updates, whereas the write-behind scenario has update times that are nearly the same as the reads. You could add more JVMs to increase the capacity of the cache without changing the response times because there’s no longer a bottleneck with the database.

Figure 1. Chart of response times for write-through cache scenario
Response times for write-through cache scenario
Figure 2. Chart of response times for write-behind cache scenario
Response times for write-behind cache scenario

The database CPU utilization charts in Figures 3 and 4 illustrate the improvement in back-end load when using write-behind. Rather than having constant load on the back-end like the write-through scenario, the write-behind case results in low CPU utilization, with load on the back-end only when you reach the buffer interval. You should tune the write-behind configuration to best match your environment with regards to the ratio of write transactions, the same record update frequency, and database update latency.

Figure 3. Chart of database CPU utilization for write-through cache scenario
Response times for write-through cache scenario
Figure 4. Chart of database CPU utilization for write-behind cache scenario
Response times for write-behind cache scenario

Conclusion

This article reviewed the write-behind caching scenario, JPA loader, and batched agent preloading, and showed how you can deploy these WebSphere eXtreme Scale functions together to provide an extreme transaction processing solution. The write-behind caching function reduces back-end load, decreases transaction response time, and isolates the application from back-end failure. These benefits and the simplicity of configuration make the write-behind caching a truly powerful feature.


Acknowledgements

Thanks to Billy Newport, Jian Tang, Thuc Nguyen, Tom Alcott, and Art Jolin for their help with this article.


Getting Started with sample code

Contents of download file

  • InLineBuffer.zip

Required Libraries

  • WebSphere eXtreme Scale trial download
  • OpenJPA
  • args4j JAR
  • DB2
  • Apache Commons DBCP JAR

Download

DescriptionNameSize
Sample code for this articleInLineBuffer.zip65KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=392671
ArticleTitle=IBM Extreme Transaction Processing (XTP) Patterns: Leveraging WebSphere Extreme Scale as an in-line database buffer
publish-date=12162009