IBM®
Skip to main content
    Country/region [select]      Terms of use
 
 
    
     Home      Products      Services & solutions      Support & downloads      My account     
 
developerworks > My developerWorks >  Dashboard > WebSphere eXtreme Scale V6.1 User Guide > ... > ObjectGrid high availability > Replication programming
developerWorks
Log In   View a printable version of the current page.
Overview Connect Spaces Forums Wikis
Replication programming
Added by haowang, last edited by Chris.D.Johnson on Nov 06, 2007  (view change)
Labels: 
(None)

Getting Started Examples Reference API documentation

See the WebSphere eXtreme Scale Wiki for links to eXtreme Scale Version 7.0 documentation.
If you log in with your developerWorks ID, you can leave comments and feedback for the development team.

Replication provides fault tolerance for a distributed ObjectGrid and in some cases can increase performance. The following sections describe how to configure and use replication with the ObjectGrid.

Overview

Replication is enabled by associating BackingMaps with MapSets. MapSets are then assigned a number of partitions and a replication policy.

The dynamic deployment topology greatly simplifies MapSet configuration by removing the need to identify each server by name. The MapSet replication configuration simply identifies the number of synchronous and asynchronous replica shards a MapSet should have in addition to the primary shard. For example, if there is to be 1 synchronous and 1 asynchronous replica, all of the BackingMaps assigned to the MapSet will each have a replica shard each distributed automatically within the ObjectGrid's set of available containers.

The static deployment topology also supports replication, but each server member must be explicitly defined in the cluster descriptor XML file. The MapSet is associated with a ReplicationGroup which identifies the server as a primary (read/write server), synchronous or asynchronous replica (read only or used for failover) or stand-by (can be promoted to a replica).

In both deployment topologies, the replication configuration can also enable clients to read data from synchronously replicated servers. This can spread the load for read requests over additional servers in the ObjectGrid.

Regardless of the deployment topology used, replication only has a programming model impact when preloading the BackingMaps.

For details on the various configuration options, see the following topics:

Map preloading

A map can be associated with a Loader. A loader is used to fetch objects when they cannot be found in the map (a cache miss) and also to write changes to a back-end when a transaction commits. Loaders can also be used for preloading data into a map. The preloadMap method of the Loader interface is called when the Java virtual machine (JVM) becomes a primary for the map set. The preloadMap method is not called on replicas or standbys. The preload method attempts to load all the intended referenced data from the back-end into the map using the provided session. The map to use is identified by the BackingMap argument that is passed to the preloadMap method.

void preloadMap(Session session, BackingMap backingMap) throws LoaderException;

Preloading in a partitioned MapSet

Maps can be paritioned into N partitions. Maps can therefore be striped across multiple servers, with each entry identified by a key that is stored only on one of those servers. Very large maps can be held in an ObjectGrid because the application is no longer limited by the heap size of a single JVM to hold all the entries of a Map. Applications that want to preload with the preloadMap method of the Loader interface must identify the subset of the data that it preloads. A fixed number of partitions always exists. You can determine this number by using the following code example:

int numPartitions = backingMap.getPartitionManager().getNumOfPartitions();
int myPartition = backingMap.getPartitionId();

This code example shows how an application can identify the subset of the data to preload from the database. Applications must always use these methods even when the map is not initially partitioned. These methods allow flexibility: if the map is later partitioned by the administrators, then the loader continues to work correctly.

The application must issue queries to retrieve the myPartition subset from the back-end. If a database is used, then it might be easier to have a column with the partition identifier for a given record unless there is some natural query that allows the data in the table to partition easily.

See Writing a loader with a replica preload controller for an example on how to properly implement a Loader for a replicated ObjectGrid.

Performance

The preload implementation copies data from the back-end into the map by storing multiple objects in the map in a single transaction. The number records to store per transaction depends on several factors. After the transaction includes more than blocks of 100 entries, the performance benefit diminishes. The optimal number depends on a number of factors including object complexity and size. Start with 100 entries and then increase the number until no more performance gains are seen. Larger transactions result in better replication performance. Remember, only the primary runs the preload code. The preloaded data is replicated from the primary to any replicas that are online.

Preloading MapSets

If the application uses a MapSet set with multiple maps then each map has its own loader. Each loader has a preload method. Each map is loaded serially by the ObjectGrid. It might be more efficient to preload all the maps by designating a single map as the preloading map. This process is an application convention. For example, two maps, department and employee, might use the department loader to preload both the department and the employee maps. This procedure ensures that, transactionally, if an application wants a department then the employees for that department are in the cache. When the department Loader preloads a department from the back-end, it also fetches the employees for that department. The department object and its associated employee objects are then added to the Map using a single transaction.

Recoverable preloading

Some customers have very large data sets that need caching. Preloading this data can be very time consuming. Sometimes, the preloading must complete before the application can go online. You might want to make preloading recoverable. Suppose there are million records to preload. The primary is preloading them and fails at the 800,000th record. Normally, the replica chosen to be the new primary clears any replicated state and starts from the beginning. ObjectGrid can use a ReplicaPreloadController interface. The loader for the application would also need to implement the ReplicaPreloadController interface. This example adds a single method to the Loader: Status checkPreloadStatus(Session session, BackingMap bmap);
This method is called by the ObjectGrid run time before the preload method of the Loader interface is normally called. The ObjectGrid tests the result of this method (Status) to determine its behavior whenever a replica is promoted to a primary.

Returned status value ObjectGrid behavior in reaction
Status.PRELOADED_ALREADY ObjectGrid does not call the preload method at all because this status value indicates that the map is fully preloaded.
Status.FULL_PRELOAD_NEEDED ObjectGrid clears the map and calls the preload method normally.
Status.PARTIAL_PRELOAD_NEEDED ObjectGrid leaves the map as-is and calls preload. This strategy allows the application loader to continue preloading from that point onwards.

Clearly, while a primary is preloading the map, it must leave some state in a map in the MapSet set that is being replicated so that the replica can figure out what status to return. You can use an extra map named, for example, RecoveryMap. This RecoveryMap map must be part of the same MapSet set that is being preloaded to ensure that the map is replicated consistently with the data being preloaded.
A suggested implementation follows. As the preload commits each block of records, the process also updates a counter or value in the RecoveryMap map as part of that transaction. The preloaded data and the RecoveryMap data are replicated atomically to the replicas. When the replica is promoted to primary, it can now check the RecoveryMap map to see what has happened.

The RecoveryMap map can hold a single entry with the state key. If no object exists for this key then you need a full preload (checkPreloadStatus returns FULL_PRELOAD_NEEDED). If an object exists for this state key then if the value is COMPLETE. The preload completes, and the checkPreloadStatus method returns PRELOADED_ALREADY. Otherwise, the value object indicates where the preload restarts and the checkPreloadStatus method returns PARTIAL_PRELOAD_NEEDED. The loader can store the recovery point in an instance variable for the loader so that when preload is called, the loader knows the starting point. The RecoveryMap map can also hold an entry per map if each map is preloaded independently.

Handling recovery in synchronous replication mode with a Loader

The ObjectGrid run time is designed not to lose committed data when the primary fails. The following section shows the algorithms used. These algorithms apply only when a replication group uses synchronous replication. A loader is optional.

The ObjectGrid run time can be configured to replicate all changes from a primary to the replicas synchronously. When a Java Virtual Machine (JVM) is promoted to be a replica, the primary sends a snapshot of the map to the replica. After the replica processes this snapshot, the primary starts sending all the completed transactions since the generation of the snapshot. Eventually, the replica catches up with the primary. This initial replication processing is asynchronous.

After a replica catches up with the primary, the pair enters peer mode and, finally, synchronous replication begins. From now on, each transaction committed on the primary is sent to all the replicas in peer mode and the primary waits for an acknowledge message. This process slows down the primary when compared with an asynchronous replication scenario because of the latency involved in receiving acknowledge messages. A synchronous commit sequence on the primary looks like the following set of steps:

Step with loader Step without loader
Get locks for entries same
Flush changes to the loader no-op
Save changes to the cache same
Send changes to replicas and wait for acknowledgement same
Commit to the loader through the TransactionCallback plug-in The TransactionCallBack plug-in commit is still called, but typically does not do anything.
Release locks for entries same

Notice that the changes are sent to the replica before they are committed to the loader. To determine when the changes are committed on the replica, revise this sequence:
At initialize time, initialize the tx lists on the primary. Set

CommitedTx = {}, RolledBackTx = {}

During synchronous commit processing, use the following sequence:

Step with loader Step without loader
Get locks for entries same
Flush changes to the loader no-op
Save changes to the cache same
Send changes with a committed transaction, roll back transaction to replica, and wait for acknowledgement same
Clear list of committed transactions and rolled back transactions same
Commit the loader through the TransactionCallBack plug-in TransactionCallBack plug-in commit is still called, but typically does not do anything
If commit succeeds, add the transaction to the committed transactions, otherwise add to the rolled back transactions no-op
Release locks for entries same

For replica processing, use the following sequence:

  • Receive changes
  • Commit all received transactions in the committed transaction list
  • Roll back all received transactions in the rolled back transaction list
  • Start a transaction or session
  • Apply changes to the transaction or session
  • Save the transaction or session to the pending list
  • Send back reply

Notice that on the replica, no loader interactions occur while the replica is in replica mode. The primary must push all changes through the Loader. The replica does not make any changes.
A side effect of this algorithm is that the replica always has the transactions, but they are not committed until the next primary transaction sends the commit status of those transactions. The transactions are then committed or rolled back on the replica. Until then, the transactions are not committed. You can add a timer on the primary that sends the transaction outcome after a small period of time (a few seconds). This timer limits, but does not eliminate, any staleness to that time window. This staleness is only a problem when using replica read mode. Otherwise, the staleness does not have an impact on the application.

When the primary fails, it is likely that a few transactions that were committed or rolled back on the primary, but the message never made it to the replica with these outcomes. When a replica is promoted to the new primary, one of the first actions is to handle this condition. Each pending transaction is reprocessed against the new primary's set of Maps. If there is a Loader then each transaction is given to the Loader. These transactions are applied in strict first in first out (FIFO) order. If a transactions fails, it is ignored. If three transactions are pending, A, B, and C, then A might commit, B might rollback and C might also commit. No one transaction has any impact on the others. Assume that they are independent.

A loader might want to use slightly different logic when it is in failover recovery mode versus normal mode. The loader can easily know when it is in failover recovery mode by implementing the ReplicaPreloadController interface. The checkPreloadStatus method is only called when failover recovery completes. Therefore, if the apply method of the Loader interface is called before the checkPreloadStatus method, then it is a recovery transaction. After the checkPreloadStatus method is called, the failover recovery is complete.

Stateful singletons using replication

WebSphere Extended Deployment added support for singletons in its first release with the WebSphere Partitioning Facility feature. This addition allowed applications to create singletons in a cluster. The ObjectGrid run time enables a similar feature using replicated MapSets. While the ObjectGrid singleton pattern has many advantages, it also has a few disadvantages. The partitioning facility has an event to the application when the singleton or partition is activated locally. This event is communicated using the partitionLoad method of the partition facility. A replicated MapSet also has a singleton; the primary. The application is notified when it becomes the primary by the ReplicaPreloadController#checkPreloadStatus method on the loader. This technique is similar to the partitioning facility, but has the advantage of being portable across different versions of WebSphere Application Server or competitive application servers.

The partitioning facility has a deactivate event, but the ObjectGrid run time does not offer this capability. A primary in the ObjectGrid normally runs until it fails. You cannot be explicitly moved around. This capability is an advantage of the partitioning facility over the ObjectGrid. Here is a table of capabilities:
Table 1.

Capability Partitioning facility ObjectGrid singletons
Singleton start event Yes Yes
Singleton stop event Yes No
Replication of singleton state No Yes
Variable quality of service (QoS) for replication No Yes
Flexible singleton placement Yes No
Can move singleton at runtime Yes No
IIOP routing of work to singleton Yes No
Requires a Java 2 Platform, Enterprise Environment (J2EE) server Yes No
Requires a full version of WebSphere Extended Deployment Yes No
Requires enterprise beans Yes No
Application is portable to other application servers No Yes

IIOP routing is supported in ObjectGrid Version 6.1 dynamic deployment topology

Singleton state

The partitioning facility does not have built-in support for state management.  In previous releases, pplications had to handle the singleton state. Typically, this situation meant that the state was pushed to a database. If the partition failed, then the server that was elected to host and recover the partition needed to retrieve this state from that database. If an application uses the ObjectGrid, then the singleton can keep its state in the map associated with the ReplicaPreloadController controller that is managing the singleton. If the primary or singleton fails, then the replica that is elected as the new primary already has the state locally because of the replication. Use synchronous replication unless data loss is acceptable to the application.

Flexible singleton placement

The partitioning facility uses the high availability manager policy mechanism to determine to host where a partition. These policies can be changed at run time with immediate effect. The ObjectGrid replication group policies are not as flexible as those with the high availability manager and cannot be changed without restarting all the servers. You lose the ability to move around singletons at run time if you are using the ObjectGrid.

Variable quality of service (QoS) replication

The partitioning facility does not offer state management. With ObjectGrid, you can use a variety of replication approaches:

  • No replication
  • Asynchronous replication
  • Synchronous replication

The replication policy of the MapSet set that is associated with the map that you are using for the state determines the policy. Synchronous replication means no data loss, but it is slower. Asynchronous replication is fast, but means one or more transactions that are committed on the primary can be lost if the primary fails.

Load balancing across replicas

The ObjectGrid, unless configured otherwise, sends all read and write requests to the primary server for a given replication group. The primary must service all requests from clients. You might want to allow read requests to be sent to replicas of the primary. Sending read requests to the replicas allows the load of the read requests to be shared by multiple Java Virtual Machines (JVMs). However, using replicas for read requests can result in inconsistent responses.

Load balancing across replicas is typically used only when clients are caching data that is changing all the time or when the clients are using pessimistic locking.

If the data is continually changing and then being invalidated in client near caches, the primary should see a relatively high get request rate from clients as a result. Likewise, in pessimistic locking mode, no local cache exists, so all requests are sent to the primary.

If the data is relatively static or if pessimistic mode is not used, then sending read requests to the replica does not have a big impact on performance. The frequency of get requests from clients with caches that are full of data is not high.

When a client first starts, its near cache is empty. Cache requests to the empty cache are forwarded to the primary. The client cache gets data over time, causing the request load to drop. If a large number of clients start concurrently, then the load might be significant and replica read might be an appropriate performance choice.

Reading from replicas

MapSets can be configured to allow read operations to be routed to replicas by setting the replicaReadEnabled option on the MapSet. This can improve performance by spreading read requests to more JVMs. If not enabled, all read requests such as the ObjectMap.get or the Query.getResultIterator method are routed to the primary.

Reading from replicas and asynchronous replication

If the data in the MapSet does not change often, then reading from replicas and using asynchronous replication is usually a good trade off. This strategy allows get requests from clients to be directed to the data on any replicas that are online. A get request might be sent to a replica that does not have a copy, and the key and value might not have been replicated to the replica at that point. If the data is not on the replica, the get request is redirected to the primary.

If the data changes or if queries are used, then it is very likely that get requests from the replicas return stale data. This might or might not be acceptable to the application. If it is not acceptable, do not enable reads from replicas.

Reading from replicas in synchronous replication mode

Synchronous replication tries to keep the replica exactly the same as the primary. If the primary fails, all the committed data on the primary is guaranteed to be available on all replicas that were in peer mode when the failure occurred. While this is the case when failures occur, allowing reads from replicas exposes some side effects of the algorithms used.

When the primary is about to commit a transaction, a copy of the changes is sent to the replica and the replica commits this transaction in the following two cases:

  • The primary fails
  • The next transaction on the primary is sent

When the primary fails, all pending transactions on the replica are committed.
Pending transactions are committed only when a subsequent transaction is committed on the primary. The primary piggy backs on this replica message the outcome of committing. When the replica receives one of these messages, it commits or rolls back any pending transactions that had outcomes specified in that message.

Pending transactions become visible to read on a replica only when the transaction is committed. If the primary is loaded and has regular modifications, then these pending transactions are committed very quickly. If the modification load on the primary is low, periods occur where pending transactions are not committed, until the next primary modification is made.

The replica for a primary that is taking modifications is normally at least one transaction behind the primary from a read point of view. No data is lost because these transactions are physically on the replica, but they are not committed until the outcome of those pending transactions is sent from the primary. This commit happens when the next read and write transaction runs.

Reading from replicas summary

If read from replica is enabled, then the application must be prepared to tolerate some gets returning stale data. This issue is true whether synchronous or asynchronous replication is being used. Use the following table to determining of replica reading is right for you:

Cache Usage Read on Replica Locking strategy Replica Mode Description
Read only (with read-only maps, consistency intolerant) True None Async (preferred) or sync Provides best possible performance. Data only updated through loader. Grid must be restarted to update data.
Read only (with read/write maps, consistency intolerant) True None Sync Clients must be manually coordinated with grid update windows. When clients are not connected, grid is updated. Once the update is complete, the sync replicas are up to date.
Read mostly (consistency toleration) True None or Optimistic (with custom optimistic callback) Sync Application must be able to tolerate stale data due to consistency problems because of the lack of 2-phase transactions. Data is updated infrequently.
Read/Write Mixture or Write Mostly (consistency toleration) True optimistic (with custom optimistic callback) Sync (read) and async (availability) Application must be able to tolerate stale data due to consistency problems because of the lack of 2-phase transactions. Data is updated frequently.
Write mostly (consistency intolerant) False Pessimistic Sync and/or Aysnc (Ex. 1 sync, N async) Application can't tolerate stale data. Must use partitions to scale, since replicas can't be used.
Wiki Disclaimer and License
© Copyright IBM Corporation 2007,2009. All Rights Reserved.


 
    About IBM Privacy Contact