See the WebSphere eXtreme Scale Wiki for links to eXtreme Scale Version 7.0 documentation.
If you log in
with your developerWorks ID, you can leave comments and feedback for the development team.
With ObjectGrid, an in-memory database or shard can be replicated from one Java virtual machine (JVM) to another. A shard represents a partition that is placed on a container. Multiple shards that represent different partitions can exist on a single container. To peform replication, a primary shard and replica shards exist for each partition. The replica shards are either synchronous (sync) or asynchronous (async). The types and placement of replica shards are determined by ObjectGrid using a deployment policy, which specifies the minimum and maximum number of synchronous and asynchronous shards.
Types of shards for replication
Three types of shards exist for replication:
- Primary
- Synchronous replica
- Asynchronous replica
The primary shard receives all insert, update and remove operations. The primary shard adds and removes replicas, replicates data to the replicas, and manages commits and rollbacks of transactions.
Synchronous replicas maintain the same state as the primary. When a primary replicates data to a synchronous replica, the transaction is not committed until it commits on the synchronous replica.
Asynchronous replicas might or might not be at the same state as the primary. When a primary replicates data to an asynchronous replica, the primary does not wait for the asynchronous replica to commit. An asynchronous replica still maintains the order of the transactions that send from the primary. Transactions are marked before they are sent. If an asynchronous replica receives transaction 2 before transaction 1, the asynchronous replica orders the transactions, and waits until transaction 1 is received and processed before completing transaction 2.
The memory cost of replication
To bring a replica online, a checkpoint map must be created to copy over existing data from the primary. The checkpoint has a small memory cost. However, the main memory cost on the primary is the memory for each entry that is modified after the checkpoint occurs. If the data changes a lot after the checkpoint, then this operation costs an amount of memory that is proportional to the number of modified entries. After the checkpoint is copied to the replica, the changes are merged back in to the checkpoint. The changed entries are not freed until they have been sent to all required replicas.
Minimum synchronous replica shards
When a primary prepares to commit data, it checks how many synchronous replica shards voted to commit the transaction. If the transaction processes normally on the replica, it votes to commit. If something went wrong on the synchronous replica, it votes not to commit. Before a primary commits, the number of synchronous replica shards that are voting to commit must meet the minSyncReplica setting from the deployment policy. When the number of synchronous replica shards that are voting to commit is too low, the primary does not commit the transaction and an error results. This action ensures that the required number of synchronous replicas are available with the correct data. Synchronous replicas that encountered errors reregister to fix their state.
The primary throws a ReplicationVotedToRollbackTransactionException error if too few synchronous replicas voted to commit.
Replication and Loaders
Loaders complicate replication. Normally, a primary shard writes changes synchronously through the Loader to a database. The Loader and database are always in sync. When the primary fails over to a replica shard, the database and Loader might not be in synch. For example:
- The primary can send the transaction to the replica and then fail before committing to the database.
- The primary can commit to the database and then fail before sending to the replica.
Either approach leads to either the replica being one transaction in front of or behind the database. This situation is not acceptable. ObjectGrid uses a special protocol and a contract with the Loader implementation to solve this issue without two phase commit. The protocol follows:
Primary side
- Send the transaction along with the previous transaction outcomes.
- Write to the database and try to commit the transaction.
- If the database commits, then commit on ObjectGrid. If the database does not commit, then roll back the transaction.
- Record the outcome.
Replica side
- Receive a transaction and buffer it.
- For all outcomes, send with the transaction, commit any buffered transactions and discard any rolled back transactions.
Replica side on failover
- For all buffered transactions, provide the transactions to the Loader and the Loader attempts to commit the transactions.
- The Loader needs to be written to make each transaction idempotent.
- If the transaction is already in the database, then the Loader just no-ops the transaction.
- If the transaction is not in the database, then the Loader applies the transaction.
- After all transactions are processed, then the new primary can begin to serve requests.
This protocol ensures that the database is at the same level as the new primary state.
© Copyright IBM Corporation 2007,2009. All Rights Reserved.