IBM®
Skip to main content
    Country/region [select]      Terms of use
 
 
    
     Home      Products      Services & solutions      Support & downloads      My account     
 
developerworks > My developerWorks >  Dashboard > WebSphere eXtreme Scale V6.1 User Guide > ... > ObjectGrid high availability > Replication events
developerWorks
Log In   View a printable version of the current page.
Overview Connect Spaces Forums Wikis
Replication events
Added by kristip, last edited by CarrieMiller on Jun 07, 2007  (view change)
Labels: 
(None)

Getting Started Examples Reference API documentation

See the WebSphere eXtreme Scale Wiki for links to eXtreme Scale Version 7.0 documentation.
If you log in with your developerWorks ID, you can leave comments and feedback for the development team.

Shards go through different states and events to support replication. Primary and replica shards are balanced across containers. Replicas turn into primaries to handle failure events. Replicas monitor the transactional state to maintain accurate data.

Lifecycle events

When primary and replica shards are placed and started, they go through a series of events to bring themselves online and into listening mode.

Primary shard

The catalog service places a primary shard for a partition. The catalog service also does the work of balancing primary shard locations and initiating failover for primary shards.

When a shard becomes a primary shard, it receives a list of replicas from the catalog service. The new primary shard creates a replica group and registers all the replicas.

When the primary is ready, an open for business message displays in the SystemOut.log file for the container on which it is running. The open message, or the CWOBJ1511I message, lists the map name, map set name, and partition number of the primary shard that started.

CWOBJ1511I: mapName:mapSetName:partitionNumber (primary) is open for business.

See How ObjectGrid places shards on a grid for more information on how the catalog service places shards.

Replica shard

Replica shards are mainly controlled by the primary shard unless the replica shard detects a problem. During a normal lifecycle, the primary shard places, registers, and deregisters a replica shard.

When the primary shard initializes a replica shard, a message displays the log that describes where the replica runs to indicate that the replica shard is available. The open message, or the CWOBJ1511I message, lists the map name, map set name, and partition number of the replica shard. This message follows:

CWOBJ1511I: mapName:mapSetName:partitionNumber (synchronous replica) is open for business.

or

CWOBJ1511I: mapName:mapSetName:partitionNumber (asynchronous replica) is open for business.

When the replica shard first starts, it is not yet in peer mode. When a replica shard is in peer mode, it receives data from the primary as data comes into the primary. Before entering peer mode, the replica shard needs a copy of all of the existing data on the primary shard.

To get a copy of the data on the primary shard, both synchronous and asynchronous replica shards perform the same startup sequence. During the registration of the replica shard, the primary shard creates a checkpoint of the current data. The data on the primary shard is in a copy-on-write state. The current data that is going to the replica shard is never modified, but a copy-on-write is performed whenever a new transaction updates records on the primary. The primary shard is able to continue processing without changing the data that is going to the replica shard. The primary shard keeps a list of the changes that were made since the checkpoint.

The checkpoint data is pushed to the replica. After the checkpoint arrives at the replica, the memory for the checkpoint is released. The changes since the checkpoint was created are merged. The list of changes are also pushed to the replica shard. As the changes are pushed across, the memory for the change is released.

After the replica completes the checkpoint phase, it switches into peer mode and begins to receive data as the primary receives the data. At this point, the replica shard starts to behave as either a synchronous or an asynchronous replica shard.

When a replica shard reaches peer mode, it prints a message to the SystemOut.log file for the replica.

CWOBJ1526I: Replica objectGridName:mapsetName:partitionNumber:mapName entering peer mode after X seconds

The time refers to the amount of time that it took the replica shard to get all of its initial data from the primary shard, including the checkpoint data and any additional changes that are made during the checkpoint copy. The time might display as zero or very low if the primary shard does not have any existing data to replicate.

Synchronous replica shard

If the new replica is a synchronous replica shard, the primary shard now starts a request or response whenever a transaction commits on the primary. The primary shard waits until the replica shard responds that it got the data. The synchronous replica shard data remains at the same level as the primary shard data.

Asynchronous replica shard

If the new replica is an asynchronous replica shard, the primary shard sends the data to the replica, but it does not wait for a response. The asynchronous replica orders and applies the data sent from the primary. The asynchronous replica data shard is not guaranteed to remain at the same level as the primary shard data.

Peer mode for all replica shards

When you are in peer mode, after all replica shards receive a transaction change, then the memory for the transaction is released on the primary shard. The replica shards only receive data from the primary shard during transactions that change data such as inserts, updates and removes. They replica shards are not contacted for reading data on the primary shard.

Recovery events

Replication is designed to recover from failure and error events. If a primary shard fails, another replica takes over. If errors are on the replica shards, the replica shard attempts to recover. The catalog service controls the placement and transactions of new primary shards or new replica shards.

Replica shard becomes a primary shard

A replica shard becomes a primary shard for two reasons. Either the primary shard stopped or failed, or a balance decision was made to move the previous primary shard to a new location.

The catalog service selects a new primary shard from the existing synchronous replica shards. The new primary shard registers all of the existing replicas and accepts transactions as the new primary shard. If the existing replica shards have the correct level of data, the current data is preserved as the replica shards register with the new primary shard. If an asynchronous replica shard was behind, it receives a fresh copy of the data and register.




Replica shard recovery

A replica shard is controlled by the primary shard. However, if a replica shard detects a problem, it can trigger a reregister event to correct the state of the data. The replica clears the current data and gets a fresh copy from the primary.

When a replica shard initiates a reregister event, the replica prints a log message.

[CWOBJ1524I:|ObjectGrid core messages#CWOBJ1524I] Replica listener must reregister with the primary.
Bad transaction

If a transaction causes an error on a replica shard during processing, then the replica shard is in an unknown state. The transaction successfully processed on the primary shard, but something went wrong on the replica. To correct this situation, the replica initiates a reregister event. With a new copy of data from the primary, the replica shard can continue. If the same problem reoccurs, the replica shard does not continuously reregister. See Failure events for more details.

Transactions slow or possibly lost

Asynchronous replica shards keep track of the order of transactions and monitor the current rate of transactions. An asynchronous replica falls behind when the number of committed transactions is too far behind the number of pending transactions. This situation could occur if an asynchronous replica is either very slow at processing transactions, or if it is waiting on a specific transaction to arrive or complete. Two stages to detecting this problem occur. First, a warning message is issued to the log:

[CWOBJ1522I:|ObjectGrid core messages#CWOBJ1522] The transaction lag warning, X, was met. Current lag: X. Beginning check for lost transaction. Possible lost transaction is X.

After the warning level is reached, then the replica shard looks for the following situations and prints an additional message if the problem continues and reaches the threshold limit:

  • The replica shard is waiting for a specific transaction.
    [CWOBJ1520I:|ObjectGrid core messages#CWOBJ1520I] The transaction lag threshold, X, was met. Current lag: X. Transaction X may be lost.
  • The replica shard is running slowly, but transaction processing continues.
    [CWOBJ1521I:|ObjectGrid core messages#CWOBJ1521I] The transaction lag threshold, X, was met. Current lag: X.

If either of the previous thresholds are met, the asynchronous replica shard reregisters itself. This action allows the replica to receive a fresh copy of the data. If an asynchronous replica shard continuously reregisters, it might indicate that the machine is under too much load. If the same problem occurs again, the replica shard does not continuously reregister. See Failure events for more details.

If the replica catches up or a possible lost transaction is received, the warnings are reset. If the replica seems to frequently enter the warning phase, it might indicate that the machine is under too much load. The warning levels might need to be adjusted if the replica processing rate is acceptable.

The default value for the warning level is 100 and the default value for the threshold level is 200. The levels refer to the number of transactions that are arriving on the replica shard.

Failure events

A replica can stop replicating data if it encounters error situations to which it cannot recover.

Too many register attempts

If a replica triggers a reregister multiple times without successfully committing data, the replica stops. Stopping prevents a replica from entering an endless reregister loop. By default, a replica shard tries to reregister three times in a row before stopping.

If a replica shard reregisters too many times, it prints the following message to the log.

Replica objectgridName:mapSetName:partitionNumber exceeded the maximum number of times to reregister without successful transactions.

If the replica is unable to recover by reregistering, a pervasive problem might exist with the transactions that are relative to the replica shard. A possible problem could be missing resources on the classpath if an error occurs while inflating the keys or values from the transaction.

Failure while entering peer mode

If a replica attempts to enter peer mode and experiences an error processing the bulk existing data from the primary (the checkpoint data), the replica shuts down. Shutting down prevents a replica from starting with incorrect initial data. Because it recieves the same data from the primary if it reregisters, the replica does not retry.

If a replica shard fails to enter peer mode, it prints the following message to the log:

Replica objectgridName:mapSetName:partitionNumber:mapName failed to enter peer mode after 0.316 seconds.

An additional message displays in the log that explains why the replica failed to enter peer mode. Three situations exist where the replica fails to enter peer mode:

  • An error occurred while the transaction was processing.
    Message: Replica failed to enter peer mode. Reason: A transaction threw an error while copying data from the primary.
  • A timeout occurred during the copy of the initial data from the primary.
    Message: Replica failed to enter peer mode. Reason: Waiting for data copy from the primary to complete timed out. Current timeout (ms): x. Default timeout: 360000 (6 minutes).
  • The data to coordinate entering peer mode is incorrect.
    Message: Replica failed to enter peer mode. Reason: Received incorrect ordering data from the primary, data copy cannot complete.
    If this message occurs, contact IBM support for assistance.
Wiki Disclaimer and License
© Copyright IBM Corporation 2007,2009. All Rights Reserved.


 
    About IBM Privacy Contact