IBM®
Skip to main content
    Country/region [select]      Terms of use
 
 
    
     Home      Products      Services & solutions      Support & downloads      My account     
 
developerworks > Dashboard > WebSphere eXtreme Scale V6.1 User Guide > ... > ObjectGrid high availability > Zone Based Replication
developerWorks
Log In   View a printable version of the current page.
Overview Spaces Forums Blogs Podcasts Wikis Exchange
Zone Based Replication
Added by dcberg, last edited by joshuad on Jul 08, 2008  (view change)
Labels: 
(None)

Getting Started Examples Reference API documentation

If you log in with your developerWorks ID, you can leave comments and feedback for the development team.

ObjectGrid V6.1 fix 3 includes support for shard placement into zones. This function allows more control over how ObjectGrid places shards on a grid. JVMs that host an ObjectGrid server can be tagged with a zone identifier. The deployment file can now include one or more zone rules and these zone rules are associated with a shard type. The best way to explain this is with some examples and once that's covered then we'll explain more of the details.

A JVM can have one ObjectGrid server active on it. That server can in turn host one container per ObjectGrid deployed to that server. A container can host multiple shards from a single ObjectGrid.

This capability is useful to make sure that replicas and primaries are placed in different locations or zones for better high availability. Normally, ObjectGrid will not place a primary and replica shard on JVMs with the same IP address. This simple rule normally avoids us placing them on the same physical box. However, many customers require a more flexible mechanism. For example, a customer may be using two blade chassis and would like the primaries to be striped across both chassis and the replica for each primary be placed on the other chassis from the primary. The chassis name would be the zone name in this case. Alternatively, a customer might name zones after floors in a building and use zones to make sure that primaries and replicas for the same data are on different floors. Buildings and data centers are also possible. We have done testing across data centers using zones as a mechanism to ensure the data is adequately replicated between the data centers. Customers using the HTTP Session Manager for ObjectGrid can also use zones. This allows customers to deploy a single web application across three data centers and ensure that users HTTP session are replicated across data centers so that the sessions can be recovered even if an entire data center fails.

Associating an ObjectGrid server with a zone using non WebSphere XD

If ObjectGrid is used with J2SE or a non WebSphere XD 6.1 based application server then a JVM that is a shard container can be associated with a zone if using the following techniques.

Applications using the startOgServer script

The startOgServer script is used to start an ObjectGrid application when it is not being embedded in an existing server. The "-zone" parameter is used to specify the zone to use for all containers within the server.

Associating XD nodes with zones

Customers using ObjectGrid with XD J2EE/JavaEE applications can leverage the XD node group feature to automatically place cluster member JVMs in specific zones. A JVM can be a member of a single zone. Node groups can be named using the convention "ReplicationZoneZONENAME". Any nodes in such a group are members of the zone "ZONENAME". Care needs to be taken to ensure that these cluster members are not a member of two or more such node groups. A cluster member JVM checks for zone membership at startup only. Adding a new node group or changing the membership will only have an impact on newly started or restarted JVMs.

Zone rules

An ObjectGrid partition has one primary shard and zero or more replica shards. We will use a naming convention for these shards. P means the primary, S means a synchronous replica and A means an asynchronous replica. A zone rule has three components:

  • A rule name
  • A list of zones
  • An inclusive or exclusive flag

A zone rule specifies the possible set of zones that a shard can be placed in. The inclusive flag means that once a shard is place in a zone from the list then all other shards will also be placed in that zone. An exclusive setting means that each shard for a partition will be placed in a different zone in the zone list. Clearly, this means if there are three shards (primary, and two synchronous replicas) then the zone list must have three zones in it.

Each shard can be associated with one zone rule. A zone rule can be shared between two shards. When a rule is shared then the inclusive/exclusive flags extends across shards of all types sharing a single rule.

Examples

Here are a set of examples showing various scenarios and the deployment configuration to realize them.

Striping primaries and replicas across zones

The customer has three blade chassis's. They want primaries distributed across all three chassis's and a single synchronous replica placed in a different chassis than the primary. We will define each chassis as a zone. We will use the chassis names ALPHA, BETA, GAMMA.

<?xml version="1.0" encoding="UTF-8"?>
<deploymentPolicy xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
xsi:schemaLocation=
"http://ibm.com/ws/objectgrid/deploymentPolicy ../deploymentPolicy.xsd"
    	xmlns="http://ibm.com/ws/objectgrid/deploymentPolicy">
  <objectgridDeployment objectgridName="library">
    <mapSet name="ms1" numberOfPartitions="37" minSyncReplicas="1"
         maxSyncReplicas="1" maxAsyncReplicas="0">
      <map ref="book" />
      <zoneMetadata>
        <shardMapping shard="P" zoneRuleRef="stripeZone"/>
        <shardMapping shard="S" zoneRuleRef="stripeZone"/>
        <zoneRule name ="stripeZone" exclusivePlacement="true" >
          <zone name="ALPHA" />
          <zone name="BETA" />
          <zone name="GAMMA" />
        </zoneRule>
      </zoneMetadata>
    </mapSet>
  </objectgridDeployment>
</deploymentPolicy>

This deployment xml shows a grid called library with a single Map called book. It uses four partitions with a single synchronous replica. The zone metadata clause shows the definition of a single zone rule and the associate of zone rules with shards. The primary and synchronous shards are both associated with the zone rule "stripeZone". The zone rule has all three zones in it and it uses exclusive placement. This means that if the primary for partition 0 is placed in ALPHA then the replica for partition 0 will be placed in either BETA or GAMMA. Similarly primaries for other partitions will be placed in other zones and the replicas will be placed.

Placing a primary and sync replica in one zone but keeping an async in a different zone

Here, we have two building with a high latency connection between them. The customer wants no data loss high availability for all scenarios. However, the performance impact of synchronous replication between buildings leads them to a trade off. They want a primary with synchronous replica in one building and an asynchronous replica in the other building. Normally, the failures are JVM crashes, box failures etc, not large scale issues. This topology lets them survive normal failures with no data loss. The loss of a building is rare enough that some data loss is acceptable in that case. We will make two zones, one for each building. Here is the deployment xml:

<?xml version="1.0" encoding="UTF-8"?>

<deploymentPolicy xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://ibm.com/ws/objectgrid/deploymentPolicy ../deploymentPolicy.xsd"
    xmlns="http://ibm.com/ws/objectgrid/deploymentPolicy">

    <objectgridDeployment objectgridName="library">
        <mapSet name="ms1" numberOfPartitions="13" minSyncReplicas="1"
            maxSyncReplicas="1" maxAsyncReplicas="1">
            <map ref="book" />
            <zoneMetadata>
                <shardMapping shard="P" zoneRuleRef="primarySync"/>
                <shardMapping shard="S" zoneRuleRef="primarySync"/>
                <shardMapping shard="A" zoneRuleRef="aysnc"/>
                <zoneRule name ="primarySync" exclusivePlacement="false" >
                    <zone name="BldA" />
                    <zone name="BldB" />
                </zoneRule>
                <zoneRule name="aysnc" exclusivePlacement="true">
                    <zone name="BldA" />
                    <zone name="BldB" />
                </zoneRule>
            </zoneMetadata>
        </mapSet>
    </objectgridDeployment>
</deploymentPolicy>

The primary and synchronous replica share a zone rule ("primarySync") with an exclusive flag setting of false. So, once the primary or sync gets placed in a zone then the other is also placed there. The asynchronous replica uses a second zone rule with the same zones as the primarySync zone rule but it uses exclusivePlacement set to true. This means don't put a shard in a zone with another shard from the same partition. This means the asynchronous replica won't get placed in the same zone that the primary/synchronous replicas were placed in.

Placing all primaries in one zone and all replicas in another zone

Here, we want all primaries in one specific zone and all replicas in a different zone. We will have a primary and a single asynchronous replica. All replicas will be in zone A and primaries in B.

<?xml version="1.0" encoding="UTF-8"?>

<deploymentPolicy xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation=
      "http://ibm.com/ws/objectgrid/deploymentPolicy ../deploymentPolicy.xsd"
    xmlns="http://ibm.com/ws/objectgrid/deploymentPolicy">

    <objectgridDeployment objectgridName="library">
        <mapSet name="ms1" numberOfPartitions="13" minSyncReplicas="0"
            maxSyncReplicas="0" maxAsyncReplicas="1">
            <map ref="book" />
            <zoneMetadata>
                <shardMapping shard="P" zoneRuleRef="primaryRule"/>
                <shardMapping shard="A" zoneRuleRef="replicaRule"/>
                <zoneRule name ="primaryRule">
                    <zone name="A" />
                </zoneRule>
                <zoneRule name="replicaRule">
                    <zone name="B" />
                </zoneRule>
            </zoneMetadata>
        </mapSet>
    </objectgridDeployment>
</deploymentPolicy>

Here, you can see two rules, one for the primaries (P) and another for the replica (A).

Zones over WANs

Customers may want to deploy a single ObjectGrid over multiple buildings or data centers with slower network interconnections. Slower network connections mean lower bandwidth and higher latency connections. The possibility of network partitions also increases in this mode due to network congestion and other factors. ObjectGrid approaches this harsh environment in the following ways.

Limited heart beating between zones.

JVMs grouped together in to core groups do heart beat each other. When the catalog service organizes JVMs in to groups, those groups do not span zones. A leader within that group pushes membership information to the catalog service. The catalog service verifies any reported failures before taking action. It does this by attempting to connect to the suspect JVMs. If the catalog service sees a false failure detection then it takes no action as the core group partition will heal in a short period of time.
The catalog service will also heart beat core group leaders periodically at a slow rate to handle the case of core group isolation.

Catalog service as the grid tie breaker.

The catalog service is the tie breaker for an ObjectGrid grid. It's essential that it acts with a single voice for the grid to execute. The catalog service runs on a fixed set of JVMs replicating data from an elected primary to all the other JVMs in that set. The catalog service should be distributed amongst the physical zones or data centers to lower the probability that it gets isolated from the grid and can survive anticipated failure scenarios.

The catalog service communicates with the container JVMs in the grid using idempotent or recoverable operations. It does this communication using IIOP. All state changes in the catalog service are synchronously replicated amongst the current members hosting the catalog service. This replication only succeeds if the majority of the JVMs accept the change. This means that if the catalog service partitions then only the majority partition can commit changes. The service primary will only send commands to the container JVMs if the state change that creates that command commits. This means a minority partition cannot advance its state or issue commands to containers.

A partitioned catalog service does not stop the grid from functioning. The grid will still accept client requests and execute operations. If there is no majority catalog service partition then failures in the grid will not be recovered until the catalog service regains majority. If recovery is delayed then over time as failures occur then certain partitions will go offline until the catalog service regains majority.

The core group leader JVMs report membership changes to the catalog service. If the catalog service partitions then the service will push back an updated route table for the catalog service. Such a route table from a minority partition will not include the location for a primary. The leader will need to iterate across all the possible catalog service JVMs to try to locate the primary partition. It will need to do this periodically while waiting for the partition to be resolved. Once, it receives a route table with a primary then pending recovery actions to be directed by the majority catalog service primary.

If the core group cannot connect with a catalog service primary for a period of time then either it's physically disconnected from the rest of the grid (possibly with a minority catalog service partition) or the catalog service is stuck in minority partitions. It's impossible to tell the difference. If there is a majority catalog service partition then it may be recovering from the apparent loss of the disconnected core group. This may lead to two primaries for the same partition, the old existing primary and the new primary in the rest of the network. The majority catalog service partition has no way to 'kill' the old primaries given it's in a disconnected network state with the old primaries. When the catalog service recovers and the disconnected core group discovers the new primaries then the catalog service will notice that there are two primaries. It will instruct the previously disconnected core groups to delete all shards and then balancing will occur.

If the catalog service partitions in to two minor partitions or a single surviving minor partition then the customer will need to get involved to help with recovery. A JMX command will be required to specify that a single minority partition be allowed to take action. The customer must ensure that the other minority partitions are terminated.

Summary

Placement zones allow control of how the ObjectGrid lays out primaries and replicas to allow advanced topologies to be realized easily.

Wiki Disclaimer and License
© Copyright IBM Corporation 2007,2008. All Rights Reserved.


    About IBM Privacy Contact