Technical Blog Post
WebSphere eXtreme Scale (WXS) Load Distribution
The following is written from the perspective of using fixed partition placement strategy. Fixed partition is the recommended placement strategy and is in use by almost all WXS environments.
WXS partitions consist of the primary shard and “n” number of replica shards. The replica shards have the same data as the primary shard and ensure that the data is available in case the primary shard is lost. With developmentMode set to false, shards from the same partition will not be placed on container JVMs with the same IP address. This keeps from placing shards for the same partition on containers which reside on the same physical box. This is to help avoid the scenario where the primary and all the replica shards for a partition are lost during a failure event. If all the shards for a partition are lost then the data for that partition is lost. Thus, distributing the shards across not only different container JVMs but different physical boxes is desirable.
Shards are distributed as evenly as possible across containers. However, each container in the environment may not have an identical number of shards. For example, consider an environment with three containers, 11 partitions, and a single replica. This would be 22 total shards which need to be distributed among the three containers. Obviously the shards cannot be distributed evenly since 22 is not divisible by three. Thus, two containers will have 7 shards and one container will have 8 shards.
While the number of shards are always distributed as evenly as possible, this does not mean that the number of primary shards and replica shards are always distributed evenly. If initial placement is controlled, then the primary and replica shards are balanced as evenly as possible at initial placement. If initial placement is not controlled, the balance of replica and primary shards per container JVM becomes more random, possibly even resulting in containers with all primary or all replica shards. Over the life time of the environment, the balance of replica and primary shards can become unbalanced due to container starts/stops or container failover.
When an application requests to store a key/value pair, the WXS client hashes the key which then maps to a partition. This is the partition where the key/value pair will be stored. WXS does not control the key; the key is chosen by the application interfacing with the WXS client APIs. This means WXS does not control the distribution of data. Typically the distribution of the number of entries across the partitions is not a concern. If there is a suitable number of partitions and a adequate variation in the keys, then the entries are generally distributed fairly evenly. That being said WXS cannot guarantee that the entries will be distributed evenly.
As discussed above the entries in the grid are distributed among the partitions based on the key of the key/value pair; WXS does not distribute the entries based on the value or the value's size. Thus, if the data in the environment is not somewhat uniform in size this could cause discrepancies in the sizes of the partitions. For example, if there is a handful of very large objects in a grid which has generally smaller objects, then even with a good distribution of the entries the shards in which these larger objects are stored could be significantly larger than other shards.
Load distribution across container JVMs is not something WXS can guarantee or control. The WXS client sends requests to the primary shard for a partition. The primary shard for a single partition is only located on one container JVM. This means any requests for a given key will always be routed to the same container JVM. Thus, the load on each container is very much dependent on the application's cache requests which is outside of the control of WXS. The fact that the load is based on the application requests can becomes more apparent if there is a “hot key(s)” in the environment. A “hot key” is a key which is operated on at a much higher frequency than other keys. “Hot keys” tend to cause a imbalance in the distribution of requests and CPU usage among the containers.
The shard and data distribution discussed above can also have large impacts on load distribution. WXS distributes the shards so that there is as close to a equal number of shards on each container JVM. While the number of shards are always distributed as evenly as possible this does not mean that the number of primary shards and replica shards are always distributed evenly. If the number primary shards and replica shards are not balanced between the container JVMs then the container JVMs with more primaries will typically have more load as the client transactions are always executed against the primary shard. Also if some partitions have significantly more transactions or more data than other partitions and the primary shards for those partitions end up on the same container JVM then the container JVM would see more load. Lastly, based on shard and data distribution if a container JVM ends up with significantly more data than other container JVMs, this would tend to cause more garbage collection and resource usage which would elevate CPU usage.
Possible tuning if the environment has unbalanced load:
1. Control initial placement. If initial placement is controlled then the number of primary and replica shards will be as even as possible among the container JVMs. If initial placement is not controlled then the primary and replica shards may not be balanced and there may be containers of all primary shards and some of all replica shards.
Controlling Initial Placement:
2. If the primary and replica shards are not evenly distributed among the containers then balance them using the balanceShardTypes command. balanceShardTypes does “move” shards so it is possible that in-flight transactions would fail while the action completes. It is recommended to run the command at off peak times to avoid any such issues.
3. Test increasing the numbers of partitions, shards, and containers. Increasing the number of partitions, shards and containers in the environment may help to distribute the load more evenly. When changing the number of partitions the environment must be completely stopped and restarted this cannot be done via a rolling restart. Be aware of the number of shards for all mapsets and grids. Some environments have numerous grids or mapsets with different numbers of partitions. When in environments such as this one should pay special attention to the distribution of shards and data if there is a CPU or load issue.
4. Check for “hot keys”. If there are known hot keys be sure the primary shards for these keys do not end up on the same container JVM. Reserving the shards can allow for greater placement control at initial placement.
5. Consider implementing the WXS near cache. In general the near cache will lighten the load on the container JVMs as some GETs can be handled locally by the client. The near cache can also be helpful if there is a "hot key". However, there is a possibility of stale data with the near cache which must be considered as well.
6. Consider implementing read from replica. Read from replica makes the most sense in environments which have very few updates to the data. Read from replica allows the client to route some read requests to the replica shard instead of the primary. Read from replica does not guarantee even distribution of read requests between the primary and replica shards this just allows some read requests to be routed to replicas. Read from replica only affects read operations. This does not allow updates, deletes, or inserts on replicas; all of these operations must still be executed on the primary shard. When enabling read from replica there is the possibility of stale data which must be considered as well.
Note, do not enable both the near cache and read from replica with asynchronous replicas in the same environment. This can lead to cases where stale data does not get updated and remains stale.