IBM®
Skip to main content
    Country/region [select]      Terms of use
 
 
    
     Home      Products      Services & solutions      Support & downloads      My account     
 
developerworks > My developerWorks >  Dashboard > WebSphere eXtreme Scale V6.1 User Guide > ... > ObjectGrid overview > ObjectGrid architecture
developerWorks
Log In   View a printable version of the current page.
Overview Connect Spaces Forums Wikis
ObjectGrid architecture
Added by dcberg, last edited by saif.patel@us.ibm.com on Jan 30, 2009  (view change)
Labels: 
(None)

Getting Started Examples Reference API documentation

See the WebSphere eXtreme Scale Wiki for links to eXtreme Scale Version 7.0 documentation.
If you log in with your developerWorks ID, you can leave comments and feedback for the development team.

This topic describes the system architecture, topology and concepts that make the ObjectGrid flexible, scalable, available, and efficient.

ObjectGrid provides for both local, in-memory data caching and distributed coherent client/server data caching. Local and distributed coherent ObjectGrid topologies both provide the same application programming model for interacting in the cache.

Distributed coherent ObjectGrid caches offer increased performance, availability and scalability and can be configured using both static and dynamic topologies. The dynamic deployment topology is new for ObjectGrid v6.1 and allows automatic balancing of ObjectGrid servers. Additional servers can be added to an ObjectGrid without restarting the ObjectGrid. This allows for very simple or small deployments and for very large terabyte sized deployments where thousands of servers are needed. The static deployment topology is also available with all versions of ObjectGrid and uses a declarative approach to defining the topology.

The following topics are discussed in this section:

ObjectGrid

An ObjectGrid is a logical container for application state. It can be physically mapped to a single JVM or a thousand server grid spread over multiple data centers. Each ObjectGrid has one or more maps defined. In a distributed environment, maps are organized into map sets.

Map

A Map is a container for key/value pairs. It allows an application to store a value indexed by a key. Maps support indexes that can be added to index attributes on the value or pieces of the key. These indexes are automatically used by query to determine the most efficient way to execute a query.

Schema

A map set can have a schema associated with it. A schema is the meta data that describes the relationships between each map (when using homogeneous Object types) or entity.

Map Schema

The ObjectGrid can store serializable Java objects in each of the maps using the ObjectMap API. A schema can be defined over the maps to identify the relationship between the objects in the maps when the maps hold objects of a single type. Defining a schema for maps is required to query the contents of the map objects. An ObjectGrid can have multiple map schemas defined.

Entity Schema

The ObjectGrid can also store entities using the EntityManager API. Each entity is associated with a map. The schema for an entity map set is automatically discovered using either an entity descriptor XML fle or annotated Java classes. Each entity has a set of key attributes and set of non-key attributes. An entity can also have relationships to other entities. ObjectGrid supports one to one, one to many, many to one and many to many relationships. Each entity is physically mapped to a single map in the map set. Entities allow applications to easily have complex object graphs that span multiple Maps. A distributed ObjectGrid can have only one entity schema.

Distributed ObjectGrid concepts

Distributed ObjectGrids require minimal additional infrastructure to operate. The minimum infrastructure is some scripts to install, start and stop a J2EE application on a server. ObjectGrid servers are used to store the cached data and clients remotely connect to the ObjectGrid servers.

Dynamic deployment topology

New for ObjectGrid v6.1 is support for plug-n-play type configuration. The dynamic configuration capability of the ObjectGrid makes it easy to add resources to the system. We introduce an ObjectGrid container to host the data and the catalog service as the touch point for the grid. The former is responsible for maintaining the data and the latter is responsible for forwarding requests to the right place on first touch, allocating space in host containers, and managing health and availability of the overall system.

Clients connect to a catalog service, retrieve a description of the ObjectGrid server topology and then communicate directly to each ObjectGrid server as needed. When the server topology changes due to the addition of new servers, or due to the failure of others, the client is automatically routed to the appropriate server that is hosting the data.

In the following diagram, many of the possible deployment combinations are illustrated

  • A catalog service typically exists in its own cluster of JVMs. A single catalog service can be used to manage multiple ObjectGrids.
  • A container can be started in a JVM by itself or can be loaded into an arbitrary JVM with other containers for different ObjectGrids (for example, on an application server JVM)
  • A client can exist in any JVM and talk to one or more ObjectGrids. A client can also exist in the same JVM as a container.

ObjectGrid container

The container is a service that hosts application data for the grid. This data is generally broken into parts, called partitions and hosted across multiple containers. So each container in turn hosts a subset of the complete data. A Java Virtual Machine (JVM) may host one or more containers and each container can host multiple shards.

Map set

A map set is a collection of maps with a common partitioning key. The data within the maps are replicated based on the policy defined on the map set. A map set is only used for distributed ObjectGrid topologies and is not needed for local ObjectGrids.

Partition

A Partition hosts a subset of the data in the grid. Think of this like a drawer in a file cabinet. Say you had employee records in a two drawer file cabinet with A-M in the upper drawer and N-Z in the bottom drawer. In ObjectGrid terms, this file cabinet would be a grid with two partitions: One partition hosts Employees A-M, the other N-Z. Of course, you may need a larger file cabinet to hold all of the files. So you may have a 26 drawer file cabinet one for each letter. This is fine even when you only have two containers as the ObjectGrid will automatically put multiple partitions in a single container and then spread them out as more containers become available.

Plan out your partition strategy carefully
Choose the number of partitions carefully before final deployment as the number of partitions can not be changed dynamically. This is because a hash mechanism is used to locate partitions in the network and there is no way for the ObjectGrid to rehash the entire data set once it has been deployed. It is generally better to over-estimate the number of partitions than it is to under-estimate.

Shard

A shard is an instance of a partition and has one of two roles: primary or replica. The primary shard and its replicas make up the physical manifestation of the partition. This means that they each host the full set of data for the partition redundantly.

Trading performance for availability
In order to increase the availability of the data, or increase persistence guarantees, replicating the data is necessary. However, replication adds cost to the transaction and so trades performance in return for availability. ObjectGrid allows this cost to be finely controlled as it supports both synchronous and asynchronous replication as well as hybrid replication models using both synchronous and asynchronous replication modes.
Primary Shard

A primary shard is the only partition instance that allows transactions to write to the cache.

Replica Shard

A replica shard is a "mirrored" instance of the partition. It receives updates synchronously or asynchronously from the primary shard. The replica shard only allows transactions to read from the cache. Replicas are never hosted in the same container as the primary and are not normally hosted on the same machine as the primary.

Synchronous Replica

A synchronous replica shard receives updates as part of the primary's transaction to guarantee data consistency. A synchronous replica can double the response time as the transaction has to commit on both the primary and the synchronous replica before the transaction is complete.

Asynchronous Replica

An asynchronous replica shard receives updates after the transaction commits to limit impact on performance but introduces the possibility of data loss as the asynchronous replica can be several transactions behind the primary.

Catalog Service

The Catalog responsibilities are broken up into a series of services. Locality is managed by the Location Service; allocation is done through the Placement Service; peer grouping for health monitoring is done by the Core Group Manager; and there is a service that provides access to administration.

The Catalog Service hosts logic that should be idle during steady state and as such has little influence on scalability. It is built to service hundreds of containers becoming available simultaneously. For availability, the catalog service should be configured into a cluster.

Location Service

The Location Service acts as the touch point for both clients looking for the containers that host the application they seek as well as for containers themselves looking to register hosted application with the Placement Service. The Location Service runs in all of the cluster members to scale out this function.

Placement Service

The placement service is the central nervous system for the grid. This service is responsible for allocating individual shards to their host container. It runs as a one-of-N elected service in the cluster so there is always exactly one instance of the service running. If that instance should fail, then another process gets elected and takes over. All state for the catalog service is replicated across all servers hosting the catalog service for redundancy.

Plan out the heap size for Containers
Remember that the Containers will be hosting all of your data and the heap settings should be configured accordingly.
Core Group Manager

The Core Group Manager is a fully automatic service responsible for organizing containers in to small groups of servers that are then automatically loosely federated to make an ObjectGrid. When a container first contacts the catalog service, it waits to be assigned to either a new or existing group of several JVMs. An ObjectGrid consists of many such groups and this grouping is a key scalability enabler. Each group is a group of JVMs that monitor each others availability through heart beating. One of these group members will be elected the leader and have an added responsibility to relay availability information to the catalog service to allow for reacting to failures by re-allocation and route forwarding.

Administration

As the central nervous system of the grid, the catalog service is also the logical entry point for system administration.

Static deployment topology

All versions of ObjectGrid support a static deployment topology, where the ObjectGrid server topology is defined in a cluster descriptor XML file. There is no catalog service needed as each primary, replica and partition is explicitly defined in one central file.

For details on the static deployment topology, see the ObjectGrid v6.0.x Programming Guide.

For details on configuring the static deployment policy for ObjectGrid v6.1, see Configuring for deployment into a static topology.

Choosing a dynamic or static deployment topology

The dynamic deployment topology is the most flexible, as it allows adding and removing ObjectGrid servers to better utilize resources without tearing-down the the entire cache. This is accomplished by using the catalog service which automatically manages the assignment of ObjectGrid servers to the active containers. All dynamic deployment topology clients communicate to the catalog service and ObjectGrid servers through IIOP.

Although a static deployment topology is fixed, it doesn't require a catalog service and allows fixed placement of the ObjectGrid data. The location of each primary, replica and partition is explicitly defined in the cluster deployment XML. All clients communicate to the ObjectGrid servers through TCP/IP using the ObjectGrid.

The dynamic deployment topology will continue to be enhanced to support a variety of deployment scenarios and should be strongly considered over the static deployment topology.

Wiki Disclaimer and License
© Copyright IBM Corporation 2007,2009. All Rights Reserved.


 
    About IBM Privacy Contact