Highly scalable grid-style computing and data processing with the ObjectGrid component of WebSphere Extended Deployment

This hands-on guide shows how the infrastructure created by ObjectGrid, a component of IBM® WebSphere® Extended Deployment, can provide an excellent platform for grid-style computing and data processing. This content is part of the IBM WebSphere Developer Technical Journal.

Jonathan Marshall (marshalj@uk.ibm.com), Senior IT Specialist, IBM, Software Group

Jonathan Marshall is a Senior IT Specialist working for IBM WebSphere Technical Sales in the UK. He has been working with WebSphere Application Server and related products for six years as a technical consultant and recently been focussing on IBM's Process Integration products.



12 December 2007

Also available in Japanese

Introduction

The ObjectGrid component of IBM WebSphere Extended Deployment is typically thought of as a distributed caching technology with qualities of service that include scalability through cache partitioning, resilience through the use of partition replicas, transactionality, and security. The infrastructure created by these qualities of service can also provide an excellent platform for grid-style computing and data processing.

Suppose you need to store a large amount of data. For scalability reasons, such as allowing for very large data sets, or simply for resilience, you might want to distribute the data across many JVMs. To do this in ObjectGrid, the data is partitioned across the JVMs using a hashing algorithm. If you are working with very large data sets, one of the most expensive parts of computation and processing is not the calculations, but rather the moving (serialization) of data so that it may be processed. It is far more efficient to perform the processing in situ -- where the data is -- and just return whatever results you need. This is known as application and data collocation. Significantly, this also enables the parallelism of computation, providing dramatic performance gains -- which is exactly what ObjectGrid enables you to do with the DataGrid APIs.

This article shows you how to set up an in-memory grid of data, and then perform computation and data updates in a distributed and parallel manner across the grid. As part of this exercise, you will also learn about the EntityManager API for storing and retrieving data, rather than the simple map APIs. (This will not be discussed in detail here, but is rather described as an introduction to the major benefits and concepts of the EntityManager API.)

This article assumes a basic understanding of ObjectGrid, and how to configure a basic, distributed environment with a simple application. See Resources for an excellent introduction to the basic concepts behind ObjectGrid.

ObjectGrid can be purchased either as part of the full WebSphere Extended Deployment product or as the component called WebSphere Extended Deployment Data Grid. It can be installed as an integral part of WebSphere Application Server or WebSphere Extended Deployment, used standalone without an application server, or run with any other Java application server.

To execute the steps described in this article, you will need:

  • WebSphere Extended Deployment Data Grid V6.1 (see Resources to obtain a trial version).
  • Any supported JVM at JDK 1.4.2 or higher; Java SDK 1.5.0 was used for this article.

The sample scenario

For the purpose of this exercise, let’s look at a sample subject that is close to all of our hearts: salaries and pay rises. For example, your task is to perform some analysis on the salaries of a company’s employees and work out the average salary. If the company has had a good year, you want to be able to process a pay rise across all of the employees. To that end, you will create a repository that stores information about people employed by a company.

You will create an ObjectGrid to store a number of Employee objects, each of which contains information about an employee. Assume that the data you need to hold in memory is too much for a single JVM, and so it will therefore be set up as a distributed grid, spread over many JVMs. This is shown in Figure 1.

Figure 1. Overview of Employee ObjectGrid
Figure 1. Overview of Employee ObjectGrid

It would be very costly in terms of network bandwidth and computation (for data serialization) to access all this data from the client for processing and then persist any changes. ObjectGrid enables you to perform computation on each partition of the grid itself in what is called an agent, which is a small piece of code that runs locally to each partition. Because an agent can work directly on the data, this removes all of the cost of serialization and data transfer, and enables you to benefit from parallelising the computation.

This exercise will show you how to create two agents to:

  1. Perform a calculation (calculate the mean average)
  2. Perform an update on all of the data (increase each salary by a percentage).

In Figure 1, the infrastructure code provided by ObjectGrid is shown in purple and the code that you are going to provide through this exercise is shown in white:

  • Data: the Employee Java™ objects, introduced below.
  • Agents: the code that actually performs the in situ computation.
  • Client code: uses the Agent Manager API to control the agent.

Finally, by way of introduction, Listing 1 shows the Employee Java object. Shown are the four fields of information you will be storing in the ObjectGrid, plus some annotations that help describe how the fields are used. For example, @Id marks the key field of the Employee object.

Listing 1. Employee object
@Entity
public class Employee {
  @Id String ssn;
  String firstName;
  String surname;
  int salary;
}

Create the grid and perform data access using the EntityManager API

In this section, you will create the distributed ObjectGrid and describe how to store and read the Employee objects from the grid with the EntityManager API.

A. Create the ObjectGrid

  1. Describe the ObjectGrid

    Start by creating the ObjectGrid you will be using. Listing 2 shows the definition file. This is as simple as it gets and defines your backing map, which is where the objects are stored. The definition file describes two things:

    • The Java objects that are going to be stored (here, the Employee object).
    • The entity metadata description file, which describes the attributes of Java objects.

    This descriptor can also describe various transactional, access, and plug-in settings.

    Listing 2. ObjectGrid definition XML file
    lt;?xml version="1.0" encoding="UTF-8"?>
    <objectGridConfig xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://ibm.com/ws/objectgrid/config ../objectGrid.xsd"
    	xmlns="http://ibm.com/ws/objectgrid/config">
    
    	<objectGrids>
    	    	<objectGrid name="EmployeeStoreObjectGrid" 
    		entityMetadataXMLFile="entity.xml">
    			<backingMap name="Employee"/>
    
    		</objectGrid>
    	</objectGrids>
    
    </objectGridConfig>
  2. Describe the entities

    You will use the EntityManager API to store and query your Employee objects in the ObjectGrid. The EntityManager API is a way to store Java objects in the ObjectGrid without having to write your own code to store the data and retrieve it into the Java objects. (This is comparable to the JPA approach to persistence.) Previously, you would have had to write code to put and get objects from a Java map, but now this is all managed for you. You simply have to describe the Java objects you want to store from your application. These descriptions can be provided through annotations in the code or in an XML descriptor file (the entity schema descriptor).

    • Using annotations

      The Employee object (Listing 1) is nice and simple and only uses two annotations:

      • @Entity: designates that the Java object is an entity
      • @Id: designates a Java variable as the primary key.

      Other annotations can be provided to describe relationships to other objects and other behaviours, but this will be sufficient for our purposes.

    • Using the entity schema descriptor

      If you are just using a local ObjectGrid (that is, the ObjectGrid is in the same process as the application), then you don’t need to provide an entity schema descriptor and can just rely on the annotations. You would programmatically register the entities that you want to use. However, in a distributed environment, you need to tell the ObjectGrid what entities it is to expect. This registers your intent to use the Employee object within the ObjectGrid.

      Listing 3 shows the descriptor, which is in its simplest form, with just the minimum required information. Three pieces of information are provided:

      • Name of the entity
      • Class name of the entity
      • Employee entity as the schema root. This is just needed to tell ObjectGrid that in a complex graph of objects, the Employee object is the parent or root object.
      Listing 3. Entity schema descriptor for the Employee object
      <?xml version="1.0" encoding="UTF-8"?>
      <entity-mappings xmlns="http://ibm.com/ws/projector/config/emd"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        	 xsi:schemaLocation="http://ibm.com/ws/projector/config/emd ./emd.xsd">
      
          <entity class-name="com.ibm.websphere.samples.datagrid.Employee" 
      name="Employee" schemaRoot="true"/>
      
      </entity-mappings>

      The descriptor can also provide the same metadata information about the entities as annotations, such as primary key field, and any mappings between objects. It can be useful to have these descriptions separate from the Java code, but this is a personal preference.

      (The schemaRoot attribute is mandatory in a distributed ObjectGrid so that the grid knows which object to hash in order to determine the partition to put it into. This is more intuitive when thinking about storing a graph of more than one related Java object. Because ObjectGrid only supports single-phase commit transactions, any update can only be performed in a single partition. This requires that all related objects be stored in the same partition. The schemaRoot is, therefore, the root object of the graph.)

  3. Describe the deployment environment

    You have described your ObjectGrid and what you want to put in it. You must now decide how you want it to be run. For example:

    • How many partitions do you need?
    • How many replicas do you need? Are they synchronous or asynchronous?

    For the purpose of this article, you will not create replicas, since there is no need for failover capability. (However, this is a straightforward configuration change, should you wish to see failover in action.)

    For this example, you want five partitions. (This is a somewhat arbitrary number that will enable us to show the calculations being performed in each partition on the grid.) Listing 4 shows the deployment policy you are going to use.

    Listing 4. ObjectGrid deployment policy file
    <?xml version="1.0" encoding="UTF-8"?>
    <deploymentPolicy xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://ibm.com/ws/objectgrid/deploymentPolicy
       ../deploymentPolicy.xsd"
    	xmlns="http://ibm.com/ws/objectgrid/deploymentPolicy">
    
    	<objectgridDeployment objectgridName="EmployeeStoreObjectGrid">
     	   	<mapSet name="EMPLOYEE_MAPSET" numberOfPartitions="5" 
    		minSyncReplicas="0" maxSyncReplicas="0" maxAsyncReplicas="0>
             		<map ref="Employee" />
         	  	</mapSet>
    	</objectgridDeployment>
    
    </deploymentPolicy>

    The deployment policy is pretty intuitive. As an overview: the map sets described are a grouping of maps with a common set of clustering and replication criteria. In this scenario, you just have one map, so it is alone in the map set.

    (In reality, the number of partitions should be given some thought. You need to consider how much data you are likely to need and how many servers you would need to store the data. The number of partitions should be at least as big as the total number of servers, but should probably be as large as 2 or 3 times that. This gives greater flexibility for distributing the partitions when failing over (failed over partitions can be well distributed) or adding new capacity (the new server can take a well distributed load).)

Figure 2. Fully configured ObjectGrid
Figure 2. Fully configured ObjectGrid

You have now finished describing our ObjectGrid as shown in Figure 2. You have:

  • Configured the ObjectGrid name.
  • Described the entities that are going to be stored in it.
  • Configured how many partitions the ObjectGrid can use.

You are now ready to start populating the grid with data and perform computations on it.

B. Populate the grid with the EntityManager API

Prior to using the ObjectGrid, you need to connect to it in the typical way using the Catalog server’s bootstrap address. Then, within your client application, you can perform a series of updates and retrievals.

  1. Add data using the EntityManager API

    Let’s take a minute to get introduced to the main concepts of the EntityManager API. As shown in Listing 5, this basically involves five steps:

    1. Get an ObjectGrid session, as you would for any map functions.
    2. From the session, get the EntityManager.
    3. Start the transaction scope within the EntityManager.
    4. Create the Java entities (in this case, a single Employee object with some data) and use the EntityManager to persist the objects.
    5. When you have finished creating and editing your Java objects, commit the transaction.
    Listing 5. Using the EntityManager API to store data
    Session session = objectGrid.getSession();
    EntityManager em = session.getEntityManager();
    
    em.getTransaction().begin();
    Employee e = new Employee();
    e.firstName = "Jonathan";
    e.surname = "Marshall";
    e.ssn = "1234";
    e.salary = 100000;
    
    em.persist(e);
    em.getTransaction().commit();

    One of the real benefits of the EntityManager API is that once an object is persisted in the ObjectGrid, any alterations made on the object within a transaction scope are automatically updated in the ObjectGrid at the end of the transaction.

    In the sample code provided with this article, you iterate through and add five Employee objects to the ObjectGrid so that you can see the objects placed in different partitions.

  2. Read data with the EntityManager API

    For reading data from the ObjectGrid, you again use the EntityManager API to get a transaction and to access the data. The simplest option to retrieve an object is by using the find method, such as:

    em.find(Employee.class, "1234");

    This performs a find based on the key field of the Java object designated by the @Id annotation. The EntityManager API also makes it possible to execute queries against the grid to retrieve a number of objects. This is somewhat similar to SQL and the querying entities in JPA.

    To query the ObjectGrid (Listing 6):

    1. Get an ObjectGrid session.
    2. From the session, get the EntityManager.
    3. Start the transaction scope within the EntityManager.
    4. Perform the query using the EntityManager.
    5. Iterate through the results (although there is just the 1 result in this case).
    6. When you have finished reading or editing your Java objects, commit the transaction.
    Listing 6. Using the EntityManager API to query data
    Session session = objectGrid.getSession();
    EntityManager em = session.getEntityManager();
    
    em.getTransaction().begin();
    
    String queryString = "select e from Employee e where e.ssn = '1234'";
    Query query = em.createQuery(queryString);
    Iterator<Employee> results = query.getResultIterator();
    
    while (results.hasNext()) {
    	Employee e = results.next();
    	System.out.println(e.firstName);
    	System.out.println(e.surname);
    	System.out.println(e.ssn);
    	System.out.println(e.salary);
    }
    
    em.getTransaction().commit();

    Until the transaction is finally committed, the objects are in "managed" state. That is, any changes to the Java objects will be picked up and stored by ObjectGrid upon the transaction commit. (In the case of a distributed ObjectGrid, as in this example, the changes will be held in the ObjectGrid client-side near cache.) Outside the transaction, the Java objects are no longer attached to the ObjectGrid and any changes will not be persisted. They are now in a "disconnected" state.

    Interestingly, the query code shown in Listing 6 would only work on a single-partition ObjectGrid. This is because the ObjectGrid only supports one-phase commit transactions (which are much faster than two-phase commit) and therefore a query can only span a single partition. To execute a query across the grid, it must be executed on each partition. The number of partitions can be obtained from the PartitionManager, and the partition you want to query can be selected using:

    query.setPartition(i)

    as shown in Listing 7.

    Listing 7. Performing a query across multiple partitions
    BackingMap bm = objectGrid.getMap("Employee");
    PartitionManager pm = bm.getPartitionManager();
    int numPartitions = pm.getNumOfPartitions();
    
    for (int i = 0; i < numPartitions; ++i) {
    	em.getTransaction().begin();
    
    	Query query = em.createQuery("select e from Employee e");
    	query.setPartition(i);
    	Iterator<Employee> results = query.getResultIterator();
    		
    	//use the results
    
    	em.getTransaction().commit();
    }

    You now have a distributed ObjectGrid and the ability to read and write data using the EntityManager API. This provides the basis for the grid work you want to do.


Perform calculations and updates on the grid

You are now in a position to use the ObjectGrid functionality to perform calculations and updates in situ on the ObjectGrid partitions. This functionality is based on creating and running agents.

An agent is a piece of application logic you provide that is to be collocated with the data in the ObjectGrid. Each agent you define will exist in every partition. Agents are executed in parallel via a call to the Agent Manager within the ObjectGrid client (Figure 1). Within an agent, you can perform any calculation or operation on grid data.

The distributed ObjectGrid needs to know about the agent code that it is going to run. Therefore, the Java classpath for each server needs to include the agent Java classes for server-side processing.

There are two types of grid agents, the significant difference between them is the information they can return to the client:

  • MapGridAgent returns a map of results from a partition. For example, a MapGridAgent could be useful for retrieving a subset of data with some calculation performed on each result.
  • ReduceGridAgent returns a single result from a partition. For example, a ReduceGridAgent could be useful for working out the lowest or highest value for a partition.

A. Performing calculations on the grid

Let's return to our scenario: You want to be able to work out the average salary across the whole employee population. The way that you can do this is to let each partition calculate:

  • Total salary of the employees in that partition.
  • Number of employees in that partition.

This information is then passed back to the client to calculate the average salary across the grid:

calculation

The calculations are performed in parallel on each partition -- where the data is. When the calculations are completed on the partition, the values are passed back to the client, where the trivial aggregation calculation can be done.

You will use the ReduceGridAgent to perform the calculation. This is the agent that “returns a result” from each partition. Let’s take a quick look at the interface, shown in Listing 8, which you need to implement for your agent.

Listing 8. ReduceGridAgent interface
public interface ReduceGridAgent {
	
	/**
	 * Provides the logic to be run on each partition.    
	 * @return Object containing the result of the processing
	 */
	public Object reduce(Session s, ObjectMap map);
	
	/**
	 * Provides the logic to be run on each partition.
	 * In this case, the entries for the partition are constrained 
	 * further by the entity keys in the collection    
	 * @return Object containing the result of the processing
	 */
	public Object reduce(Session s, ObjectMap map, Collection keys);

	/**
	 * Run once for the whole grid. 
	 * It is run after the reduce methods have been run.  
	 * @param A Collection of Java Objects, which is the results from
	 * each reduce method.  
	 */
	public Object reduceResults(Collection results);

}

You are going to provide an implementation for:

  • reduce() to calculate the average salary for a partition; this is run server-side on each partition.
  • reduceResults() to aggregate the averages returned from the partitions; this is run on the client.

You are not going to use the second of the reduce methods in this exercise, but the idea behind this is that you could operate on a subset of a partition, as specified by the key set provided to you in the method. For example, in this scenario you could calculate the average salary of a subset of the data, such as for managers.

See Figure 3 for how this looks in the distributed infrastructure.

Figure 3. Salary average grid agent overview
Figure 3. Salary average grid agent overview

Finally, in addition to implementing the ReduceGridAgent interface, your agent needs to implement the EntityAgentMixin interface. This is specifically required when storing entities, and simply requires the implementation of a getClassForEntity() method, which returns the Java class type for the entity you’re dealing with.

  1. Reduce agent code

    See Listing 9 for the reduce agent code you are going to run on each partition (Figure 3). The logic is very intuitive:

    1. Query the partition for all employees.
    2. Iterate through them, counting how many there are, and the total salary for the partition.
    Listing 9. Salary average reduce agent code
    public Object reduce(Session s, ObjectMap map) {
    	EntityManager em = s.getEntityManager();
    	Query q = em.createQuery("select e from Employee e");
    	Iterator<Employee> employees = q.getResultIterator();
    	int sum = 0;
    	int count = 0;
    	while (employees.hasNext()) {
    		Employee e = employees.next();
    		sum += e.salary;
    		count++;
    	}
    	WeightedSalary partitionTotal = new WeightedSalary();
    	partitionTotal.employeeCount = count;
    	partitionTotal.salaryTotal = sum;
    	return partitionTotal;
    }
  2. Aggregate the data from across the grid

    To bring together the answers from across the grid, you simply have to iterate through the results provided to you from the partitions. Calculate the total of the salaries across the grid and then divide that by the total number of employees in the grid (Listing 10). This code is run client side.

    Listing 10. Salary average reduce agent aggregation code
    public Object reduceResults(Collection results) {
    	System.out.println("Calculate salary total across the grid");
    	Iterator<WeightedSalary> salaries = results.iterator();
    	int sum = 0;
    	int count = 0;
    	while (salaries.hasNext()) {
    		WeightedSalary sc = (WeightedSalary) salaries.next();
    		sum += sc.salaryTotal;
    		count += sc.employeeCount;
    	}
    	return sum / count;
    }

    Your agent is now defined, and you just need to be able to kick it off from your application code (which is shown in Listing 11):

    1. You have a handle to your Employee ObjectMap within the ObjectGrid. From this, you can get an AgentManager, which provides the methods for asking the grid to execute your agent code.
    2. Create an instance of the agent you have just described.
    3. Pass the agent into the callReduceAgent method on the agent manager and use the results that you get back. In this case, an integer containing the average salary across the grid.
    Listing 11. Salary average reduce agent client code
    AgentManager amgr = map.getAgentManager();
    AverageSalaryReduceAgent agent = new AverageSalaryReduceAgent();
    
    Integer aveSalary = (Integer) amgr.callReduceAgent(agent);

    As you can see, this is very easy and you can quickly see how powerful this approach is for operating on data in the grid. This simple example shows how the different aspects of the solution described in Figure 3 fit together:

    • Client initiates the call (from main program).
    • Agent methods are run on each partition (in reduce agent code).
    • Aggregation is performed back on the client again (also in reduce agent code).

B. Performing updates on the grid

You have just used the reduction agent to provide an “answer” from the grid. The other type of agent provided by the DataGrid API is the MapGridAgent. This also enables work to be pushed out onto ObjectGrid partitions to operate on all or a subset of the data in the partition. The difference is that the MapGridAgent is designed for returning a result based on a map. This enables you to perform an operation for each object in the partition that you are interested in and return a collection of results back to the client.

In this case, you are actually going to be doing something simpler and using the MapGridAgent to perform an update on the ObjectGrid. This could have been done with either the ReduceGridAgent or the MapGridAgent, since each enables complete access to the given partition for updates. The latter is used here just for demonstration purposes.

Let's briefly introduce the MapGridAgent interface. It is slightly simpler that the ReduceGridAgent, as it only has two methods, both of which run on the server (Listing 12).

Listing 12. Increase salary map agent code
public interface MapGridAgent {
	
	/**
	 * Process a single entity in the partition we are working within 
	 * as designated by the key parameter passed to the method.  
	 * @return Object containing processed data 
	 */
	public Object process(Session s, ObjectMap map, Object key);
	
	/**
	 * Select all or some of the entities for this partition  
	 * @return Map containing results for client
	 */
	public Map processAllEntries(Session s, ObjectMap map);

}

Within this scenario of reviewing and managing employee salaries, assume that the company is issuing a company-wide raise of 10%. This is a straightforward action to process across all of the data in the ObjectGrid. Figure 4 visually shows the order of execution here:

  1. From the client, you issue a call to the agent manager.
  2. This calls the processAllEntries method on each agent.
  3. The processAllEntries method queries to obtain all Employee objects in the partition.
  4. The process method is executed for each Employee object, which updates the Employee salary within the partition.
Figure 4. MapGridAgent overview
Figure 4. MapGridAgent overview

Both methods have therefore been implemented on the MapGridAgent interface (Listing 13):

  • The processAllEntries method lets you select all of the entities in the partition, and you operate on each entity through the process method.
  • The process method simply updates the salary of the Employee object specified by the key parameter. Because the Employee object is a managed entity, updating the object automatically updates it in the grid partition.
  • You also need to pass a parameter to this agent to determine what percentage increase to apply to a given employee's salary. This parameter is passed on the object constructor to mandate that it is set.

In the example here, you don’t need a response, so an empty Hash map is used.

Listing 13. Increase salary map agent client code
public class IncreaseSalaryMapAgent implements MapGridAgent, EntityAgentMixin {
	private float increasePercentage;
	
	public IncreaseSalaryMapAgent(float increasePercentage){
		super();
		this.increasePercentage = increasePercentage;
	}

	public Map processAllEntries(Session s, ObjectMap map) {
		EntityManager em = s.getEntityManager();
		Query q = em.createQuery("select p from Employee p");
		Iterator iter = q.getResultIterator();
		while (iter.hasNext()) {
			Employee p = (Employee) iter.next();
			process(s,map,p);
		}
		return new HashMap(); //just empty hashmap
	}

	public Object process(Session s, ObjectMap map, Object key) {
		Employee p = (Employee)key;
		p.salary *= 1 + increasePercentage;
		return p;
	}

	public Class getClassForEntity() {
		return Employee.class;
	}

}

The agents are executed across the grid in exactly the same way as the ReduceGridAgent. You can see this in Listing 14. Once again, you get a handle to the AgentManager to call the map agent. This time, you pass the IncreaseSalaryMapAgent, which has been initialised with the percentage increase to execute on the salary data in the grid.

Listing 14. Sample output from demonstration application
AgentManager amgr = map.getAgentManager();
IncreaseSalaryMapAgent agent = new IncreaseSalaryMapAgent(increase);

Map m = amgr.callMapAgent(agent);

This is the simplest way of kicking off the agent. However, it can also be extended in the same way as the reduction agent by passing a list of entity keys to the callMapAgent method to process. This would bypass the processAllEntries method in the agent, and directly call the process method for the specific entity on the appropriate partition.

You have now finished designing your distributed application and the grid agents, and you are now ready to test it.


Running the demonstration application

  1. Prepare ObjectGrid

    Start up the ObjectGrid, based on the configuration that you worked through earlier. (If you don’t have a copy of ObjectGrid, see Resources to get a trial version.) Download the demo application included with this article. The important directories that you will use here are:

    • DEMO_HOME: refers to the location where you unzip the demonstration application.
    • JAVA_HOME: location of the Java SDK (this must be set as an environment variable).
    • OG_HOME: location of ObjectGrid V6.1.
  2. Start the Catalog server

    From the ObjectGrid bin directory, you can now start the Catalog server and the two ObjectGrid servers. From OG_HOME/bin run:

    startOgServer.bat/.sh catalogServer –listenerHost localhost

  3. Start the ObjectGrid servers

    The start-up scripts below assume that all ObjectGrid servers will be run on the same hardware. Therefore, they don’t need parameters to notify them of the location of the Catalog server.

    From OG_HOME/bin run (each command, all on one line):

    startOgServer.bat/.sh server1
    	-objectgridFile <DEMO_HOME>/META-INF/ObjectGrid_Definition.xml
    	-deploymentPolicyFile <DEMO_HOME>/META-INF/ObjectGrid_Deployment.xml 
    	-jvmArgs -cp <DEMO_HOME>
    
    startOgServer.bat/.sh server2 
    	-objectgridFile <DEMO_HOME>/META-INF/ObjectGrid_Definition.xml 
    	-deploymentPolicyFile <DEMO_HOME>/META-INF/ObjectGrid_Deployment.xml 
    	-jvmArgs -cp <DEMO_HOME>

    You could run this demo with just one ObjectGrid server, but it helps to see the distributed nature of what is happening with two or more ObjectGrid servers. Feel free to start more with different names.

    Wait for each ObjectGrid server to start, confirmed by the message:

    CWOBJ1001I: ObjectGrid Server server1 is ready to process requests.

    Logs for the ObjectGrid servers will be created in a logs directory, located in the directory where you kick off the servers. It is interesting to observe the contents of the SystemOut.log for server1 and server2 whilst running the client application.

  4. Run the client application

    Start the client code with this command from the DEMO_HOME directory (all on one line):

    java -cp <OG_HOME>/lib/ogclient.jar;. com.ibm.websphere.samples.datagrid.SalaryApp

    Listing 14. Sample output from demonstration application
    Populating grid with Employee entities
    Calculate salary average across the grid
    Average salary across grid: 44000
    Processing salary increase across grid
    Calculate salary average across the grid
    Average salary across grid: 48400
    Reading from partition: 0
    	Jonathan
    	Marshall
    	0
    	110000
    Reading from partition: 1
    	Alan
    	Chambers
    	1
    	55000
    Reading from partition: 2
    	Matt
    	Perrins
    	2
    	22000
    Reading from partition: 3
    	Joe
    	Bloggs
    	3
    	22000
    Reading from partition: 4
    	Fred
    	Smith
    	4
    	33000

    This output shows:

    • The results of running the two agents:
      • Displays the average salary prior to running the salary increase agent.
      • Running of the salary increase agent.
      • The average salary does indeed increase by 10%.
    • A printout of the individual employee records stored in your ObjectGrid. Again, each individual salary has increased by 10% and each Employee object has been read from a different partition, showing the distributed nature of your query.

    If you take a look at the SystemOut.log files of server1 and server2, you should see five messages that each look similar to this:

    [10/10/07 14:04:10:046 BST] 21722172 
       SystemOut O Calculating salary total for partition: 0
    [10/10/07 14:04:10:468 BST] 40d840d8 
       SystemOut O Processing salary increase for partition: 0

    This log output is generated by the agents as they run, so you should see that the agents are running five times, from five different places. The client can be run a number of times, each time showing a greater average salary as the salary data in the grid is incremented.


Conclusion

This article illustrated two powerful functional areas of ObjectGrid: using the EntityManager API for persisting Java objects in the ObjectGrid, and using the DataGrid API to perform grid computation and updates.

With these tools, it can be easy to set up the basic infrastructure of a distributed data grid and perform grid computations. The simple examples shown in this article show how using ObjectGrid can enable complex scenarios that could otherwise be extraordinarily difficult. Further, this approach has the potential to yield significant performance gains, without requiring significant hardware infrastructure.


Acknowledgements

I would like to thank Alan Chambers, Billy Newport, and Chris Johnson for reviewing and providing feedback on this article.


Download

DescriptionNameSize
Code sampleObjectGrid-computation-demo.zip16 KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=276319
ArticleTitle=Highly scalable grid-style computing and data processing with the ObjectGrid component of WebSphere Extended Deployment
publish-date=12122007