Create a Simple Compute Grid Parallel Batch Application

Batch Application using Parallel capabilities, running on a WebSphere Network Deployment Cluster

This tutorial describes how to develop a simple Java batch application using IBM® Rational® Application Developer 8.0.2 as the development environment and WebSphere® Extended Deployment Compute Grid 8.0.0.1 as the runtime environment. In addition, the application uses the Parallel Job Manager facility provided by Compute Grid to execute parallel jobs in a WebSphere Application Server Network Deployment environment. It also examines the WSBatchPackager utility and its use in packaging a batch application from Plain Old Java Object (POJO) classes.

Share:

Abhilash Usha (abhilash.u@in.ibm.com), IT Specialist, IBM China

photo of Abhilash UshaAbhilash Usha is a IT Specialist working for IBM Software Labs,India. He has over 7 years of experience in the IT Industry. Being part of the WebSphere Infrastructure Team he gets involved in WebSphere Application Server critsits. His areas of expertise includes design and development of J2EE applications and WebSphere Application Server Infrastructure. Currently he specializes in WebSphere eXtreme Scale and WebSphere Extended Deployment Compute Grid related Projects.



02 March 2012

Also available in Chinese

Introduction

Batch Processing is an important aspect in business systems and is used in such areas as billing system or report generation, and end-of-day settlement system. With business systems being used worldwide round the clock, batch windows are getting narrower, and this makes an efficient batch processing system a real necessity. WebSphere Extended Deployment Compute Grid (hereafter referred to as Compute Grid) is a complete out-of-the-box batch processing platform, providing an efficient, reliable, scalable, highly available, and secure batch execution environment.

This article is based upon WebSphere Compute Grid V8. We use the batch job development feature of Rational Application Developer V8 to construct a simple transactional batch application. We then modify it to include the parallel job manager facility. It describes in detail, the step by step procedure for developing a batch application from scratch and how parallelism can be achieved in a batch job using the Parallel Job Manager (hereafter called PJM) facility provided by Compute Grid.

About this tutorial

The sample batch application, named EmployeeBatchV8, takes employee data from the EMPLOYEE table, does some processing, and then inserts the updated information into the EMPLOYEEOUTPUT table. We will have about 10,000 employee input records belonging employees living in different states in the United States, with state abbreviations ranging from ranging from AL to WY. Using the Parallel Job Manager Facility, the main job is split up into different subordinate jobs (AL-MO and MT-WY) and separately processed. We override the Parameterizer system programming interface (SPI) to provide an independent set of inputs to each subordinate job so that they can run parallel in different grid endpoints(GEE) in a clustered environment. See Figure 1.

Server JVMs hosting the batch application are refered as Grid End Points.

Figure 1. Application Overview
Application Overview

Objectives

In this tutorial, you'll learn how to

  • Develop Batch Applications using Rational Application Developer 8 and the Compute Grid APIs
  • Use the Parallel Job Manager facility of ComputeGrid
  • Deploy the application on a WebSphere Application Server Network Deployment cluster and monitor the jobs
  • Use WSBatchPackager utility to create a batch application from POJO classes

Prerequisites

You must be familiar with the developing Java applications using an Eclipse-based IDE and Application Deployment in WebSphere Application Server Network Deployment (hereafter Application Server).

System requirements

To run the examples in this tutorial, you need WebSphere Application Server V7.0.0.l7 or above (preferably ND) and WebSphere Compute Grid 8.0.0.1 installed in any supported environment. See Figure 2. The setup for the environment used for this tutorial is:

  • Windows XP Machine
  • WebSphere Application Server 7.0.0.19 ,Compute Grid 8.0.0.1 Installed
  • Network Deployment Manager profile created.
    • profile name: Dmgr01
    • node name: ${shorthostname}CellManager01
    • server: dmgr
  • Managed node profile created. (profile name: AppSrv01)
  • Managed node federated into Network Deployment(ND) Cell
  • Cluster created (cluster name: CGCluster) with 2 servers (server1,server2)
  • One another server named SchedulerClone created in the same cell to act as scheduler
  • DB2 UDB V9.7 installed and the Employee database created. Run the ddl file provided as a download with this tutorial to create the Employee and EMPLOYEEOUTPUT tables
  • Migrated the Compute Grid data sources from default derby database to DB2
  • XA DataSource configured in the Application Server console pointing to the Employee Database. (JNDI name: jdbc/employeedbxa)
  • Rational Application Developer 8.0.2 or latest installed with the Compute Grid Tools for Modern Batch feature turned on
Figure 2. Sample infrastructure diagram
Sample Infrastructure Diagram

Creating a simple batch application using Rational Application Developer V8

To create a batch application:

  1. Open up an empty workspace in Rational Application Developer.
  2. Create a new Batch Project named EmployeeSampleV8 and click Finish. Rational Application Developer creates the EJB, Batch, and Enterprise Project (see Figure 3).
Figure 3. Batch Job Creation
Employee Batch Project Creation
  1. Select EmployeeSampleV8EJB project (see Figure 3) and select New -> Batch Job( see Figure 4)
Figure 4. Batch Job Creation
Batch Job Creation
  1. On the Batch Job Creation dialog (see Figure 5), specify Job Name as Employee and click Next.
Figure 5. Specify Employee Batch Controller
Batch Job Creation
  1. Next,create a job step. On the Batch Step Creation dialog (see Figure 6), specify:
    • Name: CopyEmployeeStep
    • Pattern: Generic
    • EnablePerformanceMeasurement: true
    • debug : true
Figure 6. Batch Step Creation
Batch Step Creation
  1. Click Create in the BATCH-RECORDPROCESSOR row under Required Properties (see Figure 6). The Create Class dialog appears (see Figure 7). Create a new class which implements the BatchRecordProcessor interface:
    • Name: EmployeeBatchProcessor
    • Package: com.trial.Employee

    Click Finish. (We provide the implementation for this class later.)

Figure 7. Create the BatchRecordProcessor Class
Create the BatchRecordProcessor Class
  1. Click Add in the Checkpoint Algorithm row under Algorithms (see Figure 6) to create a new checkpoint algorithm implementation. In the Checkpoint Algorithm dialog (see Figure 8), specify the following:
    • Name: checkpoint
    • Pattern: Record Based
    • recordcount: 100(this means that the checkpoint is taken after every 100 records in our sample application. In cases where the job terminates abruptly, we can restart the job from the latest checkpoint taken).

    Click Finish.

Figure 8. Select the CheckPoint Algorithm
Select the CheckPoint Algorithm
  1. Click Add in the Result Algorithm row under Algorithms (see Figure 6) to create a new result algorithm implementation. In the Result Algorithm dialog (see Figure 9), specify the following:
    • Name: jobsum
    • Pattern: Job Sum (this is the default result algorithm implementation and it works by adding up the return code of individual job steps. A job sum of zero means that the job has ended properly).

    Click Finish

    .
Figure 9. Select the Result Algorithm
Select the Result Algorithm
  1. The completed Batch Job dialog is shown in Figure 10. Click Next to proceed to Input Stream Creation.
Figure 10. Batch Step Creation
Batch Step Creation
  1. Create an input stream, specify the following as shown in Figure 11:
    • Name: inputStream (Do not change this name)
    • Pattern: JDBC Reader
    • ds_jndi_name: jdbc/employeedbxa
    • Pattern Implementation Class (PATTERN_IMPL_CLASS): com.trial.Employee.EmployeeJDBCReader

    Set value of all the Optional Properties to true. Add a new optional property named STATES_LIST and give it the value AK,AL. Click Next.

Figure 11. Input stream creation
Input Stream Creation
  1. Create a new Output Stream by specify the following data (see Figure 12):
    • Name: outputStream
    • Pattern: JDBC Writer
    • PATTERN_IMPL_CLASS: com.trial.Employee.EmployeeJDBCWriter

    Add a new optional property called ds_jndi_name and specify its value as jdbc/employeedbxa. Leave batchInterval empty and set all the other optional properties to true. Click Finish.

Note: The GenericXDStep expects the name to be outputStream. Do not change this default.

Figure 12. Output stream creation
Output Stream Creation
  1. We are almost done. Rational Application Developer automatically creates an Employee.xml file under the xJCL directory. It's a XML-based job control language (hereafter referred to as xJCL) which defines the batch job. Most of the parameters we populated in the above screens are used to create the xJCL.After deployment of the Batch Application, we need the xJCL for invoking the batch Application. We created three empty classes for BatchProcessor, InputStream, and OutputStream. In the next page we will try to implement some of the core methods of the above said classes.

Components of the Batch Application

We now look into the detail the classes we created and how they fit into the batch architecture.(See the Downloads section for the source code for these files.)

  • EmployeeBatchProcessor
  • EmployeeJDBCReader
  • EmployeeJDBCWriter

A job step in a transactional batch application typically contains an InputStream (EmployeeJDBCReader) that supplies input records, a BatchRecordProcessor (EmployeeBatchProcessor), which performs the business logic, and an OutputSteam (EmployeeJDBCWriter), where the output data is written into (see Figure 13).

Figure 13. Components of the Sample Application
Components of the Sample Application

In this sample application, we have used the generic batch step (GenericXDBatchStep) provided by Compute Grid. The Compute Grid runtime invokes methods defined in GenericXDBatchStep, which in turn invoke methods in EmployeeBatchProcessor (see Figure 14). A generic batch step works with one input and one output stream. During each iteration of the batch loop, it reads a single entry from the Batch Data Input Stream (EmployeeJDBCReader) and passes it to the BatchRecordProcessor(EmployeeBatchProcessor) for processing.

EmployeeJDBCReader

EmployeeJDBCReader implements JDBCReaderPattern and is used by the Batch Container to fetch the input data. The code snippet in Listing 1 shows the batch container gets the lookup query for fetching the input records. Batch Container invokes all the methods written in EmployeeJDBCReader through BDSJDBCReader and supplies the input record one by one to the EmployeeBatchProcessor. (Figure 14 shows the interaction of different components.) See the source code attached with this article for other method implementations. For more details on different methods and usage, see Introduction to batch programming using WebSphere Extended Deployment Compute Grid.

Listing 1. EmployeeJDBCReader
protected static final String SELECT_CLAUSE =
		"SELECT name,address,city,state,zipcode,email,employeeID,
		phone,annualIncome,lastOfferDate	FROM EMPLOYEESCHEMA.EMPLOYEE ";
		.
		.
		.
public String getInitialLookupQuery() {
		
		String query = SELECT_CLAUSE;
		if ( statesList != null ) {
			query += " WHERE state in ("+statesList+") ";
		}
		query += " ORDER BY EMPLOYEEID";
		return query;
		}

EmployeeBatchProcessor

EmployeeBatchProcessor is the main processing unit in this sample application. It receives the employee record from the EMPLOYEE table and passes it to the batch container to persist the data. The code snippet in Listing 2 is from the processRecord method of EmployeeBatchProcessor. The grid endpoints invoke the processJobStep method defined in GenericXDBatchStep, which in turn invokes the processRecord method of EmployeeBatchProcessor (see Figure 14). You can override this method and put in you custom logic to be executed per record. In this sample application, we are just printing the employee name and then returning the employee object for persisting into the output stream.

Listing 2. EmployeeBatchProcessor
public Object processRecord(Object obj) throws Exception
{
        Employee employee = (Employee)obj;
        System.out.println("Employee Name:"+employee.getName());
        return employee;
         
    }

EmployeeJDBCWriter

EmployeeJDBCWriter implements the JDBCWriterPattern and is used by the Batch Container to write the processed record into the output stream. The code snippet in Listing 3 from EmployeeJDBCWriter shows the getSQLQuery method, is used to get the query, and the writeRecord method, which is used for setting the values for the Prepared Statement. The grid endpoints invoke these methods through BDSJDBCWriter. (See figure 14.)

Listing 3. EmployeeJDBCWriter
 protected String tableName = "EMPLOYEESCHEMA.EMPLOYEEOUTPUT";
	protected String sqlQueryPreTablename="INSERT INTO ";
	protected String sqlQueryPostTablename=" VALUES (?, ?)";
	
	 
	public String getSQLQuery() {
		String query=this.sqlQueryPreTablename+this.tableName+
		this.sqlQueryPostTablename;
		System.out.println("Query is "+query);
		return query;
		// TODO Auto-generated method stub
	 
	}

	
	public PreparedStatement writeRecord(PreparedStatement pstmt, Object record) {
		
		if(record instanceof Employee)
		{
			try
			{
			System.out.println("Writing the Employee record
			into the output table");
			Employee employeerecord=(Employee)record;
			pstmt.setString(1,employeerecord.getName());
			pstmt.setString(2,employeerecord.getState());
			}catch(SQLException sqle)
			{
				System.out.println("Exception while making
             the prepared Statement");
				sqle.printStackTrace();
				
			}
			
		}
		else
		{
			System.out.println("Record is not a instance of
			the Employee");
		}
		
		
		// TODO Auto-generated method stub
		return pstmt;
	}
Figure 14. Batch Internal Architecture
Batch Internal Architecture

We have completed our first batch application. As a first step, you can install this application in an existing WebSphere Application Server Network Deployment Environment. The job can be submitted using the xJCL generated by Rational Application Developer. The sample application selects all employees belonging to states AK,AL from the EMPLOYEE table and inserts them into the EMPLOYEEOUTPUT table. Details on how to install the application and submit the job are included in the section Deploy and Run the Application. The only missing feature is parallelism. All we need to do now is to add some parallel job capability. Compute Grid provides a Parallel Job Manager facility to do it, and also SPIs to customize the way parallel jobs can be run.

Overriding SPIs

Compute Grid provides a number of System Programming Interfaces (SPIs) that can be used in your batch job to customize your advanced parallel application design:

SPIUsage
ParameterizerParameterizer SPI, uses the information passed in the xJCL to help divide the job into subordinate jobs.
SubJobAnalyzerParallel Job Manager(PJM) invokes SubJobAnalyzer to calculate the return code of the job
SubJobCollectorWhen a Check point is taken , SubJobCollector SPI is invoked to collect relevant state Information about the subordinate job
SynchronizationPJM invokes the Synchronization SPIs after all subordinate jobs reaching the final state

Although the sample application overrides all the SPIs except Parameterizer, they simply write to System Out, to help you track the flow of events while running parallel jobs. This sample application has about 10,000 employee records belonging to different states. Using Parameterizer, we divide the total state list by the number of parallel jobs and supply a unique set of states to each subordinate job. See the code snippet in Listing 4. The entire sample code is available as a download.

Listing 4. EmployeeParameterizer
 public Parameters parameterize(String logicalJobName, String logicalTXID,
			Properties props) {
	 		int jobcount = Integer.valueOf(props.getProperty(
			"parallel.jobcount","1"));
    		Parameters parms = new Parameters();
		    parms.setSubJobCount(jobcount);
		Properties newprops [] = new Properties[jobcount];
		for ( int i=0;i<jobcount;i++ ) 
		{
		    newprops[i] = new Properties();
			String stateList=splitup(i,jobcount);
			newprops[i].put("STATES_LIST", stateList);
		}
		parms.setSubJobProperties(newprops);
		return parms;
	}
Note:
Before importing the SPIs , make sure that the Java build path for Rational Application Developer contains com.ibm.ws.batch.runtime.jar. This is necessary because Rational Application Developer v8 does not yet support the parallel features of WebSphere Compute Grid v8. You can find the jar file in the location $WAS_HOME/stacked_products/WCG/plugins.

After importing EmployeeRecordProcessor,EmployeeJDBCReader, Employee, and EmployeeJDBCReader, your workspace should look something like that shown in Figure 15.

Figure 15. RADWorkSpace
RAD WorkSpace

Altering xJCL to include the SPIs

Add the code shown in Listing 5 (ignoring the first line, which is a pointer) to your XJCL, just after the jndi-name tag in Employee.xml. This helps ComputeGrid Runtime to decide upon the number of parallel job counts and also the placement of subjobs across JVMs

Listing 5. Employee.xml
<jndi-name>ejb/EmployeeBatchController</jndi-name>
<run instances="multiple" jvm="multiple"><props>
 <prop name="com.ibm.websphere.batch.parallel.parameterizer"
 value="com.trial.Employee.spi.EmployeeParameterizer"/>
 <prop name="com.ibm.websphere.batch.parallel.synchronization" 
 value="com.trial.Employee.spi.EmployeeTXSynchronization"/>
 <prop name="com.ibm.websphere.batch.parallel.subjobanalyzer"
 value="com.trial.Employee.spi.EmployeeSubJobAnalyzer"/>
 <prop name="com.ibm.websphere.batch.parallel.subjobcollector" 
 value="com.trial.Employee.spi.EmployeeSubJobCollector"/>
 <prop name="com.ibm.wsspi.batch.parallel.subjob.name" 
 value="EmployeeSampleSubJob" />
 <prop name="parallel.jobcount" value="2" />
 </props>
 </run>

Also, alter the STATES_LISTProperty tag (see Listing 6), to enable Parameterizer to decide on the state list each parallel job needs to handle

Listing 6. Employee.xml
<prop name="STATES_LIST" value="${STATES_LIST}" />

Add the snippet in Listing 7 (again, ignore first line as its a pointer) under the props tag , next to BATCHRECORDPROCESSOR. These are the mandatory fields for the PJM.

Listing 7. Employee.xml
 <prop name="BATCHRECORDPROCESSOR" value="com.trial.Employee.EmployeeBatchProcessor"/>
<prop name="com.ibm.wsspi.batch.parallel.jobname" value="${parallel.jobname}" />
<prop name="com.ibm.wsspi.batch.parallel.logicalTXID" value="${logicalTXID}" />
<prop name="com.ibm.wsspi.batch.parallel.jobmanager" value="${parallel.jobmanager}" />

Export the EAR from the Rational Application Developer workspace for deployment and save the Employee.xml file for job Submission.

WSBatchPackager Utility

As a alternative approach, we can use the WSBatchPackager utility supplied with the product to package the application from POJO classes. It can be found under $WAS_HOME/stacked_products/WCG/bin. POJO classes used in this application includes the SPI files, EmployeeJDBCReader, EmployeeJDBCWriter, Employee, and EmployeeBatchProcessor. You can package these classes into pojoclasses.jar. Listing 8 shows the usage of the WSBatchPackager utility. The utility can be used as a alternative on situations where you don't have Rational Application Developer for batch application development. The Enterprise Archive (EAR) file generated by WSBatchPackager is similar to the one we generated from Rational Application Developer.

Listing 8. Usage
WSBatchPackager.bat -appname =EmployeeSampleV8EAR 
-jarfile=C:\dwarticle\pojoclasses.jar -earfile=EmployeeSampleV8EARPOJO.ear

Deploy and Run the Application

  1. Deploy the Employee Batch Application into the WebSphere Application Server cluster. Proceed with the defaults as the mapping of the resource references are already provided in the application.
Note:
While deploying the application make sure that the JDK level is 1.5 or above. Select JDK compliance level as 1.5 or above under the EJB Deploy Options
  1. Access the Job Management Console of the SchedulerClone by hitting the URL http://schedulerhostname:defaulthostport/jmc
  2. Submit the xJCl through the console ( see Figure 16)
Figure 16. Job Management Console
Job Management Console
  1. Click View Jobs (see Figure 17)
Figure 17. Monitor Jobs
Monitor Jobs
  • We could notice three subordinate jobs are spawned as per the parallel job count setting we gave in the xJCL and they are running parallel in different jvms independent of each other(see Figure 17).

Conclusion

WebSphere Compute Grid and Rational Application Developer 8 make construction of a Compute Grid Batch Program much simpler. With the PJM facility offered by Compute Grid, we are able to optimally use the resources by parallel execution of jobs across different JVMs and making the solution more scalable. Also, with WSBatchPackager we are able to generate the Batch ears from POJO classes with ease. This article provided details on how we can effectively use ComputeGrid and Rational Application Developer 8 to create your first parallel batch program.

Acknowledgements

The author wants to thank Christopher Vignola, WebSphere Batch Architect,IBM for reviewing the article and his valuable suggestions.


Downloads

DescriptionNameSize
Working Sample ApplicationEmployeeSampleV8EAR.zip17KB
DDL to create Tables in DB2 databaseEmployeeallrecords.zip564 KB
Completed XJCL fileEmployee.xml4 KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=776188
ArticleTitle=Create a Simple Compute Grid Parallel Batch Application
publish-date=03022012