Modernized Java-based batch processing in WebSphere Application Server, Part 1
Introducing Modern Batch and the compute-intensive programming model
Batch programs are a traditional and essential component of any enterprise IT landscape. The current development trend for dealing with batch processing is to leverage in-house Java skills for both online and batch programs to ensure:
- Maximum re-use of implementation.
- Easier development and maintenance, as the same sets of tools are used.
- Consistency in enforcement of enterprise standards and quality of service.
IBM has developed solutions that provide a cohesive batch program management paradigm. The Modern Batch feature for IBM WebSphere Application Server (available in WebSphere Application Server V8, WebSphere Application Server V7.0 Feature Pack for Modern Batch, and IBM WebSphere Extended Deployment Compute Grid V8.0) provides a batch middleware framework that offers:
- Container managed execution of batch jobs: Provides the structure and support function that Java batch applications require, and helps you avoid the “custom middleware trap.”
- Job control interface: An XML file that describes the Java class files that are used in a batch step and the steps that are included in the batch job.
- Job checkpoint and restart capability: Ability to create checkpoints on the basis of record count or time. This enables restarting a job from a known checkpoint.
- Common batch data stream (BDS): Contains functions that abstract data into easily accessible record formats so that the batch programming can focus on the business functions rather than basic code that reads and writes the data.
Having such a framework in hand provides a welcome alternative to developing custom batch middleware, and permits developers to focus on achieving core business objectives. With Modern Batch, developing batch applications is reduced to simply writing the business logic for the job. This separation of concern between the business logic and the “plumbing” code is an important benefit of the batch framework. It enables a more efficient modularization of batch functions, which permits better re-use, and the ability to expose batch as a modular service.
Modern Batch support two batch programming paradigms:
- Compute-intensive: For simple jobs that perform computationally intensive work and don’t require restart capability.
- Transaction batch: For jobs that need a container-managed checkpoint and a restart mechanism. This enables batch jobs to be restarted from the last checkpoint if interrupted by a planned or unplanned outage.
This article looks at the compute-intensive model and presents a sample implementation using new functionality provided in IBM Rational Application Developer v8.
See Related topics for more information on the importance of a batch platform, details on the Modern Batch middleware framework, and the role of WebSphere Extended Deployment Compute Grid.
Compute-intensive programming model
The compute-intensive programming consists of these elements:
- Controller bean: A stateless session bean that enables the run time environment to control jobs for the application. The implementation of this stateless session bean (CIControllerBean) is provided by the application server.
- Job Step Implementation class: The job step represents the business logic to be performed by the job. It is represented by an instance of a class that implements the com.ibm.websphere.ci.CIWork interface. The CIWork interface contains these following methods:
- run() method would be executed when the CI job runs.
- getProperties() and setProperties() methods are provided to get input values via properties from the client.
- release() method is invoked when the job needs to be discontinued by the client in middle of execution.
- isDaemon() method returns “true” if the work is long-lived versus short-lived.
- xJCL file: An XML-based configuration file that is submitted to the job scheduler to run. The job scheduler uses information in this file to determine where and when the job runs, its inputs and outputs. The xJCL definition of a job is not part of the batch application.
Figure 1 shows the compute-intensive programming model. (This is a simplified version of the actual programming model, which will be discussed in detail in Part 2.)
Figure 1. Compute-intensive programming model
To develop a compute-intensive job, then, you need to:
- Define the xJCL file.
- Create Java classes that implement the CIWork interface with the business logic to be performed for each job step.
- Package the CIWork appropriately with the stateless session bean pointing to com.ibm.ws.ci.CIControllerBean as the implementation class.
Before building a sample compute-intensive job, it’s important to first understand how a compute-intensive application behaves during run time. In summary, what happens (see Related topics for more details) is, the application server uses the xJCL file to find and then invoke the controller bean. The bean reads the xJCL file, and for each job step in the xJCL, the bean:
- Instantiates the application CIWork object, specified by the class name element in the xJCL for the job step, using the no-argument constructor of the CIWork class.
- Invokes the setProperties() method of the CIWork object to pass any properties defined in the xJCL for the job step.
- Looks up the work manager defined in the deployment descriptor of the enterprise bean module, and uses it to asynchronously call the run() method of the CIWork object.
With this understanding of the programming model, let’s look at the steps to develop a compute intensive application.
Sample business scenario
The business in this sample scenario is a financial organization that has many branches in different states. The organization’s clients submit applications to the branches for processing. The compute-intensive application example would generate a report that summarizes the number of applications from each state, plus other metrics for the organization.
To develop this application:
- In Rational Application Developer, create a new batch project named
dWSampleBatchby navigating to File > New > Batch Project. Click Finish when done (Figure 2). This also creates an EJB project that holds the stateless session bean and the EAR project.
Figure 2. Create a new batch project
- Now that you have the projects setup, you need to create the job
definition for this batch. Create the batch job definition by
right-clicking on the xJCL folder in
the new batch project you just created, then select New > Batch
Figure 3. Create a new batch job
- Choose Compute Intensive for the Job Type and enter
SummaryReportJobfor the Job Name (Figure 4). Click Next.
Figure 4. Create the xJCL file
- On the Batch Step Creation panel, enter
SingleStepas the name of the step and choose the default pre-defined CI Work for the job step pattern (Figure 5). The CI Work pattern ensures that the Job Step class implements the CIWork interface, as required by the compute-intensive programming model.
Figure 5. Create the batch step
- Now, you need to create the implementation class for the CIWork
interface. Create the SummaryReport class, which will
implement CIWork and have the logic for the business requirement,
which, in this case, is to create the summary report. Click the Create Class button to create the implementation
class (Figure 5). Enter the details as shown in Figure 6 and click on
Figure 6. Create the batch step implementation class
- You will return to the Batch Step Creation panel. The next step is to
create the parameters for the batch job program, SummaryReport.java.
Create the two Required Properties listed below by selecting Add
(for each) and then Finish (Figure 7).
- InputFileLocation: Holds the location of the input file of data to be processed.
- OutputFileLocation: Holds the location of the output summary report file.
Figure 7. Add required properties
- The Required Properties created in the previous step can either be hardcoded with values or they
can be passed through the xJCL file at run time. For this sample, they
will be passed via the xJCL file. To achieve this, open the
SummaryReportJob.xml file under the xJCL folder by double-clicking it.
This will open the XML job definition file in the xJCL editor, as shown in Figure 8.
Figure 8. Editing xJCL file
- You want to pass the values of the file locations through the xJCL at
run time. You can do this using Substitution Properties, which enable
you to create default name-value pairs that can be used in the xJCL. Create the
Substitution Properties by clicking on Add and then choosing
Substitution Properties in the Add Item dialog and click OK
Figure 9. Add substitution properties
- Add the properties listed below in the Substitution Properties dialog
and click on Finish (Figure 10).
- inputFile: assign a default value of
- outputFile: assign a default value of
Figure 10. Add substitution properties
- inputFile: assign a default value of
- In the xJCL editor, update the required properties values with the
corresponding substitution properties (Figure 11):
- InputFileLocation to
- OutputFileLocation to
Figure 11. Updating required properties
- InputFileLocation to
- You might have noticed that the Rational tooling generates the EJB and
EAR project also. Review the EJB project to ensure that the resource
reference is correctly set to CIWorkManager. Do this in the
EJB Bindings editor by double-clicking ibm-ejb-jar-bnd.xml file under
the EJB project (Figure 12). The batch job is now configured.
Figure 12. Validate resource reference
- SummaryReport.java implements the business logic of reading the data file (InputFile.txt), preparing the report, and then writing it to the output file (OutputFile.txt). Use the SummaryReport.java file with the file included with this article for download to implement this business logic. Also, place the InputFile.txt file under the directory chosen for input file location so that the Summary program can read it. You are now ready to deploy and test the batch application.
Running the sample
To run the sample from Rational Application Developer:
- Right-click on the dWSampleBatchEAR and select Run
As > Run on Server. Select the server that you want to use and
click Finish (Figure 13).
Figure 13. Run batch application on server
- To submit the xJCL to the server runtime, right-click on the
SummaryReportJob.xml file and choose Run As > Modern Batch Job (Figure 14).
Figure 14. Submit xJCL job
- If Security is enabled on this server, check the box and enter
a valid User ID and Password. If you have placed the inputFile.txt
file in a different location other than C:\\InputFile.txt, update the
location with new value. Click Run. The job is submitted to the server runtime and opens the Job log file in the Modern Batch Job Management Console (Figure 16).
Figure 15. Modify substitution properties
Figure 16. Job log
- To view the logs of the jobs run, you can access the Modern Batch Job
Management console by right clicking on the server runtime in the
Server view and choosing Modern Batch Job Management Console,
or by using the URL:
http://<hostname>:<wc_defaulthost port>/jmc/console.jsp. The console is meant to manage batch jobs and has been purposefully kept separated from the WebSphere Application Server admin console, as operating a batch environment and managing a middleware infrastructure are two very different things. Figure 17 shows the Modern Batch Job Management Console, which provides many capabilities for managing jobs.
Figure 17. Modern Batch Job Management Console
- When the run is complete, the submitted job should produce a summary report as a file at C:\\outputFile.txt, concluding the test.
The Modern Batch feature for WebSphere Application Server provides a robust batch framework that enables you to develop batch programs with minimum effort. As part of WebSphere Application Server, the reliability offered by WebSphere products is built into the solution. It provides a simple Java-based programming model enabling you to leverage your Java skills to build dependable batch programs without the need to reinvent the framework. It also gives IT managers an opportunity to move jobs to a managed WebSphere Application Server environment.
Part 2 will discuss the transaction batch programming model, and show with another working example.
The authors thank Edward McCarthy for reviewing this article and providing invaluable input.
- Feature Pack for Modern Batch Information Center
- Developing batch applications
- Download WebSphere Application Server V7 Feature Pack for Modern Batch
- White papers: IBM Modern Batch - Feature Pack and Compute Grid
- White papers: Beginners Guide to Coding Java Batch Jobs
- Introduction to batch programming using WebSphere Extended Deployment Compute Grid
- Download WebSphere Application Server V8 trial
- WebSphere Extended Deployment Compute Grid product information
- Compute-intensive programming model
- Presentation: IBM WebSphere Application Server Batch Applications