Modernized Java-based batch processing in WebSphere Application Server, Part 2
Transaction batch programming model
Part 1 of this article series described the Modern Batch feature of IBM WebSphere Application Server through the development of a sample batch application using the compute intensive programming model. Part 2 looks at the transaction batch programming model, which provides a powerful job failover model, based on checkpoint and restart semantics.
Figure 1 depicts the various components of a batch application.
Figure 1. Components of a Batch Application
You might notice that this diagram is an extension of Figure 1 from Part 1. This enhanced model introduces these new concepts:
- Batch data stream (BDS): BDS provides an abstraction for the data stream processed by a batch step. The WebSphere Application Server Modern Batch feature provides a BDS framework which includes pre-built code that manages the opening, closing, externalizing, and internalizing of a checkpoint. Available BDS framework patterns are shown in Table 1.
Table 1. Batch Data Stream Frameworks Patterns
BDS framework patterns Description JDBC Retrieves/writes data from a database using a JDBC connection. Byte Reads/writes byte data from a file. Text file Reads/writes to a text file. JPA Retrieves/writes data to a database using a Java Persistence API (JPA) connection.
- Checkpoint algorithm: The batch container calls the checkpoint algorithm periodically to determine if it is time to take a checkpoint. The Modern Batch feature provides two pre-built checkpoint algorithms, one that supports a time-based checkpoint interval, and another that supports a checkpoint interval based on record count. Custom checkpoint algorithms can also be plugged in by writing the implementation.
- Result algorithm: Each batch job step supplies a return code when it is done. The results algorithm has visibility to the return codes from all steps in a batch job and returns a final, overall return code for the job as a whole. Modern Batch provides a pre-built results algorithm that returns the numerically highest step return code as the overall job return code. Custom result algorithms can also be plugged in by writing the implementation.
Transaction batch model
The controller bean, com.ibm.ws.batch.BatchJobControllerBean, controls the lifecycle of the batch application and is responsible for reading the xJCL file to find and execute these implementation classes:
- Job Step implementation class: contains the business logic for each step.
- Batch Data Stream implementation class: holds the data exchange logic.
- Checkpoint implementation class: determines how often to commit global transactions under which batch steps are invoked.
- Results implementation class: is used to manipulate the return codes of batch jobs.
With this high level understanding of the programming model, let’s develop a sample batch application that follows the transaction batch model using IBM Rational Application Developer V8.0.
Sample business scenario
For this example, consider the case where you need a batch program that scans a file containing records. For each record, you need to perform some processing and finally insert the record into a database. The batch program should also allow for checkpoint and restart capability, which implies that in case the processing was somehow interrupted, further processing should resume from the last saved state and not from the beginning.
For example, say there are 100 records to be processed and the checkpoint is created after every 10 records. If the processing fails after 23 records. the batch program should insert 20 records in the database and the processing should continue from the 21st record.
The transaction batch programming model provides this functionality to create checkpoints after every specified record without any additional programming. In the case of an interruption after the 23rd record, the checkpoint and restart capability will ensure that the processing starts from 21st record. If you do not have the checkpoint and restart capability, it would mean that you need to either manually stream the job from the 21st record, or start afresh from the beginning.
In the next sections, you will develop a transaction batch job for this sample scenario using Rational Application Developer V8.0. Later, we’ll discuss the different interfaces that can be used to submit the WebSphere Application Server batch jobs.
As you can see from Figure 1, developing a transactional batch job requires developing:
- Configuration xJCL file
- Job step implementation class
- Batch data stream implementation class
- Checkpoint algorithm implementation class
- Result algorithm implementation class
Modern Batch provides built-in patterns for the various components that can be utilized for rapid development. In this example, you will be using these patterns:
- Generic pattern as the job step pattern for implementing the job. This pattern can be used where you have exactly one input and one output stream.
- Record based pattern as the checkpoint algorithm for specifying the number of iterations of the process job step method before committing the transaction.
- Job sum pattern as the result algorithm for result verification. Job sum returns the highest return code of all job steps.
- Text file reader pattern as the input stream as you are reading the input data from the file, and JDBC writer pattern as the output stream for committing the records to the database as part of the BDS framework.
See Related topics to learn more about the various available built-in patterns.
To develop this application:
- Create the project.
Start by first creating the required projects in Rational Application Developer. Navigate to File > New > Batch Project and create a new batch project named
TransactionBatch. Click Finish. This will generate the required three projects.
- Create Job Control file.
Create a new xJCL file named
TransferRecordsToDBby right-clicking on the xJCL folder under TransactionBatch project, and select New > Batch Job, as shown in the Figure 2. Click Next.
Figure 2. Creating a new batch job
- Create batch job step.
Here, you create the batch job step with its implementation class. Name the job
InsertRecordsand select Generic as the Pattern. Click Create in the Required Properties Section of the dialog (Figure 3).
Figure 3. Create job step
- Create implementation class.
Name the class
InsertRecordsToDBand name the package
com.ibm.dw.batch.transaction(Figure 4). Click Finish to create the generic pattern implementation class.
Figure 4. Create pattern implementation class
- Add checkpoint algorithm.
In the Algorithm section of the Batch Step Creation dialog (Figure 3), click the Add button next to Checkpoint Algorithm. The Checkpoint Algorithm dialog will display (Figure 5). Name the class
RecordCountCheck. Select Record Based as the Pattern, and enter other field values shown in Figure 5. Click Finish to the checkpoint algorithm implementation based on the Record Based pattern. This would ensure that the checkpoints are taken every 10 records.
Figure 5. Create checkpoint algorithm implementation class
- Add result algorithm.
In the Algorithm section of the Batch Step Creation dialog (Figure 3), click the Add button next to Result Algorithm. The Result Algorithm dialog will display (Figure 6). Name the class
JobSumResultand select Job Sum for the Pattern. Click Finish to create the result algorithm implementation based on the Job Sum pattern.
Figure 6. Creating Result Algorithm Implementation Class
When all the required implementation classes have been create, the Batch Step Creation dialog should look as it does in Figure 7. Click on the Next button.
Figure 7. Batch Step Creation completion
- Specify input stream.
On the Step Stream dialog (Figure 8), name the Input Streams as
TextFileInputStream. Select Text File Reader as the Pattern and provide the location of the input file (in this example,
C:\\InputFile.txt) for the FILENAME under the Required Properties section. Click Create.
Figure 8. Create text file reader input stream
- Create implementation class for Input stream.
The Create class dialog for creating the implementation class for the Text File Reader Pattern for the input stream displays. Name the class
TextFileReaderand name the package
com.ibm.dw.batch.transaction(Figure 9). Click Finish and then Next.
Figure 9. Create text file reader implementation class
- Specify output stream.
Name the Output Stream
JDBCOutputStreamand select the JDBC Writer for the Pattern. Click Create (Figure 10).
Figure 10. Create JDBC writer output stream
- Create implementation for output stream.
The Create Class dialog displays for creating the implementation class for the output stream JDBC Writer. Name the class
JDBCWriterand name the Package
com.ibm.dw.batch.transaction(Figure 11). Click Finish twice to complete all the required steps of the transaction batch creation process.
Figure 11. Create JDBC writer implementation class
After completing the batch job creation steps, the xJCL Editor for the TransferRecordsToDB should look like that shown in Figure 12.
Figure 12. Project structure
- Replace the implementation classes with the ones included with this article for download, completing the exercise of implementing the sample transaction batch application.
You now have a transaction batch program created using Rational tooling. The sections that follow address other configuration elements required and various interfaces available for running this sample.
Submitting batch jobs
The WebSphere Application Server Modern Batch feature provides several interfaces for submitting jobs:
- Job Management Console, discussed in Part 1.
- EJB API is used for an enterprise setting using a Java EE container approach, which is beyond the scope of this article, which addresses a standalone client approach. To submit the job via the standalone EJB interface, see Listing 1.
//Obtain naming context Hashtable env = new Hashtable(); env.put(Context.INITIAL_CONTEXT_FACTORY, "com.ibm.websphere.naming.WsnInitialContextFactory"); env.put(Context.PROVIDER_URL,"corbaloc:iiop:” + <HOST_NAME> + “:” + <BOOTSTRAP_PORT> +/NameServiceCellRoot"); InitialContext ctxt = new InitialContext(env); //Lookup the EJB JobSchedulerHome jobSchedulerHome = (JobSchedulerHome) ctxt.lookup("nodes/" + <NODE_NAME> + "/servers/" + <SERVER_NAME> + “/ejb/com/ibm/websphere/longrun/ JobSchedulerHome"); //Create the Job Scheduler JobScheduler jobScheduler = jobSchedulerHome.create(); //Read the xJCL file jobScheduler.submitJob( <xJCL_FILE_CONTENT> );
You can also refer to the sample EJB batch client project included with this article for download.
- Web service interface
http://<HOST_NAME>:<DEFAULT_HOST_PORT> /LongRunningJobSchedulerWebSvcRouter/services/JobScheduler. The client can be generated for the WSDL using Rational tooling and the jobs can be submitted using the submitJob( ) method.
- Command line utility lrcmd is available in the <WAS_INSTALLATION_DIR>\bin directory.
Running the sample
To run the sample from Rational Application Developer:
Before running the sample application, place the InputFile.txt file in a folder (for example, <PROFILE_HOME>) and update the input location accordingly in the xJCL file. You need tables to insert the records after processing. Use TableSetup.sql to create the required tables in a Derby database; you might need to make changes if you are using another database. Also, create the JNDI (jdbc/ds_jndi) in the WebSphere Application Server administrative console to point to the database where the records will be inserted. When you have completed the configuration, deploy the TransactionBatchEAR to the server and submit the job using the interface above of your choice.
- Verify the checkpoint and restart capability
You’ll notice that the submitted job won’t complete successfully on the first run, having throw an exception as shown in Figure 13. If you investigate further, you’ll find the InputFile.txt has an empty newline at the 23rd line causing it to fail, resulting in the job moving from Executing to Restartable state. You’ll see that the job successfully processed the first 20 records (that is, committed two iterations of 10 records each, as per your job configuration) to the database and rolled back the insertion of 21st and 22nd records, since the exception occurs while processing the empty line in the third iteration (Figure 13).
Figure 13. Running the sample application
If you restart the job after removing the empty line from the InputFile.txt file, it would complete processing the remaining records, beginning with the 21st record, as shown in Figure 14.
Figure 14. Restarting the job
- Integrate with schedulers
For illustrative purposes, these articles have explained the batch programming models with its client interfaces, but a batch application would generally be triggered by an enterprise scheduler such as IBM Tivoli® Workload Scheduler, at some pre-determined time. The EJB or the web services interface can be used to integrate Tivoli Workload Scheduler with the batch application. If you use another scheduler that runs like a cron job, then you’ll be able to use the lrcmd utility.
The Modern Batch feature of WebSphere Application Server provides a robust batch programming model that enables you to develop batch programs with minimum effort. Because Modern Batch is a part of WebSphere Application Server, reliability is built into the solution.
This article explained the transaction batch programming model and completed a sample application using the same, finishing up our discussion of the different batch programming models. Subsequent articles will look at more advanced features of Modern Batch and show how it can be used in an enterprise setting.
The authors thank Sajan Sankaran and Edward McCarthy for reviewing this article and providing invaluable input.
- Part 1: Introducing Modern Batch and the compute-intensive programming model
- Feature Pack for Modern Batch Information Center
- WebSphere Application Server Information Center
- Developing batch applications
- Download WebSphere Application Server V7 Feature Pack for Modern Batch
- White paper: IBM Modern Batch - Feature Pack and Compute Grid
- White paper: Beginners Guide to Coding Java Batch Jobs
- Introduction to batch programming using WebSphere Extended Deployment Compute Grid
- Download WebSphere Application Server V8 trial
- WebSphere Application Server product information
- WebSphere Extended Deployment Compute Grid product information
- Presentation: IBM WebSphere Application Server Batch Applications