Batch applications can encounter errors when reading or writing records that could terminate the batch job. If a particular error is not explicitly handled by the application, the record gets thrown back to the batch container, which rolls back the current checkpoint transaction and ends the job in restartable state. The job can be restarted from the previous checkpoint, but this requires some type of external intervention; typically, a user must log in to the Job Management Console and click Restart. Until then, all records subsequent to the errant record are inefficiently left unprocessed until the job is restarted. Furthermore, if the errant record is not corrected, then the restarted job will most likely experience the same error. The error must be corrected before the job can proceed to completion, which means the remaining records go unprocessed until the errant record can be fixed.
This failure scenario, though familiar and historical, presents a challenge in creating an efficient and resilient batch environment. One way of handling the failure is to code the resiliency directly into the batch application; the batch application could catch and handle read and write exceptions itself. However, this approach conflicts with the conventional paradigm that separates business logic from infrastructure responsibilities. The batch application should only be concerned with the business logic necessary to process a record; it should not be concerned with building a resilient batch infrastructure. That’s the batch container’s responsibility.
IBM WebSphere Extended Deployment Compute Grid V8 provides a new feature for handling this type of failure scenario called skip-record processing.
Skip-record processing enables a batch data stream to skip records that encounter read or write exceptions. You can specify which types of exceptions are "skippable" and how many skips are permitted. The batch container handles the skippable exception and skips the record automatically, beneath the awareness of the batch application. The batch application moves on to the next record, automatically and uninterrupted. If a batch data stream encounters an exception that is not in the skippable list, or if it has reached the skip limit, it throws the exception back to the batch application.
The batch application can listen for skipped records by registering a SkipListener with the batch data stream. The SkipListener callback gives the batch application a hook point that it can use to gain control for every record that is skipped, in case the application wishes to log the failure or take some other action. Registering a SkipListener is recommended for the purpose of logging and auditing skipped records.
This article explains how skip-record processing works and how batch applications can use this new feature. For illustrative purposes, an example application that uses skip-record capabilities, SkipRetrySample, is included with this article.
Configuring and activating a skip-record policy
Skip-record processing is enabled by specifying a non-zero value for the com.ibm.batch.bds.skip.count property. This property is defined in the xJCL within the <props> element of a <bds>.
(Only batch data streams that inherit from com.ibm.websphere.batch.devframework.datastreams.bdsadapter.AbstractBatchDataInputStreamRecordMetrics or com.ibm.websphere.batch.devframework.datastreams.bdsadapter.AbstractBatchDataOutputStreamRecordMetrics support skip-record processing. All batch data streams provided by the WebSphere Compute Grid v8 batch data stream framework inherit from one of these two classes, and therefore all support skip-record processing.)
Each batch data stream has its own skip-record policy configuration. The optional properties below control the behavior of the skip-record policy for a particular batch data stream. Each is defined in the <props> element of a <bds>:
com.ibm.batch.bds.skip.countSpecifies the number of records that can be skipped. This is also known as the skip limit. Once the skip limit is reached, no further records are skipped. Any further record-access exceptions are percolated to the caller.
com.ibm.batch.bds.skip.include.exception.class.nSpecifies a list of exceptions that are skippable;
nis an integer, starting at 1 and incrementing by 1 for each exception. If no exceptions are specified, then all exceptions are included in the skippable list.com.ibm.batch.bds.skip.exclude.exception.class.nSpecifies a list of exceptions that are not skippable. If no exceptions are specified, then no exceptions are excluded from the skippable list (that is, all exceptions are skippable).
Be aware that com.ibm.batch.bds.skip.include.exception.class.n and com.ibm.batch.bds.skip.exclude.exception.class.n are mutually exclusive properties. It doesn’t make sense to define both at the same time: if you define a single exception in the include list, then all other exceptions are excluded; if you define a single exception in the exclude list, then all other exceptions are included.
How the batch container handles skipped records
When a batch data stream encounters a skippable exception:
- The batch data stream consumes the exception and increments the running skip count.
- The skip is logged with this message:
CWLRB5852I: Record skipped by batch data stream inputEmp in job SkipRetrySample:00035 step SkipRetryStep1 due to error java.lang.NumberFormatException: For input string: "BAD" - The SkipListener callback is invoked, if one is registered.
- When an input record is skipped, the batch data stream immediately attempts to read the next record. Control does not return to the caller until either:
- A record is read successfully.
- A non-skippable exception occurs.
- The skip limit is reached.
- When an output record is skipped, the batch data stream returns to the caller normally.
Skip-record processing is contained entirely within the batch data stream. The batch application is unaware that records have been skipped, but it can optionally be notified when records are skipped by registering a SkipListener with the batch data stream. SkipListeners are discussed in the next section.
The running skip count is persisted with the batch data stream at checkpoints. Any skips that occur during a checkpoint that eventually rolls back are not included in the count when the job is restarted. The batch data stream resumes with the skip count from the last committed checkpoint.
A batch application can register a SkipListener with the batch data stream in order to listen for skipped records. The primary purpose of the SkipListener is to provide the batch application with a mechanism for logging and auditing skipped records.
The batch container logs a message for every skipped record, and also logs a message at the end of the job to indicate the total number of skips. However, it is strongly recommended to implement a SkipListener, as the SkipListener can access the state of the batch application, if it wishes, to log a detailed account of which records were skipped and why.
The logging of skipped records is crucial for auditing the skips later, after the job has finished. A job that skips some records will still end in the normal "ended" state (as long as the number of skipped records is below the skip limit). Therefore, you should use a SkipListener to make it very apparent to job auditors that records were skipped, and to provide a detailed accounting of those skips so that the errors can be rectified and the records re-processed.
The batch data stream invokes the SkipListener each time a record is skipped. The SkipListener must implement the com.ibm.websphere.batch.SkipListener interface. The interface consists of these methods:
public void onSkippedRead(Throwable t);Invoked when an input record is skipped. The exception is passed as an argument.
public void onSkippedWrite(Object record, Throwable t);Invoked when an output record is skipped. The record that failed and the exception are passed as arguments.
The SkipListener is registered with the batch data stream via the AbstractBatchDataStreamRecordMetrics.addSkipListener method.
The batch data stream must be cast to com.ibm.websphere.batch.devframework.datastreams.bdsadapter.AbstractBatchDataStreamRecordMetrics in order to access the addSkipListener method. All batch data streams provided by the WebSphere Compute Grid V8 batch data stream framework support skip-record processing, and so they all implement the AbstractBatchDataStreamRecordMetrics interface.
A sample application that utilizes the various capabilities of skip-record processing is included with this article for demonstration purposes. The SkipRetrySample is a very simple batch application that updates employee records. The records are read from a text file, updated by the application, then written to another text file. The key classes from the application are:
com.ibm.ws.batch.srs.EmpProcessorImplements the BatchJobStepInterface.
com.ibm.ws.batch.srs.bds.EmpFileReaderFileReaderPattern for input BDS.
com.ibm.ws.batch.srs.bds.EmpFileWriterFileWriterPattern for output BDS.
com.ibm.ws.batch.srs.EmpRecordEncapsulates an employee record.
com.ibm.ws.batch.srs.MySkipListenerSkipListener implementation.
Simulating record-access errors and using skip-record processing
In order to simulate skippable exceptions, the input data contains a handful of records
with bad data. Specifically, the bad records contain text data in a field that is
expected to contain only numeric data (instead of a number, they contain the text
BAD). This causes a NumberFormatException when the batch application’s FileReaderPattern, EmpFileReader, attempts to read the record and construct an EmpRecord object out of it.
The SkipRetrySample xJCL lists java.lang.NumberFormatException as a skippable exception for the batch data stream. Therefore, the batch data stream will skip these bad records and immediately move on and fetch the next record from the stream.
Configuring and activating the skip-record policy
Listing 1 shows an excerpt from the xJCL showing the skip-record policy definition for the batch data stream named "inputEmp."
Listing 1. xJCL excerpt showing skip-record policy configuration
<batch-data-streams>
<bds>
<logical-name>inputEmp</logical-name>
<props>
<prop name="PATTERN_IMPL_CLASS" value="com.ibm.ws.batch.srs.bds.EmpFileReader"/>
<prop name="file.encoding" value="${fileEncoding}"/>
<prop name="FILENAME" value="${inputEmpFile}" />
<prop name="com.ibm.batch.bds.skip.count" value="3" />
<prop name="com.ibm.batch.bds.skip.include.exception.class.1"
value="java.lang.NumberFormatException" />
</props>
<impl-class>
com.ibm.websphere.batch.devframework.datastreams.patterns.TextFileReader
</impl-class>
</bds>
... |
The skip limit is set to 3, meaning the batch data stream will tolerate up to three skipped records. Only java.lang.NumberFormatExceptions will be skipped, as it is the only class listed in the include list. If the batch data stream encounters any other type of error, or if it encounters a fourth NumberFormatException (which means it has exceeded the skip limit), then the exception will percolate to the caller and the record will not be skipped.
When the SkipRetrySample job is executed, the bad records will generate NumberFormatExceptions, which will be detected by the batch data stream. The batch data stream will skip those records and continue processing the input data, enabling the job to complete despite the presence of the errant records.
Adding a SkipListener and auditing skipped records
The SkipRetrySample also illustrates the use of a SkipListener. The SkipListener is registered with the batch data stream under EmpProcessor.createJobStep (Listing 2).
Listing 2. Registering a SkipListener with a batch data stream
public void createJobStep()
{
...
// Open BDS
_inputEmpBDS=(AbstractBatchDataInputStream)BatchDataStreamMgr.getBatchDataStream(
"inputEmp", getJobStepID());
// Add SkipListener
_mySkipListener = new MySkipListener();
((AbstractBatchDataStreamRecordMetrics)_inputEmpBDS).addSkipListener(_mySkipListener);
...
} |
Whenever an input record is skipped, the batch data stream will call SkipListener.onSkippedRead. The MySkipListener implementation is very simple: it logs the last "clean" record (that is, the last record that was successfully read from the stream) for auditing purposes. After the job completes, you will likely want to know whether or not records were skipped, and if so, which records were bad. Since the actual bad record is not accessible (because it failed to be read), MySkipListener logs the previously loaded record. This will help you quickly locate the failing record. The code excerpt is shown in Listing 3.
Listing 3. Sample SkipListener implementation that logs skip errors
public void onSkippedRead(Throwable t)
{
logger.log(Level.FINE,"onSkippedRead called for failure", t);
logger.log(Level.FINE,"last clean record", _lastCleanRecord);
} |
The last clean record is set into MySkipListener under EmpProcessor.processJobStep whenever a record is read successfully (Listing 4).
Listing 4. EmpProcessor.processJobStep supplies record info to SkipListener
public int processJobStep()
throws Exception
{
...
if (_inputEmpBDS.hasNext())
{
EmpRecord empRecord = (EmpRecord)_inputEmpBDS.read();
_mySkipListener.setLastCleanRecord(empRecord);
...
}
...
} |
Logging the skip-record count with record metrics
The batch data stream framework keeps track of these metrics for each batch data stream:
- skip: the number of skipped records
- rps: the record processing rate, in records per second.
Only batch data streams that inherit from com.ibm.websphere.batch.devframework.datastreams.bdsadapter.AbstractBatchDataInputStreamRecordMetrics or com.ibm.websphere.batch.devframework.datastreams.bdsadapter.AbstractBatchDataOutputStreamRecordMetrics maintain record metrics. Because all batch data stream classes provided by WebSphere Compute Grid inherit from one of these two classes, they all maintain record metrics.
These metrics are reported at the end of the job in the messages. Notice that each batch data stream has its own set of metrics.
Listing 5. Record metric messages logged at the end of the job step
CWLRB5844I: [06/11/11 11:37:47:906 EDT] Job Step Batch Data Stream [SkipRetrySample:00035,SkipRetryStep1,outputEmp]: Metric = skip Value = 0 CWLRB5844I: [06/11/11 11:37:47:906 EDT] Job Step Batch Data Stream [SkipRetrySample:00035,SkipRetryStep1,outputEmp]: Metric = rps Value = 82,074 CWLRB5844I: [06/11/11 11:37:47:906 EDT] Job Step Batch Data Stream [SkipRetrySample:00035,SkipRetryStep1,inputEmp]: Metric = skip Value = 2 CWLRB5844I: [06/11/11 11:37:47:906 EDT] Job Step Batch Data Stream [SkipRetrySample:00035,SkipRetryStep1,inputEmp]: Metric = rps Value = 78,265 |
The batch data stream record metrics are also available to the batch application at run time. They can be retrieved from the JobStepContext via JobStepContext.getRecordMetrics. The SkipRetrySample illustrates how to retrieve the metrics in the EmpProcessor.checkMetrics method (Listing 6).
Listing 6. Retrieving RecordMetrics in the batch application
private void checkMetrics()
{
...
RecordMetrics inputEmpRM = JobStepContextMgr.getContext().getRecordMetrics("inputEmp");
long skips = inputEmpRM.getMetric(RecordMetrics.MetricName.skip);
long rps = inputEmpRM.getMetric(RecordMetrics.MetricName.rps);
logger.fine("RecordMetric data for inputEmp BDS. Number of skipped records = " + skips
+ ". Records/second = " + rps);
} |
Record metrics are persisted in the LOCALJOBSTATUS table of the LRSCHED database.
By adding a skip-record policy configuration to your batch application, you can enhance its efficiency by enabling the job to process large data sets to completion – even if the data contains a few bad records that would otherwise interrupt the job, leaving the remaining records unprocessed until the errant record is corrected. A skip-record policy also improves the resiliency of the batch application by responding dynamically to record failures.
| Description | Name | Size | Download method |
|---|---|---|---|
| Code sample | skipsample.zip | 11 KB | HTTP |
Information about download methods
Learn
- WebSphere Compute Grid Information Center: Developing batch applications
- WebSphere Compute Grid Information Center: Skip-record processing
- WebSphere
Extended Deployment Compute Grid product information
-
IBM developerWorks WebSphere
Get products and technologies
Rob Alderman is an Advisory Software Engineer at IBM. He works in the WebSphere development group, specializing in WebSphere Compute Grid and WebSphere Application Server for z/OS. Rob earned a dual BS degree in computer systems engineering and computer science from Rensselaer Polytechnic Institute (RPI) in Troy, NY.




