Skip-record processing enhances the efficiency and resiliency of WebSphere Compute Grid batch applications

IBM® WebSphere® Extended Deployment Compute Grid V8 offers a new feature called skip-record processing for controlling how a batch application responds to error conditions. Skip-record processing enables a batch data stream to automatically skip records that encounter read or write exceptions. This feature is useful when dealing with very large datasets that could contain a few bad records. The batch application can log the skip, discard the bad record, and continue processing the remaining records without interruption, thereby improving the efficiency of the batch environment. The environment is also more resilient, in that a single bad record does not interrupt the entire batch job. This content is part of the IBM WebSphere Developer Technical Journal.

Robert Alderman (ralder@us.ibm.com), Advisory Software Engineer, IBM

Rob Alderman is an Advisory Software Engineer at IBM. He works in the WebSphere development group, specializing in WebSphere Compute Grid and WebSphere Application Server for z/OS. Rob earned a dual BS degree in computer systems engineering and computer science from Rensselaer Polytechnic Institute (RPI) in Troy, NY.



21 September 2011

Also available in Chinese

Introduction

Batch applications can encounter errors when reading or writing records that could terminate the batch job. If a particular error is not explicitly handled by the application, the record gets thrown back to the batch container, which rolls back the current checkpoint transaction and ends the job in restartable state. The job can be restarted from the previous checkpoint, but this requires some type of external intervention; typically, a user must log in to the Job Management Console and click Restart. Until then, all records subsequent to the errant record are inefficiently left unprocessed until the job is restarted. Furthermore, if the errant record is not corrected, then the restarted job will most likely experience the same error. The error must be corrected before the job can proceed to completion, which means the remaining records go unprocessed until the errant record can be fixed.

This failure scenario, though familiar and historical, presents a challenge in creating an efficient and resilient batch environment. One way of handling the failure is to code the resiliency directly into the batch application; the batch application could catch and handle read and write exceptions itself. However, this approach conflicts with the conventional paradigm that separates business logic from infrastructure responsibilities. The batch application should only be concerned with the business logic necessary to process a record; it should not be concerned with building a resilient batch infrastructure. That’s the batch container’s responsibility.

IBM WebSphere Extended Deployment Compute Grid V8 provides a new feature for handling this type of failure scenario called skip-record processing.

Skip-record processing enables a batch data stream to skip records that encounter read or write exceptions. You can specify which types of exceptions are "skippable" and how many skips are permitted. The batch container handles the skippable exception and skips the record automatically, beneath the awareness of the batch application. The batch application moves on to the next record, automatically and uninterrupted. If a batch data stream encounters an exception that is not in the skippable list, or if it has reached the skip limit, it throws the exception back to the batch application.

The batch application can listen for skipped records by registering a SkipListener with the batch data stream. The SkipListener callback gives the batch application a hook point that it can use to gain control for every record that is skipped, in case the application wishes to log the failure or take some other action. Registering a SkipListener is recommended for the purpose of logging and auditing skipped records.

This article explains how skip-record processing works and how batch applications can use this new feature. For illustrative purposes, an example application that uses skip-record capabilities, SkipRetrySample, is included with this article.


Configuring and activating a skip-record policy

Skip-record processing is enabled by specifying a non-zero value for the com.ibm.batch.bds.skip.count property. This property is defined in the xJCL within the <props> element of a <bds>.

(Only batch data streams that inherit from com.ibm.websphere.batch.devframework.datastreams.bdsadapter.AbstractBatchDataInputStreamRecordMetrics or com.ibm.websphere.batch.devframework.datastreams.bdsadapter.AbstractBatchDataOutputStreamRecordMetrics support skip-record processing. All batch data streams provided by the WebSphere Compute Grid v8 batch data stream framework inherit from one of these two classes, and therefore all support skip-record processing.)

Each batch data stream has its own skip-record policy configuration. The optional properties below control the behavior of the skip-record policy for a particular batch data stream. Each is defined in the <props> element of a <bds>:

  • com.ibm.batch.bds.skip.count

    Specifies the number of records that can be skipped. This is also known as the skip limit. Once the skip limit is reached, no further records are skipped. Any further record-access exceptions are percolated to the caller.

  • com.ibm.batch.bds.skip.include.exception.class.n

    Specifies a list of exceptions that are skippable; n is an integer, starting at 1 and incrementing by 1 for each exception. If no exceptions are specified, then all exceptions are included in the skippable list.

  • com.ibm.batch.bds.skip.exclude.exception.class.n

    Specifies a list of exceptions that are not skippable. If no exceptions are specified, then no exceptions are excluded from the skippable list (that is, all exceptions are skippable).

Be aware that com.ibm.batch.bds.skip.include.exception.class.n and com.ibm.batch.bds.skip.exclude.exception.class.n are mutually exclusive properties. It doesn’t make sense to define both at the same time: if you define a single exception in the include list, then all other exceptions are excluded; if you define a single exception in the exclude list, then all other exceptions are included.


How the batch container handles skipped records

When a batch data stream encounters a skippable exception:

  1. The batch data stream consumes the exception and increments the running skip count.
  2. The skip is logged with this message:

    CWLRB5852I: Record skipped by batch data stream inputEmp in job SkipRetrySample:00035 step SkipRetryStep1 due to error java.lang.NumberFormatException: For input string: "BAD"

  3. The SkipListener callback is invoked, if one is registered.
  4. When an input record is skipped, the batch data stream immediately attempts to read the next record. Control does not return to the caller until either:
    • A record is read successfully.
    • A non-skippable exception occurs.
    • The skip limit is reached.
  5. When an output record is skipped, the batch data stream returns to the caller normally.

Skip-record processing is contained entirely within the batch data stream. The batch application is unaware that records have been skipped, but it can optionally be notified when records are skipped by registering a SkipListener with the batch data stream. SkipListeners are discussed in the next section.

The running skip count is persisted with the batch data stream at checkpoints. Any skips that occur during a checkpoint that eventually rolls back are not included in the count when the job is restarted. The batch data stream resumes with the skip count from the last committed checkpoint.


Registering a SkipListener

A batch application can register a SkipListener with the batch data stream in order to listen for skipped records. The primary purpose of the SkipListener is to provide the batch application with a mechanism for logging and auditing skipped records.

The batch container logs a message for every skipped record, and also logs a message at the end of the job to indicate the total number of skips. However, it is strongly recommended to implement a SkipListener, as the SkipListener can access the state of the batch application, if it wishes, to log a detailed account of which records were skipped and why.

The logging of skipped records is crucial for auditing the skips later, after the job has finished. A job that skips some records will still end in the normal "ended" state (as long as the number of skipped records is below the skip limit). Therefore, you should use a SkipListener to make it very apparent to job auditors that records were skipped, and to provide a detailed accounting of those skips so that the errors can be rectified and the records re-processed.

The batch data stream invokes the SkipListener each time a record is skipped. The SkipListener must implement the com.ibm.websphere.batch.SkipListener interface. The interface consists of these methods:

  • public void onSkippedRead(Throwable t);

    Invoked when an input record is skipped. The exception is passed as an argument.

  • public void onSkippedWrite(Object record, Throwable t);

    Invoked when an output record is skipped. The record that failed and the exception are passed as arguments.

The SkipListener is registered with the batch data stream via the AbstractBatchDataStreamRecordMetrics.addSkipListener method.

The batch data stream must be cast to com.ibm.websphere.batch.devframework.datastreams.bdsadapter.AbstractBatchDataStreamRecordMetrics in order to access the addSkipListener method. All batch data streams provided by the WebSphere Compute Grid V8 batch data stream framework support skip-record processing, and so they all implement the AbstractBatchDataStreamRecordMetrics interface.


Example application

A sample application that utilizes the various capabilities of skip-record processing is included with this article for demonstration purposes. The SkipRetrySample is a very simple batch application that updates employee records. The records are read from a text file, updated by the application, then written to another text file. The key classes from the application are:

Further testing with the sample application

The SkipRetrySample application also contains code that is capable of utilizing the various capabilities of retry-step processing. This code is disabled for the purpose of this article, but it can be easily enabled by adding a retry-step policy to the xJCL. To learn more about retry-step processing, refer to the WebSphere Compute Grid V8 Information Center.

  • com.ibm.ws.batch.srs.EmpProcessor

    Implements the BatchJobStepInterface.

  • com.ibm.ws.batch.srs.bds.EmpFileReader

    FileReaderPattern for input BDS.

  • com.ibm.ws.batch.srs.bds.EmpFileWriter

    FileWriterPattern for output BDS.

  • com.ibm.ws.batch.srs.EmpRecord

    Encapsulates an employee record.

  • com.ibm.ws.batch.srs.MySkipListener

    SkipListener implementation.

Simulating record-access errors and using skip-record processing

In order to simulate skippable exceptions, the input data contains a handful of records with bad data. Specifically, the bad records contain text data in a field that is expected to contain only numeric data (instead of a number, they contain the text BAD). This causes a NumberFormatException when the batch application’s FileReaderPattern, EmpFileReader, attempts to read the record and construct an EmpRecord object out of it.

The SkipRetrySample xJCL lists java.lang.NumberFormatException as a skippable exception for the batch data stream. Therefore, the batch data stream will skip these bad records and immediately move on and fetch the next record from the stream.

Configuring and activating the skip-record policy

Listing 1 shows an excerpt from the xJCL showing the skip-record policy definition for the batch data stream named "inputEmp."

Listing 1. xJCL excerpt showing skip-record policy configuration
<batch-data-streams>
  <bds>
    <logical-name>inputEmp</logical-name>
    <props>
      <prop name="PATTERN_IMPL_CLASS" value="com.ibm.ws.batch.srs.bds.EmpFileReader"/>
      <prop name="file.encoding" value="${fileEncoding}"/>
      <prop name="FILENAME" value="${inputEmpFile}" />
      <prop name="com.ibm.batch.bds.skip.count" value="3" />
      <prop name="com.ibm.batch.bds.skip.include.exception.class.1"
            value="java.lang.NumberFormatException" />
    </props>
    <impl-class>
      com.ibm.websphere.batch.devframework.datastreams.patterns.TextFileReader
    </impl-class> 
  </bds>
  ...

The skip limit is set to 3, meaning the batch data stream will tolerate up to three skipped records. Only java.lang.NumberFormatExceptions will be skipped, as it is the only class listed in the include list. If the batch data stream encounters any other type of error, or if it encounters a fourth NumberFormatException (which means it has exceeded the skip limit), then the exception will percolate to the caller and the record will not be skipped.

When the SkipRetrySample job is executed, the bad records will generate NumberFormatExceptions, which will be detected by the batch data stream. The batch data stream will skip those records and continue processing the input data, enabling the job to complete despite the presence of the errant records.

Adding a SkipListener and auditing skipped records

The SkipRetrySample also illustrates the use of a SkipListener. The SkipListener is registered with the batch data stream under EmpProcessor.createJobStep (Listing 2).

Listing 2. Registering a SkipListener with a batch data stream
public void createJobStep()
{
  ...
  // Open BDS
  _inputEmpBDS=(AbstractBatchDataInputStream)BatchDataStreamMgr.getBatchDataStream(
      "inputEmp", getJobStepID());

  // Add SkipListener
  _mySkipListener = new MySkipListener();
  ((AbstractBatchDataStreamRecordMetrics)_inputEmpBDS).addSkipListener(_mySkipListener);
  ...
}

Whenever an input record is skipped, the batch data stream will call SkipListener.onSkippedRead. The MySkipListener implementation is very simple: it logs the last "clean" record (that is, the last record that was successfully read from the stream) for auditing purposes. After the job completes, you will likely want to know whether or not records were skipped, and if so, which records were bad. Since the actual bad record is not accessible (because it failed to be read), MySkipListener logs the previously loaded record. This will help you quickly locate the failing record. The code excerpt is shown in Listing 3.

Listing 3. Sample SkipListener implementation that logs skip errors
public void onSkippedRead(Throwable t)
{
  logger.log(Level.FINE,"onSkippedRead called for failure", t);
  logger.log(Level.FINE,"last clean record", _lastCleanRecord);
}

The last clean record is set into MySkipListener under EmpProcessor.processJobStep whenever a record is read successfully (Listing 4).

Listing 4. EmpProcessor.processJobStep supplies record info to SkipListener
public int processJobStep()
  throws Exception
{
  ...
  if (_inputEmpBDS.hasNext())
  {
    EmpRecord empRecord = (EmpRecord)_inputEmpBDS.read();
    _mySkipListener.setLastCleanRecord(empRecord);
    ...
  }
  ...
}

Logging the skip-record count with record metrics

The batch data stream framework keeps track of these metrics for each batch data stream:

  • skip: the number of skipped records
  • rps: the record processing rate, in records per second.

Only batch data streams that inherit from com.ibm.websphere.batch.devframework.datastreams.bdsadapter.AbstractBatchDataInputStreamRecordMetrics or com.ibm.websphere.batch.devframework.datastreams.bdsadapter.AbstractBatchDataOutputStreamRecordMetrics maintain record metrics. Because all batch data stream classes provided by WebSphere Compute Grid inherit from one of these two classes, they all maintain record metrics.

These metrics are reported at the end of the job in the messages. Notice that each batch data stream has its own set of metrics.

Listing 5. Record metric messages logged at the end of the job step
CWLRB5844I: [06/11/11 11:37:47:906 EDT] Job Step Batch Data Stream 
[SkipRetrySample:00035,SkipRetryStep1,outputEmp]: Metric = skip  Value = 0
CWLRB5844I: [06/11/11 11:37:47:906 EDT] Job Step Batch Data Stream 
[SkipRetrySample:00035,SkipRetryStep1,outputEmp]: Metric = rps  Value = 82,074
CWLRB5844I: [06/11/11 11:37:47:906 EDT] Job Step Batch Data Stream 
[SkipRetrySample:00035,SkipRetryStep1,inputEmp]: Metric = skip  Value = 2
CWLRB5844I: [06/11/11 11:37:47:906 EDT] Job Step Batch Data Stream 
[SkipRetrySample:00035,SkipRetryStep1,inputEmp]: Metric = rps  Value = 78,265

The batch data stream record metrics are also available to the batch application at run time. They can be retrieved from the JobStepContext via JobStepContext.getRecordMetrics. The SkipRetrySample illustrates how to retrieve the metrics in the EmpProcessor.checkMetrics method (Listing 6).

Listing 6. Retrieving RecordMetrics in the batch application
private void checkMetrics()
{
  ...
  RecordMetrics inputEmpRM = JobStepContextMgr.getContext().getRecordMetrics("inputEmp");
  long skips = inputEmpRM.getMetric(RecordMetrics.MetricName.skip);
  long rps = inputEmpRM.getMetric(RecordMetrics.MetricName.rps);

  logger.fine("RecordMetric data for inputEmp BDS.  Number of skipped records = " + skips
              + ". Records/second = " + rps);
}

Record metrics are persisted in the LOCALJOBSTATUS table of the LRSCHED database.


Conclusion

By adding a skip-record policy configuration to your batch application, you can enhance its efficiency by enabling the job to process large data sets to completion – even if the data contains a few bad records that would otherwise interrupt the job, leaving the remaining records unprocessed until the errant record is corrected. A skip-record policy also improves the resiliency of the batch application by responding dynamically to record failures.


Download

DescriptionNameSize
Code sampleskipsample.zip11 KB

Resources

Learn

Get products and technologies

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=758489
ArticleTitle=Skip-record processing enhances the efficiency and resiliency of WebSphere Compute Grid batch applications
publish-date=09212011