Batch Data Streams

Batch Data Streams (BDS) are Java™ objects that provide an abstraction for the data stream processed by a batch step.

To create a Batch Data Stream, you create a Java class that implements the interface com.ibm.websphere.batch.BatchDataStream. This abstraction creates a well-defined interface so that the batch job step can be easily changed to use a different resource type. The BDS also ensures that the resource can be recovered to the last checkpoint position after a failure.

A batch step can have one or more BDS objects associated with it. The batch containers make the BDS associated with the batch step available at run time.

One of the key features of a BDS is the ability to convey its current position in the stream to the batch containers, and the capability to move itself to a location in the Batch Data Stream. This feature allows the batch containers to record (in the batch containers database) how much data was processed by a batch step. This information is recorded on every checkpoint. Therefore, if the job is cancelled or fails in a recoverable manner, the batch containers can restart a batch job from a recorded position in the Batch Data Stream.

For more information, see Batch data streams in the WebSphere Application Server product documentation.

Design notes for applications with BDS

This list contains some of the key application design considerations for building a batch application with a BDS.
Using a BDS to read input to a batch job
Processing input data often involves many records. A BDS for input, to enable checkpoint/restart processing is generally the best option. The use of a BDS is beneficial when:
  • The batch job is processing many records and must use checkpoint/restart.
  • You want to maintain a level of indirection to an input source.
  • You need the application to open a data resource for input based on a logical name and define the implementation to use in the job definition (xJCL).
  • You want to reuse existing BDS implementations.
  • You want the I/O dependencies of your batch job step to be self-documenting through the Batch Data Stream construct in xJCL.
Note: The input data to a batch job should remain unchanged during the lifetime of the job. If the data changes, restart processing could be affected, for example if the Batch Data Stream positions itself in the input data by an offset into the file, and the file data is changed, the positioning of the Batch Data Stream after a restart could be incorrect.
Using a BDS for output from a batch job

Batch Data Streams can be used for output resources for many of the same reasons as for input. However, in some cases, it is more appropriate not to use Batch Data Streams for output.

For example, if you want your applications to call an existing CICS® program that updates a transactional resource and you want to use that CICS program without change.

When you do not use Batch Data Streams, no checkpoint data is stored to recover to a particular position in the resource. That might not be an issue for some resource types. For example, updates to records in a VSAM Key Sequenced data set (KSDS) where the records are updated by primary key, might not need checkpoint information that is stored. If the batch program fails, updates can be made again to the records in the KSDS only if the position in the input data is recovered.

Transactional and non-transactional output
Batch Data Streams in CICS can be classified into one of three categories:
  1. Map to CICS managed transactional resources. These types of streams access the CICS resources through CICS APIs. If records are updated by one of these Batch Data Streams, those updates are transactional and are committed when the batch container checkpoints the application. If the batch job fails and a transaction is backed out, the resources remain in a consistent state.
  2. Access resources directly that are not managed by CICS. These include partitioned data set members and files on zFS. While these Batch Data Streams can be used in a CICS batch application, the updates of records in the resources are not part of a CICS unit of work. Therefore, if the batch application encounters an error and is backed out to its last successful checkpoint, updates to the non-transactional resources are not backed out and the data is in an inconsistent state. If the batch application is restarted from its last successful checkpoint, the Batch Data Stream might be able to correct the state of the data. For example, when appending new records to the end of a zFS file, the Batch Data Stream could truncate the file to the offset of the last successful checkpoint and then continue with further processing.
  3. User written. If the existing Batch Data Stream implementations do not meet your needs, you can write your own implementations.