Solving Business Problems with WebSphere Extended Deployment

Using WebSphere Extended Deployment's Long Running Execution Model to asynchronously access enterprise applications deployed in WebSphere

This article provides an overview of the batch processing capabilities of WebSphere Extended Deployment's Business Grid technology and provides several examples of real business solutions. The Business Grid is a long-running execution environment that allows your business applications to be accessed in a batch-like, asynchronous manner. This article assumes that you have a basic understanding of J2EE™ technology and the role of application servers in enterprise application integration.

Snehal Antani, Consultant, IBM

Snehal is a consultant for IBM Software Services (ISSW), supporting large scale enterprise customers with WebSphere Application Server and WebSphere Extended Deployment projects for distributed and z/OS systems. Snehal is based out of the IBM Poughkeepsie lab and his prior experience includes development for Web services, Application Versioning, Security, EJB Container, and Performance for WebSphere Application Server for z/OS and WebSphere Extended Deployment.


developerWorks Contributing author
        level

Mohammad Fakhar, Software Engineer, IBM

Mohammad Fakhar is a WebSphere Consultant with IBM Software Services for WebSphere in Dubai. He was part of the WebSphere XD product development team and provides consulting services for the WebSphere Virtual Enterprise product.



28 June 2006

Introduction

J2EE architecture emphasizes centralizing core business logic into reusable applications. These applications become easier to maintain when centralized and deployed to middleware such as WebSphere. Modern middleware addresses many significant problems that had plagued traditonal IT infrastructures- ensuring that the application is highly available, scalable, and secure, and that data integrity is robust. One key feature that J2EE application servers do not directly address however is the ability to access business logic asynchronously in a batch processing environment

Asynchronous batch processing is an integral part of any complete IT solution and has existed on the mainframe for decades. Banking solutions use batch exhaustively, for example, to assess interest on accounts or compute credit ratings. As J2EE and open standards become more pervasive in the marketplace, business applications will migrate from native technology to these open standards and run on middleware such as WebSphere. In this evolution, businesses will lose the ability to efficiently access those business applications asynchronously. WebSphere Extended Deployment (hereafter called Extended Deployment) delivers Business Grid technology to help address this problem. Business Grid is the overall execution environment for two types of workloads- Compute Intensive and Batch. While Business Grid encompasses both workload types, this article focuses on batch processing and how it is managed by the Business Grid.

Batch jobs consist of units-of-work that must be executed on one or more records. These records are stored in some type of file system or database and are fed to this unit-of-work in a loop. Distinct tasks of a batch application can be divided into batch steps. The entire batch job is complete when each batch step has been processed. The Business Grid is a long-running execution environment that allows you to access business applications in a batch-like, asynchronous manner. It defines a J2EE-based programming model to which you can write batch applications. We discuss the details of the programming model, but generally you must implement a processJobStep() method in your Java implementation class. This processJobStep() method is synonymous with the unit-of-work previously described. The Extended Deployment long-running execution environment calls this method repeatedly in a loop as shown in Figure 1.

For more detailed information on batch jobs, see the WebSphere Extended Deployment Information Center.

Figure 1. The Execution environment invokes the processJobStep() defined on the batch step in a batch loop
The Execution environment invokes the processJobStep() defined on the batch step in a batch loop

Overview of WebSphere Extended Deployment's Business Grid

Two runtime components make up the Business Grid: one Long Running Scheduler (LRS) and one or more Long-Running Execution Environments (LREE), as shown in Figures 2a and 2b. The LRS is primarily a dispatcher:

  • Web service or IIOP request submits batch jobs to the LRS using metadata described in an XML dialect known as xJCL
  • The LRS uses placement and workload algorithms to determine which LREE should execute this job
  • The LRS dispatches the job to that LREE
  • The LREE executes the job
  • The LRS is notified of state changes to the job itself
  • State changes are persisted in database tables specific to the LRS
Figure 2a. Overview of WebSphere Extended Deployment Business Grid's LRS and LREE components
Overview of WebSphere Extended Deployment Business Grid's LRS and LREE components
Figure 2b. End-to-end view of how batch jobs are dispatched to the Business Grid
End-to-end view of how batch jobs are dispatched to the Business Grid

A typical batch application must address the following questions:

  • What specific steps must be executed to complete this batch job?
  • What is the source of the input data?
  • What is the destination of the output data?
  • How frequently should the state of the overall job be saved (checkpointed) and therefore be restorable?
  • How should the return codes for each job step be handled?
  • Is there a conditional flow between the steps in the job?

The Business Grid defines a programming model that specifies how to form the answers to each of the questions. It manages several key pieces of a batch job:

  • Positioning and repositioning data streams
  • Checkpointing the job at some predefined interval
  • Processing the return codes from steps and/or the overall job
  • Passing each record to the unit-of-work defined in the application
  • Prioritizing the dispatching of jobs based on the service policy applied to them

The Business Grid introduces the concept of a Batch Data Stream (BDS). A BDS is a positionable data source. Some typical BDS types include:

  • Service Data Objects (SDO)
  • MVS datasets (using jZOS, see http://jzos.com/)
  • Relational databases
  • Files
  • Any other positional resources

The batch step processess an input BDS containing the records or pieces of records. The batch step writes the results of that processing to an output BDS. The Business Grid manages the checkpointing of these BDSs so that they can be repositioned at the last checkpoint if the batch job is restarted (after a failure for example).

Listing 1 describes a BDS declaration within the xJCL

Listing 1: A BDS declaration within the xJCL
<batch-data-streams>
  <bds>
      <logical-name>myoutput</logical-name>
      <impl-class>com.ibm.websphere.samples.PostingOutputStream</impl-class>
      <props>
          <prop name="FILENAME" value="somefile" />
      </props>
  </bds>
</batch-data-streams>

The Business Grid allows for checkpointing (saving the state of the job at some selected interval) batch jobs. This lets you restart the job at its previous checkpoint on any other LREE if necessary. Checkpointing is important in batch processing because it allows you to recover more efficiently from failures, instead of starting the entire job over from the beginning. Business Grid has two predefined checkpoint algorithms: Timer-Based and Record-Based.

  • Timer-Based algorithm: checkpoints occur after some specific amount of time has passed, (e.g., every fifteen seconds).
  • Record-Based algorithm: checkpoints occur after some specified number of records has been processed (e.g., every one hundred records).

Checkpoint algorithms directly affect the transactional behavior of a batch application. The Business Grid initializes a global transaction at the start of a checkpoint and commits that global transaction once the checkpoint has completed successfully. This allows the runtime to rollback any work that took place within the checkpoint when a failure occurs. Resource consideration, such as database locks, open file-descriptors, and memory must play an integral role in deciding how frequently to checkpoint. Listing 2 shows a checkpoint algorithm within the xJCL.

Listing 2. Checkpoint algorithm declaration within the xJCL
<checkpoint-algorithm name="timebased">
<classname>com.ibm.wsspi.batch.checkpointalgorithms.timebased</classname>
	<props>
		<prop name="interval" value="15" />
	</props>
</checkpoint-algorithm>

The Business Grid also provides callbacks for results algorithms. Results algorithms are points within the application that can process job step and overall job return codes. These results algorithms examine the return codes and can use any J2EE service provided by the middleware to process them. For example, it can place a message on a JMS queue or invoke a web service to indicate failure or success. Listing 3 describes the results algorithm within the xJCL.

Listing 3. Results algorithm declaration within the xJCL
<results-algorithms>
	<results-algorithm name="jobsum">
		<classname>com.ibm.wsspi.batch.resultsalgorithms.jobsum</classname>
	</results-algorithm>
</results-algorithms>

Each component of a batch application is defined by meta data in the xJCL XML file. The Extended Deployment Business Grid Programming Guide describes each element of the xJCL and provides samples on how they should be defined. The metadata structure generally consists of the structure shown in Listing 4.

Listing 4. General description of xJCL
<?xml version="1.0" encoding="UTF-8"?>
<job name="PostingsSampleEar" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   <jndi-name>ejb/com/ibm/websphere/samples/PostingsJob</jndi-name>
   <step-scheduling-critria>
      .......................
   </step-scheduling-critria>
   <checkpoint-algorithm name="timebased">
      ...............
   </checkpoint-algorithm>
   <results-algorithms>
      ...............
   </results-algorithms>
    <job-step name="Step1">
      <jndi-name>ejb/DataCreationBean</jndi-name>
      <checkpoint-algorithm-ref name="timebased" />
      <results-ref name="jobsum" />
      <batch-data-streams>
        <bds>
          <logical-name>myoutput</logical-name>
          <impl-class>com.ibm.websphere.samples.PostingOutputStream</impl-class>
          <props>
            <prop name="FILENAME" value="c:\temp\batchjoboutput\bjo.zout1.out" />
          </props>
        </bds>
      </batch-data-streams>
    </job-step>
</job>

Example LREE interaction flow with a batch step

Figure 3 shows a sequence diagram for how the LREE drives a batch step. If there are multiple steps in the xJCL, the LREE drives this sequence for each step sequentially. Important: The LREE calls all methods in the diagram below under a global transaction context managed by the LREE.

Figure 3. Sequence diagram for how the LREE drives a batch step
Sequence diagram for how the LREE drives a batch step

For a larger version of Figure 3, click here.


Solving problems with Business Grid

In this section, we will solve two sample problems using Business Grid technology

Providing batch access to existing Enterprise Applications

Problem: Several types of applications (retirement modeling, student loan forecasting, etc.) access a financial calculator. This financial calculator application is their "kernel" application, which must be available to numerous other banking applications, including asynchronous batch-type applications that execute tasks such as calculating interest and credit scores.

The data that the financial calculator must process asynchronously resides in EBCDIC in an IMS database on the mainframe. This data must remain on the mainframe to take advantage of the robust security, high-availability, and scalability features of z/OS and WebSphere on z/OS. How do we asynchronously process this native data while reusing our financial calculator? Figure 4 shows a visual representation of the problem.

Figure 4. Problem: A financial calculator module needs to be re-used by multiple applications in an asynchronous manner
Problem: A financial calculator module needs to be re-used by multiple applications in an asynchronous manner

Solution: Invoke financial calculator business logic from a batch step and use Batch Data Streams to read native data sets with jZOS APIs. This is shown in Figure 5.

Figure 5. Solution: Use an Extended Deployment batch application to re-use core business logic
Solution: Use an Extended Deployment batch application to re-use core business logic

The steps the batch application in Figure 5 takes are:

  1. Create two jZOS Batch Data Streams: one for input, one for output.
  2. Define a batch step bean with a processJobStep(). This bean contains logic to retrieve a record from the jZOS BDS and also contains logic to invoke the financial calculator under the covers with the retrieved record.
  3. Repeatedly invoke the processJobStep() of the batch step bean in a batch loop.
  4. Upon each iteration of processJobStep(), the batch step bean:
    1. Creates a Java Object that represents the data retrieved from the input jZOS BDS and passes that object to the financial calculator for processing.
    2. Takes the output of the processing and persists the data to the output MVS dataset via the output jZOS BDS.
  5. Continue iterating until each input record has been processed and its output persisted.

Although you could solve this problem without using the Business Grid, the solution is not as efficient or as elegant. A WebSphere MQ solution is shown in Figure 6.

Figure 6. A WebSphere MQ solution for the financial calculator problem
A WebSphere MQ solution for the financial calculator problem

If you opted to use a WebSphere MQ solution, it would take the following steps:

  1. From a native COBOL application:
    1. Dispatch the IMS data to WebSphere using message-driven beans and WebSphere MQ messaging
    2. Convert each record to ASCII XML and place it on the message queue
  2. When the record is on the message queue, the WebSphere framework would:
    1. Notify a message-driven bean within WebSphere
    2. Retrieve that message from the queue with the bean
    3. Pass that message on to the business logic
  3. The business logic would then:
    1. Convert the XML into a Java Object
    2. Pass the Java Object to to a kernel bean and peform the financial calculations on it
    3. Convert the Java Object back to XML
    4. Push the XML back onto the message queue
  4. WebSphere would then notify the native application and persist the message back into the IMS database

Why is the Business Grid solution better? It provides:

  • Flexibility of adjusting the checkpoint algorithm without having to modify any application code. This in turn affects the transaction scope and resource locking schemes.
  • Flexibility of changing data sources by defining new Batch Data Streams and isolating the application changes to specific sections of the code, namely the BDS definitions.
  • Eliminating the need to massage the data format via intermediaries. Business Grid is able to parse the raw data via the BDS definition and convert that raw data straight to Java-Object, as opposed to converting data to XML, to ASCII, and so on.
  • Ability to recover from the last check-pointed position in case of system failures, e.g., temporary network or database failure
  • Ability to temporarily suspend or cancel jobs in case of resource contention; eg: if an unexpected peak in online workload occurs, then Business Grid can suspend work and use the server for online-transaction processing.
  • Ability to assign service policies to jobs; eg: give higher priority to platinum customer batch workloads.
  • Simplified system administration of the batch environment since Business Grid integrates with WebSphere Admin Console to provide view of batch jobs in system.
  • Ability to quickly create a conditional flow of job steps using xJCL.
  • Ability to use health management features of WebSphere Extended Deployment to proactively manage failures, such as memory leaks, by enabling automatic restart of jobs after recovering from a server failure.

Batch processing of JMS-based purchase orders

Problem: Imagine that you are designing a batch application to process purchase orders. The batch application receives purchase orders over a WebSphere MQ queue. The application processes the purchase orders in a batch window at night, when online applications activity is not at peak. To get through a day's batch of purchase orders, the application must be extremely optimized and hence written by highly skilled programmers who are familiar with all the eccentricities of the underlying technologies and platforms on which the application runs. In worst case scenarios, some purchase orders do not make the batch window and customers have to wait another day for the order to be processed, which hurts customer satisfaction. Furthermore, analysis of servers indicates that most machines are not even at 100% utilization during the day, while purchase orders await their turn on the MQ queue.

Solution: Make the purchase order application a WebSphere Extended Deployment batch application and let the Business Grid run the Batch whenever idle servers are available in the WebSphere cell. Figure 7 outlines this solution.

Figure 7. Batch processing of JMS-based purchase orders
Batch processing of JMS-based purchase orders

The direction of flow in the above diagram is from left to right, starting from the submission of xJCL to LRS and resulting in the LREE Purchase Order batch application being driven by LREE to process records on the MQ queue.

Long Running Scheduler Flow

The steps in the scheduler flow are:

  1. xJCL representing job metadata is stored in the LRS database. LRS has an xJCL Repository feature that allows xJCL to be stored and invoked from the LRS database
  2. In this scenario, you have two options for triggering a submit request to LRS for the purchase order job:
    1. Use the LRS recurring jobs feature to submit the purchase order job at certain intervals: for example, submit purchase order job every three hours or at 3 pm every day.
    2. Write a custom trigger that monitors the queue for purchase orders. The trigger uses the LRS Web-services interface to submit the job when it reaches a certain threshold of purchase orders on the queue: for example, submit a job when at least 10 purchase orders are pending.
  3. Upon receiving the job request, the LRS finds an idle application server in the WebSphere Extended Deployment cell to which to dispatch the request and asynchronously invokes the LREE on that application server to start the batch job.
  4. The LREE invokes the batch application flow and sends status to the LRS about the progress of the job.

Batch application flow

The steps in the batch application flow are:

  1. The LREE calls the processJobStep method of the batch step.
  2. The batch step uses a JMS based input Batch Data Stream (BDS) to read records from the purchase order queue. Since processJobStep is called under a global transaction, BDS access to the queue is transactional.
  3. The batch step invokes business logic to process the purchase order(s).
  4. The batch step uses an output JMS BDS to write a confirmation or completed purchase order object to WebSphere MQ. Since processJobStep is called under a global transaction, the BDS access to the queue is transactional.
  5. The processJobStep method returns and the LREE consults the checkpoint policies to determine if its time for a checkpoint. If a checkpoint is taken, the current transaction is committed; hence the JMS provider takes input messages that were processed permanently off of the input queue and commits output messages on the output queue. A new transaction is then started.
  6. processJobStep is called again to process the next record(s).
  7. This cycle continues until all records on the queue are processed. If a failure occurs or if the job is cancelled during processing, it can be restarted later from the last checkpointed position.

Advantages of using WebSphere Extended Deployment

In addition to the advantages listed in the previous scenario, WebSphere Extended Deployment has the following advantages:

  1. Faster purchase order processing by utilizing long running scheduler application placement capabilities.
  2. Full utilization of idle servers during day time means better return on investment on hardware.
  3. Recoverable message processing as JMS is transaction aware and positional BDSs can be processed from saved checkpoints.
  4. Flexible job scheduling choices: you can use interval-based scheduling features of LRS or write custom triggers to kick off jobs
  5. Integrated management of an organization's online and batch applications from one product, which leads to cost savings derived from not having to maintain two separate workload infrastructures and two distinct highly specialized skill sets in the organization.

Conclusion

WebSphere Extended Deployment provides a robust infrastructure and modern programming model for recoverable batch processing that let's you re-use existing services of an enterprise in a batch application.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=132852
ArticleTitle=Solving Business Problems with WebSphere Extended Deployment
publish-date=06282006