Executing batch programs in parallel with WebSphere Extended Deployment Compute Grid

Batch computing is an integral part of any enterprise IT infrastructure. As an execution framework, IBM® WebSphere® Extended Deployment Compute Grid is responsible for executing the batch workloads with some enterprise qualities of service, including transactional integrity, security, and restart capabilities, to name a few. This article describes a Proxy Service pattern that uses advanced partitioning techniques for the parallel execution of batch workloads across a grid of endpoints. This content is part of the IBM WebSphere Developer Technical Journal.

Share:

Snehal Antani (antani@us.ibm.com), ISSW, WebSphere Extended Deployment Technical Lead, IBM

Snehal Antani works for the IBM Software Services (ISSW) Advanced WebSphere group and is the technical lead for WebSphere Extended Deployment. His focus is primarily in the domain of infrastructure design for Service Oriented Architecture (SOA) with WebSphere branded products across all platforms (z/OS and Distributed). He comes from a development background where he helped develop and deliver WebSphere z/OS, WebSphere XD-Distributed, and WebSphere XD-z/OS. Snehal has disclosed several patents and technical publications in the domains of SOA, Enterprise Application Infrastructure, and Grid Computing. He earned a BS in Computer Science from Purdue University and will complete his MS Computer Science degree from Rensselear Polytechnic Institute (RPI) in Troy, NY. Visit IBM Software Services for more information.



18 July 2007

Introduction

As a batch execution environment, IBM WebSphere Extended Deployment Compute Grid provides container-managed qualities of service to Java batch and grid applications. These qualities of service are provided by way of leveraging the underlying application server upon which the component is deployed. Batch computing is an integral part of any enterprise IT infrastructure. Batch applications are responsible for business services like calculating credit ratings, applying interest calculations, and so on. As an execution framework, WebSphere Extended Deployment Compute Grid (hereafter referred to as Compute Grid) is responsible for executing the batch workloads with some enterprise qualities of service, including transactional integrity, security, and restart capabilities, to name a few. So why is this technology called Compute Grid and not Java Batch? This article answers that question by describing the Proxy Service pattern for the parallel execution of batch workloads across a grid of endpoints.


About the pattern

Batch workloads are traditionally data intensive and must perform within some set defined goals; for example, they must complete within some time window, execute with transactional integrity, allow for restart upon failure, and so on. The principles of grid computing, including the parallel execution of a job across a grid of endpoints, emerge as the next generation of enterprise applications.

Suppose a batch application must process 100,000,000 million records of data. If the data is such that each record can be processed independently of the others, then the sequence in which they are processed does not matter. Typically, a batch job will process records sequentially, starting with the first record and cycling through the remainder until the work is complete. The execution time of this workload will be a function of the number of records to process, and could be quite long. Enter parallelism. A job exhibiting these characteristics can be broken up into smaller chunks of work, and each chunk can be dispatched across multiple threads of execution within an application server and processed at the same time.

How does one leverage the principles of grid computing, the qualities of service provided by the underlying middleware runtime, and the container and execution management features of Compute Grid to parallel-process batch workloads?

The Proxy Service pattern for parallel execution of batch workloads, described in this article, illustrates a technique for partitioning a single batch job and executing the partitions in parallel across some grid of endpoints. Additionally, the pattern utilizes more advanced partitioning techniques, such as endpoint affinity aggressive caching, and others. The proxy service pattern can be used to solve the stated problem, given these assumptions:

  • The input data of the job can be broken down into discrete partitions.
  • Each partition can be executed independently of each other.
  • The output data of the job can be aggregated from discrete partitions of results.

The solution

Compute Grid is composed of two major components:

  • The Long Running Scheduler (LRS) dispatches and monitors the execution of a single job.
  • The Long Running Execution Environment (LREE) executes a single job while ensuring its required qualities of service (transactional integrity, restart capabilities, security, and so on) are achieved. LREEs are multithreaded server processes; a single batch job is dispatched to a single thread within the LREE, therefore a single LREE can execute many batch jobs concurrently.

The LRS exposes a set of APIs that provide the ability to programmatically submit a job, monitor the status of a job, and perform lifecycle operations on the job (stop, cancel, and restart). These APIs are an integral part of the Proxy Service pattern.

Figure 1. Proxy Service pattern with Compute Grid
Figure 1. Proxy Service pattern with Compute Grid

The Proxy Service pattern is composed of two components:

  • Proxy service

    The principle component in the Proxy Service pattern is the proxy service itself, which serves as a front end to the Compute Grid runtime and exposes logical business functions. Figure 1 depicts this service where the method ExecuteEndOfMonthProcessing() is exposed.

    The purpose of the partition application is to break a single logical request into several smaller partitions. Figure 2 describes the operations executed by the proxy service, which contains business logic that can create partitions. Each partition is subsequently submitted to the LRS as a discrete job (and managed thereafter by the LRS as a discrete and independent job). The proxy service monitors each job that it has submitted and, upon completion, notifies the requestor that the task has been completed.

    Figure 2. Operations executed by the proxy service
    Figure 2. Operations executed by the proxy service

    There are several Compute Grid features that are also involved in the proxy service pattern. One key feature is xJCL variable substitution, which is available as of Compute Grid Version 6.1. xJCL variable substitution enables xJCL templates to be defined and stored in a repository owned and managed by the LRS. Job submitters (in this case, the proxy service) no longer need to generate or directly manipulate xJCL; rather, the submitted job needs only know the job name to invoke and the parameters associated with that job. This greatly simplifies the proxy service implementation.

  • Partition Template

    This template is the xJCL template described as part of the variable substitution feature of Compute Grid. Figure 3 illustrates the end-to-end architecture of the Proxy Service pattern. The proxy service divides a single logical transaction into several discrete partitions. For each partition, the partition properties are submitted to the LRS, along with the name of the partition template. The LRS, using the variable substitution feature, creates fully qualified xJCL jobs (called work partitions) and substitutes each job to the grid of LREEs. Each work partition is independently executed per the xJCL partition template within the grid.

    Figure 3. End-to-end architecture of the Proxy Service pattern
    Figure 3. End-to-end architecture of the Proxy Service pattern

    The proxy service monitors the status of each job that has been submitted. Upon completion of all jobs, the service can notify the requestor of the completion (or failure) of the overall logical job. Figure 4 further describes the monitoring of jobs within this pattern.

    Figure 4. Monitoring of jobs within the Proxy Service pattern
    Figure 4. Monitoring of jobs within the Proxy Service pattern

Several system programming interfaces (SPI) are introduced in Version 6.1 of Compute Grid. These SPIs facilitate the development of plug-in modules that can influence the behavior of the framework, independent of the application itself. One SPI of particular interest is the dispatch confirmer. This SPI plug-in point resides within the LRS and enables the plug-in to override the endpoint-selection decision that has been made (typically based on some round-robin or load-balancing algorithm). The dispatch confirmer, in the context of the Proxy Service pattern, can provide endpoint affinity. Specifically, the plug-in would examine some partition meta-data associated with the work partition to be executed and, based on well-defined rules, route partitions to specific endpoints. Figure 5 depicts the underlying architecture for this extension.

Figure 5. Dispatch confirmer SPI architecture
Figure 5. Dispatch confirmer SPI architecture

End-point affinity can have tremendous effects on the performance of the application. For example, with this affinity there would be a higher cache-hit rate. Additionally, if the runtime has exclusive access to the data to be processed, data can be preloaded and cached aggressively where all data reads are executed in memory against the cache, while only the data writes are executed against the data source itself. If data exclusivity is not possible, the WebSphere Extended Deployment data grid technology could be used to manage data caching and invalidation. Figure 6 depicts this architecture.

Figure 6. Data caching manged by WebSphere Extended Deployment Data Grid
Figure 6. Data caching manged by WebSphere Extended Deployment Data Grid

Conclusion

With narrowing batch windows, batch job performance is becoming more crucial than ever before. Advanced parallelism, data and partition affinity, and other techniques described in this article provide an essential set of tools that can help streamline offline processing. The Proxy Service pattern can be used to execute batch workloads in parallel. A single WebSphere Extended Deployment Compute Grid job executes on a single thread within the endpoint. A single endpoint will have many threads, and within the grid there exists many endpoints. The Proxy Service pattern will create several partitions of jobs from the single job submitted; each job will be submitted to the LRS, and then each job is sub-dispatched to an available thread on an available endpoint. As each job completes, the proxy service is notified. Upon completion of all jobs, the requestor is notified of the overall success (or failure). If the data and workload exhibits certain characteristics conducive to aggressive data caching and endpoint affinity, further optimizations can be made to improve performance and throughput.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=240368
ArticleTitle=Executing batch programs in parallel with WebSphere Extended Deployment Compute Grid
publish-date=07182007