Skip to main content

Executing batch programs in parallel with WebSphere Extended Deployment Compute Grid

Snehal Antani (antani@us.ibm.com), ISSW, WebSphere Extended Deployment Technical Lead, IBM
Snehal Antani works for the IBM Software Services (ISSW) Advanced WebSphere group and is the technical lead for WebSphere Extended Deployment. His focus is primarily in the domain of infrastructure design for Service Oriented Architecture (SOA) with WebSphere branded products across all platforms (z/OS and Distributed). He comes from a development background where he helped develop and deliver WebSphere z/OS, WebSphere XD-Distributed, and WebSphere XD-z/OS. Snehal has disclosed several patents and technical publications in the domains of SOA, Enterprise Application Infrastructure, and Grid Computing. He earned a BS in Computer Science from Purdue University and will complete his MS Computer Science degree from Rensselear Polytechnic Institute (RPI) in Troy, NY. Visit IBM Software Services for more information.

Summary:  Batch computing is an integral part of any enterprise IT infrastructure. As an execution framework, IBM® WebSphere® Extended Deployment Compute Grid is responsible for executing the batch workloads with some enterprise qualities of service, including transactional integrity, security, and restart capabilities, to name a few. This article describes a Proxy Service pattern that uses advanced partitioning techniques for the parallel execution of batch workloads across a grid of endpoints. This content is part of the IBM WebSphere Developer Technical Journal.

Date:  18 Jul 2007
Level:  Intermediate
Activity:  1740 views
Comments:  

Introduction

As a batch execution environment, IBM WebSphere Extended Deployment Compute Grid provides container-managed qualities of service to Java batch and grid applications. These qualities of service are provided by way of leveraging the underlying application server upon which the component is deployed. Batch computing is an integral part of any enterprise IT infrastructure. Batch applications are responsible for business services like calculating credit ratings, applying interest calculations, and so on. As an execution framework, WebSphere Extended Deployment Compute Grid (hereafter referred to as Compute Grid) is responsible for executing the batch workloads with some enterprise qualities of service, including transactional integrity, security, and restart capabilities, to name a few. So why is this technology called Compute Grid and not Java Batch? This article answers that question by describing the Proxy Service pattern for the parallel execution of batch workloads across a grid of endpoints.


About the pattern

Batch workloads are traditionally data intensive and must perform within some set defined goals; for example, they must complete within some time window, execute with transactional integrity, allow for restart upon failure, and so on. The principles of grid computing, including the parallel execution of a job across a grid of endpoints, emerge as the next generation of enterprise applications.

Suppose a batch application must process 100,000,000 million records of data. If the data is such that each record can be processed independently of the others, then the sequence in which they are processed does not matter. Typically, a batch job will process records sequentially, starting with the first record and cycling through the remainder until the work is complete. The execution time of this workload will be a function of the number of records to process, and could be quite long. Enter parallelism. A job exhibiting these characteristics can be broken up into smaller chunks of work, and each chunk can be dispatched across multiple threads of execution within an application server and processed at the same time.

How does one leverage the principles of grid computing, the qualities of service provided by the underlying middleware runtime, and the container and execution management features of Compute Grid to parallel-process batch workloads?

The Proxy Service pattern for parallel execution of batch workloads, described in this article, illustrates a technique for partitioning a single batch job and executing the partitions in parallel across some grid of endpoints. Additionally, the pattern utilizes more advanced partitioning techniques, such as endpoint affinity aggressive caching, and others. The proxy service pattern can be used to solve the stated problem, given these assumptions:

  • The input data of the job can be broken down into discrete partitions.
  • Each partition can be executed independently of each other.
  • The output data of the job can be aggregated from discrete partitions of results.

The solution

Compute Grid is composed of two major components:

  • The Long Running Scheduler (LRS) dispatches and monitors the execution of a single job.
  • The Long Running Execution Environment (LREE) executes a single job while ensuring its required qualities of service (transactional integrity, restart capabilities, security, and so on) are achieved. LREEs are multithreaded server processes; a single batch job is dispatched to a single thread within the LREE, therefore a single LREE can execute many batch jobs concurrently.

The LRS exposes a set of APIs that provide the ability to programmatically submit a job, monitor the status of a job, and perform lifecycle operations on the job (stop, cancel, and restart). These APIs are an integral part of the Proxy Service pattern.


Figure 1. Proxy Service pattern with Compute Grid
Figure 1. Proxy Service pattern with Compute Grid

The Proxy Service pattern is composed of two components:

  • Proxy service

    The principle component in the Proxy Service pattern is the proxy service itself, which serves as a front end to the Compute Grid runtime and exposes logical business functions. Figure 1 depicts this service where the method ExecuteEndOfMonthProcessing() is exposed.

    The purpose of the partition application is to break a single logical request into several smaller partitions. Figure 2 describes the operations executed by the proxy service, which contains business logic that can create partitions. Each partition is subsequently submitted to the LRS as a discrete job (and managed thereafter by the LRS as a discrete and independent job). The proxy service monitors each job that it has submitted and, upon completion, notifies the requestor that the task has been completed.



    Figure 2. Operations executed by the proxy service
    Figure 2. Operations executed by the proxy service

    There are several Compute Grid features that are also involved in the proxy service pattern. One key feature is xJCL variable substitution, which is available as of Compute Grid Version 6.1. xJCL variable substitution enables xJCL templates to be defined and stored in a repository owned and managed by the LRS. Job submitters (in this case, the proxy service) no longer need to generate or directly manipulate xJCL; rather, the submitted job needs only know the job name to invoke and the parameters associated with that job. This greatly simplifies the proxy service implementation.

  • Partition Template

    This template is the xJCL template described as part of the variable substitution feature of Compute Grid. Figure 3 illustrates the end-to-end architecture of the Proxy Service pattern. The proxy service divides a single logical transaction into several discrete partitions. For each partition, the partition properties are submitted to the LRS, along with the name of the partition template. The LRS, using the variable substitution feature, creates fully qualified xJCL jobs (called work partitions) and substitutes each job to the grid of LREEs. Each work partition is independently executed per the xJCL partition template within the grid.



    Figure 3. End-to-end architecture of the Proxy Service pattern
    Figure 3. End-to-end architecture of the Proxy Service pattern

    The proxy service monitors the status of each job that has been submitted. Upon completion of all jobs, the service can notify the requestor of the completion (or failure) of the overall logical job. Figure 4 further describes the monitoring of jobs within this pattern.



    Figure 4. Monitoring of jobs within the Proxy Service pattern
    Figure 4. Monitoring of jobs within the Proxy Service pattern

Several system programming interfaces (SPI) are introduced in Version 6.1 of Compute Grid. These SPIs facilitate the development of plug-in modules that can influence the behavior of the framework, independent of the application itself. One SPI of particular interest is the dispatch confirmer. This SPI plug-in point resides within the LRS and enables the plug-in to override the endpoint-selection decision that has been made (typically based on some round-robin or load-balancing algorithm). The dispatch confirmer, in the context of the Proxy Service pattern, can provide endpoint affinity. Specifically, the plug-in would examine some partition meta-data associated with the work partition to be executed and, based on well-defined rules, route partitions to specific endpoints. Figure 5 depicts the underlying architecture for this extension.


Figure 5. Dispatch confirmer SPI architecture
Figure 5. Dispatch confirmer SPI architecture

End-point affinity can have tremendous effects on the performance of the application. For example, with this affinity there would be a higher cache-hit rate. Additionally, if the runtime has exclusive access to the data to be processed, data can be preloaded and cached aggressively where all data reads are executed in memory against the cache, while only the data writes are executed against the data source itself. If data exclusivity is not possible, the WebSphere Extended Deployment data grid technology could be used to manage data caching and invalidation. Figure 6 depicts this architecture.


Figure 6. Data caching manged by WebSphere Extended Deployment Data Grid
Figure 6. Data caching manged by WebSphere Extended Deployment Data Grid

Conclusion

With narrowing batch windows, batch job performance is becoming more crucial than ever before. Advanced parallelism, data and partition affinity, and other techniques described in this article provide an essential set of tools that can help streamline offline processing. The Proxy Service pattern can be used to execute batch workloads in parallel. A single WebSphere Extended Deployment Compute Grid job executes on a single thread within the endpoint. A single endpoint will have many threads, and within the grid there exists many endpoints. The Proxy Service pattern will create several partitions of jobs from the single job submitted; each job will be submitted to the LRS, and then each job is sub-dispatched to an available thread on an available endpoint. As each job completes, the proxy service is notified. Upon completion of all jobs, the requestor is notified of the overall success (or failure). If the data and workload exhibits certain characteristics conducive to aggressive data caching and endpoint affinity, further optimizations can be made to improve performance and throughput.


Resources

About the author

Snehal Antani works for the IBM Software Services (ISSW) Advanced WebSphere group and is the technical lead for WebSphere Extended Deployment. His focus is primarily in the domain of infrastructure design for Service Oriented Architecture (SOA) with WebSphere branded products across all platforms (z/OS and Distributed). He comes from a development background where he helped develop and deliver WebSphere z/OS, WebSphere XD-Distributed, and WebSphere XD-z/OS. Snehal has disclosed several patents and technical publications in the domains of SOA, Enterprise Application Infrastructure, and Grid Computing. He earned a BS in Computer Science from Purdue University and will complete his MS Computer Science degree from Rensselear Polytechnic Institute (RPI) in Troy, NY. Visit IBM Software Services for more information.

Comments



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=240368
ArticleTitle=Executing batch programs in parallel with WebSphere Extended Deployment Compute Grid
publish-date=07182007
author1-email=antani@us.ibm.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers