 | Level: Intermediate Snehal Antani (antani@us.ibm.com), ISSW, WebSphere Extended Deployment Technical Lead, IBM
18 Jul 2007
Batch computing is an integral part of any enterprise IT infrastructure. As an execution framework, IBM® WebSphere® Extended Deployment Compute Grid is responsible for executing the batch workloads with some enterprise qualities of service, including transactional integrity, security, and restart capabilities, to name a few. This article describes a Proxy Service pattern that uses advanced partitioning techniques for the parallel execution of batch workloads across a grid of endpoints.
From IBM WebSphere Developer Technical Journal.
Introduction
As a batch execution environment, IBM WebSphere Extended Deployment Compute Grid provides container-managed qualities of service to Java batch and grid applications. These qualities of service are provided by way of leveraging the underlying application server upon which the component is deployed. Batch computing is an integral part of any enterprise IT infrastructure. Batch applications are responsible for business services like calculating credit ratings, applying interest calculations, and so on. As an execution framework, WebSphere Extended Deployment Compute Grid (hereafter referred to as Compute Grid) is responsible for executing the batch workloads with some enterprise qualities of service, including transactional integrity, security, and restart capabilities, to name a few. So why is this technology called Compute Grid and not Java Batch? This article answers that question by describing the Proxy Service pattern for the parallel execution of batch workloads across a grid of endpoints.
 |
About the pattern
Batch workloads are traditionally data intensive and must perform within some set defined goals; for example, they must complete within some time window, execute with transactional integrity, allow for restart upon failure, and so on. The principles of grid computing, including the parallel execution of a job across a grid of endpoints, emerge as the next generation of enterprise applications.
Suppose a batch application must process 100,000,000 million records of data. If the data is such that each record can be processed independently of the others, then the sequence in which they are processed does not matter. Typically, a batch job will process records sequentially, starting with the first record and cycling through the remainder until the work is complete. The execution time of this workload will be a function of the number of records to process, and could be quite long. Enter parallelism. A job exhibiting these characteristics can be broken up into smaller chunks of work, and each chunk can be dispatched across multiple threads of execution within an application server and processed at the same time.
How does one leverage the principles of grid computing, the qualities of service provided by the underlying middleware runtime, and the container and execution management features of Compute Grid to parallel-process batch workloads?
The Proxy Service pattern for parallel execution of batch workloads, described in this article, illustrates a technique for partitioning a single batch job and executing the partitions in parallel across some grid of endpoints. Additionally, the pattern utilizes more advanced partitioning techniques, such as endpoint affinity aggressive caching, and others. The proxy service pattern can be used to solve the stated problem, given these assumptions:
- The input data of the job can be broken down into discrete partitions.
- Each partition can be executed independently of each other.
- The output data of the job can be aggregated from discrete partitions of results.
The solution
Compute Grid is composed of two major components:
- The Long Running Scheduler (LRS) dispatches and monitors the execution of a single job.
- The Long Running Execution Environment (LREE) executes a single job while ensuring its required qualities of service (transactional integrity, restart capabilities, security, and so on) are achieved. LREEs are multithreaded server processes; a single batch job is dispatched to a single thread within the LREE, therefore a single LREE can execute many batch jobs concurrently.
The LRS exposes a set of APIs that provide the ability to programmatically submit a job, monitor the status of a job, and perform lifecycle operations on the job (stop, cancel, and restart). These APIs are an integral part of the Proxy Service pattern.
Figure 1. Proxy Service pattern with Compute Grid
The Proxy Service pattern is composed of two components:
-
Proxy service
The principle component in the Proxy Service pattern is the proxy service itself, which serves as a front end to the Compute Grid runtime and exposes logical business functions. Figure 1 depicts this service where the method ExecuteEndOfMonthProcessing() is exposed.
The purpose of the partition application is to break a single logical request into several smaller partitions. Figure 2 describes the operations executed by the proxy service, which contains business logic that can create partitions. Each partition is subsequently submitted to the LRS as a discrete job (and managed thereafter by the LRS as a discrete and independent job). The proxy service monitors each job that it has submitted and, upon completion, notifies the requestor that the task has been completed.
Figure 2. Operations executed by the proxy service
There are several Compute Grid features that are also involved in the proxy service pattern. One key feature is xJCL variable substitution, which is available as of Compute Grid Version 6.1. xJCL variable substitution enables xJCL templates to be defined and stored in a repository owned and managed by the LRS. Job submitters (in this case, the proxy service) no longer need to generate or directly manipulate xJCL; rather, the submitted job needs only know the job name to invoke and the parameters associated with that job. This greatly simplifies the proxy service implementation.
-
Partition Template
This template is the xJCL template described as part of the variable substitution feature of Compute Grid. Figure 3 illustrates the end-to-end architecture of the Proxy Service pattern. The proxy service divides a single logical transaction into several discrete partitions. For each partition, the partition properties are submitted to the LRS, along with the name of the partition template. The LRS, using the variable substitution feature, creates fully qualified xJCL jobs (called work partitions) and substitutes each job to the grid of LREEs. Each work partition is independently executed per the xJCL partition template within the grid.
Figure 3. End-to-end architecture of the Proxy Service pattern
The proxy service monitors the status of each job that has been submitted. Upon completion of all jobs, the service can notify the requestor of the completion (or failure) of the overall logical job. Figure 4 further describes the monitoring of jobs within this pattern.
Figure 4. Monitoring of jobs within the Proxy Service pattern
Several system programming interfaces (SPI) are introduced in Version 6.1 of Compute Grid. These SPIs facilitate the development of plug-in modules that can influence the behavior of the framework, independent of the application itself. One SPI of particular interest is the dispatch confirmer. This SPI plug-in point resides within the LRS and enables the plug-in to override the endpoint-selection decision that has been made (typically based on some round-robin or load-balancing algorithm). The dispatch confirmer, in the context of the Proxy Service pattern, can provide endpoint affinity. Specifically, the plug-in would examine some partition meta-data associated with the work partition to be executed and, based on well-defined rules, route partitions to specific endpoints. Figure 5 depicts the underlying architecture for this extension.
Figure 5. Dispatch confirmer SPI architecture
End-point affinity can have tremendous effects on the performance of the application. For example, with this affinity there would be a higher cache-hit rate. Additionally, if the runtime has exclusive access to the data to be processed, data can be preloaded and cached aggressively where all data reads are executed in memory against the cache, while only the data writes are executed against the data source itself. If data exclusivity is not possible, the WebSphere Extended Deployment data grid technology could be used to manage data caching and invalidation. Figure 6 depicts this architecture.
Figure 6. Data caching manged by WebSphere Extended Deployment Data Grid
Conclusion
With narrowing batch windows, batch job performance is becoming more crucial than ever before. Advanced parallelism, data and partition affinity, and other techniques described in this article provide an essential set of tools that can help streamline offline processing. The Proxy Service pattern can be used to execute batch workloads in parallel. A single WebSphere Extended Deployment Compute Grid job executes on a single thread within the endpoint. A single endpoint will have many threads, and within the grid there exists many endpoints. The Proxy Service pattern will create several partitions of jobs from the single job submitted; each job will be submitted to the LRS, and then each job is sub-dispatched to an available thread on an available endpoint. As each job completes, the proxy service is notified. Upon completion of all jobs, the requestor is notified of the overall success (or failure). If the data and workload exhibits certain characteristics conducive to aggressive data caching and endpoint affinity, further optimizations can be made to improve performance and throughput.
Resources
About the author  | |  |
Snehal Antani works for the IBM Software Services (ISSW) Advanced WebSphere group and is the technical lead for WebSphere Extended Deployment. His focus is primarily in the domain of infrastructure design for Service Oriented Architecture (SOA) with WebSphere branded products across all platforms (z/OS and Distributed). He comes from a development background where he helped develop and deliver WebSphere z/OS, WebSphere XD-Distributed, and WebSphere XD-z/OS. Snehal has disclosed several patents and technical publications in the domains of SOA, Enterprise Application Infrastructure, and Grid Computing. He earned a BS in Computer Science from Purdue University and will complete his MS Computer Science degree from Rensselear Polytechnic Institute (RPI) in Troy, NY. Visit IBM Software Services for more information. |
Rate this page
|  |