Skip to main content

skip to main content

developerWorks  >  Tivoli | Grid computing  >

Orchestrating grid workloads -- neither feast nor famine

Move servers in and out of grids with IBM Tivoli Intelligent Orchestrator

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Introductory

Frank De Gilio (degilio@us.ibm.com), Senior complex solutions architect, IBM , IBM Design Center for On Demand Business

21 Sep 2004

Grid resource managers manage workload from requesters to the available grid engines. What happens when there's more work than available engines can handle? Traditionally, this condition causes queuing and additional wait times for the user community. What would happen if the grid resource manager could appeal to some outside entity to add engine resources? What if there were multiple grids within an enterprise and this outside entity could determine which grid most needed resources? This article discusses how resources can be managed into and out of a grid environment using an example infrastructure.

Introduction

What's the biggest roadblock to acceptance of an on demand business environment? It isn't technology. The e-business world provides many examples of technology evolving quickly to support new needs. The biggest roadblock is politics. By itself, on demand business is an apolitical model: it looks at what's needed to ensure that all resources are used to the best benefit of the enterprise. But the enterprise is extremely political. Since these two models -- the apolitical on demand business environment and the political enterprise -- are diametrically opposed, we need technology that encourages enterprise "kingdoms" to share their resources. As grid computing moves from purely scientific and mathematical use to a more utility-based model, the technology to leverage the proper use of servers in this environment must be in place.

In this article, I'll draw examples from some work we did in the IBM Design Center for On Demand Business for a financial institution. The models used were based on grid workloads that followed trading examples, but they're representative of basic grid workload models we've seen for a number of different business clients.

For the financial institution, we used IBM Tivoli® Intelligent Orchestrator (TIO) software because it enables an organization to add and remove servers from a processing environment based on the needs of that environment. Traditionally, TIO has been deployed in Web-based environments to ensure the best use of servers throughout multiple tiers. This has been accomplished by analyzing the CPU use of the server and the rate of work to the server from the network. If TIO can be adapted to serve the grids as well, it would be a powerful tool for managing servers across multiple heterogeneous environments. Suddenly, servers become commodities to share across departments -- hoarding of departmental server resources can become a thing of the past. This article defines the methodology used to transform the TIO product in its traditional Web-based world into one of looking across multiple worlds.



Back to top


The business problem

Enterprises constantly struggle to find the best way to manage their hardware, software, and management resources. Often new applications drag with them a new set of servers. To ensure servers will handle expected demand, capacity planners frequently overestimate the load to ensure there is enough room for growth as the application usage rises. If the estimate is too low, performance suffers. If the estimate is too high, resources are wasted. Since the typical political climate discourages sharing of resources, the wasted resources are never used. The disturbing thing to CIO and IT organizations is that such wasted resources can never be brought to bear on resource-starved applications. Some users can be stuck with poorly performing applications while perfectly useful resources lie idle.



Back to top


The solution

Several technologies are coming together now to solve this problem. The advent of efficient Web services, the proliferation of J2EE underpinnings for those Web services, and the power of grid computing allow application components to be efficiently deployed within a heterogeneous environment. Applications have become less platform- and infrastructure-dependent and more focused on solving business problems. This paves the way for using grids in a utility model. In environments where multiple grids must contend for the same resources or must share resources with non-grid environments, we need something outside the grid to ensure proper deployment of servers. In our work with the financial institution, we used IBM Tivoli Intelligent Orchestrator (TIO) to fill this need. TIO also provides the ability to track deployments of server environments, which allows a person to keep track of how and for whom a server deployment is carried out. Simply put, TIO provides the framework for removing the political as well as technical barriers that might stand in the way of an enterprise becoming more of an on demand business.

In our work with the financial institution, we combined Tivoli Intelligent Orchestrator with DataSynapse GridServer. Let's look briefly at these products so you can get an idea of how they work together.

About DataSynapse GridServer

DataSynapse GridServer is an application environment for grid computing. It provides a component architecture for distributing compute-based workloads. GridServer delivers adaptive, non-deterministic load balancing, dynamic scheduling, and a job-task paradigm. GridServer has three basic components:

  1. The Director -- This component is the base contact point for the grid. It's the primary contact point for clients and knows all the brokers in the environment and what applications those brokers are managing. When a client contacts the director with work, it refers the client to the proper broker associated with the application.
  2. The Broker -- This component schedules work to engines in the grid. It takes requests from the client and provides job tasks to engines performing the work in this environment.
  3. The Engine -- Engines are the components that perform the required tasks.

When an engine daemon is started on a server, it contacts the broker to tell it that it is available. At initialization time it gets any updates that are available. Once registered with a broker, this daemon is ready to receive work. Engines normally have a home broker to which they are attached, but they can be moved from broker to broker based on the needs of the grid environment. Management of this environment is through either a Web interface, which provides you granular views or controls of the execution environment, as well as a Web services interface. Overall, DataSynapse GridServer is a very flexible, lightweight infrastructure for grid computing.

About Tivoli Intelligent Orchestrator

There are two primary components of TIO: the Provisioning Manager and the Orchestrator. These two constructs share a Data Center Model (DCM), which contains all relevant data about the server and network environment. Information about the type of network devices, network connections between them and servers, server details, and relevant software stacks provide a detailed picture of what the data center controls.

Tivoli Provisioning Manager (TPM)

TPM is a framework for creating rules that govern how servers will be configured. Through the TPM tool set, users create workflows that define the steps to set up a server. Workflows use a Java™ technology-like semantic that allows you to tie predefined Java plug-ins together to perform configuration tasks on servers and network appliances within TPM's domain. In addition to the provided Java plug-ins, users can create their own Java plug-ins to perform additional tasks in the workflows. In most cases this is unnecessary since TPM already has plug-ins for communicating with servers, cluster managers (like IBM Direct and Cluster Server Manager), and network appliances.

Since workflows can call other workflows, each workflow can be focused on a particular task. Efficient modularization in workflows can create a very well-structured and reusable deployment infrastructure. Listing 1 shows a simple workflow used for adding a server to a DataSynapse Grid.

Notice that TPM has a sophisticated flow structure that allows for workflows to catch errors in the deployment process and recover from them. While TPM is described here in the TIO context, it is also available separately.

Tivoli Orchestrator

The Orchestrator component uses TPM to deploy the resources in the DCM to ensure that the Service Level Agreements (SLAs) are met. The Orchestrator gets data from servers and network devices and, using a technology called an Objective Analyzer (OA), determines how the application of servers in this environment will affect the SLA for a particular workload. An SLA defines an agreement on operational characteristics between a service provider and a consumer. For example, an SLA could define a response time for a query. (All queries to database A will be returned in one second or less). Often SLAs are in place to define all major requirements a consumer has on a provider of service.

In addition to understanding the effect of servers on a particular workload, the Orchestrator must determine if the application of servers to a particular workload is advisable based on the needs of the entire datacenter. Thus, a particular workload might require additional resources to run faster; it still might not get the resources it wants if more important work requires them.

It's important to understand that the Orchestrator doesn't manage workloads within a server's context; rather, it ensures enough servers are available to meet the SLA assigned to the workload in general. Stated another way, the Orchestrator doesn't pick which server should be used to perform a task, but it ensures enough servers are available to service the scheduled work.



Back to top


TIO applied to grid

The OAs shipped with Orchestrator were based on Capacity on Demand models that were well suited to Web workloads but not to grid workloads. Capacity on Demand predicts utilization by tying the CPU data received from the server with arrival rates received from network appliances. Given the "transactional" request response nature of the Web, this model characterizes the workload effectively. The chaotic nature of grid workload tied with grid's tendency to completely consume all available server CPUs, makes the traditional Capacity on Demand model ineffective in the grid environment.

TIO's Objective Analyzers at work

Each OA builds a "probability of breach surface" which defines the probability of missing (breaching) an SLA. This probability is based on the current workload and the resources available to that workload. If the "probability of breach" is high enough, the Orchestrator asks the OA what effect the addition of servers will cause. It then determines the optimal server allocation to provide to a workload.

To do this accurately for grid, we looked at the information the DataSynapse GridServer could provide an OA to determine what workload requirements were needed. After some experimentation, DataSynapse provided the OA with queue depth, the length of time each unit of work was taking, as well as a measure of the current workload.

Applying OAs to the grid

We also determined that there's a need to accurately categorize the type of grid workload to be managed. For example, the financial institution we were working with had two basic workloads to be modeled: one was very bursty, the other very consistent. The bursty workload characterized an environment where a number of users would dump a set of calculations into the grid with unexpected think times between runs. The consistent workload was characterized by a consistent submission of calculations that could be predicted and managed uniformly. The bursty workload characterized an active user workload, and the consistent workload characterized a batch workload.

TIO provides the ability to stack OAs to build a complex set of rules defining when additional servers are required to ensure an SLA is not breached. This allows stacking of multiple OAs, each monitoring specific aspects of an environment. This way, no single OA becomes too complex and OAs can be reused in multiple environments with similar characteristics. For the grid, we created an OA responsible for delaying the release of servers from a grid. This ensures that servers would not be prematurely removed from the grid that was serving the bursty workload. Even though there was no work in the grid, this OA made sure that servers would not be released in case new work was imminent. This also precluded unnecessary undulation in server allocations to the grid.



Back to top


Grid orchestration -- moving servers in and out of grids

Now that grids are moving into the mainstream of the IT environment, their effective inclusion is dependent on the ability to determine how much resource should be applied against particular workloads. Since the discrete usage of servers in the grid environment can be tracked, owners of these resources can demonstrate their servers' contribution to the enterprise. Thus, the owner can recoup individual expenses or share expenses among the grid community. While traditional CPU scavenging is viable in grid environments, it tends to be more parasitic in nature. The more controlled usage of servers within an orchestrated environment benefits not only the resource users but the resource providers as well. Since the servers' owner benefits from someone else using the resource, he is more motivated to share normally unused available processing cycles.



Resources



About the author

Author photo

Frank De Gilio has been an IBM employee since 1985. He worked in MVS system development, tools and middleware development, and has worked on projects that tied MVS to workstations in client/server and Internet environments. In 1997, he joined the IBM S/390® new technology center, where his experience in UNIX®, MVS, and Microsoft® Windows® application/middleware development was key in showing customers how to use the latest OS/390® technology in Web-enabled environments. Currently in the new IBM Design Center for On Demand Business, he shows customers how the latest on demand technologies can be used to energize their infrastructures.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top