IBM WebSphere® Extended Deployment (XD) is composed of 3 components:
- Operations Optimization, an application virtualization technology with a goals-oriented run time
- Compute Grid, a Java™ batch and compute-intensive execution run time
- Data Grid, an advanced caching infrastructure for extreme transaction processing (XTP) applications
This article focuses on the use of the Operations Optimization component. You learn the key areas you need to consider when deploying a solution into a goals-oriented, production environment that involves shared resources.
Goals-oriented, virtualized run times can deliver significant reductions in a data center’s overall hardware footprint and middleware complexity. However, using these run times requires that you understand some new concepts if you want to effectively adopt the technology.
For example, traditionally, an application infrastructure team knows exactly where an application is running; this is not the case in a goals-oriented runtime environment. An on demand router (ODR) (the key component of XD Operations Optimization) provides a level of application virtualization in which the execution location for an application is loosely defined by an administrator, and specifically decided by autonomics based on service-level agreements, the load of the application servers, and other factors. To better understand the types of virtualization, see the section Understanding virtualization in hardware, software, and WebSphere XD.
Another difference requires that developers build applications to tolerate vertical scalability, which is the notion of starting multiple instances of the same application on a single operating system image. Developers who are not cognizant of this runtime behavior can make mistakes such as writing to a local file expecting to have exclusive access. With vertical scalability, this exclusive access is not given and, as a result, output files can be corrupted as multiple application instances are started.
The purpose of this article is to raise your awareness of these and other issues that you need to review when you adopt an XD Operations Optimization environment.
Applications deployed in a static cluster environment, such as one provided by WebSphere Network Deployment (WebSphere ND), are somewhat isolated from each other. Applications run on their own dedicated servers, and some basic routing mechanisms distribute the work to them. With this isolation, applications that perform poorly impact themselves primarily; there are typically no side-effects on unrelated applications within the infrastructure. Figure 1 depicts a shared resource run time.
Figure 1. Multiple servers and middleware processes compete for common system resources within a shared-resource environment
In a shared resource environment, this isolation is eliminated. Applications run on virtualized hardware and, at the lowest hardware levels, compete for the same resources such as CPU, memory, and so on. Applications must therefore be prioritized such that when load occurs, shared resources are allocated to applications of higher priority. This model places more responsibility on the application, and a poorly performing application of higher priority can negatively affect all applications of lower priority.
Goals-oriented run times, such as those supported by WebSphere XD Operations Optimization, will do their best to fulfill the defined service policies for their hosted applications. For example, multiple instances of the application might be started across the cluster, or additional CPU cycles might be dynamically provisioned (if the server is running on virtualized hardware such as System z).
These run times will attempt to achieve the defined goals in the order of relative application priority; higher priority applications that are not meeting their goals will receive additional server resources over lower priority applications. It is, therefore, important to ensure the accurate definition of service policies and relative application priorities; applications of lower priority will be negatively effected when higher priority applications are not meeting their goals. The system will take resources away from the lower priority workloads and allocate them to the higher priority ones.
You must set service policies (for example, "90% of requests should complete within 2 seconds") to realistic goals. Although WebSphere XD Operations Optimization will try its best to achieve the stated goals, it cannot improve the inherent performance of the application. For example, if the average response time of an application is 2 seconds, do not set the goal to 1 second. Furthermore, try to ensure that service policies match the specified business requirements (if they are defined). Business requirements needed to define execution metadata often do not exist; therefore, you might need to extract the values from application usage data, which you can collect with some monitoring infrastructure. (This topic is beyond the scope of this article.)
Applications, especially those with higher priority, must scale. A dynamic environment, such as WebSphere XD or WebSphere on z/OS, will start multiple instances of the application to try to meet specified workload goals. If the application does not scale well, resources will be inefficiently provisioned as these additional application instances are created. If a bottleneck exists in the application (for example, all transactions exclusively access a single row in a table), these additional application server instances might not help the run time meet the stated service goals. Instead, the additional server instances waste system resources which in turn deprive other applications on the system of those resources.
You also need to ensure that those applications which consume resources are properly charged for that consumption. In a static middleware environment, hardware resources are dedicated to some particular set of applications; therefore, the cost of managing those hardware resources is concrete and easily quantified. In a virtualized environment, in which hardware resources are shared across many applications, quantifying the costs for those resources is more complex. You must determine how to calculate resource usage for each deployed application, and then charge the application owners for that usage. WebSphere XD, WebSphere on z/OS, and other advanced goals-oriented run time environments provide chargeback functions for quantifying the costs of applications.
Finally, a shared-resource middleware infrastructure must be resilient, where resiliency is defined as “the continued availability and performance of a service despite negative changes in its environment”. The failure and recovery of that infrastructure can negatively impact all other co-located applications. For example, resources, CPU cycles and memory, are required to recover failed applications and servers; these resources are taken from workloads of lower priority. Using WebSphere XD Operations Optimization, you can improve the resiliency of the middleware infrastructure by monitoring its health and by using the application versioning facilities.
For example, you can alleviate known health issues among applications, such as memory leaks or runaway CPU usage, by defining health monitoring policies and actions. If there is a memory leak, you can define a health policy that will restart the server if the servers JVM heap size exceeds some threshold. For more information on the topic of SOA resiliency, see the Build a resilient SOA infrastructure article series.
Before deploying your applications to a goals-oriented, shared resource run time, consider each of the areas summarized below. Each item is described in more detail following this checklist.
- Configure the initial and default WebSphere XD Infrastructure.
- Certification that the overall infrastructure adheres to the non-functional business requirements (high-availability, scalability, security, and so on).
- Performance and linear scalability tests for critical applications have been executed.
- All monitoring data can be collected and the monitoring infrastructure is in place.
- Service level policies have been defined for all applications.
- Chargeback infrastructure has been defined and implemented.
- Deployment processes have been defined.
- Health policy procedures have been defined.
- Application logging infrastructure is enabled.
- The migration strategy for the server infrastructure and the business applications has been defined.
Perform the necessary steps, described in the IBM Redbook Optimizing Operations with WebSphere Extended Deployment V6.1, to install and configure a basic WebSphere XD infrastructure. The essential steps are:
- Install the On-Demand Routers.
- Create the needed nodegroups.
- Create dynamic clusters.
- Install applications to the dynamic clusters.
- Create initial service policies (if needed/known).
- Define initial transaction classes (if needed/known).
- Define initial work classes (if needed/known).
Certify that the overall infrastructure provides the qualities-of-service specified by their business requirements, including the following elements:
- High availability of:
- The end-to-end architecture.
- Each application. Ensure the application can run on any member in the node group, keeping in mind external dependencies, and so on.
- Components within each of these tiers: web, application, and data.
- Every application or at least for the major ones.
- The On-Demand Router (ODR).
- The dynamic clusters and the node group(s).
- WebSphere components, such as ODR’s, Deployment Manager, Node Agents
- Security components, including authentication mechanisms such as LDAP.
- Singleton application components.
- Dependent subsystems such as DB2 and MQ.
Performance and linear scalability tests for highest priority applications are critical. As you move applications to a shared runtime environment, poorly performing applications will adversely effect applications of lower priority.
- Test performance and scalability of critical applications.
- Test performance and scalability of the fully-configured On-Demand Router.
- Understand impacts of horizontal and vertical scaling on connection pooling. For example, multiple applications clustered in the same DC could over-provision connections based on pool configurations.
You need to collect application usage data (such as ITCAM) for every application to be deployed into the XD run time in order to be sure that service policies are accurately set. This collection must be easily repeatable and adjustable to handle newly deployed applications.
Ensure that service policy data has been defined and configured within the runtime environment. Service policies are especially important because they will impact the runtime behavior of the overall middleware infrastructure when the system is under load.
- Verify the accuracy of the service policies for critical applications, at a minimum.
- Verify the process of adjusting and/or defining new policies is in place.
For more information, see Chapter 1. Service level optimization in IBM Redbook: Best Practices for Implementing WebSphere Extended Deployment.
You need to define the chargeback infrastructure for all applications. You might be using department owned, dedicated hardware for each application. When moving to a shared workload environment, you use chargeback policies for usage of applications to bill the appropriate owning department.
- Define fine-grained XD Transaction and Work Classes for each application to be deployed. Also, define some proper naming convention for these transaction classes.
- Define the mapping of transaction classes to owning departments for all applications.
- Determine if you need to develop tooling to map XD chargeback data to the owning departments.
- Define business policies for applying chargeback.
For more information, see Chapter 2. Application hosting and chargeback in IBM Redbook: Best Practices for Implementing WebSphere Extended Deployment.
When you deploy new applications to the XD run time, you need to determine the following (at a minimum):
- Define new service policies, and adjust existing application priorities.
- Define new chargeback policies, or include new application to existing chargeback policies.
- Ensure continuous availability of the application during rollout, this can be achieved with the Application Editions feature of XD Operations Optimization
- Create a strategy for rolling back to a working version of the application, in case of failure (validation of the application compared to rolling out previous edition). The Application Editions feature of WebSphere XD Operations Optimization can also help achieve this objective.
The non-functional business requirements will require some level of infrastructure resiliency, which can be achieved with health policies and monitoring delivered by XD Operations Optimization. Health policies should be defined as well as their corresponding actions (such as how the system should react to health alerts). At the very minimum, consider the health of:
- Critical applications.
- Critical components that can impact critical applications.
- The overall middleware infrastructure.
Make sure your logging infrastructure can tolerate vertical scalability, horizontal scalability, and co-location of multiple applications in the same JVM.
- Application logging must tolerate multiple instances using the same logging file system (vertical scalability).
- Logging must tolerate instances across cluster logging.
- Multiple applications running within the same JVM must log carefully (not arbitrarily printing to system-out, for example) such that the output logs can be easily used for debugging.
Address these areas in your migration strategy:
- Applications. Decide how to manage the continuous availability of the application during production. With the application editions feature, determine:
- Is a group rollout sufficient? Group rollout ensures that application requests are served by some part of the cluster while specific cluster members are in the process of changing application editions.
- Is an atomic rollout required? Atomic rollout forces all cluster members to upgrade to the next version of the application together. This action is essential when some backwards incompatible change ( for example, new database schemas) is introduced.
To improve the stability of the SOA infrastructure, implement aggressive timers to govern the service invocations. This solution is especially appropriate in the area of business applications that perform service invocations over blocking protocols such as RMI/IIOP. In situations where such business applications and the services they invoke cannot be co-located in the same system, you can implement service invocation timers to alleviate the problems that occur when the invoked services are non-responsive (due to an unreliable network, for example). Please see Build a resilient SOA infrastructure, Part 1: Why blocking application server threads can lead to a brittle SOA.
The author wishes to thanks Nitin Gaur, IBM Americas Techworks, for this description of hardware, software, and WebSphere XD virtualization.
This scenario would include technologies such as Solaris Zones and Logical Partitioning (LPARs) on AIX5L. The platforms address the needs of a growing data center and bring forward the efficient use of consolidated and powerful hardware resources. The core principle offered by these technologies is resizing based on capacity. While these technologies have largely addressed the growing needs of a data center by efficiently managing power and space (and hence the costs), at the core of the value proposition lies the appeal of business continuity. The hardware system resources (such as a RAM, CPU, Network IO and so on) can be easily added or removed from a logically defined partition, based on capacity, without any interruptions such as system restarts (reboot due to resource constraints). Hence, by providing this level of virtualization at the hardware levels, the higher level applications (any application), unaware of lower level dynamic resource allocation, can operate and use resources provided by the operating system.
This is enabled by products such as VMware, which virtualizes the run time, essentially allows for portability and ease of management (with factory-like deployments). So while a "virtual run time"does not provide any resource pooling or optimization features, it can be very instrumental in scenarios that require fast turnaround times, such as a product demos, lab environments, and so on. This "rip-and-replace" appeal of VMware products has also found a place in large enterprise deployments, essentially to keep support costs in check. By virtualizing a run time, the virtualized images can run anywhere in the infrastructure on any hardware and software platform, because the virtual run time is defined and is usually consistent across the enterprise.
Note: With the release of technologies such as VMware ESXserver, we have seen developments and offerings by VMware that are creeping into areas of virtualizing the hardware resource pools (as discussed above in Hardware virtualization).
WebSphere XD, on the other hand, plays a role only in virtualizing the application runtime environment. So, in theory, it acts and plays above the hardware and VMware virtualization levels, and allows the user to have a policy-driven approach to "sense and respond" type of management capability. The key factor that separates WebSphere XD from other types of virtualization is resource allocation based on goals-oriented and business policy-driven workloads.
WebSphere XD enables an infrastructure that accommodates traditional J2EE/JEE type transactional requests and non-traditional Java batch/long running applications. The virtualized resource pool enabled by WebSphere XD is primarily used to drive business activity and to prioritize resource allocation to activities of higher economic value to the business. This model differs from the above mentioned scenarios. Scenario 1 (hardware virtualization) is blind to the types of applications running. In scenario 2, only the run time is virtualized to maximize hardware investments and to curb support costs.
Note: So, while there may be overlapping use of the term virtualization, the value that is added differs significantly. In an environment which has a virtualized hardware platform, it is safe to say WebSphere XD will play well, because it will only have visibility to resources available to it (provided by a defined allocation policy).
The adoption of a virtualized, goals-oriented run time such as WebSphere XD's Operations Optimization requires careful planning and preparation. The topics described in this article provide a starting point to help to ensure success.
Blog: News and Thoughts from the WebSphere Extended Deployment Senior Architecture Team
Building resilient SOA infrastructures article series on developerWorks
IBM Redbook: Optimizing Operations with WebSphere Extended Deployment 6.1
IBM Redbook: Best Practices for Implementing WebSphere Extended Deployment. See Chapter 1. Service level optimization and Chapter 2. Application hosting and chargeback.
The role of the ODR, from the WebSphere Extended Deployment InfoCenter
WebSphere Extended Deployment, developerWorks resource page
WebSphere Extended Deployment product documentation
Snehal Antani works for the SOA Technology Practice within IBM Software Services for WebSphere (ISSW) and is the technical lead for IBM WebSphere Extended Deployment. He comes from a development background, working on several products including WebSphere z/OS, WebSphere XD-Distributed, and WebSphere XD-z/OS, and has helped bring to production some of IBM’s largest WebSphere Distributed and z/OS customers around the world. He has disclosed several patents and technical publications in the domains of enterprise application infrastructure and grid computing. He earned a BS in computer science from Purdue University and will complete his MS in computer science from Rensselaer Polytechnic Institute (RPI) in Troy, NY with a thesis in the area of quantifying and improving the resiliency of a middleware infrastructure.