Enterprise hosting centers face growing challenges in meeting Service Level Agreements (SLAs), while carefully managing infrastructure costs. This series explores the need for and implementation of autonomic computing methodologies. You'll learn how you can optimize your resource usage, and better control costs, by following this team's success. The IBM intranet portal team chose IBM's WebSphere Extended Deployment to upgrade IBM's internal enterprise applications infrastructure. This series discusses the architecture, deployment model, and lessons learned from the deployment of WebSphere Extended Deployment and application resiliency design patterns.
If you're a CIO, IT Architect, or IT Infrastructure Manager looking to deliver better resiliency, reduce costs, and increase application availability, then this article is for you. By introducing autonomic computing into your environment and using the features of WebSphere Extended Deployment, you can optimize your resource usage and better control infrastructure costs. You can also improve your ability to meet SLAs.
This article explains the benefits of IBM's recent deployment of WebSphere Extended Deployment into its intranet production environment. It outlines the business problems, and solutions, that guided the team's selection of products. To better understand this journey into autonomic computing technology, we'll first give an overview of the environment and business requirements behind the project.
IBM employees use the corporate intranet portal, or set of applications, to work, collaborate, and learn in an On Demand Workplace environment. (See the On Demand Workplace series for more information.)
The intranet portal is based on dynamic infrastructure, and provides profiled delivery of content and access to new tools and applications to help employees manage their workflow, collaborate, and gain knowledge. The intranet portal is a framework where many business-critical applications are aggregated using WebSphere Portal, WebSphere Extended Deployment, Tivoli® products, DB2®, and more.
The IBM intranet infrastructure handles high volumes of traffic, averaging 30 million requests a day, while maintaining a sub-second transaction response time for many applications. Enterprise applications within the On Demand Workplace share common services and hosting, with special dedicated systems for high volume and critical applications. Some applications, for example, had a dedicated infrastructure to support the peak load that the application receives between 10 a.m. to 2 p.m. every day.
Special events, such as executive webcasts or collaborative online jams, drove new capacity requirements. A side effect of deploying applications on dedicated hardware is an unacceptable amount of white space with low utilization on some systems. And, due to sharing of key services such as common front-end proxy infrastructure, back-end DB2 resources, and Lightweight Directory Access Protocol (LDAP) directories lookups, problems in any particular service layer were affecting the overall availability of most applications.
A comprehensive strategy was needed to solve the following business problems.
- Capacity sizing
- Over-provisioning of infrastructure, indeterminate sizing methods, peak traffic during special events, costly capacity sizing processes
- Low resource utilization, excess white space in CPU and memory
- System administration
- Manual and error-prone administrative methods for creating additional resources, slow response in procuring additional capacity and application deployment, end- to-end monitoring with SLAs not being met
- Cascading failures with no application isolation or control, availability expectations challenged
- Cost concerns
- Hosting costs, maintenance, and support
To solve these problems, the team explored using autonomic computing principles and IBM's new breed of products such as Virtualization engine (VE), WebSphere Extended Deployment, Enterprise Workload Manager (eWLM), and Tivoli Intelligent Orchestrator. With the goals of shared infrastructure and dedicated resources at the portfolio level, the team examined the pros and cons of deploying all intranet-related applications into a virtualized environment, and sharing resources across all applications. The latest release of WebSphere Extended Deployment would solve many of the business problems.
WebSphere Extended Deployment Version 6.0 enables a Business Grid, a dynamic, goals-directed, high-performance application environment for running mixed application types and workload patterns in WebSphere applications. This technology extends the capabilities of the WebSphere platform, helping you deal with IT scalability and performance challenges.
When deploying WebSphere Extended Deployment, our goal was to solve the problems, using the associated solutions, outlined in Table 1.
Table 1. Problems and solutions
|Lack of application resiliency||Allow intranet applications to recover from unexpected problems, and provide high availability with WebSphere Extended Deployment On Demand Router (ODR) layer.|
|Inefficiency||Server utilization increases by virtualization of computing resources into pools, which are shared among applications and portfolios based on application criticality and usage on demand. Achieved by combining separate WebSphere clusters into a single huge cluster, and assignment of proper service policies.|
|High cost||Server consolidation and reduced system administration will provide cost savings, but won't compromise the capacity of any intranet portfolio applications. Consolidate all hosted WebSphere clusters across business organizations into a single entity, driving down 25-35% overall hardware costs.|
|Poor autonomic computing||Provisioning and deprovisioning WebSphere Application Server servers for an application based on the resource usage and peak usage. Lets other critical applications use the virtual resources that WebSphere Extended Deployment creates.|
|Poor monitoring||Monitor all portfolio applications in a single dashboard approach, making it easier to see the big picture and pinpoint problems quickly in an environment with numerous applications, leading to higher availability.|
In the past, the IBM intranet WebSphere infrastructure included multiple dedicated WebSphere clusters. The rationale for dedicated clusters was: criticality of the application, a need to isolate applications from one another, and the load of some of the applications. Dedicated clusters made the cost of the entire portfolio infrastructure very high because of extra maintenance and required hardware. The actual capacity is usually more than what the portfolio needs, and there is considerable white space on all the dedicated servers that existed to support peak traffic requirements. With WebSphere Extended Deployment, the team was assured of resource availability for critical applications on an all-shared hosting environment, and reduction in total hardware resources.
Typical WebSphere Extended Deployment architecture includes the ODR component in front of the back-end application servers (supports both WebSphere and non-WebSphere application servers). ODR weighs each type of URI and calculates how many resources it typically needs for processing. It also decides if the application is consuming more than it needs to for a given resource, and thereby detects application failures needing corrective action.
Figure 1 shows our WebSphere Extended Deployment architecture.
Figure 1. WebSphere Extended Deployment Architecture diagram
A typical flow through the architecture in Figure 1:
- Application clients such as Web browsers make calls to URLs.
- WebSphere Caching Proxy server either serves the response from its local cache for static content, or forwards it to the Web servers or ODRs.
- WebSphere Caching Proxy rules decide whether the request goes to the IBM HTTP Server cluster, or ODR cluster, depending on whether the URL is a static or dynamic request.
- If the request is a dynamic servlet request, ODR begins a proxy server then decides which back-end node's application server should handle the request. The decision is based on the data it captures regularly to determine the resource utilization on back-end nodes, and how much capacity (such as CPU cycles) this URL or transaction might require for processing.
- WebSphere Extended Deployment's administrative console's run-time reports and charts show a lot of information on the run-time status of all back-end nodes and performance of ODRs. The performance information at the transaction or URI level would help with problem determination. For example, if a URI that involves LDAP calls slowed down considerably, showing the response times on the admin console's charts would tell developers or administrators there is probably an LDAP issue at that moment.
Making sure that critical applications had the required resources, when needed, was a high-priority goal for our team. A predictive feature of WebSphere Extended Deployment is the use of service policies. This section briefly describes WebSphere Extended Deployment service policies, and how the policies helped meet the goal.
As mentioned in Business problems, detecting dedicated infrastructure for critical applications, and servers that are greatly under utilized during non-peak hours, are great benefits of WebSphere Extended Deployment. WebSphere Extended Deployment can govern all the critical applications by providing them the required resources when they're needed. This becomes very important when there is resource contention between heavily used, less-critical applications versus very critical applications. When machines are overused, critical applications are serviced, while applications given lesser precedence receive fewer resources.
Service policies are a factor in controlling application placement. Placement lets administrators prioritize the work and define the business or performance goals of applications. Just like other autonomic computing designs, WebSphere Extended Deployment includes policy definitions. Architects, working with business teams, define the application's business goals within the service policies by assigning the importance of each application and creating a business goal. Service policies work with the ODR to meet the application's business goals. The information is fed into WebSphere Extended Deployment, which uses the information to protect critical application resources on shared environments, allowing the infrastructure to still achieve the desired SLAs.
Figure 2 shows different service policies that are created for the intranet portal, and the importance levels. During the application boarding plan, each application owner or architect needs to answer questions to help the CIO Technology Team decide which service level is right for a given application. The questions include: business criticality of the application, revenue generated by the application, and technical information such as a list of transactions, expected average response times of each transaction, and so on.
Figure 2. Applications and service level mapping
Figure 3 shows the types of service policies and the different range of response times for the transactions that fall under a given service level. For example, sample application A, identified as Platinum, could have three different transactions, each falling under three different buckets of response time expectations. Transaction 1 URIs are expected to give an average response time of 500 msec, transaction 2 under the 1500 msec bucket, and transaction 3 under 3000 msec average response times.
Figure 3. Service levels and response time bucket mapping
This section briefly introduces WebSphere Extended Deployment health policies. WebSphere Extended Deployment provides a health management and monitoring system. While the WebSphere Extended Deployment environment saves infrastructure costs by increasing server resource utilization, it also encourages shared infrastructure for applications. It's common for some applications to sometimes have odd behavior and suddenly overuse resources, such as CPU or memory. Oftentimes, under certain conditions some applications will show memory leaks. WebSphere Extended Deployment's health management protects applications and infrastructure from such common scenarios using health policies.
A health policy configuration defines preventative and detection-based policies to ensure the vitality of your server environment. A system restart is used to flush out the environment. WebSphere Extended Deployment provides four different types of health conditions that can be monitored:
- Excessive memory
- Excessive response time condition
For example, WebSphere Extended Deployment, depending on the reaction mode configured, can either simply monitor, or send event notification e-mail to the developer or administrators. When in "supervise" mode, WebSphere Extended Deployment creates corrective actions that need to be approved by the administrator. When in "automatic" mode, WebSphere Extended Deployment can automatically take actions, such as restart the server or take thread dumps. For IBM intranet applications, the excessive memory usage health policy is configured.
This first part in our series introduced how the IBM intranet portal team strived to achieve autonomic computing in an on demand environment. This article explained the business problems and solutions outlined by the team, and their rationale for choosing WebSphere Extended Deployment to solve specific problems within the intranet framework. We also introduced the high level architecture, and some of the cost-saving features, of WebSphere Extended Deployment.
The next installment in our series will go into details about WebSphere deployment tasks, and will provide lessons learned and best practices. A subsequent article will discuss some of the autonomic application resiliency design patterns for application developers. Stay tuned!
The authors would like to thank Brian K. Martin, Anthony R. Tuel, Wolfgang Segmuller, Priyanka Jain, and Keith Smith from the IBM WebSphere Software Group for their help in deploying WebSphere Extended Deployment in the IBM intranet infrastructure.
Extended Deployment: Get an overview, features,
benefits, requirements, documentation, and more.
WebSphere Extended Deployment"
(developerWorks, December 2004): Get a high-level
overview of the autonomic features of WebSphere Extended
Deployment, which manages a high volume of system
transactions and simplifies the complexity of deploying
Orchestration and Optimization: Extend the
capabilities of WebSphere Network Deployment.
software: Automatically adjust computing resources
with WebSphere Extended Deployment.
blogs: Get involved in the developerWorks community.
Bartlett's blog: Hear what this IBM VP of
Autonomic Computing has to say about the state of
computing: An insider's perspective: Join IBM
Autonomic Computing architect Peter Brittenham in this
discussion of all things autonomic.
Mahi R. Inampudi is the lead IT architect for IBM's On Demand Workplace expertise location system (BluePages). Other responsibilities include the architecture and solution design for several of IBM's internal offerings and collaborating with the CIO office and IBM Research helping design applications using the latest SOA methods. Recent interests include leveraging newer technologies, such as WebSphere Extended Deployment, the Rational product suite, and IBM's intraGrid architecture.
Murali Narasimhadevara is a Senior IT architect with the IBM CIO office. Murali is also the Senior Webmaster for the IBM intranet, and has been helping develop it for the past eight years. He has extensive experience in building and managing high volume Web sites, application and Web server administration with a focus in WebSphere, performance/capacity planning, and enterprise application design. His areas of interest are in autonomic and utility computing for managing Web infrastructures.