Application elasticity is a key aspect of a cloud deployment, enabling you to scale in (down) or out (up) the amount of processing available for a given application to match user load. In IBM Workload Deployer (IWD), Virtual Applications can quickly and easily offer elasticity through the use of policies.
Policies in a Virtual Application enable you to dynamically vary the number of instances of your application that are available to process work, with IWD making decisions about when to scale in or out your application based on performance metrics and criteria that you specify. Elasticity introduces some important considerations for your application:
To support elasticity, IWD provides a number of shared services: a proxy service and a caching service. The proxy service lets you assign a virtual hostname to your application (myapp.xyz.com) and manages the onward connections to your variable number of application instances. The caching service provides transparent storage and migration of HttpSession data between instances so that user's state is maintained whilst scaling actions are affected. (For the proxy service, IWD assumes there will be a load balancer within the DMZ which forwards the virtual hostname onto the proxy instances.) Use of both of the shared services requires them to have been started from the Cloud -> Shared Services menu option in the GUI.
- What happens to my users when they are mid-transaction and the instance they are working with is removed (scaled in)?
- Now that I have a dynamic number of instances, how do my users know which instance to connect to?
To support elasticity, IWD provides a number of shared services: a proxy service and a caching service. The proxy service lets you assign a virtual hostname to your application (myapp.xyz.com) and manages the onward connections to your variable number of application instances. The caching service provides transparent storage and migration of HttpSession data between instances so that user's state is maintained whilst scaling actions are affected. (For the proxy service, IWD assumes there will be a load balancer within the DMZ which forwards the virtual hostname onto the proxy instances.) Use of both of the shared services requires them to have been started from the Cloud -> Shared Services menu option in the GUI.
Setting up a routing policy is very simple; just click the + icon at the top of your application component (EAR/WAR/EBA) and click Routing Policy (note that you can also add a policies at the application level if you prefer). Once the policy has been added, select it and then set the properties in the right-hand pane:



With routing configured, we can now add a scaling policy in the same way. By default, when you add a scaling policy the Static policy will be selected. The Static policy provides a fixed number of instances of your application, which enables you to provide fault tolerance and high availability of your application, but does not provide elasticity.
The first of the elastic policies is CPU Based:

Let's use the settings above to illustrate in detail; from our initial state with one instance, if CPU utilisation exceeds 80% at each of the samples for 120 seconds, a scale out action will be triggered and a new application instance will be created. It's important to note that once a scaling action is initiated, all triggers are suspended until the scaling action has fully completed (ie. the application instance is in service, or completely removed). Once the scale out action has completed, we will have two application instances, so at each monitoring interval, the CPU usage will be averaged and compared against the policy. If the average is consistently less/more than the specified range and we have less/more instances than the maximum/minimum another scaling action will be triggered.
The CPU based policy is perfect for you if your application is CPU-bound and application performance degrades when CPU utilisation increases.
When checked, the Enable session caching checkbox ensures that your HttpSession data is highly available across all of your application instances. (Note: to use this, every Object in your HttpSession must be java.io.Serializable or java.io.Externalizable)
Perhaps by now you're thinking "that sounds great, but what happens to my users when they're midway through a transaction and bound by session affinity to a server which is marked to be removed (scaled in)?". When a scale in action is triggered, the first part of the action is to notify the proxy shared service that the marked server is going to be removed. The proxy then stops sending new traffic to the marked server. If an affinity request for the marked server arrives at the proxy, it will be routed to an alternate server. The alternate server will not have the HttpSession for the request, so this will be fetched from the caching shared service. This all happens seamlessly and transparently to the user. Once the proxy has been notified, the WebSphere Application Server instance is stopped via the normal quiesce process, ensuring that inflight transactions are completed and HttpSessions are replicated to the cache shared service before the server is removed.
This works in the same way as the CPU-based policy, but the metric used is the average of the slowest HTTP response time from each instance of the application. In the screenshot above, instances will be scaled in when the average slowest HTTP response time is consistently less than 1000ms and scaled out when the response time is consistently more than 5000ms. The HTTP response time is measured in a similar way to the WebSphere Application Server ServiceTime metric, so the response time is measured once the request has arrived at your application (and therefore excludes network/proxy latency). If your application is not CPU-bound, but has predictable response times that degrade as load increases, the Response Time Based scaling policy is perfect for you.