Service level differentiation in WebSphere Virtual Enterprise
Service level differentiation is the idea that certain requests should be prioritized above other requests in order to guarantee availability of more important applications or to provide faster response times to more important user requests. WebSphere Virtual Enterprise has a number of components that work together to give administrators the ability to use service level differentiation in their application environment by managing the available server capacity based on priorities supplied by the administrator and the dynamic state of the system. This article discusses the Autonomic Request Flow Manager, which can throttle lower priority traffic when resources are limited.
Introduction to the Autonomic Request Flow Manager
The Autonomic Request Flow Manager (ARFM), a core component of WebSphere Virtual Enterprise, provides service level differentiation by making prioritization decisions about which incoming requests to service and when. These decisions are based on dynamic analysis of the incoming traffic, the available server capacity, and the service goals defined by the user.
Note that when referring to server capacity, ARFM considers the entire set of application servers that are capable of serving the request (e.g. a cluster), not a single server (unless only one server is capable of serving a request for the application in question).
Figure 1 summarizes how requests are handled. When requests enter the On Demand Router, ARFM queues them and makes decisions about when to dispatch them to the backend application servers. This queuing process allows higher priority (as defined by the administrator) traffic to be serviced before lower priority traffic if insufficient resources exist to simultaneously serve all requests. The queued traffic sees an increased response time since it spends some amount of time in a queue prior to being serviced.
ARFM enqueues a requests based on the service policy associated with the request. A service policy is a combination of a request pattern and a desired response time goal for requests matching that pattern. A service policy also includes an importance value which directly translates into what priority will be given to requests associated with the service policy.
ARFM compares available backend resources to the requests waiting on the queues.
It chooses a request (if any) to dispatch based on available resources. In addition, ARFM can request additional servers to be brought online if it determines that additional capacity would reduce the amount of queuing that is occurring. This is described in more detail in the next section.
See the Resources listed below for additional information on this behavior. A future article will also discuss the Application Placement Controller component, which is responsible for starting and stopping servers based on load requirements reported by ARFM, in more detail.
Figure 1. On Demand Router request flow
How ARFM Manages Capacity
Every control cycle, which is one minute by default, ARFM analyzes the current state of the system by looking at available server capacity, current traffic demands, capacity requirements of that traffic, and expected response times of that traffic. Based on this information, it computes “allocations” for each service policy that are designed to ensure that service goals are met, assuming there is sufficient server capacity to do so.
These allocations take the form of how many concurrent requests of each service policy are allowed through to the application servers at a time. If there are more incoming requests for a particular service policy than can be served by the concurrency allocation for that class, the extra requests are held in the queue until existing requests are completed in order to remain within the concurrency limit, rather than being dispatched off the queue immediately as would normally be the case.
When determining the allocation to give a particular service policy, ARFM first determines how much capacity is available on the application servers, and the expected CPU demand resulting from the various service policies (predicted request rate and CPU demand required to serve a request that classifies into a particular service policy, based on historical data).
ARFM then attempts to optimize the allocation for each service policy by predicting how much queuing will occur for that service policy given the predicted (based on historical observation) traffic demand and the time required to service requests for that class of traffic. The difference between the expected service time (how long it typically takes to service a request under normal load conditions) and the service goal defined for that class of traffic is the amount of time that ARFM can queue the request without violating the service goal. That is, time in the queue plus time to actually service the request on the application server should be less than the configured service goal, assuming sufficient capacity exists. Allocations of concurrency are given to each service policy in such a way as to minimize the impact of queuing on the overall responsiveness of the system and to avoid breaching service goals. (If sufficient capacity does not exist, then ARFM will make trade-off decisions about which service goals will be breached and how badly)
When profiling traffic, ARFM assumes that the service times recorded when the backend server is not overloaded represent the best expected service time the application can manage. It does not consider contention between requests either of the same or different service policies nor competition for backend resources such as databases.
Let us consider a scenario in which there are two classes of traffic, gold and bronze. In this scenario, gold traffic is more important than bronze traffic, and both types of requests end up driving load against a common backend database. Now consider what happens when the backend database becomes overloaded, but the application server still has available capacity. Even though the response times of gold traffic will increase due to the overload, and might be improved by restricting bronze traffic, bronze traffic will not be queued to relieve database contention which is resulting in higher gold response times. This is because the application server, which is the resource tier that ARFM manages, is not overloaded. ARFM continues to profile the gold traffic and increase the expected service time due to the higher actual service times it is now seeing. From the perspective of ARFM, the application server is not overloaded, so the increased service time seen in the gold traffic represents a fundamental change in how long gold traffic requires to be served. This is a common source of confusion for users who would expect that bronze traffic should be queued in this case, but ARFM has no way of determining the correlation between the increased bronze traffic and the increasing gold service times. ARFM manages the first tier of capacity only and correcting this situation would require insight into the resource utilization at deeper tiers. If the gold and bronze requests were competing for limited CPU cycles at the application tier (i.e. the application server was running at 90+% CPU utilization), then ARFM would queue the bronze traffic as expected.
In fact, there are only two actions that ARFM will take in order to prevent a service goal from being breached. The first is to reduce the time ARFM keeps the request in the queue for traffic in that service policy (thus reducing the overall response time). However, this action can only be taken if ARFM was queuing traffic for the service policy, which was not the case in the previous example. The other action it can take is to suggest to the Application Placement Controller that additional capacity should be brought online by starting additional servers in a dynamic cluster. Again, however, ARFM will only make this decision if it is currently queuing requests, and it determines that additional capacity would allow it to avoid queuing those requests, reducing the response times of that traffic by the amount of time the requests were previously spending in the queue and hopefully preventing a service goal breach in this way.
When traffic is not being queued by ARFM and the application servers have available CPU capacity, ARFM does not take action to meet service goals because from its perspective the requests are already being served as quickly as possible. That is, requests are not experiencing increased response time due to queuing or due to a lack of application server capacity (in the example above, the increased response time was due to a lack of database capacity), so queuing/blocking other traffic or bringing additional capacity online would not improve the situation since the existing capacity is under utilized.
One final point about ARFM engagement: the threshold at which ARFM
determines that there is insufficient capacity to service all requests and
that some of them should be queued is the
CPU Overload Protection value (see Figure 2).
By default, this is set to 90% meaning that ARFM will not queue any
traffic until the application server CPU utilization would otherwise
exceed 90%. Lowering this value will reduce the volume of traffic ARFM
will allow to the backend. In effect, this will cause ARFM to queue more
traffic because there is effectively less backend capacity available to
it. This can be useful if, for example, it is known that database
resources become overloaded when the application server reaches 50%
utilization. The overload protection value could then be set to 50% in
order to effectively protect the database from overload. This provides an
indirect way to use ARFM to manage deeper tier resources. Note that the
current CPU utilization value is estimated by ARFM based on the predicted
characteristics of the requests it has permitted through the system. If
those predictions are incorrect then the actual backend utilization may
slightly breach, or never reach, this goal. Furthermore, as discussed
earlier, ARFM manages sets of equivalent application servers as a single
resource and assumes that load will be divided appropriately between them.
This means that there are conditions under which individual servers might
be driven to higher or lower load levels. This behavior will be discussed
further in a future article, as part of the overview of the Dynamic Work
Load Manager (DWLM) component.
Figure 2. Configuring the ARFM CPU overload protection threshold
Defining Service Policies
In WebSphere Virtual Enterprise, Service Policy definitions provide the foundation for how ARFM analyzes and optimizes traffic flow. ARFM makes predictions and calculations based on the observations of the traffic in each service policy. Within a service policy, all requests are considered to be equivalent. This means that ARFM assumes that any two requests that fall into the same service policy have similar response times and CPU requirements.
This assumption leads to a key observation about ARFM behavior and a potential pitfall: ARFM models the traffic flow through the system in order to predict what that response times will look like given a set of incoming requests. It bases this on the observed service times of prior requests for the same service policy. This means that it assumes the application server is capable of processing a certain number of requests of a certain type per second, and uses this assumption when determining how many concurrent requests should be allowed. If ARFM has predicted that it can allow a continuous stream of two requests per second (of a particular class of traffic) without resulting in a backup of requests and without overloading the system, unanticipated queuing will occur if one of the requests takes significantly longer than predicted. The available concurrency is effectively reduced to one for a period of time (the other slot of concurrency is occupied by a request that ARFM predicted would return quickly, but has not). This may result in the queuing of additional requests until the long-lived request returns, freeing up the concurrency slot.
To avoid this issue, service policies should be defined such that all traffic categorized to a given service policy has similar typical response times. In some cases this might mean having multiple “gold” service policies, possibly even for requests that are served by the same application (but with differing typical response times).
You configure service policies in two steps. First, you group the associated request URIs into a transaction class (see Figure 3). This is done via the Administrative Console by going to Enterprise Applications-><Application Name>->Service Policies. Next, you create a service policy by selectingOperational Policies->Service Policies->new. Once the new policy is created, you can edit the policy (via the same Operational Policies->Service Policies panel) so the transaction classes can be associated to the service policy definition (see Figure 4). The service policy definition also includes the goal and importance which will be applied to the transaction classes (and thus the URIs) associated to it (see Figure 5). Service policies are defined via Operational Policies->Service Policies in the Administrative Console. For more specific details on service policy creation, see the Resources at the end of this article.
Figure 3. Grouping request URIs into transaction classes
Figure 4. Grouping transaction classes into a service policy
When initially configuring policies, one might intuitively define a single service policy for a given application, based on the importance of that application. However, if two services have very different response times, you should define two different service policies for them. For example, an application that offers both a search service and a timestamp service should have two service policies defined, since search requests presumably take longer than a simple timestamp response. Having two service policies allows ARFM to profile each type of traffic (timestamp requests versus search requests) independently. This improves the accuracy of its predictions about expected service times of each, which in turn improves its traffic shaping decisions.
Similarly, since ARFM assumes that any two requests within a service policy will have similar CPU demands on the application server, traffic patterns which do not fit this expectation can disrupt proper flow control. For example, if ARFM calculates that it can allow 10 requests to be processed on the backend server concurrently with a resulting CPU usage of 90% (9% per request) and one of those requests instead drives 27% CPU, the backend server will be overloaded to 108%. This scenario is even more detrimental to service policy enforcement if ARFM were attempting to divide CPU resources between two service policies and expected each policy to drive 45% of the CPU. In this case, requests from one service policy which demand excessive CPU and lead to overload could result in breached service goals for the other policy or for both policies.
The example of the search request versus timestamp request applies in this situation as well, since search requests are likely to consume more CPU resources than a timestamp request. If they were grouped into the same service class, ARFM would likely have difficulty making accurate predictions about CPU requirements and would over or underload the application server depending on the mix of search versus timestamp requests flowing through the system at any particular time.
The preceding examples are exaggerated for clarity. In practice ARFM anticipates and tolerates a certain level of inconsistency between requests. However, if there are known disparities between particular request types, those requests should be divided into independent service policies so that ARFM can manage each of them effectively.
In addition to the classification of traffic and the goal of the service class, service policies also have an importance associated with them. This is effectively a multiplier which is used to determine “how good” or “how bad” the achieved response time is when compared to the service goal. That is, ARFM multiplies the predicted amount by which a goal is going to be missed or met given a particular allocation by the importance weight of that goal. It attempts to optimize the allocation across the service policies such that the total value is minimized. (Expected response times which are lower than the service goal are considered to be negative values for purposes of this calculation). Service goals with an importance level of “discretionary” have no contribution to the relative goodness or badness of a particular set of allocations, meaning that ARFM makes no effort to avoid queuing traffic in that service class and freely does so if this allows it to improve the overall response time (queue time plus service time) of other service policies.
Suppose we have a system with three service policies defined: Gold, Silver, and Bronze. Gold is defined with a very high importance, silver with a medium importance, and bronze is defined as discretionary. When faced with a set of requests which cannot be served immediately (due to a lack of server capacity), ARFM will first queue requests associated with the bronze policy since its importance is discretionary, so increased service time for these requests does not impact the computed goodness of the system. For the silver and gold requests, ARFM will divide the necessary queuing between them in accordance with their importance. This means that both types of requests will spend some time in the queue, but the gold requests will be queued for proportionately less time. If we assign high importance a numeric value of 100 and medium importance a numeric value of 50, then we can say that ARFM will attempt to queue such that 100*(gold queue time)==50*(silver queue time). In other words, silver requests will, on average, spend twice as much time being queued as gold requests, when there are insufficient resources to immediately service the request.
Figure 5. Service policy goal configuration
Finally, as discussed earlier, ARFM’s primary capacity management mechanism is to queue incoming requests that can be delayed and still meet their service goal. This means that ARFM requires some wiggle room between the typical service time of a particular class of traffic and the desired service goal. This allows it to understand which requests can be given less allocation (and thus be more susceptible to queuing) and which cannot. If the service goals are configured to be very close to the typical service time of the application, ARFM has a very difficult time making appropriate decisions about which traffic to queue, since any queuing will likely violate the service goal. To create meaningful service differentiation, configure service policies loosely enough so that ARFM can queue requests without violating the service goal when the required backend service time is taken into account. Creating multiple service policies helps with this requirement by allowing appropriate goals to be set for each potential type of request, rather than a single goal for any request to the application.
When to Use ARFM
As should now be clear, the nature of how ARFM evaluates traffic patterns and performs capacity management means that properly configured traffic categorization combined with well defined service policies and appropriate service goals is critical for a successful implementation. Without correctly defined service policies, ARFM can overload or underload the application servers as it makes incorrect assumptions about the nature of the traffic it is processing and the demands that traffic places on the application servers.
Until a system is properly configured with appropriate service policy
definitions, consider disabling the traffic queuing function of ARFM. To
do this, use the
disableARFM.py” wsadmin script
(located in the <WAS_HOME>/bin directory) to set the
cell-level custom property
false. There is also a complementary
enableARFM.py script. Note that
disables CPU overload protection, since ARFM will make no effort to queue
traffic under any circumstances. However the Application Placement
Controller still brings additional capacity online as needed if there are
The Autonomic Request Flow Manager is an extremely powerful tool for orchestrating service level differentiation. However, the decisions it makes do not always match the intuitive expectations of a system administrator. Further, ARFM is very sensitive to proper configuration in order to achieve intended results. The main points to keep in mind when configuring the Autonomic Request Flow Manager to implement service level differentiation, are:
Regarding ARFM capabilities:
- Does not manage non-CPU application server resource contention between policies (e.g. application server filesystem contention)
- Does not manage backend (e.g. database) resource contention between policies
Regarding ARFM configuration:
- Avoid wide variance in service times for requests in a single policy
- Avoid wide variance in CPU usage by requests in a single service policy
With these items in mind and an understanding of the algorithms at work when ARFM makes resource decisions, you will be able to configure a WebSphere Virtual Enterprise environment which performs service level differentiation without being surprised by the priority decisions that are made.
Service Time – The amount of time required to process a request on an application server. This is measured from the time the request arrives at the application server until the response is generated. It does not include time spent in a queue on the On Demand Router.
Response Time – The total amount of time from when a request is submitted to the On Demand Router until a response is turned. This does include time spent in a queue on the On Demand Router and is the value which is measured against the service policy goal.
Work Class – configurations associated with each application which categorize requests into specific transaction classes based on URI.
Transaction Class – An object which is used to group a set of work classes together and then associate them with a service policy.
Service Policy – a set of transaction classes associated with a service goal for those classes. All transaction classes in a service policy should have similar response time and CPU consumption patterns.
Service Goal – the response time target which ARFM will attempt to meet for the set of Transaction Classes associated with a particular Service Policy.
- The WebSphere Virtual Enterprise 188.8.131.52 Information Center
- High level discussion of resource provisioning interactions between the Autonomic Request Flow Manager and the Application Placement Controller
- “Defining a Service Policy” in the WebSphere Virtual Enterprise Information Center
- “Configuring the autonomic request flow manager” (including CPU overload protection) in the WebSphere Virtual Enterprise Information Center
- “Overview of application placement” in the WebSphere Virtual Enterprise Information Center
Dig deeper into WebSphere on developerWorks
Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.
Experiment with new directions in software development.
Software development in the cloud. Register today to create a project.
Evaluate IBM software and solutions, and transform challenges into opportunities.