Level: Intermediate CheKim Chhuor (chhuor@us.ibm.com), System Verification Tester, IBM
25 Jul 2006 You've gathered performance data with the help of the IBM® Enterprise Workload Manager (EWLM) -- now you're ready to exploit this data by enabling intelligent partition management of your AIX® and Linux® partitions running on IBM System p5 servers. In this first part of a two-part series, you get an introduction to logical partitioning. You're guided through the steps to set up your environment for EWLM partition management, and learn how to configure partitions.
Logical partitioning (LPAR) is a convenient way to consolidate computing resources on fewer, bigger machines for ease of management and cost saving, while providing isolation and tremendous flexibility in resource allocation -- thus enabling faster response to the continuously changing business needs. However, resource allocation on p5 servers has been a manual process so far. Even though the process could be done dynamically, it still relies on the system administrator to allocate resources based on judgment.
But, some users have been exploiting the Partition Load Manager (PLM), a feature of the Advanced POWER™ Virtualization offering, and AIX's Workload Manager (WLM) feature, to automate resource allocation based on some resource usage thresholds. Those methods are mostly trial-and-error exercises and do not clearly map to a business need.
With EWLM's partition management feature, you define the performance objective as demanded by your business users. You tell EWLM the level of importance of each application, and EWLM will continuously balance resources among the partitions to help your applications meet the defined performance objectives with respect to each application's priority. Moreover, it may take action before an application misses its performance goal so your most important applications will always comply with the service level agreement, without your intervention.
In this article, I'll guide you through the process of setting up your environment for partition management by EWLM, using AIX and Linux platforms as illustration. You'll first learn the hardware and software requirements, and then you'll move on to how the partitions should be configured in the Hardware Management Console (HMC). In the second part of this article series, I'll demonstrate partition management in action by first creating a domain policy, launching a workload, and then observing the result while the workload struggles to meet the performance goal. And finally, I'll show you some tools and give you some tips for troubleshooting your environment in case you run into problems.
You should be familiar with common EWLM concepts and terms, such as ARM instrumentation, MS, DM, Control Center, and so on. If this is all new to you, read my previous article "Performance monitoring with Enterprise Workload Manager" (see Resources).
All the concepts and procedures discussed in this article also apply to IBM System i5™ servers running i5/OS®, with respect to the particularity of the operating system. Note that even though p5 and i5 servers support dynamic allocation of several types of resources (such as processor, memory, network, and I/O devices), the current version of EWLM only deals with processor resources; therefore, it will only manage application bottlenecks caused by processor contention. Applications that suffer from memory delay or I/O delay will not get any relief from EWLM at this point.
So what's a partition? The term partition is used in this article because it is consistent with terminology on the POWER5™ platform. However, you might see in other EWLM literature that the term Virtual Server is used instead because this term is more generic and could apply to other virtualization environments. Some people may prefer to use the term logical partition or LPAR as it is called in the mainframe world. But in this article, I'll simply call it partition and partition management.
Along with the base requirement of EWLM, which is well documented in the InfoCenter (see Resources), there are additional prerequisites that should be satisfied to get partition management to work. For instance, you must have the hardware and software that support DLPAR operations, and your partitions should be configured in such a way that they can be re-adjusted by EWLM without violating the basic platform constraints. The rest of this section takes a look at these hardware and software requirements.
Servers
Any server based on the IBM POWER5 processor with the Micro-Partitioning feature is supported. This includes existing pSeries® and iSeries™ servers, as well as the new IBM System p5 and i5 servers. Micro-Partitioning is a feature of the Advanced POWER Virtualization offering that provides the capability to host up to 10 partitions per processor and capacity allocation can be made in granularity of 1/100th of a processor, thus allowing more flexible resources sharing and higher utilization rate. Because EWLM only manages processor capacity, it doesn't matter whether the partitions are using physical adapters or the Virtual I/O Server.
The system firmware should be version SF235 or higher. On the AIX 5.3 platform, use the command lsmcode to find out your current system firmware version. This command is also available on SUSE Linux Enterprise Server 9 (SLES 9) if you install the IBM-provided packages called Service and productivity tools for Linux on POWER (see Resources).
Hardware Management Console
You obviously need an HMC to create partitions on your server. The screen shots that I captured for this article are based on HMC code version 5.20 build 20060210.1. Slightly older versions can also be used, but there might be some discrepancy in the GUI panels.
If you do not use an HMC in your environment, but rely on the Integrated Virtualization Manager (IVM) instead (for instance if you're trying EWLM on a very small server or the BladeCenter® JS21), theoretically, EWLM should be able to manage partitions on that too. However, for a production environment, I strongly recommend an HMC-connected server -- this is the configuration that I've been testing during development. There was no testing done with an IVM-managed server so far; therefore, I will not elaborate any further on that.
Operating system
On an AIX 5.3 platform, you need to be at Maintenance Level 3 or higher. For SLES 9, you must have Service Pack 3 installed. However, there was a bug fix that couldn't make it into SP3, so you must get a later kernel build (like 2.6.5-7.244 or higher) from the SUSE maintenance Web site.
The Resource Monitoring and Control (RMC) subsystem should be installed and operational. This subsystem is part of the Reliable Scalable Cluster Technology (RSCT; see Resources). On AIX, it should already be installed and configured on every HMC-managed partition. On SLES 9, you should download and install the Service and productivity tools for Linux on POWER packages in order to make DLPAR operation possible.
To check if this subsystem is running on AIX and SLES 9, enter the following command:
# lssrc -s ctrmc
Subsystem Group PID Status
ctrmc rsct 209002 active
|
It should have an "active" status. Then, to check if it is properly configured, enter this command:
# /usr/sbin/rsct/bin/rmcdomainstatus -s ctrmc
Management Domain Status: Management Control Points
I A 0x870d96b9185bb416 0001 10.1.1.120
|
The resulting IP address should correspond to the HMC IP address, in this case 10.1.1.120. This is not specifically required for EWLM to work (actually EWLM can perform partition management and adjust capacity without the RMC daemon running), but this is required for the partition to interact with the HMC. Without it, you won't be able to make a DLPAR change from HMC, and any change done locally in the partition won't be reflected in the HMC. So for a functional environment, you should have the RMC daemon running at all times. If the RMC daemon is temporary unavailable for some reason, EWLM partition management will continue on without a problem.
EWLM
For AIX, you could use the generally available version of EWLM V2R1 for partition management. For SLES 9, you must install fixpack 20 on top of that. But, I recommend you install fixpack 30 in both cases. The AIX partitions that I use here have a pre-release version of fixpack 30 applied on the Managed Server (MS) side as well as on the Domain Manager (DM) side. There is a nice enhancement to the EWLM partition management function in fixpack 30 that I'll mention later, as well as several bug fixes.
Additionally, the middleware within which your applications run should ideally be ARM-instrumented to take advantage of the intelligent management algorithm. If that is not possible, for instance if you're dealing with a vendor package or batch jobs, you could still benefit from EWLM's partition management by defining a partition class in the domain policy and EWLM will manage each partition as a service class adhering to a performance goal. EWLM just won't understand the granularity of transactions flowing from node to node and the proportion of time spent on each hop in relation to the overall response time.
That's why having ARM-instrumented middleware provides more statistics -- so EWLM can make better management decisions.
So, if you can survive this requirements list (and frankly, it is a bit long when I think about it), then please read on. If you're not sure whether you have all the requirements fulfilled, it might be a good idea to write a checklist (or use the one provided as a download) and verify every item with your system administrator, in case that is not one of your several hats.
Partitions configuration in the HMC
Now, let's look at how the partitions should be configured for EWLM partition management. Go to your HMC console or use the Web-based System Manager tool (WebSM) to remotely connect to your HMC (as I do here):
- Expand the list of partitions on your server.
- Right-click on the active partition profile and choose Properties.
- Click on the Processors tab.
You should see a window as shown in Figure 1.
Figure 1. The processors settings in partition profile
Processing mode
There are four groups of settings in the Processors tab window; they're all very important. The first one is the processing mode, which should be set to "Shared" mode, meaning that the partition will use processor resource from the shared processor pool. EWLM will not manage any partition in a "Dedicated" processing mode.
Processing units
The next setting is the amount of processing units to assign to a partition, the most important setting of all. Start by assigning the desired processing units for the work that this partition normally does, then give a maximum capacity beyond which this partition should not consume and a minimum capacity below which this partition should not be pushed. EWLM manages the capacity of the participating partitions with respect to the minimum and maximum range and that range cannot be changed dynamically -- you have to shut down the partition and re-activate it to make any change effective.
You must give some room for each partition to donate capacity in order for others to receive extra capacity when needed. Note that EWLM's management scope is at the group level, which I'll explain in more detail later on. It maintains the total entitled capacity of all partitions in its operation and won't add or take away capacity from the group; it simply takes from one or more partitions within the group to give to one or more partitions in the same group. If no one is willing to donate anything, then no one will receive anything. The donor cannot go below its minimum capacity and the receiver cannot go above its maximum capacity, no matter how bad the service class is suffering.
The entitled capacity usually corresponds to the desired processing units set in the partition profile, except there might be cases where, during activation of a partition, there is not enough free capacity in the shared processor pool to assign to that partition. It could still be activated, but it won't be entitled to as much as its desired capacity setting.
For example, my partition hci245 has a desired capacity of 1.0 processing unit (PU). If during activation there is only 0.7 PU available in the pool, it will still be activated, but its entitled capacity will be 0.7 instead of 1.0.
It's important to understand the difference between entitled capacity and desired capacity. EWLM uses the entitled capacity when the MS joins the DM, regardless of the desired capacity. Obviously, the actual entitled capacity never goes below the minimum value, nor above the maximum value, that is enforced by the POWER Hypervisor.
Virtual processors
The next setting is the number of virtual processors (VP). Actually, assigned processing units are easy to understand, but the concept of a virtual processor on a POWER5 server is harder to explain because it's more complex. Instead of continuing the confusion, I'd rather refer you to an excellent IBM Redbook, "Introduction to Advanced POWER Virtualization on IBM p5 Servers" (Resources). Section 3.3 provides a detailed explanation in case you're not familiar with this concept.
What matters here is that the minimum number of VPs should be a rounding of the minimum PUs set earlier. The desired values of VPs should be a bit larger than the desired number of PUs and the maximum VPs should be somewhat larger than the maximum PUs. Again, the minimum and maximum values cannot be changed dynamically.
The desired value is used during partition activation, but as soon as EWLM MS is started, it adjusts the VP value to an optimum value based on the current entitled capacity. For example, if the current entitled capacity is 1.0, MS will adjust the VP value to 3. If, later on, the capacity is lowered to 0.4, it changes the VP value to 1. If the capacity rises to 3, it changes the VP to 5, and so on.
This action lets the partition take advantage of any potential unused capacity in the shared processor pool. But MS will not go lower than the minimum value or higher than the maximum VP value. In fact, you can use the VP setting to put a break to an uncapped partition in its ability to consume everything available in the shared pool.
Sharing mode
The last setting in this panel is the sharing mode, which can be "capped" or "uncapped." You can refer to the same IBM Redbook mentioned previously if you're unfamiliar with the sharing mode.
What's important to mention here is that with a capped mode partition, the Hypervisor does not automatically allocate more processing resources than the desired PUs. Therefore, EWLM takes a more active role in adjusting processor capacity if a service class running on that partition is missing its performance goal.
With an uncapped mode partition, the Hypervisor automatically provisions more capacity if there is unused capacity in the shared pool. Therefore, EWLM might not have to get involved as much; it only gets involved if several uncapped partitions are fighting for processing cycles, by adjusting the entitled capacity as well as the uncapped weight value, to favor the partition with more important works going on or missing goals more severely than others.
The decision to use capped or uncapped mode depends on the overall workload across the partitions; some workload combinations are better suited for sharing while others are better suited for isolation. If you're testing EWLM partition management at the moment, it might be good to start with capped partitions so you can easily see the effect of EWLM's management. After you're comfortable with that, you can switch to uncapped mode if that's what you ultimately want.
Regardless of the sharing mode, be careful with your software licenses if they are charged by processor. Some software enforces hard limits on the processor count and won't use more processors than the number registered. Because the actual number of logical processors seen by the operating system tends to be larger than the number of physical processors, you might wonder why adding more capacity doesn't increase the throughput or decrease response time. Besides, you want to be legitimate. Software licensing in a virtualized environment is still an unknown quantity -- it will take time to come up with some kind of industry standard practice.
Partition workload group
Now let's jump to the Settings tab in the same window. You need to set a value for the partition workload group as shown in Figure 2.
Figure 2. Partition workload group setting in partition profile
You can use any 16-bit integer value here (up to 32768), but start from the bottom because the higher range is reserved for special purposes. If you have multiple workload groups, the numbers don't have to be contiguous.
There are two things you need to know about the utility of this workload group number:
-
First, the MS participates in partition management only if the partition belongs to a group, meaning having a workload group number assigned.
-
Second, EWLM manages capacity exchange among partitions of the same group only.
If you don't want any partition management by EWLM on a capable partition, just set its workload group to "None."
Now, here comes the interesting enhancement in fixpack 30. Before this fixpack, if you made a change to the group membership of a partition, even if the partition actually accepted the change, you'd have to restart MS to pick it up. With fixpack 30, the MS dynamically detects group membership changes and reacts to them without restarting. For instance, if a partition is not part of a group and you assign a group number, MS detects that within seconds and activates the partition management function. Alternatively, if a partition belongs to a group and then you set the group to "None," MS notifies DM that it's leaving the group and is no longer part of the partition management by EWLM.
You can change the partition workload group dynamically from the HMC as shown in Figure 3 by right-clicking on the partition, selecting Properties, going to the Other tab, then putting a value in or selecting "None" from the drop-down list.
Figure 3. Changing partition workload group dynamically in the partition properties panel
Shared processor pool authority
One last thing and you're done with the HMC. You must allow each partition to use the shared processor pool:
- Right-click on each partition and select Properties.
- Go to the Hardware tab and then Processors and Memory.
- check the box to "Allow shared processor pool utilization authority" as shown in Figure 4.
Figure 4. Shared pool authority setting in partition properties
That's it for partition configuration. This may seem like a long process, but it'll get faster as you practice it on every partition. The instructions apply to p5 and i5 partitions running AIX, Linux, or i5/OS.
EWLM partition management algorithm
In the second part of this series, you'll get to see this demo work -- in other words, I'll show you partition management in action. But to help you understand what's happening behind the scenes, I'd like to now discuss how this was designed to work.
I want to show you how the different components interact to make this management possible, as well as the several factors that the algorithm engine is considering while making a management decision. Don't worry though: I won't go into the low-level details or the actual algorithm because that is a large dose of math that I don't want to try to digest!
The intention of partition management is to try to help a service class meet the performance goal that was set in the domain policy. Concretely, the goal is for the transactions to complete in a certain period, which can be expressed in average response time (such as all transactions completed within 100ms in average) or in percentile response time (such as 95 percent of the transactions completed within 200ms).
EWLM tries to help the service class that is missing a performance goal or is about to miss it. A transaction can span across multiple nodes and platforms; EWLM keeps detailed statistics on how the transactions (or sub-transactions) are actually performing on each of those nodes. It tries to help the nodes that are suffering from processor bottleneck, but it can't make those transactions run faster because that's only possible by optimizing the environment configuration or by rewriting code.
So the way EWLM makes a contribution is by prioritizing the overall workload across the systems that it manages. In other words, it helps the most important service class achieve its goal by borrowing resources from others that may be idling or doing less important work. Concretely, it means that EWLM takes away processor capacity from some partitions to give it to the one that is missing its goal.
Another way EWLM can help is by influencing how the load balancing device dispatches incoming work to the managed servers so that requests will be sent to the servers perceived as most capable of executing them within the expected response time. But that's the subject of another article.
The management loop
The flow of partition management goes like this: When the MS process is started, it checks to see if the platform it's running on is capable of partition management. If the answer is yes, then the MS sends the platform information to the DM, which keeps track of every virtual group and its members.
A virtual group represents the partitions that are part of the same workload group (the setting in HMC) on a specific machine. There might be several virtual groups on the same machine, but a virtual group cannot span across machine boundaries. The actual implementation of a virtual group is the concatenation of the machine model and serial number and workload group number, such as 9117-570107CCEE-111. So even if you set the same workload group number on two different machines, the resulting virtual group ID is different on each; therefore, EWLM will not make those partitions exchange capacity between groups. In fact, it's actually not possible to exchange capacity between partitions on separate machines.
When another partitioning-capable MS joins the DM, it is either added to the existing virtual group or a new group will be initialized. The DM always ensures that the sum of the capacity of the group remains constant during capacity shifts. The group capacity only changes if a member joins or leaves.
It is important to mention that, even though you could create several virtual groups, there is only one shared processor pool on each machine in the current implementation. In other words, the Hypervisor doesn't care about the workload group setting when dispatching processing cycles to partitions. Consequently, if you have several uncapped partitions on your server, they draw resources from the same shared processor pool regardless of the workload group they belong to.
Let me try to summarize all that in the diagram in Figure 5.
Figure 5. The Virtual Group and Virtualization Agent relationship
Global and local PI
I have to introduce the notion of a global and local performance index (PI). The PI is the numerical representation of how a service class is doing in relation to the performance goal. A fractional PI (PI < 1.0) means the service class is doing better than expected, a PI greater than 1.0 (PI > 1.0) means the service class is missing its goal.
The global PI is the overall score of all the transactions in the service class, end-to-end, for a given period of time. Those transactions might traverse several nodes every time, distributed across many clusters, and so on. Naturally, some transactions will complete faster than others either because of the nature of the actual work to do or because of the conditions of the node processing it. Some nodes are more powerful, more idle; others are less powerful, might be busy with other works, and so on.
So, the MS in each node maintains a local PI to indicate how well it's doing in relation to the performance goal, even if it's just doing a portion of the work (like with a DB2 hop). If the local PI is bad on a partition, that MS will ask for help even if the global PI is doing very well. A good example of this is if you take a cluster of five nodes -- four of them might be doing very well (local PI = 0.2), one of them is doing terribly bad (local PI = 2.0) -- in this case, the global PI will be 0.56, which is still very good (0.2 * 4 + 2.0 = 2.8 / 5 = 0.56) despite the laggard element in the group. The local PI allows EWLM to identify the bad performer among the group and tries to help it even if the overall score looks good.
Plea and assessment
Now, here comes the fun part. What happens when a service class is missing performance goals? An easy case could go like this: the MS makes a projection of whether getting more processing power will improve the situation. If the cause is not processor bottleneck, then the MS does nothing. If the answer is yes, the MS sends out a plea for help to the DM. The DM scans through the list of potential candidates, which include all the managed partitions in the same virtual group, asking each one to evaluate the impact of giving up some resources on the existing workloads it's carrying. Each MS responds to the assessment request telling how much resources it could give up; sometimes it can't give anything up if there are service classes of higher importance going on and donating resources may cause too much negative impact on them.
After receiving all the responses, the DM evalutes if a net improvement is possible. If yes, it tells the donors to reduce entitled capacity and after all the donors are done, it tells the receiver to increase its entitled capacity by the sum of all the donations. If the workload scales as projected, then it should be satisfied with the new capacity increase and everybody will be happy!
Reality is rarely that simple, though. There are several cases where things don't go that smoothly. For instance, if the MS with a failing service class is already at its maximum entitled capacity, then it's just not possible to help; if a potential donor is already at its minimum capacity, then it can't donate anything. Besides those scenarios, the DM might receive several pleas at the same time -- in that case, it simply keeps the highest priority one and skips the rest.
There are some situations in which EWLM doesn't know what to do, so it does nothing. For example, when a service class's PI is off by a large amount (such as PI > 100), it could cause the MS to project to incur a very large increase in capacity in order to satisfy the service class, which would turn out asking for more than what's in the shared processor pool or would make the MS go over its maximum entitlement. In a case like that, you might see nothing happen at all. These types of situations are likely caused by a poorly defined performance goal or by a defective system attempting to handle the workload. When the PI is this high, there's no easy way to tell whether the performance goal was badly defined or if the system is having trouble serving requests (deadlock, out of memory, system errors, and so on) -- only the system administrator can sort this out.
In conclusion
Those are the rules of game:
- You've gotten a clear look at the list of required hardware and software.
- I've detailed how to configure partitions for your HMC, discussing:
- Four important settings: processing mode, processing units, virtual processors, and sharing mode
- How to partition the workload group
- How to assign shared processor pool authority
- I've discussed the EWLM partition management algorithm:
- Explaining the flow of partition management
- Demystifying global and local performance indexes
- Uncovering pleas for help and how they are assessed
In the next article, I'll start the action by examining the topology of my test environment and the workload used. I'll also take a look at the domain policy, then run the workload and observe the partition management actions taken by EWLM.
Download | Description | Name | Size | Download method |
|---|
| Requirement and configuration checklist | ac-ewlm-lpar1source.zip | 3KB | HTTP |
|---|
Resources Learn
-
"Performance monitoring with Enterprise Workload Manager" (developerWorks, March 2006): The article lays the groundwork for this article and explains how to enable ARM instrumentation for end-to-end performance monitoring in IBM middleware on AIX and Linux.
-
IBM EWLM Information Center: This InfoCenter provides documentation, an overview, and frequently asked questions.
-
Reliable Scalable Cluster Technology: This overview explains RSCT.
-
"Advanced POWER Virtualization on IBM System p5" (IBM Redbooks, December 2005): This redbook covers both the Micro-Partitioning technology and the Virtualization Engine's Partition Load Manager quite well.
-
"Autonomic load balancing with EWLM" (developerWorks, April 2006): Part one of a three-part series is now available -- this one focuses on the CISCO Content Switching Module (CSM).
-
Advanced POWER Virtualization on System p5: See this site for more information about the Advanced POWER Virtualization feature.
-
Specified operating environments: This section of the EWLM Information Center explains software and hardware requirements for each component of EWLM.
-
ARM: The Open Group has a section on ARM that shows you how to extend your enterprise management tools directly to applications, creating a comprehensive end-to-end management capability that includes measuring application availability, application performance, application usage, and end-to-end transaction response time.
Get products and technologies
-
Service and productivity tools: These tools for Suse Linux on HMC-managed servers will help you manage an HMC-based server.
-
fixpack 30: When it's available, pick up a copy of the fixpack for the IBM Virtualization Engine.
-
IBM Trade Performance Benchmark: Get a copy of the sample for WebSphere® Application Server V6.0 on HP-UX, IBM AIX, Linux, Microsoft Windows, Solaris (Sun Microsystems), OS/390 and z/OS (V6.0.1).
Discuss
About the author  | 
|  | CheKim Chhuor currently works at IBM Poughkeepsie in the System Verification Test team. He has many years of Web infrastructure consulting experience, and holds many IBM WebSphere®, DB2®, and e-business certifications. His current focus is on grid and autonomic computing. You can contact him at chhuor@us.ibm.com. |
Rate this page
|