IBM Support

Assigning the appropriate processor entitled capacity

Question & Answer


Question

How can you monitor and allocate sufficient processor entitlement capacity to accommodate your workload in shared-capped or uncapped partitions?

Cause

Running over the capacity entitlement causes performance degradation.

Answer

This document outlines the definition of capacity entitlement, its performance impact on the AIX operating system, and guidance on how to monitor it, as well as how to allocate the appropriate number of processors for a partition's workload.

To effectively monitor capacity entitlement and assign the right amount of processors for a specific partition's workload, it's essential to understand the various types and modes of partitions and how they utilize logical partitioning features in Power Systems.

The following sections will provide an overview of different partition types, modes, and relevant processor parameters, presented in clear, easy-to-understand points with explanations. This will be followed by a detailed discussion on monitoring entitled capacity and assigning the appropriate number of processors for server workloads, complete with examples using various tools.


What is the capacity entitlement?

  • Physical processors are presented to a logical partition's operating systems as virtual processors.
  • Physical processors are virtualized into portions or fractions, each fraction of a single processor equals 0.1 of one processor (there is an additional fraction of 0.01).
  • The number of cores assigned to a partition is represented by the Capacity Entitlement.
  • To display the assigned capacity entitlement for a shared partition, use the command # lparstat.
  • You will see an output similar to this: System configuration: type=Shared mode=Uncapped smt=4 lcpu=48 mem=12288MB psize=76 ent=6.00. In this example, the current assigned capacity entitlement is 6 CPUs.
  • This indicates the number of processors that the partition is entitled to use.
  • This number indicates the maximum limit the partition can access from the processor pool if it is capped.
  • The partition can utilize more than the assigned capacity entitlement in uncapped mode.
  • Details about capped and uncapped modes will be discussed later in this document.
  • The number of virtual processors and processing units assigned to a partition can be dynamically modified through the HMC.
 


Capacity Entitlement considerations

  • Capacity entitlement should be properly configured for standard production operations and to accommodate workloads during peak times.  
  • Ensuring sufficient capacity entitlement is crucial to prevent any negative impact on operating system performance and processor affinity.  
  • Exceeding the entitled capacity can lead to poor affinity and significant performance degradation, which can affect business operations.  
 


Virtual Processors

  • A virtual processor is a representation of a physical processor core to the operating system of a partition that uses shared processors.
  • It is the number of physical processors that the logical partition can spread out across.
  • It represents the upper threshold for the number of physical processors that can be used.
  • Each partition has its own assigned virtual processors.
  • The partition will operate only on the virtual processors required for its workload.
  • Unneeded virtual processors assigned to a partition will be folded away using the processor folding feature.
  • To display the current assigned virtual processors use the command # lparstat.
  • Using an HMC, you can change the number of virtual processors and processing units that are assigned to the partition.
 


Processor affinity

  • The probability that a logical processor is dispatched on the same physical processor it last executed on.  
  • Exceeding entitlement will impact processor affinity.  
  • When operating within entitlement, the thread is generally dispatched each time on the same processor.  
  • The uncapped mode does not prevent issues with processor affinity, as illustrated below.  
  • Even in an uncapped partition, processor affinity is affected if capacity entitlement exceeds 100%.  
  • A dedicated partition has optimal processor affinity.  
 


Processor folding

  • Processor folding was introduced in AIX 5.3 TL3.  
  • Each logical partition has a number of virtual processors assigned to it.  
  • The workload of the logical partition will require some of those virtual processors.  
  • Other virtual processors may not be necessary for that workload.  
  • The operating system will fold the idle virtual processors and will operate only on those that are needed.  
  • The parameter `vpm_fold_policy` controls the implementation of the virtual processor management feature for processor folding.  
  • It is recommended to keep processor folding enabled, as it is enabled by default.  
  • If it is disabled, you can enable it using the command `# schedo -p -o vpm_fold_policy=1`.  
  • Processor folding is not supported or recommended on VIOS servers.  
  • Processor folding is disabled by default on VIOS servers.  
  • Use the command `# man schedo` for more information on how to check and set various folding options.  
  • Use the command `mpstat` in conjunction with the `-s` flag to monitor folded processors.  
 


Dedicated Partitions

  • A dedicated partition uses a fixed number of whole processors.  
  • It can share its unused processors if it is in donating mode.  
  • Its processor pool is extended with ceded idle cycles in donating mode.  
  • It will not borrow any processors when needed.  
  • The Dedicated-donating mode can be enabled via HMC.  
  • Enable it under the lpar Processors tab from the 'Processor Sharing' option.  
  • A dedicated partition does not interact with the processor pool.  
  • Capacity entitlement statistics will not be displayed for dedicated partitions.  
  • System, user, wait, and idle consumption averages should be monitored.  
  • A dedicated partition has optimal processor affinity.  
 


Shared Partitions

  • A shared partition uses fractional numbers of processors.  
  • It can share its unused processors with another partition in the processor pool.  
  • It will borrow additional processors if needed (uncapped mode).  
  • It cannot borrow additional processors when in capped mode.  
  • A partition can be assigned to a specific processor pool.  
  • If there are no user-created pools, all partitions will operate in the default processor pool.  
  • Capacity entitlement statistics will be displayed for shared partitions.  
  • System, user, wait, idle consumption averages, and capacity entitlement should be monitored.
 


What is the shared processor pool?

  • Groups all cores that are not dedicated into specific logical partitions (default shared processor pool)
  • The default shared processor pool is created by default.
  • Some Power models allow you to use the HMC to configure multiple shared processor pools.
  • It facilitates the sharing of processing capacity among multiple logical partitions.
  • Dedicated processors do not utilize a processor pool.
  • Dedicated donating partitions extend the processor pool with ceded idle processor cycles.
 


Sharing modes Capped/Uncapped

  • Capped mode does not permit the partition to exceed the assigned entitled capacity, even if there are free resources in the processor pool.
  • Uncapped mode allows the logical partition to obtain more processing units if needed, provided that sufficient resources are available.  
  • Uncapped partitions have access to spare processor cycles in the shared processor pool.  
  • Capacity entitlement no longer reflects the maximum number of cores available in relation to the partition's capacity weight.  
 


Capacity Weight

  • Capacity weight determines the priority of a partition in accessing resources from the processor pool if needed.  
  • Uncapped weight is a value ranging from 0 to 255, with the default uncapped weight set at 128.  
  • Unused capacity is allocated to contending partitions based on their established uncapped weight values.  
  • Capacity weight for critical partitions may be assigned a slightly higher value.  
  • The lower the weight value, the less likely it is to allocate processing cycles when exceeding entitlement.  
  • An uncapped partition with a capacity weight of zero will behave as if it is in capped mode.  
  • Capacity weight can be adjusted dynamically from the HMC as needed.  
  • Capacity weight is not always taken into account and may be disregarded in certain situations.  
If logical partition A has an uncapped weight of 100 and one virtual processor, while logical partition B has an uncapped weight of 200 and one virtual processor, partition B will receive two additional processing units for every one that partition A receives when resources are limited, but if there is enough capacity for both, they will receive equal amounts of unused capacity, ignoring their uncapped weights.


Logical processor

  • Represents an individual Simultaneous Multi-threading (SMT) thread of a physical processor.  
  • The total number of logical processors (lcpu) equals the number of virtual CPUs (VCPU) multiplied by the SMT: lcpu = VCPU * SMT.  
  • When SMT is disabled, a virtual processor core corresponds to one AIX logical processor.  
  • Use the command `# lparstat` to check the number of logical processors.  
  • The number of configured logical CPUs should be assessed against the number of process threads in the run queue.  
  • Some threads in the run queue may be waiting for an available logical CPU.  
  • Additional virtual processors may be necessary if there are threads in the run queue waiting for a logical CPU.  
 


Simultaneous Multi-Threading SMT

  • Allows processors to achieve thread-level parallelism at the instruction level.  
  • The `smtctl` command controls the enabling and disabling of processor SMT mode.  
  • To dynamically increase the default number of SMT threads on a Power8, use `# smtctl -t 8` and then run `bosboot` to make it persist across reboots.  
  • Use the command `# smtctl` to check the current SMT mode.  
 


How to benefit from the uncapped feature

It is recommended to assign a partition's entitlement close to its average CPU consumption within a specific time window (ideally 10 minutes during peak production) and allow the uncapped feature to handle spikes, which enables the shared partition to access additional fractional cores from the shared processor pool while ensuring that the partition does not operate consistently above its currently assigned capacity entitlement.

Important commands:
  • To display the current assigned virtual processors, use the command `# lparstat -i | grep -i "Desired Virtual CPUs"`.  
  • To display the currently assigned capacity entitlement, use the command `# lparstat | awk -F "ent=" '/ent\=/ {print $NF}'`.  
  • To display the logical partition's Processor Capacity Weight, use the command `# lparstat -i | grep -i "Variable Capacity Weight"` (the default is 128).
  • To check the partition type, use the command `# lparstat -i | awk '/Type/{print $NF}'` (the output will be either shared or dedicated).  
  • To check the partition's shared processor mode, use the command `# lparstat -i | awk '/capped/{print $NF}'` (the output will be Capped or Uncapped).  
 


How to monitor the available processors in the shared processor pool?

  • Monitoring the available processors in the shared processor pool is crucial.  
  • Having fewer than one spare processor in the pool as a free resource can lead to performance degradation.  
  • Additional spare processors should be added to the processor pool if necessary.  
  • The available processors in the pool are represented in the "app" column of the `lparstat` command output.  
  • By default, the "app" column is not included in the `lparstat` output.  
  • Enabling the "app" column does not affect the system's performance.  
 


How to enable the partition to gather the available processors information "app"?

  1. Log on to HMC
  2. Right-click on the specific LPAR
  3. Properties
  4. Check the box of 'Allow performance information collection'
  • Or use HMC command line $ chsyscfg -m <sys name> -r lpar -i "name=<lpar_name>,allow_perf_collection=1"
  • Note: This will add an additional column in the `lparstat` output called "app," which will display the available processors in the shared processing pool.


Example to check the available processors in the pool:

  • There is less than one spare processor (minimum is 0.17) in the shared processor pool.
  • This will have an impact on the performance.
  • This indicates a need for additional spare processors to be added to the processor pool.
  • Monitor the partition with more snapshots to confirm if more processors are needed.
 



Monitoring the capacity entitlement:
There are various tools available to monitor and address processor capacity entitlement issues, and you can check capacity entitlement yourself by using the command line to obtain an average value during what you consider peak production times with higher workloads.


Examples using the lparstat command:

  • Use lparstat 1 60 for a one-minute test to get 60 snapshots each second.
  • Or use lparstat 1 120 for a longer measurement period.
  • Use lparstat 2 60 to change the frequency gap by 2 seconds.
  • Using lparstat 10 60 to get sixty samples ten seconds apart.
For more details on lparstat command use # man lparstat

The output will be similar to the following:

 
Some details about the above output:
  • %user indicates the percentage of the entitled processing capacity used while executing at the user level (application).
  • %sys is the percentage of the entitled processing capacity used while executing at the system level (kernel).
  • %idle is the percentage of the entitled processing capacity unused while the partition was idle and did not have any outstanding disk I/O request.
  • %wait indicates the percentage of the entitled processing capacity unused while the partition was idle and had outstanding disk I/O request(s).
  • %entc column indicates the percentage of the physical processor consumed, those statistics are displayed only when the partition type is shared.
  • For both dedicated and shared partitions you still need to have a look over user, sys, wait, and idle percentages.
  • We recommend having the average as well for each within a specific window as we made with the capacity entitlement.
  • Usually, we should not see high usage for the summation of both %user and %sys averages.
  • For more details about the outputs above we recommend reading the manual pages of the lparstat command use the command # man lparstat
  • The output shows high physical processor consumed with slightly high values in %entc exceeding 100%
  • That's why we have to get an average usage so we can confirm if they were just spikes and no action is needed or if the processor entitlement should be increased.
 



How to get capacity entitlement average, maximum, and minimum consumption from the lparstat command with a minute test:

We need to get the average, maximum, and minimum usage of the processors consumed during this time (one minute) to check the capacity entitlement and to get the appropriate suggested capacity entitlement partition should have.

Use the following command to run the lparstat command for one minute to generate 60 measurements each measurement will be every one second and to direct the output to the file physc.int:

# lparstat 1 60 | tee physc.int


The output will be similar to the following:
Use the following command to get the minimum and maximum consumption during this minute:
# awk '/^..[0-9]/{print $5}' physc.int | sort -n | sed -n '1p;$p'
The following command for the average utilization during the same minute:
# awk '/^..[0-9]/{sum+=$5}END{printf "%.2f\n",sum/60}' physc.int

The above new statistics will let us have a clear view of what happened during the one-minute test:
  • The maximum value is the maximum number the physical processors consumed during that time and it is important to keep it under 100% for normal production operation.
  • If the maximum value reaches more than 100% it might be just a spike or more than a single spike hence you should have another look over the minimum number.
  • Having a minimum value over 100% means we have 60 measurements over 100% which means it is no longer just a spike but a workload with capacity entitlement problem.
  • In this case round the average value to the next 0.10 equivalent to get the appropriate capacity entitlement to assign to this partition only if the application or the database running on this partition or the operating system itself are not having a specific process hogging and eating the CPU time.
  • The above test would be applicable only if you need to monitor the entitled capacity or to know how much entitled capacity should be assigned for your environment and workload. For any other processor problems, such as a specific process eating most of CPU time, gather perfpmr data and upload it to IBM Support for analysis or to check with the application owner on why that process is consuming that much processor usage.
  • We recommend using a 10-minute window for more measurements and accurate results.
  • If another number is specified rather than 60 to collect more measurements, make sure to change its equivalent in the above command that gets the average usage.
  • If for example 'lparstat 1 120' is used then the command will be # awk '/^.[0-9]/{sum+=$5}END{printf "%.2f\n",sum/120}' physc.int

Example: Note that I have listed only 7 measurements, they are supposed to be 60.

# lparstat 1 60 | tee physc.int

# awk '/^..[0-9]/{print $5}' physc.int | sort -n | sed -n '1p;$p'
9.89 < minimum
9.92 < maximum

# awk '/^..[0-9]/{sum+=$5}END{printf "%.2f\n",sum/60}' physc.int
9.93 < average

# lparstat | awk -F "ent=" '/ent\=/ {print $NF}'
9.00 < current
  • Compare the average value with the capacity entitlement assigned.
  • The average value is 9.92 and the capacity entitlement is 9.00
  • Round the average capacity entitlement value to the next '0.10' equivalent which will be 10.00
  • The entitled capacity should be increased to 10.00 for better operation.
  • If a specific process is eating most of the CPU time, you still need to involve the IBM support team or the application vendor.
  • The appropriate data to check the problem is perfpmr data.
  • This suite of scripts can be downloaded from ftp://ftp.software.ibm.com/aix/tools/perftools/perfpmr
  • Download the appropriate version for your version of AIX.
  • Run the perfpmr scripts then upload the data to IBM Support for analysis and possible fixes.
  • The application vendor can check also if there is a problem with the application with high CPU usage.
 



How to get the averages for different CPU modes?

Using the same above file physc.int that contains the output from # lparstat 1 60

  • %user: Use the command # awk '/^..[0-9]/{sum+=$1}END{printf "%.2f\n",sum/60}' physc.int
  • %sys: Use the command # awk '/^..[0-9]/{sum+=$2}END{printf "%.2f\n",sum/60}' physc.int
  • %wait: Use the command # awk '/^..[0-9]/{sum+=$3}END{printf "%.2f\n",sum/60}' physc.int
  • %idle: Use the command # awk '/^..[0-9]/{sum+=$4}END{printf "%.2f\n",sum/60}' physc.int
 



The vmstat command can be used as well to check the capacity entitlement:

Example: # vmstat 1 600

  • With the vmstat command above we used 600 snapshots for more measurements for better results.
  • Look for the pc column values under CPU and get its average.
  • The column pc is the number of physical processors used and is displayed only if the partition is running with a shared processor.
  • You won't find that column if this is run on a dedicated partition.
 



The sar command might be useful to monitor the CPU consumption also:

Example: # sar -u 1 10

 
 
 
 
 
 



Cheers, Mahmoud M. Elshafey

[{"Product":{"code":"SWG10","label":"AIX"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Not Applicable","Platform":[{"code":"PF002","label":"AIX"}],"Version":"6.1;7.1","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
04 November 2024

UID

isg3T1024788