IBM Support

Assigning the appropriate processor entitled capacity

Question & Answer


Question

How to monitor and set enough processor entitled capacity to fit your workload in shared-capped/uncapped partition(S)?

Cause

Running over the capacity entitlement causes performance degradation.

Answer


This document discusses the capacity entitlement definition, the performance impact it has on the AIX operation system, how to monitor it and how to set the appropriate amount of processors needed for a partition's workload.

In order to monitor the capacity entitlement or to assign the appropriate amount of processors for a specific partition to cover its workload, it is important to understand the different partitions types and modes and how they benefit from logical partitioning features from power systems.

The following details will cover the different partition types, modes, and different processor related parameters with definitions in summarized simple points with explanations, then a detailed discussion on how to monitor the entitled capacity and how to assign the appropriate amount of processors for the server workload with examples using some tools.
 


What is the capacity entitlement?

  • Physical processors are presented to a logical partition's operating systems as virtual processors.
  • Physical processors are virtualized into portions or fractions.
  • Each fraction of a single processor equals 0.1 of one processor. There is an additional fraction of 0.01
  • The number of cores assigned to a partition is represented by the Capacity Entitlement.
  • To display the assigned capacity entitlement for a shared partition use the command # lparstat|awk -F "ent=" '/ent\=/ {print $NF}'
  • The output will be the number of processors this partition is entitled to use.
  • This is the upper threshold the partition can have from the processor pool (Capped mode).
  • The partition can use more than the assigned capacity entitlement (Uncapped mode).
  • Capped and uncapped modes details will be illustrated later in this document.
  • The number of virtual processors and processing units that are assigned to a partition can be changed through the HMC.
 


Capacity Entitlement considerations

  • Capacity entitlement should be correctly configured for normal production operation and to cover workload during peak time.
  • Having enough capacity entitlement is important so as not to impact operating system performance and processor affinity.
  • Running over entitled capacity can cause bad affinity and noticeable performance degradation affecting business operation.
 


Virtual Processors

  • A virtual processor is a representation of a physical processor core to the operating system of a partition that uses shared processors.
  • It is the number of physical processors that the logical partition can spread out across.
  • It represents the upper threshold for the number of physical processors that can be used.
  • We recommend not to increase the ratio between the virtual processors to entitled capacity to more than 1.6
  • Each partition has its own assigned virtual processors.
  • The partition will work only on the virtual processors needed for its workload.
  • The unneeded virtual processors assigned to a partition will fold away using processor folding feature.
  • To display the current assigned virtual processors use the command # lparstat -i | grep -i "Desired Virtual CPUs"
  • Using an HMC, you can change the number of virtual processors and processing units that are assigned to the partition.
 


Processor affinity

  • The probability that a logical processor was dispatched on the same physical processor that it last executed on.
  • Running over the entitlement will impact processor affinity.
  • If you run within entitlement you generally dispatch the thread each time on the same processor.
  • Uncapped mode will not avoid the problems with the processor affinity, this is illustrated below.
  • Even an uncapped partition the processor affinity is affected if capacity entitlement exceeds 100%.
  • A dedicated partition has an ideal processor affinity.
 


Processor folding

  • Processor folding was introduced in AIX 5.3 TL3.
  • Each logical partition has a number of virtual processors assigned to it.
  • The logical partition workload will need to use some of those virtual processors.
  • Other processors might not be needed for that workload.
  • The operating system will fold the idle virtual processors and will run only on those that are needed.
  • The parameter vpm_fold_policy controls the application of the virtual processor management feature of processor folding.
  • We recommend to keep processor folding enabled, it's enabled by default.
  • If disabled, it's recommended to enable it using the command # schedo -p -o vpm_fold_policy=1
  • Processor folding is not supported or recommended on VIOS servers.
  • Processor folding is disabled by default on VIOS servers.
  • Use the command # man schedo for more information about how to check and set different folding options.
  • Use the command mpstat in parallel with -s flag to monitor the folded processors.
 


Dedicated Partitions

  • A dedicated partition uses a dedicated number of whole processors.
  • It can share its un-used processors if it was in donating mode.
  • Its processor pool is extended with ceded idle cycles in donating mode.
  • It will not borrow any processors if needed.
  • The Dedicated-donating mode can be enabled via HMC.
  • Enable it under lpar Processors tab from 'Processor Sharing' option.
  • A dedicated partition will not work with the processor pool.
  • Capacity entitlement statistics will not be displayed for dedicated partitions.
  • System, user, wait and idle consumption averages should be monitored.
  • A dedicated partition has the ideal processor affinity.
 


Shared Partitions

  • A shared partition uses fractional numbers of processors.
  • It can share its non-used processors to another partition in the processor pool.
  • It will borrow additional processors if needed (Uncapped node).
  • It can not borrow additional processors if needed in capped mode.
  • A partition can be assigned to a specific processor pool.
  • If there are no user created pools, all partitions will work in the default processor pool.
  • Capacity entitlement statistics will be displayed for shared partitions.
  • System, user, wait, idle consumption averages and capacity entitlement should be monitored if Capped.
  • If uncapped monitor the capacity entitlement.
 


What is the shared processor pool?

  • Groups all cores that are not dedicated into specific logical partitions (default shared processor pool)
  • The default shared processor pool is created by default.
  • Some Power models allow you to use the HMC to configure multiple shared processor pools.
  • It allows the sharing of processing capacity among multiple logical partitions.
  • Dedicated processors do not take advantage of a processor pool.
  • Dedicated donating partitions extend the processor pool with ceded idle processor cycles.
 


Sharing modes Capped/Uncapped

  • Capped mode doesn't allow the partition to exceed the assigned entitled capacity even if there are free resources in the processor pool.
  • Uncapped mode will let the logical partition to get more processing units if needed if enough resources were available.
  • Uncapped partitions have access to spare processor cycles in the shared processor pool.
  • Capacity entitlement no longer represents the maximum number of cores available in reference to the partition capacity weight.
 


Capacity Weight

  • Capacity weight is the partition priority in having resources from the processor pool if needed.
  • Uncapped weight is a number in the range of 0 through 255, the default uncapped weight value is 128.
  • Unused capacity is distributed to contending partitions in proportion to the established value of the uncapped weight.
  • Capacity weight for critical partitions could have a slightly higher weight.
  • The lower the weight value to less likely it will allocate processing cycles when exceeding entitlement.
  • An uncapped partition with a capacity weight of zero will cause it to work as it's capped mode.
  • Capacity weight can be adjusted dynamically from HMC as needed.
  • Capacity weight is not always considered and ignored in some cases.
For example, logical partition A has one virtual processor and an uncapped weight of 100.
Logical partition B also has one virtual processor, but an uncapped weight of 200.
If logical partitions A and B both require additional processing capacity,
and there is not enough physical processor capacity to run both logical partitions,
logical partition B receives two additional processing units for every additional processing
unit that logical partition A receives. If logical partitions A and B both require additional
processing capacity, and there is enough physical processor capacity to run both logical
partitions, logical partition A and B receive an equal amount of unused capacity. In this
situation, their uncapped weights are ignored.


Logical processor

  • Represents an individual Simultaneous Multi-threading SMT thread of a physical processor.
  • Equals the total number of cores assigned multiplied by SMT: lcpu = vCPU * SMT
  • Hypervisor virtual processor core corresponds to one AIX logical processor when SMT is disabled.
  • Use the command #lparstat to check number of logical processors.
  • Number of lcpus configured should be evaluated against the number of process threads on the run queue
  • Some threads on the run queue may be waiting for an available logical cpu.
  • Additional virtual processors might be needed if there are threads on the run queue waiting for lcpu.
 


Simultaneous Multi Threading SMT

  • Allows processors to have thread level parallelism at the instruction level.
  • The smtctl command controls the enabling and disabling of processor SMT mode.
  • When booting a Power8 logical partition, the default number of SMT thread is 4
  • To increase the default number of SMT threads dynamically on a Power8, use # smtctl -t 8 and run bosboot to make it persist across reboots.
  • Use the command # smtctl to check the current SMT mode.
 


How to benefit from the uncapped feature

It is recommended to assign a partition's entitlement close to its average cpu consumption
within a specific window (recommended 10 minutes - during peak production time) and let the
spikes to be addressed by the uncapped feature that will allow the shared partition have
additional fractional number of cores from the shared processor pool.

Important commands:
  • To display the current assigned virtual processors use the command # lparstat -i | grep -i "Desired Virtual CPUs"
  • To display the current assigned capacity entitlement use the command # lparstat|awk -F "ent=" '/ent\=/ {print $NF}'
  • To display the logical partition Processor Capacity Weight use the command # lparstat -i | grep -i "Variable Capacity Weight" - - default is 128.
  • To check the partition type use the command # lparstat -i | awk '/Type/{print $NF}' - - the output might be shared or dedicated.
  • To check the partition shared processor mode use the command # lparstat -i | awk '/capped/{print $NF}' - - the output is Capped or Uncapped.
 


How to monitor the available processors in the shared processor pool?

  • Monitoring the available processors in the shared processor pool is very important.
  • Having less than one spare processor in the pool as free resource will cause a performance degradation.
  • Additional spare processors should to be added to the processor pool if needed.
  • The available processors in the pool are represented in the "app" column from lparstat command output
  • By default the "app" column is not found in lparstat output
  • Turning "app" on does not impact the system's performance.
 


How to enable the partition to gather the available processors information "app" ?

  1. Log on to HMC
  2. Right click on the specific LPAR
  3. Properties
  4. Check box of 'Allow performance information collection'
  • Or use HMC command line $ chsyscfg -m <sys name> -r lpar -i "name=<lpar_name>,allow_perf_collection=1"


Example to check the available processors in the pool:
  • There is less than one spare processor (minimum is 0.17) in the shared processor pool.
  • This will have an impact on the performance.
  • This indicates a need for additional spare processors to be added to the processor pool.
  • Monitor the partition with more snapshots to confirm if more processors are needed.
 



Monitoring the capacity entitlement:
There are many tools to monitor and to address the processor capacity entitlement problems.
If you need to check capacity entitlement yourself you can use the command line to get an
average entitlement value during the time you think it's a peak time from production operation
higher workload.


Examples using lparstat command:

  • Use lparstat 1 60 for a one minute test to get 60 snapshots each second.
  • Or use lparstat 1 120 for a longer measurements period.
  • Use lparstat 2 60 to change the frequency gap with 2 seconds.
  • Using lparstat 10 60 to get sixty samples ten seconds apart.
For more details on lparstat command use # man lparstat

The output will be similar to the following:

 
Some details about the above output:
  • %user indicates the percentage of the entitled processing capacity used while executing at the user level (application).
  • %sys is the percentage of the entitled processing capacity used while executing at the system level (kernel).
  • %idle is the percentage of the entitled processing capacity unused while the partition was idle and did not have any outstanding disk I/O request.
  • %wait indicates the percentage of the entitled processing capacity unused while the partition was idle and had outstanding disk I/O request(s).
  • %entc column indicates the percentage of the physical processor consumed, those statistics are displayed only when the partition type is shared.
  • For both dedicated and shared partitions you still need to have a look over user, sys, wait and idle percentages.
  • We recommend to have the average as well for each within a specific window as we made with the capacity entitlement.
  • Usually we should not see high usage for the summation of both %user and %sys averages.
  • For more details about the outputs above we recommend to read the manual pages of lparstat command use the command # man lparstat
  • The output shows high physical processor consumed with slightly high values in %entc exceeding 100%
  • That's why we have to get an average usage so we can confirm if they were just spikes and no action needed or the processor entitlement should be increased.
 



How to get capacity entitlement average, maximum and minimum
consumption from lparstat command with one minute test:

We need to get the average, maximum and the minimum usage the processors
consumed during this time (one minute) to check the capacity entitlement and to
get the appropriate suggested capacity entitlement partition should have.

Use the following command to run lparstat command for one minute to generate
60 measurements each measurement will be every one second and to direct the
output to the file physc.int:

# lparstat 1 60 | tee physc.int


The output will be similar to the following:
Use the following command to get the minimum and maximum consumption during this minute:
# awk '/^..[0-9]/{print $5}' physc.int | sort -n | sed -n '1p;$p'
The following command for the average utilization during the same minute:
# awk '/^..[0-9]/{sum+=$5}END{printf "%.2f\n",sum/60}' physc.int

The above new statistics will let us have a clear view on what happened during the one minute test:
  • The maximum value is the maximum number the physical processors consumed during that time and it is important to keep it under 100% for normal production operation.
  • If the maximum value reached more than 100% it might be just a spike or more than a single spike hence you should have another look over the minimum number.
  • Having a minimum value over 100% means we have 60 measurements over 100% which means it is no longer just a spike but a workload with capacity entitlement problem.
  • In this case round the average value to the next 0.10 equivalent to get the appropriate capacity entitlement to assign to this partition only if the application or the database running on this partition or the operating system itself are not having a specific process hogging and eating the CPU time.
  • The above test would be applicable only if you need to monitor the entitled capacity or to know how much entitled capacity should be assigned for your environment and workload. For any other processor problems, such as a specific process is eating most of CPU time, gather perfpmr data and upload to IBM Support for analysis or to check with the application owner on why that process is consuming that much processor usage.
  • We recommend using a 10 minute window for more measurements and accurate results.
  • If another number is specified rather than 60 to collect more measurements, make sure to change its equivalent in the above command that gets the average usage.
  • If for example 'lparstat 1 120' used then the command will be # awk '/^.[0-9]/{sum+=$5}END{printf "%.2f\n",sum/120}' physc.int

Example: Note that I have listed only 7 measurements, they supposed to be 60.

# lparstat 1 60 | tee physc.int
# awk '/^..[0-9]/{print $5}' physc.int | sort -n | sed -n '1p;$p'
9.89 < minimum
9.92 < maximum

# awk '/^..[0-9]/{sum+=$5}END{printf "%.2f\n",sum/60}' physc.int
9.93 < average

# lparstat | awk -F "ent=" '/ent\=/ {print $NF}'
9.00 < current
  • Compare the average value with capacity entitlement assigned.
  • The average value is 9.92 and the capacity entitlement is 9.00
  • Round the average capacity entitlement value to the next '0.10' equivalent which will be 10.00
  • The entitled capacity should be increased to 10.00 for better operation.
  • If a specific process is eating most of CPU time, you still need to involve IBM support team or the application vendor.
  • The appropriate data to check the problem is a perfpmr data.
  • This suite of scripts can be downloaded from ftp://ftp.software.ibm.com/aix/tools/perftools/perfpmr
  • Download the appropriate version for your version of AIX.
  • Run the perfpmr scripts then upload the data to IBM Support for analysis and possible fix.
  • The application vendor can check also if there is a problem with the application with high CPU usage.
 



How to get the averages for different CPU modes?

Using the same above file physc.int that contains the output from # lparstat 1 60

  • %user: Use the command # awk '/^..[0-9]/{sum+=$1}END{printf "%.2f\n",sum/60}' physc.int
  • %sys: Use the command # awk '/^..[0-9]/{sum+=$2}END{printf "%.2f\n",sum/60}' physc.int
  • %wait: Use the command # awk '/^..[0-9]/{sum+=$3}END{printf "%.2f\n",sum/60}' physc.int
  • %idle: Use the command # awk '/^..[0-9]/{sum+=$4}END{printf "%.2f\n",sum/60}' physc.int
 



The vmstat command can be used as well to check the capacity entitlement:

Example: # vmstat 1 600

  • With the vmstat command above we used 600 snapshots for more measurements for better results.
  • Look for the pc column values under cpu and get its average.
  • The column pc is the number of physical processors used and displayed only if the partition is running with shared processor.
  • You won't find that column if this is run on a dedicated partition.
 



The sar command might be useful to monitor the CPU consumption also:

Example: # sar -u 1 10

 
 
 
 
 
 



Cheers, Mahmoud M. Elshafey

[{"Product":{"code":"SWG10","label":"AIX"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Not Applicable","Platform":[{"code":"PF002","label":"AIX"}],"Version":"6.1;7.1","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
19 April 2020

UID

isg3T1024788