Question & Answer
Question
Cause
Answer
- Processor idle percentage is the amount of the entitled processing capacity unused while the partition was idle and does not have any outstanding disk I/O request.
- The lparstat command provides a report of LPAR-related information and utilization statistics, check the following output that shows system configuration and processor %user, %system, %idle, %wait and capacity entitlement utilization:
System configuration: type=Shared mode=Uncapped smt=4 lcpu=48 mem=98304MB psize=76 ent=6.00
%user %sys %wait %idle physc %entc lbusy vcsw phint
----- ----- ------ ------ ----- ----- ------ ----- -----
18.3 12.3 0.0 69.3 2.95 49.1 19.8 5108 0
18.4 12.3 0.0 69.3 2.93 48.9 19.6 4573 0
18.5 12.5 0.0 69.0 2.94 49.1 21.1 8543 0
18.5 11.3 0.0 70.2 2.95 49.1 3.2 4080 0
16.8 11.2 0.0 72.0 2.69 44.9 17.2 5142 0
- The physc value indicates the number of physical processors consumed at a time (not necessarily busy cores as they might be partially used).
- The output shows idle percentage around 70%. This percentage is the unused processing capacity at the moment.
- We need to discuss some information about the virtual processor management and throughput modes in order to understand why we would see high idle percentage while all CPUs are used.
- Physical processors are presented to a logical partition's operating system as virtual processors, each logical partition has a number of assigned virtual processors.
- A virtual processor is a structure that represents a physical processor and can be used for virtualization and sharing purposes to save the states and content of the registers of that physical processor.
- Logical processor represents an individual simultaneous multithreading thread (SMT) of a virtual processor, it allows processors to have thread level parallelism at the instruction level.
- Total number of logical processors is the total number of virtual processors assigned multiplied by select SMT mode threads.
- Number of lcpus = Number of virtual CPUs * Number of SMT threads per virtual CPU.
- When SMT is disabled, virtual processor corresponds to one AIX™ logical processor, use the command: "lparstat" to check number of logical processors.
- Each individual SMT thread of a virtual processor is treated as an independent logical processor by AIX™, hence a single workload thread at the instruction level dispatches an individual logical processor.
- There are different SMT modes supported per POWER™ system types, most of current systems use SMT 4 or SMT 8.
-
AIX™ virtual processor management offers a way influence how the instructions are handled and spread across on logical processors.
-
Logical processor represents a single thread of a virtual processor, allows processors to have thread level parallelism at the instruction level.
-
A single workload thread uses an individual logical processor of a virtual processor, next thread uses another logical processor in a specific order, this order is controlled by throughput mode.
-
The schedo command can be used to change the throughput mode by using the parameter called vpm_throughput_mode.
-
The default throughput mode ("raw throughput") forces the first workload thread to use the first SMT thread of the first virtual processor, then the next workload thread uses the first SMT thread of the second virtual processor, and so forth.
-
The "scaled throughput mode" lets the first workload thread use the first SMT thread of the first virtual processor, then the second workload thread uses the second SMT thread on the same virtual processor, when the number of workload threads are higher than the vpm_throughput_mode value. The workload threads are scheduled on the next virtual processors SMT threads.
-
For more details about the vpm_throughput_mode tunable use: # schedo -h vpm_throughput_mode
-
To set the throughput mode use: # schedo -o vpm_throughput_mode=<the desired level of SMT exploitation>.
-
For example, to change from default throughput mode (vpm_throughput_mode=0) to scaled throughput mode with SMT exploitation of 2, use this command: # schedo -o vpm_throughput_mode=2
-
Check the current throughput mode setting, use the command: # schedo -o vpm_throughput_mode
-
The maximum value for vpm_throughput_mode corresponds to the number of hardware threads and logical processors in your POWER™ CPU.
![]()
Processors idle percentage time represents the unused virtual processors SMT threads or from the processors that are fully ceeded.
Example
. SMT = 8
. Hence logical processors = 64
. Throughput mode = Raw (vpm_throughput at default: 0)
. Workload at the moment = 8 workload threads
- The first SMT thread of all virtual CPUs is used to handle the 8 workload threads.
- All virtual CPUs become busy with workload as their first logical CPU (SMT thread) is used.
- The remaining 7 logical CPUs (SMT threads) of all virtual CPUs are free, this is reflected as idle%.
- The unused logical CPUs are ready for extra workload threads if needed.
- In this case, all CPUs are used but not fully busy, as they still have free SMT threads.
- While the total logical CPUs are 64 and the used logical CPUs are only 8, the unused logical CPUs are 56.
- The high number of unused logical CPUs is reflected as an idle percentage.
- The output of lparstat shall show all CPUs used but at the same time it shows high idle% CPU time.
- Another logical partition cannot schedule work on those free SMT threads when one or more of the SMT threads are in use.
![]()
When a single logical CPU (SMT thread) of a virtual processor is used by a logical partition, the rest of the logical CPUs (SMT threads) of this virtual processor remain free and ready for extra workload for this logical partition. Those free logical CPUs are reflected as %idle CPU time until they get busy, and they won't be available at that time for other logical partitions.
Setting up the system properly can reduce high %idle CPU time. The number of the virtual processors shouldn't be set too high to help in reducing the %idle CPU time and in order to compress the workload on less virtual processors, note this is applicable only for some workloads and might not be suitable for other scenarios.
. SMT = 4
. Hence logical processors = 20
. Throughput mode = Raw (vpm_throughput at default of 0)
. Workload at the moment = 7 workload threads
_______ _______ _______ _______ _______
| | | | | | | | | |
SMT1 |Thread1| |Thread2| |Thread3| |Thread4| |Thread5|
SMT2 |Thread6| |Thread7| |Free | |Free | |Free |
SMT3 |Free | |Free | |Free | |Free | |Free |
SMT4 |Free | |Free | |Free | |Free | |Free |
|_______| |_______| |_______| |_______| |_______|
. SMT = 4
. Hence logical processors = 20
. Throughput mode = Scaled (vpm_throughput = 4)
. Workload at the moment = 7 workload threads
_______ _______ _______ _______ _______
| | | | | | | | | |
SMT1 |Thread1| |Thread5| |Free | |Free | |Free |
SMT2 |Thread2| |Thread6| |Free | |Free | |Free |
SMT3 |Thread3| |Thread7| |Free | |Free | |Free |
SMT4 |Thread4| |Free | |Free | |Free | |Free |
|_______| |_______| |_______| |_______| |_______|
- Seeing high idle% CPU time is normal, especially in some scenarios when using raw throughput and high number of virtual CPUs.
- Tuning vpm_throughput_mode will compress the workload on fewer virtual processors usually at the cost of some extra latency, and it reduces the high %idle time percentage allowing more processors folded.
- While it reduces the number of processors consumed, the application throughput and response time generally are not as good as when the vpm_throughput_mode is set to 0.
- When the vpm_throughput_mode is set to 0, the application response time and throughput are the best since the workload is spread across the first CPU SMT threads first.
- Generally most situations the default throughput mode (raw throughput) is used, but some workloads might run better under "scaled throughput mode".
- It is important that workloads are tested when the throughput mode is changed.
Testing a logical partition with the following specifications:
. Mode=Donating
. SMT=4
. LCPU=64
. VCPU=16
. Throughput (vpm_throughput_mode=0)
- Before starting the workload use: "lparstat 1 5", the output shows a 100% idle system with a low physical consumption (because of the dedicated donating setting).
%user %sys %wait %idle physc vcsw %nsp %utcyc
----- ----- ------ ------ ----- ----- ----- ------
0.0 0.0 0.0 100.0 0.01 901 101 0.00
0.0 0.0 0.0 100.0 0.01 854 101 0.00
0.0 0.0 0.0 100.0 0.01 855 101 0.04
0.0 0.0 0.0 100.0 0.01 845 101 0.00
0.0 0.0 0.0 100.0 0.23 856 101 0.00 - The 16 workload threads are using 16 logical CPUs (SMT threads).
- With raw throughput, the 16 workload threads are using the first logical CPU (SMT thread) of each virtual processor, hence all processors are used.
- System still has the rest of the logical CPUs (SMT threads) from SMT thread number 2 until thread number 4 as free on all virtual processors showing around 63% of %idle time.
- See the following output from # lparstat 1 5, the output shows 16 CPUs used (physc) at the same time it shows 63% idle% CPU time:
%user %sys %wait %idle physc vcsw %nsp %utcyc
----- ----- ------ ------ ----- ----- ----- ------
37.7 0.0 0.0 62.2 15.59 647 101 0.35
37.7 0.0 0.0 62.3 15.54 686 101 0.37
37.5 0.0 0.0 62.4 15.91 740 101 0.37
37.8 0.0 0.0 62.1 14.97 861 101 0.39
37.7 0.0 0.0 62.3 15.00 824 101 0.39 -
Using nmon to get output similar to the following, it can get a per logical CPU (SMT thread) utilization%.
-
The output shows the 4 logical CPUs a single virtual processor.
-
The first logical CPU (SMT thread) is with 100% utilization while the rest of logical CPUs are free waiting to handle extra workload:xCPU User% Sys% Wait% Idle%|
x 0 100.0 0.0 0.0 0.0|
x 1 0.0 0.0 0.0 100.0|
x 2 0.0 0.0 0.0 100.0|
x 3 0.0 0.0 0.0 100.0|
Was this topic helpful?
Document Information
Modified date:
12 November 2020
UID
ibm16233418