PowerVP’s GUI has had the “colorized heat” view of utilization since its first release. You were able to add or change the threshold utilization percentages and the color the GUI would show when utilization was within that threshold. This is still available in the 1.1.3 version of PowerVP and will provide a nice visual for someone that is actively monitoring the POWER system using the PowerVP GUI. In release 1.1.3, the PowerVP agents have added the capability to trigger a message when a configured utilization threshold is reached.
In release 1.1.3, the PowerVP agents have added the capability to trigger a message when a configured utilization threshold is reached. In the PowerVP documentation, we call it an alert, but in reality, it is simply a message that is sent to syslog for AIX and Linux, and to the QSYSOPR message queue for IBM i. From there, you can use your own “alerting” software, that you probably already have monitoring other messages to syslog or QSYSOPR, to generate an email, or a page, or a text message, etc. to your system operators so they can handle the situation.
The PowerVP agent allows you to configure alerts for the frame’s CPU utilization, partition CPU utilization, and for POWER bus utilization. This configuration is done on your system level agent for PowerVP. Before using this feature, we recommend that you spend some time analyzing your system with the PowerVP GUI to determine what your system’s utilization characteristics normally are, which will then provide you with the information you need to set up the thresholds for the alerts.
The configuration consists of a threshold (utilization percent) that will trigger the alert. You also configure an amount of time (duration) during which the utilization has to exceed that threshold before an alert is generated, so any minor utilization spikes will not generate unnecessary alerts. You also will configure an amount of time to wait before sending another alert (realert) while the threshold is still being exceeded, which eliminates your operators getting additional alerts while they are still responding to the first one. Optionally, you can also configure a syslog level (for AIX/VIOS and Linux).
The PowerVP agent will simply send a message to syslog on AIX and Linux and to the QSYSOPR message queue on IBM i when one of the configured utilization events as occurred. You will need to use another application that monitors QSYSOPR or syslog to generate your own text messages, emails, pages, etc., to inform the system operators about the situation so they can take the necessary action. For example, if you have a partition that occasionally gets into a loop and consumes a large amount of CPU while in the loop, your operators would need to investigate the partition and perform the necessary action to alleviate the situation.
There are 2 CPU utilizations that can be monitored, the system CPU and individual partition CPU utilization. You can also individually monitor the different power bus utilizations, internode (A bus), intranode (X) bus, memory controller bus, and the IO bus.
The configuration is done in the PowerVP agent configuration file, located in /etc/opt/ibm/powervp/powervp.conf on AIX/VIOS and Linux and in /QIBM/UserData/powervp/powervp.conf on IBM i. The directives are:
- UtilizationAlertPartitionCPU percent duration realert level
- UtilizationAlertSystemCPU percent duration realert level
- UtilizationAlertAbus percent duration realert level
- UtilizationAlertXbus percent duration realert level
- UtilizationAlertMCbus percent duration realert level
- UtilizationAlertInputIObus percent duration realert level
- UtilizationAlertOutputIObus percent duration realert level
The percent is the utilization percent threshold at which the agent will start counting time to see if the utilization continues. You can use any value between 1 and 100.
The duration is in seconds and is the amount of time the utilization needs to continue to exceed the percent before a message is generated. This allows you to specify a time which signifies a real problem, not a temporary spike in utilization that is normal.
The realert is in seconds and is the amount of time after a message has been sent before another message will be sent (provided the utilization continues to exceed the threshold). This provides your operators time to respond to the problem without “flooding” them with repeat messages.
The level specifies the severity of the error to be reported. The level is only used on AIX and Linux, it will be ignored if provided on IBM i. It will default to Notice for Linux and AIX/VIOS. Valid values for level include the syslog severity levels of Emergency, Alert, Critical, Error, Warning, Notice, and Informational. The syslog Facility will be daemon.
You can provide configuration for the situations you want to monitor. For situations you don’t want to monitor, do not provide any configuration file directive.
The UtilizationAlertSystemCPU is used for the CPU utilization of the entire Power system frame. PowerVP sums up the CPU utilization on all of the active cores to determine the frame’s CPU utilization.
The UtilizationAlertPartitionCPU is used for the CPU utilization of the partitions based on their entitled processor capacity. This threshold is probably the most difficult to determine. You may have set up an uncapped partition that runs a single application once a day and during that time, it exceeds its entitled processor capacity, running over 100 percent, while the rest of the day it sits fairly idle. In this case, it would be normal for this partition to exceed 100 percent while the application is running and you wouldn’t want to alert anyone about this normal situation.
The UtilizationAlertAbus and UtilizationAlertXbus are used to monitor the partition affinity of the system. The A busses (AB busses on POWER 7) are used for traffic between the nodes of the POWER system. The X busses (WXYZ busses on POWER 7) are used for traffic between the chips within a node of the POWER system.
The UtilizationAlertMCbus is used to monitor the memory allocation and memory affinity of the system.
The UtilizationAlertInputIObus and UtilizationAlertOutputIObus are used to separately monitor the incoming and outgoing IO busses for high utilization.
High or imbalanced bus utilization can indicate an affinity issue that you may be able to affect. You can use the Dynamic Platform Optimizer (DPO) tool, which assesses the system affinity and will make changes to improve your system’s affinity.
Here are the messages PowerVP will send to the QSYSOPR message queue on IBM i. The message id’s are:
- SLE0121 for UtilizationAlertSystemCPU
- SLE0122 for UtilizationAlertPartitionCPU
- SLE0123 for UtilizationAlertAbus
- SLE0124 for UtilizationAlertXbus
- SLE0125 for UtilizationAlertMCbus
- SLE0126 for both UtilizationAlertInputIObus and UtilizationAlertOutputIObus
For AIX and Linux (and for IBM i) the message text starts with a message identifier:
- MSG0107 for UtilizationAlertSystemCPU
- MSG0106 for UtilizationAlertPartitionCPU
- MSG0108 for UtilizationAlertAbus
- MSG0109 for UtilizationAlertXbus
- MSG0110 for UtilizationAlertMCbus
- MSG0111 for both UtilizationAlertInputIObus and UtilizationAlertOutputIObus
In addition to a message in QSYSOPR or to syslog, PowerVP will also log the message in the joblog for the PowerVP agent on IBM i and in the log file, /var/log/powervp.log, on AIX/VIOS and Linux.
19 December 2019