GUI performance monitoring issues

The sensor gets the performance data for the collector. The collector application that is called pmcollector runs on every GUI node to display the performance details in the GUI. A sensor application is running on every node of the system.

If GUI is not displaying the performance data, the following might be the reasons:

  1. Collectors are not enabled
  2. Sensors are not enabled
  3. NTP failure

Collectors are not enabled

Do the following to verify whether collectors are working properly:
  1. Issue systemctl status pmcollector on the GUI node to confirm that the collector is running.
  2. If collector service is not started already, start the collector on the GUI nodes by issuing the systemctl restart pmcollector command. Depending on the system requirement, the pmcollector service can be configured to be run on the nodes other than GUI nodes. You need to verify the status of pmcollector service on all nodes where collector is configured.
  3. If you cannot start the service, verify its log file that is located at /var/log/zimon/ZIMonCollector.log to see whether it logs any other details of the issues related to the collector service status.
  4. Use a sample CLI query to test if data collection works properly. For example:
    mmperfmon query cpu_user
Note: After migrating from release 4.2.0.x or later to 4.2.1 or later, you might see the pmcollector service critical error on GUI nodes. In this case, restart the pmcollector service by running the systemctl restart pmcollector command on all GUI nodes.

Sensors are not enabled or not correctly configured

The following table lists sensors that are used to get the performance data for each resource type:
Table 1. Sensors available for each resource type
Resource type Sensor name Candidate nodes
Network Network All
System Resources CPU All
Load
Memory
NSD Server GPFSNSDDisk NSD Server nodes
IBM Storage Scale Client GPFSFilesystem IBM Storage Scale Client nodes
GPFSVFS
GPFSFilesystemAPI
NFS NFSIO Protocol nodes running NFS service
SMB SMBStats Protocol nodes running SMB service
SMBGlobalStats
CTDB CTDBStats Protocol nodes running SMB service
Object SwiftAccount Protocol nodes running Object service
SwiftContainer
SwiftObject
SwiftProxy
Transparent Cloud Tiering MCStoreGPFSStats Cloud gateway nodes
MCStoreIcstoreStats
MCStoreLWEStats
Capacity DiskFree All nodes
GPFSFilesetQuota Only a single node
GPFSDiskCap Only a single node

The IBM Storage Scale GUI lists all sensors in the Services > Performance Monitoring > Sensors page. You can use this view to enable sensors and set appropriate periods and restrictions for sensors. If the configured values are different from recommended values, such sensors are highlighted with a warning symbol.

You can query the data displayed in the performance charts through CLI as well. For more information on how to query performance data displayed in GUI, see Querying performance data shown in the GUI through CLI.

NTP failure

The performance monitoring fails if the clock is not properly synchronized in the cluster. Issue the ntpq -c peers command to verify the NTP state.

GUI Dashboard fails to respond

The system log displays the error as shown.
mgtsrv-system-log shows this error:
2020-12-26T21:12:21 >ZiMONUtility.runQuery:150< FINE: WARNING: PERF_SLOWQUERY -
the following query took at least 10000ms. to execute (executed at 1609045931):
get -j metrics max(df_free) group_by mountPoint  now bucket_size 86400
2020-12-26T21:12:49 >MMEvent.handle:32< FINE: Received mmantras event:
fullMessage:2020.12.26 21.12.49 cliaudit nodeIP:172.16.0.11 GUI-CLI root SYSTEM [EXIT, CHANGE]
'mmaudit all list -Y' RC=1 pid=48922
eventTime:2020.12.26 21.12.49
eventName:cliaudit
nodeIP:172.16.0.11
Event Handler:com.ibm.fscc.gpfs.events.CliAuditEvent
2020-12-26T21:14:24 >ServerCompDAO.getStorageServerForComponent:494< FINE:
No StorageServer found for Component: GenericComponent [clusterId=10525186418337620125,
compType=SERVER, componentId=10, partNumber=5148-21L, serialNumber=78816BA,
name=5148-21L-78816BA, gpfsNodeId=3, displayId=null]
This can occur owing to a large number of keys being unnecessarily collected. You can check the total number of keys by issuing the following command:
mmperfmon query --list keys | wc -l
To resolve, you need to delete the obsolete or expired keys by issuing the following command.
mmperfmon delete --expiredKeys
If there are a large number of keys that are queued for deletion, the command may fail to respond. As an alternative, issue the following command.
mmsysmonc -w 3600 perfkeys delete --expiredkeys
The command waits up to one hour for the processing. If you use Docker or something similar which creates short lived network devices or mount points, those entities can be ignored by using a filter as shown:
mmperfmon config update Network.filter="netdev_name=veth.*|docker.*|flannel.*|cali.*|cbr.*"
mmperfmon config update DiskFree.filter="mountPoint=/var/lib/docker.*|/foo.*"