GUI performance monitoring issues

The sensor gets the performance data for the collector. The collector application that is called pmcollector runs on every GUI node to display the performance details in the GUI. A sensor application is running on every node of the system.

If GUI is not displaying the performance data, the following might be the reasons:

Collectors are not enabled
Sensors are not enabled
NTP failure

Collectors are not enabled

Do the following to verify whether collectors are working properly:

Issue systemctl status pmcollector on the GUI node to confirm that the collector is running.
If collector service is not started already, start the collector on the GUI nodes by issuing the systemctl restart pmcollector command. Depending on the system requirement, the pmcollector service can be configured to be run on the nodes other than GUI nodes. You need to verify the status of pmcollector service on all nodes where collector is configured.
If you cannot start the service, verify its log file that is located at /var/log/zimon/ZIMonCollector.log to see whether it logs any other details of the issues related to the collector service status.
Use a sample CLI query to test if data collection works properly. For example:
```
mmperfmon query cpu_user
```

Note: After migrating from release 4.2.0.x or later to 4.2.1 or later, you might see the pmcollector service critical error on GUI nodes. In this case, restart the pmcollector service by running the systemctl restart pmcollector command on all GUI nodes.

Sensors are not enabled or not correctly configured

The following table lists sensors that are used to get the performance data for each resource type:

Table 1. Sensors available for each resource type
Resource type	Sensor name	Candidate nodes
Network	Network	All
System Resources	CPU	All
	Load
	Memory
NSD Server	GPFSNSDDisk	NSD Server nodes
IBM Storage Scale Client	GPFSFilesystem	IBM Storage Scale Client nodes
	GPFSVFS
	GPFSFilesystemAPI
NFS	NFSIO	Protocol nodes running NFS service
SMB	SMBStats	Protocol nodes running SMB service
SMB	SMBGlobalStats	Protocol nodes running SMB service
CTDB	CTDBStats	Protocol nodes running SMB service
Object	SwiftAccount	Protocol nodes running Object service
	SwiftContainer
	SwiftObject
	SwiftProxy
Transparent Cloud Tiering	MCStoreGPFSStats	Cloud gateway nodes
	MCStoreIcstoreStats
	MCStoreLWEStats
Capacity	DiskFree	All nodes
	GPFSFilesetQuota	Only a single node
	GPFSDiskCap	Only a single node

The IBM Storage Scale GUI lists all sensors in the Services > Performance Monitoring > Sensors page. You can use this view to enable sensors and set appropriate periods and restrictions for sensors. If the configured values are different from recommended values, such sensors are highlighted with a warning symbol.

You can query the data displayed in the performance charts through CLI as well. For more information on how to query performance data displayed in GUI, see Querying performance data shown in the GUI through CLI.

NTP failure

The performance monitoring fails if the clock is not properly synchronized in the cluster. Issue the ntpq -c peers command to verify the NTP state.

GUI Dashboard fails to respond

The system log displays the error as shown.

mgtsrv-system-log shows this error:
2020-12-26T21:12:21 >ZiMONUtility.runQuery:150< FINE: WARNING: PERF_SLOWQUERY -
the following query took at least 10000ms. to execute (executed at 1609045931):
get -j metrics max(df_free) group_by mountPoint  now bucket_size 86400
2020-12-26T21:12:49 >MMEvent.handle:32< FINE: Received mmantras event:
fullMessage:2020.12.26 21.12.49 cliaudit nodeIP:172.16.0.11 GUI-CLI root SYSTEM [EXIT, CHANGE]
'mmaudit all list -Y' RC=1 pid=48922
eventTime:2020.12.26 21.12.49
eventName:cliaudit
nodeIP:172.16.0.11
Event Handler:com.ibm.fscc.gpfs.events.CliAuditEvent
2020-12-26T21:14:24 >ServerCompDAO.getStorageServerForComponent:494< FINE:
No StorageServer found for Component: GenericComponent [clusterId=10525186418337620125,
compType=SERVER, componentId=10, partNumber=5148-21L, serialNumber=78816BA,
name=5148-21L-78816BA, gpfsNodeId=3, displayId=null]

This can occur owing to a large number of keys being unnecessarily collected. You can check the total number of keys by issuing the following command:

mmperfmon query --list keys | wc -l

To resolve, you need to delete the obsolete or expired keys by issuing the following command.

mmperfmon delete --expiredKeys

If there are a large number of keys that are queued for deletion, the command may fail to respond. As an alternative, issue the following command.

mmsysmonc -w 3600 perfkeys delete --expiredkeys

The command waits up to one hour for the processing. If you use Docker or something similar which creates short lived network devices or mount points, those entities can be ignored by using a filter as shown:

mmperfmon config update Network.filter="netdev_name=veth.*|docker.*|flannel.*|cali.*|cbr.*"
mmperfmon config update DiskFree.filter="mountPoint=/var/lib/docker.*|/foo.*"