Using the performance monitoring tool

The performance monitoring tool collects metrics from GPFS and protocols and provides system performance information. By default, the performance monitoring tool is enabled, and it consists of Collectors, Sensors, and Proxies.

Collector

In the older versions of IBM Storage Scale, the performance monitoring tool was configured only with a single collector, which supported up to 150 sensor nodes. The performance monitoring tool can be configured with multiple collectors to increase scalability and fault-tolerance, and this configuration is referred to as multi-collector federation.

In a multi-collector federated configuration, the collectors need to be aware of each other. Otherwise, a collector returns only the data that is stored in its own measurement database. When the collectors are aware of their peer collectors, they can collaborate with each other to collate measurement data for a specific measurement query. All collectors that are part of the federation are specified in the peers configuration option in the collector’s configuration file as shown in the following example:

peers = { host = "collector1.mydomain.com" port = "9085" },
			 { host = "collector2.mydomain.com" port = "9085" }

The port number is the one specified by the federationport configuration option, typically set to 9085. You can also list the current host so that the same configuration file can be used for all the collectors.

Note: A Linux® operating system user is added to the host. This user ID, scalepm, is used by the pmcollector to run the process in the context of the new user. However, the scalepm ID does not have privilege to log in to the system.

When the peers are specified, any query for measurement data is directed to any of the collectors that are listed in the peers section. The collector collects and assembles a response based on all relevant data from all collectors. Hence, clients need to contact only a single collector instead of all of them to get all the measurements available in the system.

To distribute the measurement data reported by sensors over multiple collectors, multiple collectors might be specified when the sensors are configured.

If multiple collectors are specified, the sensors pick one to report their measurement data to. The sensors use stable hashes to pick the collector such that the sensor-collector relationship does not change too much if new collectors are added or if a collector is removed.

Additionally, sensors and collectors can be configured for high availability. In this setting, sensors report their measurement data to more than one collector such that the failure of a single collector would not lead to any data loss. For instance, if the collector redundancy is increased to two, every sensor reports to two collectors. As a side-effect of increasing the redundancy to two, the bandwidth that is used for reporting measurement data is duplicated. The collector redundancy must be configured before the sensor configuration is stored in IBM Storage Scale by changing the colRedundancy option in /opt/IBM/zimon/ZIMonSensors.cfg.

Sensor

A sensor is a component that collects performance data from a node. Typically, multiple sensors run on any node that is needed to collect metrics. By default, the sensors are started on every node.

Sensors identify the collector from the information present in the sensor configuration. The sensor configuration is managed by IBM Storage Scale and can be retrieved and changed by using the mmperfmon command. A copy is stored in /opt/IBM/zimon/ZIMonSensors.cfg. However, this copy must not be edited by users.

Proxy

A proxy is run for each of the protocols to collect the metrics for that protocol.

By default, the NFS, and SMB proxies are started automatically with those protocols. They do not need to be started or stopped. However, to retrieve metrics for SMB, NFS, or Object, these protocols must be active on the specific node.

For information, see the Enabling protocol metrics topic.

For information on enabling Transparent cloud tiering metrics, see Integrating Cloud services metrics with the performance monitoring tool.

Important: When the performance monitoring tool is used, ensure that the clocks of all of the nodes in the cluster are synchronized. The Network Time Protocol (NTP) must be configured on all nodes.

Note: The performance monitoring information, which is driven by the IBM Storage Scale internal monitoring tool and users by using the mmpmon command might affect each other.