Using the performance monitoring tool

The performance monitoring tool collects metrics from GPFS and protocols and provides performance information.

The performance monitoring system is started by default and consists of three parts: Collectors, Sensors, and Proxies.

Collector

In the previous release of IBM Spectrum Scale, the performance monitoring tool could be configured with a single collector only. Start of change A single collector can easily support up to 150 sensor nodes. End of change From version 4.2, the performance monitoring tool can be configured with multiple collectors to increase scalability and fault-tolerance. This latter configuration is referred to as federation.

In a multi-collector federated configuration, the collectors need to be aware of each other, otherwise a collector would only return the data stored in its own measurement database. Once the collectors are aware of their peer collectors, they can collaborate with each other to collate measurement data for a given measurement query. All collectors that are part of the federation are specified in the peers configuration option in the collector’s configuration file as shown in the following example:

peers = { host = "collector1.mydomain.com" port = "9085" },
			 { host = "collector2.mydomain.com" port = "9085" }

The port number is the one specified by the federationport configuration option, typically set to 9085. You can also list the current host so that the same configuration file can be used for all the collector machines.

Once the peers have been specified, any query for measurement data might be directed to any of the collectors listed in the peers section and the collector collects and assembles a response based on all relevant data from all collectors. Hence, clients need to only contact a single collector instead of all of them in order to get all the measurements available in the system.

To distribute the measurement data reported by sensors over multiple collectors, multiple collectors might be specified when configuring the sensors.

If multiple collectors are specified, the sensors pick one to report their measurement data to. The sensors use stable hashes to pick the collector such that the sensor-collector relationship does not change too much if new collectors are added or if a collector is removed.

Additionally, sensors and collectors can be configured for high availability. In this setting, sensors report their measurement data to more than one collector such that the failure of a single collector would not lead to any data loss. For instance, if the collector redundancy is increased to two, every sensor reports to two collectors. As a side-effect of increasing the redundancy to two, the bandwidth consumed for reporting measurement data is duplicated. The collector redundancy has to be configured before the sensor configuration is stored in IBM Spectrum Scale by changing the colRedundancy option in /opt/IBM/zimon/ZIMonSensors.cfg.

Sensor

A sensor is a component that collects performance data from a node. Typically there are multiple sensors run on any node that is required to collect metrics. By default, the sensors are started on every node.

Sensors identify the collector from the information present in the sensor configuration. The sensor configuration is managed by IBM Spectrum Scale, and can be retrieved and changed using the mmperfmon command. A copy is stored in /opt/IBM/zimon/ZIMonSensors.cfg. However, this copy must not be edited by users.

Proxy

A proxy is run for each of the protocols to collect the metrics for that protocol.

By default, the NFS and SMB proxies are started automatically with those protocols. They do not need to be started or stopped. However, to retrieve metrics for SMB, NFS or Object, these protocols have to be active on the specific node.

For information on enabling Object metrics, see the Enabling protocol metrics topic.

For information on enabling Transparent cloud tiering metrics, see Integrating Cloud services metrics with the performance monitoring tool.

Important: When using the performance monitoring tool, ensure that the clocks of all of the nodes in the cluster are synchronized. It is recommended that the Network Time Protocol (NTP) be configured on all nodes.