Introduction to performance monitoring

IBM Storage Scale provides functionality to monitor and maintain the performance of a cluster and its nodes.

IBM Storage Scale comes with a scalable performance monitoring solution built out of sensors and proxies, which reads the individual data, and a collector database that stores it for later use. Sensors, collectors, and proxies, along with performance metrics form the basic components of the Performance Monitoring tool. The performance metrics are counters that monitor the working of a system on which IBM Storage Scale is deployed. The tool can be configured to have several sensors that collect performance data from the nodes in a cluster, and scales up to many nodes. In IBM Storage Scale, there are more than 50 performance sensors that track more than a thousand performance metrics, and collect capacity and usage information. For more information, see Using the performance monitoring tool.

The sensors first receive the data for one point in time and then parse the data into a format that is understood by the collector. The sensors then send the data directly to the collector. Queries are used by the customer or other applications to see and further use the time series data. A single collector can easily support up to 150 sensor nodes. The Performance Monitoring tool can be configured with multiple collectors to increase scalability and fault-tolerance. This latter configuration is referred to as federation. In a multi-collector federated configuration, collectors must be aware of their peers to work properly. All collectors that are part of the federation must be specified in the peer configuration option in the collector’s configuration file.

You can use the mmperfmon command to configure the Performance Monitoring tool and its components to set up metrics, run queries, and compare node metrics. For more information, see mmperfmon command.

The Performance Monitoring tool monitors the system on a per-component basis. A list of all supported components and metrics can be found in the List of performance metrics section.

A file system or disk can have several items of this kind that are called entities in the system. The monitoring of such a component is broken down to the monitoring of the individual file systems or disks.

All the information that is collected by the Performance Monitoring tool can be viewed by using one of the following options:

IBM Storage Scale GUI: A fully customizable dashboard that provides hard-wired performance charts in detail views.
mmperfmon query: A CLI command to query performance data.
Grafana bridge: An open source monitoring dashboard.
REST API to query performance data: The APIs to query and chart the performance data by user-defined dashboard software.

All the performance metrics data that is collected by the Performance Monitoring tool can be monitored by using threshold monitoring. Threshold monitoring is a service that helps the user to identify performance issues by defining threshold rules for selected performance metrics. An IBM Storage Scale user can set user-defined thresholds on any performance metric. For more information, see Threshold monitoring for system health.

While troubleshooting, you can also find more detailed information about the performance monitoring by viewing the perfmon tool logs. The performance monitoring tool logs can be found in the /var/log/zimon directory on each node that is configured for performance monitoring. For more information, see Performance monitoring tool logs.