Configuring the performance monitoring tool
The performance monitoring tool, collector, sensors, and proxies, are a part of the IBM Spectrum Scale™ distribution. The tool is installed with the GPFS™ core packages on all nodes. The tools packages are small, approximately 300 KB for the sensors and 500 KB for the collector.
For information on the usage of ports for the performance monitoring tool, see the Firewall recommendations for Performance Monitoring tool
Configuring sensors
The install toolkit sets up a default set of sensors to monitor on each node. You can modify the sensors on each individual node.
The configuration file of the sensors, ZimonSensors.cfg, is located on each node in the /opt/IBM/zimon folder. The file lists all groups of sensors in it. The configuration file includes the parameter setting of the sensors, such as the reporting frequency, and controls the sensors that are active within the cluster.
sensors =
{
name = "CPU"
period = 1
},
{ name = "Load"
period = 1
},
{
name = "Memory"
period = 1
},
{
name = "Network"
period = 1
filter = "eth*"
# filters are currently ignored.
},
{
name = "Netstat"
period = 1
},
Starting with version 4.2 of the performance monitoring
tool, sensors can be configured on nodes that are part of a IBM
Spectrum Scale cluster
through a IBM
Spectrum Scale based
configuration mechanism. However, this also requires the installation
of IBM
Spectrum Scale 4.2
on all the nodes where a sensor is running and where the sensors are
to be configured. It also requires the entire cluster to be at least
running IBM
Spectrum Scale 4.1.1
and the execution of mmchconfig release=LATEST command.prompt# mmperfmon config generate \
--collectors collector1.domain.com,collector2.domain.com,…
prompt# mmchnode --perfmon –N nodeclass1,nodeclass2,…
prompt# mmperfmon config update param1=value1 param2=value2 …
- The CTDBDBStats.cfg file is referred in:
{ name = "CTDBDBStats" period = 1 type = "Generic" },
- The CTDBStats.cfg file is referred in:
{ name = "CTDBStats" period = 1 type = "Generic" },
- The NFSIO.cfg file is referred in:
{ # NSF Ganesha statistics name = "NFSIO" period = 1 type = "Generic" },
- The SMBGlobalStats.cfg file is referred in:
{ name = "SMBGlobalStats" period = 1 type = "Generic" },
- The SMBStats.cfg file is referred in:
{ name = "SMBStats" period = 1 type = "Generic" },
The period in the example specifies the interval size in number of seconds when a sensor group will gather data. 0 means the sensor group is disabled and 1 runs the sensor group every second. You can specify a higher value to decrease the frequency that the data is collected.
The file also contains the hostname of the node where the collector is running that the sensor should be reporting to.
To enable VFS sensors, use the mmfsadm vfsstats enable command on the node.
The sensors can be stopped (deactivated) using the systemctl stop pmsensors command.
The sensors can be started (activated) enabled using the systemctl start pmsensors command.
Configuring the collector
The configuration file of the collector, ZIMonCollector.cfg, is located in the /opt/IBM/zimon/ folder.
The most important configuration options are the domains and the peers configuration options. All other configuration options are best left at their defaults and are explained within the default configuration file shipped with ZIMon.
Metric Domain Configuration
The domains configuration indicates the number of metrics to be collected and how long they must be retained and in what granularity. Multiple domains might be specified. If data no longer fits into the current domain, data is spilled over into the next domain and resampled.
domains = {
# this is the raw domain, aggregation factor for the raw domain is always 0
aggregation = 0
ram = "500m" # amount of RAM to be used
duration = "12h"
filesize = "1g" # maximum file size
files = 16 # number of files.
}
, {
# this is the second domain that aggregates to 60 seconds
aggregation = 60
ram = "500m" # amount of RAM to be used
duration = "4w"
filesize = "500m" # maximum file size
files = 4 # number of files.
}
, {
# this is the third domain that aggregates to 30*60 seconds == 30 minutes
aggregation = 30
ram = "500m" # amount of RAM to be used
duration = "1y"
filesize = "500m" # maximum file size
files = 4 # number of files.
}
The configuration file lists several data domains. At
least one domain must be present and the first domain represents the
raw data collection as the data is collected by sensors. The aggregation
parameter for this first domain must be set to 0.- The duration parameter indicates how long until the collected metrics will be pushed into the next (coarser-grained) domain. If this option is left out, no limit on the duration will be imposed.
- The ram parameter indicates the amount of RAM to be allocated for the domain. Once that amount of RAM has been filled up, collected metrics will be pushed into the next (coarser-grained) domain. If this option is left out, no limit on the amount of RAM available will be imposed.
- The filesize and files parameter indicates how much space is allocated on disk for a given domain. While storing metrics in memory, there is a persistence mechanism in place that also stores the metrics on disk in files of size filesize. Once the number of files is reached and a new file is to be allocated, the oldest file is removed from the disk. The persistent storage must be at least as large as the amount of main memory to be allocated for a domain because when the collector is restarted, the in-memory database is recreated from these files. Queries can also be served from these files if they are executed in archive mode (-a).
The collector collects the metrics from the sensors. For instance, in a 5 node cluster where only the load values (load1, load5, load15) are reported, the collector will have to maintain 15 metrics (3 metrics times 5 nodes). Depending on the number of metrics collected, the collector requires a different amount of main memory to store the collected metrics in memory. Assuming 500000 metrics are collected, here are two configurations and the amount of data required to store the database. Depending on the amount of data to be collected, 500000 metrics corresponds to about 1000 nodes.
Configuration 1 (4GB of RAM). Domain one configured at 1 second granularity for a period of 6 hours, domain 2 configured at 30 seconds granularity for the next 2 days, domain 3 configured at 15 minutes granularity for the next 2 weeks and domain 4 configured at 6 hour granularity for the next 2 months.
Configuration 2 (16GB of RAM). Domain one configured at 1 second granularity for a period of 1 day, domain 2 configured at 30 sec granularity for the next week, domain 3 configured at 15 minute granularity for the next 2 months and domain 4 configured at 6 hour granularity for the next year.
The collectors can be stopped (deactivated) using the systemctl stop pmcollector command.
The collectors can be started (activated) using the systemctl start pmcollector command.
Manual configuration method
When upgrading the performance monitoring tool, it is important to note how the previous version was configured and if the configuration mechanism is to be changed. Before IBM Spectrum Scale 4.2 the previous version was configured using a file-based configuration mechanism where you had to manually edit the configuration files and propagate them to the requisite nodes. If the configuration mechanism is to be changed, it is important to verify that the installed versions of both IBM Spectrum Scale and the performance monitoring tool support the new configuration mechanism. If this is not the case, or if you would use the manual, file-based, configuration method, then when you install the IBM Spectrum Scale 4.2 there are a couple of steps required. None of the nodes in the cluster should be designated perfmon nodes. If the nodes in the cluster are designated as perfmon nodes then you must run mmchnode --perfmon –N all command.
Then you need to delete the centrally stored configuration information by issuing mmperfmon config delete --all command.
The /opt/IBM/zimon/ZIMonSensors.cfg file is then maintained manually and the tools using it can overwrite it with a new version at any time. This mode is useful if sensors are to be installed on non-Spectrum Scale nodes or if you want to have a cluster with multiple levels of IBM Spectrum Scale running.
Enabling Object metrics
At the time of installation, the Object metrics proxy is configured to start by default on an Object protocol node.
The Object metrics proxy server is controlled by the corresponding service script called pmswiftd, located in the /etc/rc.d/init.d/ directory. You can start and stop it using the systemctl start pmswiftd and systemctl stop pmswiftd commands respectively.
In case of a system restart, the Object metrics proxy server restarts automatically whereas the Object metrics proxy client is triggered by the Performance Monitoring tool. In case of a failover, the server may start automatically if it has failed gracefully. Otherwise, it needs to be started manually.