Configuring multiple collectors
The performance monitoring tool installation can have a single collector or can consist of multiple collectors to increase the scalability or the fault-tolerance of the performance monitoring system. This latter configuration is referred to as federation. A single collector can easily support up to 150 sensor nodes.
In a multi-collector federated configuration, the collectors need to know about each other, else a collector would return only the data that is stored in its own measurement database. After the collectors know the peer collectors, they collaborate with each other to collect data for a given measurement query. All collectors that are part of the federation are specified in the peers configuration option in the collector’s configuration file as shown as:
peers = {
host = "collector1.mydomain.com"
port = "9085"
}, {
host = "collector2.mydomain.com"
port = "9085"
}
The port number is the one specified by the federationport configuration option, typically set to 9085. It is acceptable to list the current host so that the same configuration file can be used for all the collector machines.
After the peers are specified, a query for measurement data can be directed to any of the collectors listed in the peers section. Also, the collectors collect and assemble a response that is based on all relevant data from all collectors. Hence, clients only need to contact a single collector to get all the measurements available in the system.
To distribute the measurement data reported by sensors over multiple collectors, multiple collectors might be specified when automatically configuring the sensors, as shown in the following sample:
prompt# mmperfmon config generate \
--collectors
collector1.domain.com,collector2.domain.com,…
If multiple collectors are specified, then the federation between these collectors is configured automatically. The peers section in those collectors' configuration files, /opt/IBM/zimon/ZIMonCollector.cfg, is also updated. The sensors pick one of the many collectors to report their measurement data to. The sensors use stable hashes to pick the collector such that the sensor-collector relationship does not change too much when new collectors are added or when a collector is removed.
Additionally, sensors and collectors can be configured for high availability. To maintain high availability, each metric is sent to two collectors in case one collector becomes unavailable. In this setting, sensors report their measurement data to more than one collector so that the failure of a single collector would not lead to any data loss. For instance, if the collector redundancy is increased to two, then every sensor reports to two collectors. As a side-effect of increasing the redundancy to two, the bandwidth that is used for reporting measurement data is duplicated. The collector redundancy must be configured before the sensor configuration is stored in GPFS by changing the colRedundancy option in /opt/IBM/zimon/defaults/ZIMonSensors.cfg as explained in the Configuring the sensor section.