Configuring multiple collectors

The performance monitoring tool installation can have a single collector, or can consist of multiple collectors to increase the scalability or the fault-tolerance of the performance monitoring system. This latter configuration is referred to as “federation”.

Note: For federation to work, all the collectors need to have the same version number.

In a multi-collector federated configuration, the collectors need to know about each other, else a collector would only return the data stored in its own measurement database. Once the collectors know the peer collectors, they will collaborate with each other to collect data for a given measurement query. All collectors that are part of the federation are specified in the peers configuration option in the collector’s configuration file as shown below:

peers = {
host = "collector1.mydomain.com"
port = "9085"
}, {
host = "collector2.mydomain.com"
port = "9085"
}

The port number is the one specified by the federationport configuration option, typically set to 9085. It is acceptable to list the current host as well so that the same configuration file can be used for all the collector machines.

Once the peers have been specified, a query for measurement data can be directed to any of the collectors listed in the peers section, and the collector will collect and assemble a response based on all relevant data from all collectors. Hence, clients only need to contact a single collector in order to get all the measurements available in the system.

To distribute the measurement data reported by sensors over multiple collectors, multiple collectors may be specified when automatically configuring the sensors, as shown in the following sample:

prompt# mmperfmon config generate \
--collectors collector1.domain.com,collector2.domain.com,…

If multiple collectors are specified, the sensors will pick one of the many collectors to report their measurement data to. The sensors use stable hashes to pick the collector such that the sensor-collector relationship does not change too much if new collectors are added or if a collector is removed.

Additionally, sensors and collectors can be configured for high availability. To maintain high availability each metric should be sent to two collectors in case one collector becomes unavailable. In this setting, sensors report their measurement data to more than one collector, so that the failure of a single collector would not lead to any data loss. For instance, if the collector redundancy is increased to two, every sensor will report to two collectors. As a side-effect of increasing the redundancy to two, the bandwidth consumed for reporting measurement data will be duplicated. The collector redundancy has to be configured before the sensor configuration is stored in GPFS™ by changing the colRedundancy option in /opt/IBM/zimon/defaults/ZIMonSensors.cfg as explained in the Configuring the sensor section.