Configuring the collector

The following section describes how to configure the collector in a performance monitoring tool.

The most important configuration options are the domains and the peers configuration options. All other configuration options are best left at their defaults and are explained within the default configuration file shipped with ZIMon.

The configuration file of the collector, ZIMonCollector.cfg, is located in the /opt/IBM/zimon/ folder.

Metric Domain Configuration

The domains configuration indicates the number of metrics to be collected and how long they must be retained and in what granularity. Multiple domains might be specified. If data no longer fits into the current domain, data is spilled over into the next domain and re-sampled.

A simple configuration is:
domains = {
# this is the raw domain, aggregation factor for the raw domain is always 0
aggregation = 0
ram = "500m" # amount of RAM to be used
duration = "12h"
filesize = "1g" # maximum file size
files = 16 # number of files.
}
,
{
# this is the second domain that aggregates to 60 seconds
aggregation = 60
ram = "500m" # amount of RAM to be used
duration = "4w"
filesize = "500m" # maximum file size
files = 4 # number of files.
}
,
{
# this is the third domain that aggregates to 30*60 seconds == 30 minutes
aggregation = 30
ram = "500m" # amount of RAM to be used
duration = "1y"
filesize = "500m" # maximum file size
files = 4 # number of files.
}
The configuration file lists several data domains. At least one domain must be present and the first domain represents the raw data collection as the data is collected by sensors. The aggregation parameter for this first domain must be set to 0.
Each domain specifies the following parameters:
  • The duration parameter indicates the time period until the collected metrics are pushed into the next (coarser-grained) domain. If this option is left out, no limit on the duration is imposed. Permitted units are seconds, hours, days, weeks, months and years { s, h, d, w, m, y }.
  • The ram parameter indicates the amount of RAM to be allocated for the domain. Once that amount of RAM is filled up, collected metrics are pushed into the next (coarser-grained) domain. If this option is left out, no limit on the amount of RAM available is imposed.
  • The filesize and files parameter indicates how much space is allocated on disk for a given domain. While storing metrics in memory, there is a persistence mechanism in place that also stores the metrics on disk in files of size filesize. Once the number of files is reached and a new file is to be allocated, the oldest file is removed from the disk. The persistent storage must be at least as large as the amount of main memory to be allocated for a domain because when the collector is restarted, the in-memory database is re-created from these files.

    If both the ram and the duration parameters are specified, both constraints are active at the same time. As soon as one of the constraints is hit, the collected metrics are pushed into the next (coarser-grained) domain.

The aggregation value, which is used for the second and following domains, indicates the resampling to be performed. Once data is spilled into this domain, the data is resampled to be no better than indicated by the aggregation factor. The value for the second domain is in seconds, the value for domain n (n>2) is the value of domain n-1 multiplied by the aggregation value of domain n.
CAUTION:
Changing the domain ram and duration parameters after data collection has started might lead to the loss of data that is already collected. It it therefore recommended to carefully estimate the collector size based on the monitored installation, and to set these parameters accordingly from the start.

The collector collects the metrics from the sensors. For example, in a five-node cluster where only the load values (load1, load5, load15) are reported, the collector will maintain 15 metrics (3 metrics times 5 nodes). Depending on the number of metrics that are collected, the collector requires a different amount of main memory to store the collected metrics in memory. Assuming 500000 metrics are collected, the following configurations are possible. Depending on the amount of data to be collected, 500000 metrics corresponds to about 1000 nodes.

Configuration 1 (4GB of RAM). Domain one configured at one second granularity for a period of six hours, domain 2 configured at 30 seconds granularity for the next two days, domain 3 configured at 15 minutes granularity for the next two weeks and domain 4 configured at 6-hour granularity for the next 2 months.

Configuration 2 (16GB of RAM). Domain one configured at 1 second granularity for a period of one day, domain 2 configured at 30 sec granularity for the next week, domain 3 configured at 15 minute granularity for the next two months and domain 4 configured at 6-hour granularity for the next year.

Note: The above computation only gives the memory required for the in-memory database, not including the indices necessary for the persistent storage or for the collector program itself.

The collectors can be stopped (deactivated) using the systemctl stop pmcollector command.

The collectors can be started (activated) using the systemctl start pmcollector command.