Planning for PowerVC monitoring

As you plan your IBM® Power® Virtualization Center environment monitoring requirements, make sure that you adhere to the recommended minimum hardware resources to achieve the best possible performance for a default installation. However, there are additional aspects you might want to consider, such as data retention and backup space requirements.

Data retention

By default, the PowerVC monitoring components are configured to hold a maximum of 10 GB of data per host. That amount is further diminished on a multi-node installation by the setting of the replication factor for OpenSearch data. When the number of nodes is more than one, by default the replication factor is set to 2. That provides a (n-1) resilience factor, which means the cluster still has all data if one of the nodes goes down.

This amount of disk space might not be adequate to store a desired time window of log entries. The number of log entries is not static – it depends on the features you are using in PowerVC and the amount of resources managed by it. It is not surprising on production systems for the log information to grow at a rate of multiple GBs per day or more.

Consequently, you (specially in production environments) might want to adjust these default retention values and provide more storage space to be used by OpenSearch.

The most important retention settings are located in the /opt/ibm/powervc-opsmgr/ansible/monitoring/vars/curator.yml file. Users are free to change these values, but should be aware that they are only applied after a config or a reset CLI command.

The following variables control the retention of data in Elastic search:
curator_filter_type
This is the type of criteria rule used to manage data pruning. Either age or space criteria are supported. If the age criteria is used, then curator_prune_days must also be set. If the space criteria is used, then curator_disk_space must also be set. The default value for this setting is space.
curator_prune_days
Specifies the number of days of data that must be retained. Any data older than this will be pruned (removed) from the database. It defaults to 10 days. Special care must be taken when using the age criteria since the amount of space that is used by log data on any given day is not necessarily the same or consistent over time. Always allow plenty of disk space on your /usr file system for the database to store data in this case.

curator_disk_space
Specifies the maximum storage space in GB that can be used by database to store daily log data. The default value is 10GB. The granularity of pruned data is a whole day’s worth. That means if your environment produces 5 GB of data in the first day, 3 GB on the second day and 3 GB on the third day, the first day of log data (5 GB, the oldest data first) will be pruned because the sum of all days exceeds your limit of 10 GB. Data consumption is checked on an hourly basis, so it might take up to an hour until the excess data is recognized and pruned by the system. For this reason, it is always good to allocate at least an extra hour’s worth of data space to your /usr file system.

These settings must be properly set before installing PowerVC (if the monitoring variable in inventory is set to True) or later on before installing the monitoring components by using the powervc-opsmgr monitoring --install CLI option (if the monitoring variable wasn’t previously set during PowerVC initial installation). You can also, as previously stated, use other powervc-opsmgr monitoring CLI options to update the cluster after an install: --config or --reset (beware: a reset causes all your previously collected data to be flushed and lost).

Also, any changes to the above settings must be accompanied by available storage in the /usr file system.

Backup storage requirements

Backup storage can either be on a file system or on a separate disk. For multinode installations, only the primary / bootstrap host set in inventory holds the backup data (and that storage is mounted via NFS to the other controller nodes in the cluster). The main difference between single- and multi-node installation is that the amount of storage you must plan for backup is equal to the total amount of data being stored by the database for all hosts, not only on the primary / bootstrap host.

In the single-node scenario, this is easy to calculate, since the amount of storage you need is equal to the maximum storage allowed by the data retention criteria.

However, in the multi-node scenario you must multiply that amount by the number of monitoring controller nodes you have in your cluster.

For example, if you have a 3-node PowerVC controller cluster and you set data retention criteria of space and curator_disk_space is set to 100 GB , then the total amount of space you must allocate and reserve for a backup on the primary / bootstrap is 300 GB (three times 100 GB ).

The backup settings are located in the /opt/ibm/powervc-opsmgr/ansible/monitoring/vars/elastic.yml file. Again, users are free to change these settings after an install, but they only enter in effect after a --config or a --reset CLI command.

The following variables control up the setting and allocation of backup space for OpenSearch:
monitor_bak_id
This is an identifier for your backup. A directory with the same name as the value of this variable is created under the location of the monitor_bak_path variable (see below). By default it's value is set to latest. So, for example, if all the default values were not altered prior to installation, the backed up files would be located under /backup/latest in the file system.
monitor_bak_path
Backup is stored here in the file system. The default is to store backups under the /backup directory in the root filesystem.
monitor_bak_disk
If this is set to yes then a new storage device is allocated (must already be visible by the operating system before PowerVC or monitoring installation) to host the backup data. In addition, if this is set to yes, then monitor_bak_fdev and monitor_bak_fsys must also be set. The default setting is no, which means it reuses the disk in the system where the root filesystem (/) is allocated.
monitor_bak_fdev
This is the device corresponding to the additional disk that you want to use to hold the backup storage. By default, it points to the second SATA drive as recognized by the operating system (sdb) and its first partition (sdb1). You can change this setting to reflect the specific drive and partition you want to use.
monitor_bak_fsys
This is the type of file system you want to create on the disk or partition that is specified by monitor_bak_fdev. The PowerVC monitoring installation formats the partition by using the file system type that you chose. This value must be a valid type of file system that your operating system supports. By default, this is set to use the xfs filesystem type.
monitor_bak_cidr
This is the cidr of the network where your cluster is installed. It is used for the mounting of the NFS file system across all nodes in the cluster. By default this is set to 22, but please ensure that it conforms to your cluster's network cidr value prior to installing PowerVC monitoring or the backup and restore CLI commands will not work. It can also be altered after installation and then applied by using the CLI's --config or --reset options.
monitor_bak_zips
Controls whether the backup is compressed after it completes or not. By default, this is set to yes, and a zip file is created under the location set by the monitor_bak_path variable.

There are also other variables in the backup.yaml file. It is recommended that the user does not alter their values unless one is an experienced Ansible user. But keep in mind that changes to these values are currently not supported.

Additional considerations

Initially, it is recommended you monitor the data consumption to make sure your requirements are met and the amount of log data you need to retain to adequately troubleshoot your environment is kept in the database.

Later on, if your requirements change and you need to store more data, or more data is produced by your day-to-day operations you can add additional storage to meet log data retention goals, and then update the yaml file configuration and apply it using CLI commands.

The same applies to backup space. Always plan to adjust your backup storage space in the primary / bootstrap host accordingly. It should be able to hold all data held by the database on all nodes in the cluster if multi-node installation, or the same amount in a single-node installation.