Automatic assignment of single node sensors

The GPFSFilesetQuota, GPFSFileset, GPFSPool, and GPFSDiskCap sensors must be restricted to run only on a single node in the cluster. The system automatically selects an adequate node within the cluster by assigning a restricted value, @CLUSTER_PERF_SENSOR, to the sensors' restrict field in the cluster. For example, create a configuration by using the mmperfmon config update GPFSDiskCap.restrict=@CLUSTER_PERF_SENSOR command.

The node for CLUSTER_PERF_SENSOR is selected automatically based following criteria:

  • The node has the PERFMON designation.
  • The PERFMON component of the node is HEALTHY.
  • The GPFS component of the node is HEALTHY.
Note: You can use the mmhealth node show command to find out whether the PERFMON and GPFS components of a node are in a HEALTHY state.

By default, this node is selected from all nodes in the cluster. A GUI node is always preferred. However, if you want to restrict the pool of nodes from which the sensor node is chosen, then you can create a node class, CLUSTER_PERF_SENSOR_CANDIDATES, by using the mmcrnodeclass command. After the CLUSTER_PERF_SENSOR_CANDIDATES node class is created, only the nodes in this class are selected.

If the selected node is in the DEGRADED state, then the system automatically reconfigures the CLUSTER_PERF_SENSOR to another node that is in the HEALTHY state and triggers a restart of performance monitoring service on the previous and currently selected nodes.

A user can view the currently selected node in the cluster by using the mmccr vget CLUSTER_PERF_SENSOR command. If the mmhealth node eventlog command is run on the DEGRADED and HEALHTY nodes, then it lists the singleton_sensor_off and singletom_sensor_on events respectively.

If the automatic reconfiguration of the CLUSTER_PERF_SENSOR happens frequently, then the restart of sensors is triggered more often than their configured period value. This can impact the system load and its overall performance.

Note:

The GPFSDiskCap sensor is I/O intensive and it queries for the available file space on all GPFS file systems. In case of large clusters with many file systems, if the GPFSDiskCap sensor is frequently restarted, it can negatively impact the system performance. The GPFSDiskCap sensor can cause a similar impact on the system performance as the mmdf command.

Therefore, it is recommended to use a dedicated node name, instead of using @CLUSTER_PERF_SENSOR for any sensor, in the restrict field of a single node sensor until the node stabilizes in the HEALTHY state.

For example, the GPFSDiskCap sensor is configured using @CLUSTER_PERF_SENSOR variable in the restrict field as shown in the following configuration:

  • name = GPFSDiskCap
  • period = 86400
  • restrict = @CLUSTER_PERF_SENSOR

If this node is frequently restarted, then it can impact the system performance and cause system load issues. This issue can be avoided by using a dedicated node name as shown in the following configuration by using the mmperfmon config update GPFSDiskCap.restrict=abc.cluster.node.com command.

  • name = GPFSDiskCap
  • period = 86400
  • restrict = abc.cluster.node.com
Note: If you manually configure the restrict field of the capacity sensors, then you must ensure that all the file systems on the specified node are mounted. This is done so that the file system-related data, like capacity, can be recorded.

A newly installed cluster has @CLUSTER_PERF_SENSOR as the default value in the restrict fields of the GPFSFilesetQuota, GPFSFileset, GPFSPool, and GPFSDiskCap sensors.

An updated cluster, which was installed before IBM Storage Scale 5.0.5, might not be configured to use this feature automatically, and must be reconfigured by the administrator. You can use the mmperfmon config update SensorName.restrict=@CLUSTER_PERF_SENSOR command, where SensorName is the GPFSFilesetQuota, GPFSFileset, GPFSPool, and GPFSDiskCap sensors, to update the configuration.

CAUTION:
The sensor update works only when the cluster has a minRelelaseLevel of 5.0.1-0 or higher. If you have 4.2.3.-x or 5.0.0.-x nodes in your cluster, this feature does not work.