Use case 5: Identify the ACTIVE PERFORMANCE MONITOR node

This section describes the threshold use case to identify the ACTIVE PERFORMANCE MONITOR node.

To see the pmcollector node that is granted the ACTIVE PERFORMANCE MONITOR role, use the following command:
[root@gpfsgui-21 ~]# mmhealth thresholds list
The system displays output similar to the following:

active_thresholds_monitor: gpfsgui-22.novalocal
### Threshold Rules ###
rule_name             metric                error  warn    direction  filterBy  groupBy             sensitivity
-----------------------------------------------------------------------------------------------------------------
InodeCapUtil_Rule     Fileset_inode         90.0   80.0    high                 gpfs_cluster_name,  300
                                                                                gpfs_fs_name,
                                                                                gpfs_fset_name      
DataCapUtil_Rule      DataPool_capUtil      90.0   80.0    high                 gpfs_cluster_name,  300
                                                                                gpfs_fs_name,
                                                                                gpfs_diskpool_name  
MemFree_Rule          mem_memfree           50000  100000  low                  node                300                              
SMBConnPerNode_Rule   connect_count         3000   None    high                 node                300                              
SMBConnTotal_Rule     connect_count         20000  None    high                                     300                              
MetaDataCapUtil_Rule  MetaDataPool_capUtil  90.0   80.0    high                 gpfs_cluster_name,  300
                                                                                gpfs_fs_name,
                                                                                gpfs_diskpool_name  
The information about the ACTIVE PERFORMANCE MONITOR node is also included in the THRESHOLD service health state.
[root@gpfsgui-21 ~]# mmhealth node show threshold -N all
The health status of the active_threshold_monitor for the nodes that have the ACTIVE PERFORMANCE MONITOR role is shown as a subprocess of the THRESHOLD service.

Node name:      gpfsgui-21.novalocal

Component        Status        Status Change     Reasons
--------------------------------------------------------
THRESHOLD        HEALTHY       1 hour ago        -
  MemFree_Rule   HEALTHY       58 min. ago       -

There are no active error events for the component THRESHOLD on this node (gpfsgui-21.novalocal).

Node name:      gpfsgui-22.novalocal

Component                 Status        Status Change     Reasons
-----------------------------------------------------------------
THRESHOLD                 HEALTHY       1 hour ago        -
  MemFree_Rule            HEALTHY       1 hour ago        -
  SMBConnTotal_Rule       HEALTHY       23 min. ago       -
  active_thresh_monitor   HEALTHY       1 hour ago        -

There are no active error events for the component THRESHOLD on this node (gpfsgui-22.novalocal).

Node name:      gpfsgui-23.novalocal

Component               Status        Status Change     Reasons
---------------------------------------------------------------
THRESHOLD               HEALTHY       1 hour ago        -
  MemFree_Rule          HEALTHY       58 min. ago       -
  SMBConnPerNode_Rule   HEALTHY       33 min. ago       -
  SMBConnTotal_Rule     HEALTHY       33 min. ago       -

There are no active error events for the component THRESHOLD on this node (gpfsgui-23.novalocal).

Node name:      gpfsgui-24.novalocal

Component               Status        Status Change     Reasons
---------------------------------------------------------------
THRESHOLD               HEALTHY       1 hour ago        -
  MemFree_Rule          HEALTHY       1 hour ago        -
  SMBConnPerNode_Rule   HEALTHY       23 min. ago       -
  

There are no active error events for the component THRESHOLD on this node (gpfsgui-24.novalocal).

Node name:      gpfsgui-25.novalocal

Component               Status        Status Change     Reasons
---------------------------------------------------------------
THRESHOLD               HEALTHY       1 hour ago        -
  MemFree_Rule          HEALTHY       58 min. ago       -
  SMBConnPerNode_Rule   HEALTHY       23 min. ago       -
If the ACTIVE PERFORMANCE MONITOR node loses the connection or is unresponsive, another pmcollector node takes over the role of the ACTIVE PERFORMANCE MONITOR node. After a new pmcollector takes over the ACTIVE PERFORMANCE MONITOR role, the status of all the cluster-wide thresholds is also reported by the new ACTIVE PERFORMANCE MONITOR node.

 root@gpfsgui-22 ~]# systemctl status pmcollector
 pmcollector.service - zimon collector daemon
   Loaded: loaded (/usr/lib/systemd/system/pmcollector.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Tue 2019-03-05 15:41:52 CET; 28min ago
  Process: 1233 ExecStart=/opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R 
           /var/run/perfmon (code=exited, status=0/SUCCESS)
 Main PID: 1233 (code=exited, status=0/SUCCESS)

Mar 05 14:27:24 gpfsgui-22.novalocal systemd[1]: Started zimon collector daemon.
Mar 05 14:27:24 gpfsgui-22.novalocal systemd[1]: Starting zimon collector daemon...
Mar 05 15:41:50 gpfsgui-22.novalocal systemd[1]: Stopping zimon collector daemon...
Mar 05 15:41:52 gpfsgui-22.novalocal systemd[1]: Stopped zimon collector daemon.

[root@gpfsgui-21 ~]# mmhealth thresholds list

active_thresholds_monitor: gpfsgui-21.novalocal
### Threshold Rules ###
rule_name             metric                error  warn    direction  filterBy  groupBy             sensitivity
-----------------------------------------------------------------------------------------------------------------
InodeCapUtil_Rule     Fileset_inode         90.0   80.0    high                 gpfs_cluster_name,
                                                                                gpfs_fs_name,
                                                                                gpfs_fset_name      300
DataCapUtil_Rule      DataPool_capUtil      90.0   80.0    high                 gpfs_cluster_name,
                                                                                gpfs_fs_name,
                                                                                gpfs_diskpool_name  300
MemFree_Rule          mem_memfree           50000  100000  low                  node                300                               
SMBConnPerNode_Rule   connect_count         3000   None    high                 node                300                               
SMBConnTotal_Rule     connect_count         20000  None    high                                     300                               
MetaDataCapUtil_Rule  MetaDataPool_capUtil  90.0   80.0    high                 gpfs_cluster_name,
                                                                                gpfs_fs_name,
                                                                                gpfs_diskpool_name  300

[root@gpfsgui-21 ~]# mmhealth node show threshold -N all

Node name:      gpfsgui-21.novalocal

Component                 Status        Status Change     Reasons
-----------------------------------------------------------------
THRESHOLD                 HEALTHY       1 hour ago        -
  MemFree_Rule            HEALTHY       1 hour ago        -
  SMBConnTotal_Rule       HEALTHY       30 min. ago       -
  active_thresh_monitor   HEALTHY       30 min. ago       -

There are no active error events for the component THRESHOLD on this node (gpfsgui-21.novalocal).

Node name:      gpfsgui-22.novalocal

Component                 Status        Status Change     Reasons
-----------------------------------------------------------------
THRESHOLD                 HEALTHY       1 hour ago        -
  MemFree_Rule            HEALTHY       1 hour ago        -

There are no active error events for the component THRESHOLD on this node (gpfsgui-22.novalocal).

Node name:      gpfsgui-23.novalocal

Component               Status        Status Change     Reasons
---------------------------------------------------------------
THRESHOLD               HEALTHY       1 hour ago        -
  MemFree_Rule          HEALTHY       1 hour ago        -
  SMBConnPerNode_Rule   HEALTHY       1 hour ago        -

There are no active error events for the component THRESHOLD on this node (gpfsgui-23.novalocal).

Node name:      gpfsgui-24.novalocal

Component               Status        Status Change     Reasons
---------------------------------------------------------------
THRESHOLD               HEALTHY       1 hour ago        -
  MemFree_Rule          HEALTHY       1 hour ago        -
  SMBConnPerNode_Rule   HEALTHY       1 hour ago        -

There are no active error events for the component THRESHOLD on this node (gpfsgui-24.novalocal).

Node name:      gpfsgui-25.novalocal

Component               Status        Status Change     Reasons
---------------------------------------------------------------
THRESHOLD               HEALTHY       1 hour ago        -
  MemFree_Rule          HEALTHY       1 hour ago        -
  SMBConnPerNode_Rule   HEALTHY       1 hour ago        -

There are no active error events for the component THRESHOLD on this node (gpfsgui-25.novalocal).



The ACTIVE PERFORMANCE MONITOR switch over triggers new event entry in the Systemhealth event log:

2019-03-05 15:42:02.844214 CET        thresh_monitor_set_active INFO       The thresholds monitoring 
                                                                           process is running
                                                                           in ACTIVE state on the local node