Use case 5: Identify the ACTIVE PERFORMANCE MONITOR
node
This section describes the threshold use case to identify the ACTIVE PERFORMANCE
MONITOR
node.
To see the pmcollector node that is granted the
The system displays
output similar to the
following:ACTIVE PERFORMANCE MONITOR
role, use the following
command:[root@gpfsgui-21 ~]# mmhealth thresholds list
active_thresholds_monitor: gpfsgui-22.novalocal
### Threshold Rules ###
rule_name metric error warn direction filterBy groupBy sensitivity
-----------------------------------------------------------------------------------------------------------------
InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name, 300
gpfs_fs_name,
gpfs_fset_name
DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name, 300
gpfs_fs_name,
gpfs_diskpool_name
MemFree_Rule mem_memfree 50000 100000 low node 300
SMBConnPerNode_Rule connect_count 3000 None high node 300
SMBConnTotal_Rule connect_count 20000 None high 300
MetaDataCapUtil_Rule MetaDataPool_capUtil 90.0 80.0 high gpfs_cluster_name, 300
gpfs_fs_name,
gpfs_diskpool_name
The
information about the
ACTIVE PERFORMANCE MONITOR
node is also included in the
THRESHOLD service health
state.[root@gpfsgui-21 ~]# mmhealth node show threshold -N all
The
health status of the
active_threshold_monitor
for the nodes that have the ACTIVE
PERFORMANCE MONITOR role is shown as a subprocess of the THRESHOLD
service.
Node name: gpfsgui-21.novalocal
Component Status Status Change Reasons
--------------------------------------------------------
THRESHOLD HEALTHY 1 hour ago -
MemFree_Rule HEALTHY 58 min. ago -
There are no active error events for the component THRESHOLD on this node (gpfsgui-21.novalocal).
Node name: gpfsgui-22.novalocal
Component Status Status Change Reasons
-----------------------------------------------------------------
THRESHOLD HEALTHY 1 hour ago -
MemFree_Rule HEALTHY 1 hour ago -
SMBConnTotal_Rule HEALTHY 23 min. ago -
active_thresh_monitor HEALTHY 1 hour ago -
There are no active error events for the component THRESHOLD on this node (gpfsgui-22.novalocal).
Node name: gpfsgui-23.novalocal
Component Status Status Change Reasons
---------------------------------------------------------------
THRESHOLD HEALTHY 1 hour ago -
MemFree_Rule HEALTHY 58 min. ago -
SMBConnPerNode_Rule HEALTHY 33 min. ago -
SMBConnTotal_Rule HEALTHY 33 min. ago -
There are no active error events for the component THRESHOLD on this node (gpfsgui-23.novalocal).
Node name: gpfsgui-24.novalocal
Component Status Status Change Reasons
---------------------------------------------------------------
THRESHOLD HEALTHY 1 hour ago -
MemFree_Rule HEALTHY 1 hour ago -
SMBConnPerNode_Rule HEALTHY 23 min. ago -
There are no active error events for the component THRESHOLD on this node (gpfsgui-24.novalocal).
Node name: gpfsgui-25.novalocal
Component Status Status Change Reasons
---------------------------------------------------------------
THRESHOLD HEALTHY 1 hour ago -
MemFree_Rule HEALTHY 58 min. ago -
SMBConnPerNode_Rule HEALTHY 23 min. ago -
If
the
ACTIVE PERFORMANCE MONITOR
node loses the connection or is unresponsive,
another pmcollector node takes over the role of the ACTIVE PERFORMANCE MONITOR
node. After a new pmcollector takes over the ACTIVE PERFORMANCE MONITOR
role, the
status of all the cluster-wide thresholds is also reported by the new ACTIVE PERFORMANCE
MONITOR
node.
root@gpfsgui-22 ~]# systemctl status pmcollector
pmcollector.service - zimon collector daemon
Loaded: loaded (/usr/lib/systemd/system/pmcollector.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Tue 2019-03-05 15:41:52 CET; 28min ago
Process: 1233 ExecStart=/opt/IBM/zimon/sbin/pmcollector -C /opt/IBM/zimon/ZIMonCollector.cfg -R
/var/run/perfmon (code=exited, status=0/SUCCESS)
Main PID: 1233 (code=exited, status=0/SUCCESS)
Mar 05 14:27:24 gpfsgui-22.novalocal systemd[1]: Started zimon collector daemon.
Mar 05 14:27:24 gpfsgui-22.novalocal systemd[1]: Starting zimon collector daemon...
Mar 05 15:41:50 gpfsgui-22.novalocal systemd[1]: Stopping zimon collector daemon...
Mar 05 15:41:52 gpfsgui-22.novalocal systemd[1]: Stopped zimon collector daemon.
[root@gpfsgui-21 ~]# mmhealth thresholds list
active_thresholds_monitor: gpfsgui-21.novalocal
### Threshold Rules ###
rule_name metric error warn direction filterBy groupBy sensitivity
-----------------------------------------------------------------------------------------------------------------
InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,
gpfs_fs_name,
gpfs_fset_name 300
DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,
gpfs_fs_name,
gpfs_diskpool_name 300
MemFree_Rule mem_memfree 50000 100000 low node 300
SMBConnPerNode_Rule connect_count 3000 None high node 300
SMBConnTotal_Rule connect_count 20000 None high 300
MetaDataCapUtil_Rule MetaDataPool_capUtil 90.0 80.0 high gpfs_cluster_name,
gpfs_fs_name,
gpfs_diskpool_name 300
[root@gpfsgui-21 ~]# mmhealth node show threshold -N all
Node name: gpfsgui-21.novalocal
Component Status Status Change Reasons
-----------------------------------------------------------------
THRESHOLD HEALTHY 1 hour ago -
MemFree_Rule HEALTHY 1 hour ago -
SMBConnTotal_Rule HEALTHY 30 min. ago -
active_thresh_monitor HEALTHY 30 min. ago -
There are no active error events for the component THRESHOLD on this node (gpfsgui-21.novalocal).
Node name: gpfsgui-22.novalocal
Component Status Status Change Reasons
-----------------------------------------------------------------
THRESHOLD HEALTHY 1 hour ago -
MemFree_Rule HEALTHY 1 hour ago -
There are no active error events for the component THRESHOLD on this node (gpfsgui-22.novalocal).
Node name: gpfsgui-23.novalocal
Component Status Status Change Reasons
---------------------------------------------------------------
THRESHOLD HEALTHY 1 hour ago -
MemFree_Rule HEALTHY 1 hour ago -
SMBConnPerNode_Rule HEALTHY 1 hour ago -
There are no active error events for the component THRESHOLD on this node (gpfsgui-23.novalocal).
Node name: gpfsgui-24.novalocal
Component Status Status Change Reasons
---------------------------------------------------------------
THRESHOLD HEALTHY 1 hour ago -
MemFree_Rule HEALTHY 1 hour ago -
SMBConnPerNode_Rule HEALTHY 1 hour ago -
There are no active error events for the component THRESHOLD on this node (gpfsgui-24.novalocal).
Node name: gpfsgui-25.novalocal
Component Status Status Change Reasons
---------------------------------------------------------------
THRESHOLD HEALTHY 1 hour ago -
MemFree_Rule HEALTHY 1 hour ago -
SMBConnPerNode_Rule HEALTHY 1 hour ago -
There are no active error events for the component THRESHOLD on this node (gpfsgui-25.novalocal).
The ACTIVE PERFORMANCE MONITOR switch over triggers new event entry in the Systemhealth event log:
2019-03-05 15:42:02.844214 CET thresh_monitor_set_active INFO The thresholds monitoring
process is running
in ACTIVE state on the local node