Use case 6: Observe the memory usage with MemFree_Rule

This section describes the threshold use case to observe the memory usage with MemFree_Rule.

The default MemFree_rule observes the estimated available memory in relation to the total memory allocation on each cluster node. Run the following command to display all the active threshold rules:
[root@fscc-p8-23-c ~]# mmhealth thresholds list
The system displays output similar to the following:

active_thresholds_monitor: fscc-p8-23-c.mainz.de.ibm.com
### Threshold Rules ###
rule_name                metric                   error  warn  direction  filterBy  groupBy                                            sensitivity
------------------------------------------------------------------------------------------------------------------------------------------------------
InodeCapUtil_Rule        Fileset_inode            90.0   80.0  high                 gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name      300m
DataCapUtil_Rule         DataPool_capUtil         90.0   80.0  high                 gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name  300
MemFree_Rule             MemoryAvailable_percent  None   5.0   low                  node                                               300-min
diskIOreadresponseTime   DiskIoLatency_read       250    100   None                 node, diskdev_name                                 300
SMBConnPerNode_Rule      connect_count            3000   None  high                 node                                               300
diskIOwriteresponseTime  DiskIoLatency_write      250    100   None                 node, diskdev_name                                 300
SMBConnTotal_Rule        connect_count            20000  None  high                                                                    300
MetaDataCapUtil_Rule     MetaDataPool_capUtil     90.0   80.0  high                 gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name  300
The MemFree_Rule throws a warning if the smallest value within the sensitivity period is lower than the threshold warn boundary. The threshold warn boundary is set to 5% by default. All thresholds events can be reviewed by using the mmhealth node show threshold command output.
[root@fscc-p8-23-c ~]# mmhealth node show threshold
The system displays output similar to the following:
Node name:      fscc-p8-23-c.mainz.de.ibm.com

Component                   Status        Status Change     Reasons
----------------------------------------------------------------------------------------------
THRESHOLD                   DEGRADED      5 min. ago        thresholds_warn(MemFree_Rule)
  MemFree_Rule              DEGRADED      Now               thresholds_warn(MemFree_Rule)
  active_thresh_monitor     HEALTHY       10 min. ago       -
  diskIOreadresponseTime    HEALTHY       10 min. ago       -
  diskIOwriteresponseTime   HEALTHY       Now               -


Event               Parameter        Severity    Active Since      Event Message
-------------------------------------------------------------------------------------------------------------------
thresholds_warn     MemFree_Rule     WARNING     Now               The value of MemoryAvailable_percent 
                                                                   for the component(s) MemFree_Rule/fscc-p8-23-c
                                                                   exceeded threshold warning level 5.0
                                                                   defined in MemFree_Rule.
All threshold events that are raised until now can also be reviewed by running the following command:
[root@fscc-p8-23-c ~]# mmhealth node eventlog | grep thresholds 
The system displays output similar to the following:

 ...
2019-08-30 12:40:49.102217 CEST       thresh_monitor_set_active INFO       The thresholds monitoring process is 
                                                                           running in ACTIVE state on the local node
2019-08-30 12:41:04.092083 CEST       thresholds_new_rule       INFO       Rule diskIOreadresponseTime was added
2019-08-30 12:41:04.127695 CEST       thresholds_new_rule       INFO       Rule SMBConnTotal_Rule was added
2019-08-30 12:41:04.147223 CEST       thresholds_new_rule       INFO       Rule diskIOwriteresponseTime was added
2019-08-30 12:41:19.117875 CEST       thresholds_new_rule       INFO       Rule MemFree_Rule was added
2019-08-30 13:16:04.804887 CEST       thresholds_normal         INFO       The value of DiskIoLatency_read defined in 
                                                                           diskIOreadresponseTime for component 
                                                                           diskIOreadresponseTime/fscc-p8-23-c/sda1
                                                                           reached a normal level.
2019-08-30 13:16:04.831206 CEST       thresholds_normal         INFO       The value of DiskIoLatency_read defined in
                                                                           diskIOreadresponseTime for component
                                                                           diskIOreadresponseTime/fscc-p8-23-c/sda2 
                                                                           reached a normal level.
2019-08-30 13:21:05.203115 CEST       thresholds_normal         INFO       The value of DiskIoLatency_read defined in
                                                                           diskIOreadresponseTime for component 
                                                                           diskIOreadresponseTime/fscc-p8-23-c/sdc 
                                                                           reached a normal level.
2019-08-30 13:21:05.227137 CEST       thresholds_normal         INFO       The value of DiskIoLatency_read defined in
                                                                           diskIOreadresponseTime for component 
                                                                           diskIOreadresponseTime/fscc-p8-23-c/sdd 
                                                                           reached a normal level.
2019-08-30 13:21:05.242787 CEST       thresholds_normal         INFO       The value of DiskIoLatency_read defined in
                                                                           diskIOreadresponseTime for component 
                                                                           diskIOreadresponseTime/fscc-p8-23-c/sde 
                                                                           reached a normal level.
2019-08-30 13:41:06.809589 CEST       thresholds_removed        INFO       The value of DiskIoLatency_read for the component(s)
                                                                           diskIOreadresponseTime/fscc-p8-23-c/sda1
                                                                           defined in diskIOreadresponseTime was removed.
2019-08-30 13:41:06.902566 CEST       thresholds_removed        INFO       The value of DiskIoLatency_read for the component(s) 
                                                                           diskIOreadresponseTime/fscc-p8-23-c/sda2 
                                                                           defined in diskIOreadresponseTime was removed.
2019-08-30 15:24:43.224013 CEST       thresholds_warn           WARNING    The value of MemoryAvailable_percent for the component(s)
                                                                           MemFree_Rule/fscc-p8-23-c exceeded threshold warning level
                                                                           6.0 defined in MemFree_Rule.
2019-08-30 15:24:58.243273 CEST       thresholds_normal         INFO       The value of DiskIoLatency_write defined in 
                                                                           diskIOwriteresponseTime for component 
                                                                           diskIOwriteresponseTime/fscc-p8-23-c/sda3 
                                                                           reached a normal level.
2019-08-30 15:24:58.289469 CEST       thresholds_normal         INFO       The value of DiskIoLatency_write defined in 
                                                                           diskIOwriteresponseTime for component 
                                                                           diskIOwriteresponseTime/fscc-p8-23-c/sda 
                                                                           reached a normal level.
2019-08-30 15:29:43.648830 CEST       thresholds_normal         INFO       The value of MemoryAvailable_percent defined
                                                                           in MemFree_Rule for component MemFree_Rule/fscc-p8-23-c 
                                                                           reached a normal level.
...
You can view the mmsysmonitor log located in the /var/adm/ras directory for more specific details about the events that are raised.
[root@fscc-p8-23-c ~]# cat /var/adm/ras/mmsysmonitor.fscc-p8-23-c.log |grep MemoryAvailable_percent
2019-08-30_14:39:06.681+0200: [I] ET_threshold    Event=thresholds_normal identifier=MemFree_Rule/fscc-p8-23-c 
arg0= arg1=MemoryAvailable_percent arg2=MemFree_Rule
2019-08-30_15:24:43.252+0200: [I] ET_threshold    Event=thresholds_warn identifier=MemFree_Rule/fscc-p8-23-c 
arg0=6.0 arg1=MemoryAvailable_percent arg2=MemFree_Rule
If the mmsysmonitor log is set to DEBUG, and the buffering option of the debug messages is turned off, the log file must include all the messages about the threshold rule evaluation process.

[root@fscc-p8-23-c ~]# cat /var/adm/ras/mmsysmonitor.fscc-p8-23-c.log |grep MemoryAvailable_percent
2019-08-30_15:24:42.656+0200: [D] Thread-2096     doFilter _ColumnInfo(name='MemoryAvailable_percent', semType=15, 
keys=(fscc-p8-23-b|Memory|mem_memtotal, fscc-p8-23-b|Memory|mem_memfree, fscc-p8-23-b|Memory|mem_buffers, 
fscc-p8-23-b|Memory|mem_cached, fscc-p8-23-b|Memory|mem_memtotal, fscc-p8-23-b|Memory|mem_memtotal, 
fscc-p8-23-b|Memory|mem_memfree, fscc-p8-23-b|Memory|mem_buffers, fscc-p8-23-b|Memory|mem_cached), 
column=0) value 29.5395588982 ls=0 cs=0 - ThresholdStateFilter.doFilter:164
2019-08-30_15:24:42.658+0200: [D] Thread-2096     doFilter _ColumnInfo(name='MemoryAvailable_percent', semType=15, 
keys=(fscc-p8-23-c|Memory|mem_memtotal, fscc-p8-23-c|Memory|mem_memfree, fscc-p8-23-c|Memory|mem_buffers, 
fscc-p8-23-c|Memory|mem_cached, fscc-p8-23-c|Memory|mem_memtotal, fscc-p8-23-c|Memory|mem_memtotal, 
fscc-p8-23-c|Memory|mem_memfree, fscc-p8-23-c|Memory|mem_buffers, fscc-p8-23-c|Memory|mem_cached), 
column=1) value 4,36088625518 ls=0 cs=0 - ThresholdStateFilter.doFilter:164
2019-08-30_15:24:42.660+0200: [D] Thread-2096     doFilter _ColumnInfo(name='MemoryAvailable_percent', semType=15, 
keys=(fscc-p8-23-d|Memory|mem_memtotal, fscc-p8-23-d|Memory|mem_memfree, fscc-p8-23-d|Memory|mem_buffers, 
fscc-p8-23-d|Memory|mem_cached, fscc-p8-23-d|Memory|mem_memtotal, fscc-p8-23-d|Memory|mem_memtotal, 
fscc-p8-23-d|Memory|mem_memfree, fscc-p8-23-d|Memory|mem_buffers, fscc-p8-23-d|Memory|mem_cached), 
column=2) value 29.6084011311 ls=0 cs=0 - ThresholdStateFilter.doFilter:164
2019-08-30_15:24:42.661+0200: [D] Thread-2096     doFilter _ColumnInfo(name='MemoryAvailable_percent', semType=15, 
keys=(fscc-p8-23-e|Memory|mem_memtotal, fscc-p8-23-e|Memory|mem_memfree, fscc-p8-23-e|Memory|mem_buffers, 
fscc-p8-23-e|Memory|mem_cached, fscc-p8-23-e|Memory|mem_memtotal, fscc-p8-23-e|Memory|mem_memtotal, 
fscc-p8-23-e|Memory|mem_memfree, fscc-p8-23-e|Memory|mem_buffers, fscc-p8-23-e|Memory|mem_cached), 
column=3) value 10.5771109742 ls=0 cs=0 - ThresholdStateFilter.doFilter:164
2019-08-30_15:24:42.663+0200: [D] Thread-2096     doFilter _ColumnInfo(name='MemoryAvailable_percent', semType=15, 
keys=(fscc-p8-23-f|Memory|mem_memtotal, fscc-p8-23-f|Memory|mem_memfree, fscc-p8-23-f|Memory|mem_buffers, 
fscc-p8-23-f|Memory|mem_cached, fscc-p8-23-f|Memory|mem_memtotal, fscc-p8-23-f|Memory|mem_memtotal, 
fscc-p8-23-f|Memory|mem_memfree, fscc-p8-23-f|Memory|mem_buffers, fscc-p8-23-f|Memory|mem_cached), 
column=4) value 11.7536187253 ls=0 cs=0 - ThresholdStateFilter.doFilter:164

The estimation of the available memory on the node is based on the free buffers and the cached memory values. The free buffers and the cached memory values are returned by the performance monitoring tool that is derived from the /proc/meminfo file. The following queries show how the available memory percentage value depends on the sample interval. The larger the bucket_size, the more the metrics values are smoothed.

[root@fscc-p8-23-c ~]# date; echo "get metrics mem_memfree,mem_buffers,mem_cached, 
mem_memtotal from node=fscc-p8-23-c last 10  bucket_size 300 " | /opt/IBM/zimon/zc localhost
Fri 30 Aug 15:26:31 CEST 2019
1:      fscc-p8-23-c|Memory|mem_memfree
2:      fscc-p8-23-c|Memory|mem_buffers
3:      fscc-p8-23-c|Memory|mem_cached
4:      fscc-p8-23-c|Memory|mem_memtotal
Row     Timestamp               mem_memfree     mem_buffers     mem_cached      mem_memtotal
1       2019-08-30 14:40:00     4582652 0       225716  7819328
2       2019-08-30 14:45:00     4003383 0       307715  7819328
3       2019-08-30 14:50:00     3341518 0       344366  7819328
4       2019-08-30 14:55:00     3145560 0       359768  7819328
5       2019-08-30 15:00:00     1968256 0       378541  7819328
6       2019-08-30 15:05:00     710601  0       290097  7819328
7       2019-08-30 15:10:00     418108  0       157229  7819328
8       2019-08-30 15:15:00     365706  0       129333  7819328
9       2019-08-30 15:20:00     293367  0       146004  7819328  --> 5,6%
10      2019-08-30 15:25:00     1637727 3976    1665790 7819328


[root@fscc-p8-23-c ~]# date; echo "get metrics mem_memfree,mem_buffers,mem_cached, 
mem_memtotal from node=fscc-p8-23-c last 10  bucket_size 60 " | /opt/IBM/zimon/zc localhost
Fri 30 Aug 15:26:07 CEST 2019
1:      fscc-p8-23-c|Memory|mem_memfree
2:      fscc-p8-23-c|Memory|mem_buffers
3:      fscc-p8-23-c|Memory|mem_cached
4:      fscc-p8-23-c|Memory|mem_memtotal
Row     Timestamp               mem_memfree     mem_buffers     mem_cached      mem_memtotal
1       2019-08-30 15:17:00     310581  0       161529  7819328
2       2019-08-30 15:18:00     283733  0       116588  7819328
3       2019-08-30 15:19:00     265449  0       112012  7819328
4       2019-08-30 15:20:00     263733  0       102635  7819328  --> 4,7%
5       2019-08-30 15:21:00     251716  0       71268   7819328
6       2019-08-30 15:22:00     3222924 0       61258   7819328
7       2019-08-30 15:23:00     2576786 0       1164106 7819328
8       2019-08-30 15:24:00     1693056 6842    2877966 7819328
9       2019-08-30 15:25:00     2171244 7872    2341481 7819328
10      2019-08-30 15:26:00     2100834 7872    2358798 7819328
.

.
[root@fscc-p8-23-c ~]# date; echo "get metrics mem_memfree,mem_buffers,mem_cached, 
mem_memtotal from node=fscc-p8-23-c last 600  bucket_size 1 " | /opt/IBM/zimon/zc localhost
Fri 30 Aug 15:27:02 CEST 2019
1:      fscc-p8-23-c|Memory|mem_memfree
2:      fscc-p8-23-c|Memory|mem_buffers
3:      fscc-p8-23-c|Memory|mem_cached
4:      fscc-p8-23-c|Memory|mem_memtotal
Row     Timestamp               mem_memfree     mem_buffers     mem_cached      mem_memtotal
...
365    2019-08-30 15:20:17     600064  0       116864  7819328
366    2019-08-30 15:20:18     557760  0       117568  7819328
367    2019-08-30 15:20:19     550336  0       117888  7819328
368    2019-08-30 15:20:20     533312  0       117312  7819328
369    2019-08-30 15:20:21     508096  0       117632  7819328
370    2019-08-30 15:20:22     450816  0       116736  7819328
371    2019-08-30 15:20:23     414272  0       118976  7819328
372    2019-08-30 15:20:24     400384  0       119680  7819328
373    2019-08-30 15:20:25     372096  0       119680  7819328
374    2019-08-30 15:20:26     344448  0       123392  7819328
375    2019-08-30 15:20:27     289664  0       131840  7819328
376    2019-08-30 15:20:28     254848  0       131840  7819328
377    2019-08-30 15:20:29     252416  0       127680  7819328  --> 4,7%
378    2019-08-30 15:20:30     244224  0       125504  7819328  --> 4,8%
379    2019-08-30 15:20:31     271872  0       105920  7819328
380    2019-08-30 15:20:32     270400  0       70592   7819328  ---> 4,3%
381    2019-08-30 15:20:33     229312  0       111360  7819328
382    2019-08-30 15:20:34     null    null    null    null
383    2019-08-30 15:20:35     null    null    null    null
...

527    2019-08-30 15:22:59     3186880 0       1902976 7819328
528    2019-08-30 15:23:00     3045760 0       1957952 7819328
529    2019-08-30 15:23:01     2939456 0       2012224 7819328
530    2019-08-30 15:23:02     2858560 0       2038272 7819328
531    2019-08-30 15:23:03     2835840 0       2038144 7819328
532    2019-08-30 15:23:04     2774656 0       2059968 7819328
533    2019-08-30 15:23:05     2718720 0       2089920 7819328
534    2019-08-30 15:23:06     2681280 0       2095168 7819328
535    2019-08-30 15:23:07     2650688 0       2136896 7819328
536    2019-08-30 15:23:08     2531200 1216    2224832 7819328
537    2019-08-30 15:23:09     2407744 7872    2298880 7819328
538    2019-08-30 15:23:10     2326976 7872    2341440 7819328
539    2019-08-30 15:23:11     2258240 7872    2410624 7819328
The suffix -min in the rule sensitivity parameter prevents the averaging of the metric values. Of all the data points returned by a metrics sensor for a specified sensitivity interval, the smallest value is involved in the threshold evaluation process. Use the following command to get all parameter settings of the default MemFree_rule:

[root@fscc-p8-23-c ~]# mmhealth thresholds list -v
The system displays output similar to the following:

###   MemFree_Rule details  ###
attribute       value                                                                                                                                                                                                                      
------------------------------------------------------------------------------
rule_name MemFree_Rule                                                                                                                                                                                                               
frequency 300                                                                                                                                                                                                                        
tags  	  thresholds                                                                                                                                                                                                                 
user_action_warn  The estimated available memory is less than 5%, 
                  calculated to the total RAM or 40 GB, whichever is lower. 
		     The system performance and stability might be affected. 
                  For more information see the IBM Storage Scale 
                  performance tuning guidelines.
user_action_errorNone                                                                                                                                                                                                                
priority  2                                                                                                                                                                                                                          
downsamplOpNone                                                                                                                                                                                                                      
type      measurement                                                                                                                                                                                                                
metric    MemoryAvailable_percent                                                                                                                                                                                                    
metricOp  noOperation                                                                                                                                                                                                                
bucket_size300                                                                                                                                                                                                                       
computationNone                                                                                                                                                                                                                      
duration None                                                                                                                                                                                                                       
filterBy                                                                                                                                                                                                         
groupBy node                                                                                                                                                                                                                       
error   None                                                                                                                                                                                                                       
warn     6.0                                                                                                                                                                                                                        
directionlow                                                                                                                                                                                                                        
hysteresis0.0                                                                                                                                                                                                                        
sensitivity 300-min