Use case 6: Observe the memory usage with MemFree_Rule
This section describes the threshold use case to observe the memory usage with
MemFree_Rule
.
The default
The system displays
output similar to the following:MemFree_rule
observes the
estimated available memory in relation to the total memory allocation on each cluster node. Run the
following command to display all the active threshold
rules:[root@fscc-p8-23-c ~]# mmhealth thresholds list
active_thresholds_monitor: fscc-p8-23-c.mainz.de.ibm.com
### Threshold Rules ###
rule_name metric error warn direction filterBy groupBy sensitivity
------------------------------------------------------------------------------------------------------------------------------------------------------
InodeCapUtil_Rule Fileset_inode 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name 300m
DataCapUtil_Rule DataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300
MemFree_Rule MemoryAvailable_percent None 5.0 low node 300-min
diskIOreadresponseTime DiskIoLatency_read 250 100 None node, diskdev_name 300
SMBConnPerNode_Rule connect_count 3000 None high node 300
diskIOwriteresponseTime DiskIoLatency_write 250 100 None node, diskdev_name 300
SMBConnTotal_Rule connect_count 20000 None high 300
MetaDataCapUtil_Rule MetaDataPool_capUtil 90.0 80.0 high gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name 300
The
MemFree_Rule
throws a warning if the smallest value within the sensitivity period
is lower than the threshold warn boundary. The threshold warn boundary is set to 5% by default. All
thresholds events can be reviewed by using the mmhealth node show threshold
command output. [root@fscc-p8-23-c ~]# mmhealth node show threshold
The system
displays output similar to the
following:Node name: fscc-p8-23-c.mainz.de.ibm.com
Component Status Status Change Reasons
----------------------------------------------------------------------------------------------
THRESHOLD DEGRADED 5 min. ago thresholds_warn(MemFree_Rule)
MemFree_Rule DEGRADED Now thresholds_warn(MemFree_Rule)
active_thresh_monitor HEALTHY 10 min. ago -
diskIOreadresponseTime HEALTHY 10 min. ago -
diskIOwriteresponseTime HEALTHY Now -
Event Parameter Severity Active Since Event Message
-------------------------------------------------------------------------------------------------------------------
thresholds_warn MemFree_Rule WARNING Now The value of MemoryAvailable_percent
for the component(s) MemFree_Rule/fscc-p8-23-c
exceeded threshold warning level 5.0
defined in MemFree_Rule.
All
threshold events that are raised until now can also be reviewed by running the following
command:
[root@fscc-p8-23-c ~]# mmhealth node eventlog | grep thresholds
The
system displays output similar to the following:
...
2019-08-30 12:40:49.102217 CEST thresh_monitor_set_active INFO The thresholds monitoring process is
running in ACTIVE state on the local node
2019-08-30 12:41:04.092083 CEST thresholds_new_rule INFO Rule diskIOreadresponseTime was added
2019-08-30 12:41:04.127695 CEST thresholds_new_rule INFO Rule SMBConnTotal_Rule was added
2019-08-30 12:41:04.147223 CEST thresholds_new_rule INFO Rule diskIOwriteresponseTime was added
2019-08-30 12:41:19.117875 CEST thresholds_new_rule INFO Rule MemFree_Rule was added
2019-08-30 13:16:04.804887 CEST thresholds_normal INFO The value of DiskIoLatency_read defined in
diskIOreadresponseTime for component
diskIOreadresponseTime/fscc-p8-23-c/sda1
reached a normal level.
2019-08-30 13:16:04.831206 CEST thresholds_normal INFO The value of DiskIoLatency_read defined in
diskIOreadresponseTime for component
diskIOreadresponseTime/fscc-p8-23-c/sda2
reached a normal level.
2019-08-30 13:21:05.203115 CEST thresholds_normal INFO The value of DiskIoLatency_read defined in
diskIOreadresponseTime for component
diskIOreadresponseTime/fscc-p8-23-c/sdc
reached a normal level.
2019-08-30 13:21:05.227137 CEST thresholds_normal INFO The value of DiskIoLatency_read defined in
diskIOreadresponseTime for component
diskIOreadresponseTime/fscc-p8-23-c/sdd
reached a normal level.
2019-08-30 13:21:05.242787 CEST thresholds_normal INFO The value of DiskIoLatency_read defined in
diskIOreadresponseTime for component
diskIOreadresponseTime/fscc-p8-23-c/sde
reached a normal level.
2019-08-30 13:41:06.809589 CEST thresholds_removed INFO The value of DiskIoLatency_read for the component(s)
diskIOreadresponseTime/fscc-p8-23-c/sda1
defined in diskIOreadresponseTime was removed.
2019-08-30 13:41:06.902566 CEST thresholds_removed INFO The value of DiskIoLatency_read for the component(s)
diskIOreadresponseTime/fscc-p8-23-c/sda2
defined in diskIOreadresponseTime was removed.
2019-08-30 15:24:43.224013 CEST thresholds_warn WARNING The value of MemoryAvailable_percent for the component(s)
MemFree_Rule/fscc-p8-23-c exceeded threshold warning level
6.0 defined in MemFree_Rule.
2019-08-30 15:24:58.243273 CEST thresholds_normal INFO The value of DiskIoLatency_write defined in
diskIOwriteresponseTime for component
diskIOwriteresponseTime/fscc-p8-23-c/sda3
reached a normal level.
2019-08-30 15:24:58.289469 CEST thresholds_normal INFO The value of DiskIoLatency_write defined in
diskIOwriteresponseTime for component
diskIOwriteresponseTime/fscc-p8-23-c/sda
reached a normal level.
2019-08-30 15:29:43.648830 CEST thresholds_normal INFO The value of MemoryAvailable_percent defined
in MemFree_Rule for component MemFree_Rule/fscc-p8-23-c
reached a normal level.
...
You can view the
mmsysmonitor
log located in the
/var/adm/ras directory for more specific details about the events that are
raised.[root@fscc-p8-23-c ~]# cat /var/adm/ras/mmsysmonitor.fscc-p8-23-c.log |grep MemoryAvailable_percent
2019-08-30_14:39:06.681+0200: [I] ET_threshold Event=thresholds_normal identifier=MemFree_Rule/fscc-p8-23-c
arg0= arg1=MemoryAvailable_percent arg2=MemFree_Rule
2019-08-30_15:24:43.252+0200: [I] ET_threshold Event=thresholds_warn identifier=MemFree_Rule/fscc-p8-23-c
arg0=6.0 arg1=MemoryAvailable_percent arg2=MemFree_Rule
If the
mmsysmonitor
log is set to DEBUG
, and
the buffering option of the debug messages is turned off, the log file must include all the messages
about the threshold rule evaluation process.
[root@fscc-p8-23-c ~]# cat /var/adm/ras/mmsysmonitor.fscc-p8-23-c.log |grep MemoryAvailable_percent
2019-08-30_15:24:42.656+0200: [D] Thread-2096 doFilter _ColumnInfo(name='MemoryAvailable_percent', semType=15,
keys=(fscc-p8-23-b|Memory|mem_memtotal, fscc-p8-23-b|Memory|mem_memfree, fscc-p8-23-b|Memory|mem_buffers,
fscc-p8-23-b|Memory|mem_cached, fscc-p8-23-b|Memory|mem_memtotal, fscc-p8-23-b|Memory|mem_memtotal,
fscc-p8-23-b|Memory|mem_memfree, fscc-p8-23-b|Memory|mem_buffers, fscc-p8-23-b|Memory|mem_cached),
column=0) value 29.5395588982 ls=0 cs=0 - ThresholdStateFilter.doFilter:164
2019-08-30_15:24:42.658+0200: [D] Thread-2096 doFilter _ColumnInfo(name='MemoryAvailable_percent', semType=15,
keys=(fscc-p8-23-c|Memory|mem_memtotal, fscc-p8-23-c|Memory|mem_memfree, fscc-p8-23-c|Memory|mem_buffers,
fscc-p8-23-c|Memory|mem_cached, fscc-p8-23-c|Memory|mem_memtotal, fscc-p8-23-c|Memory|mem_memtotal,
fscc-p8-23-c|Memory|mem_memfree, fscc-p8-23-c|Memory|mem_buffers, fscc-p8-23-c|Memory|mem_cached),
column=1) value 4,36088625518 ls=0 cs=0 - ThresholdStateFilter.doFilter:164
2019-08-30_15:24:42.660+0200: [D] Thread-2096 doFilter _ColumnInfo(name='MemoryAvailable_percent', semType=15,
keys=(fscc-p8-23-d|Memory|mem_memtotal, fscc-p8-23-d|Memory|mem_memfree, fscc-p8-23-d|Memory|mem_buffers,
fscc-p8-23-d|Memory|mem_cached, fscc-p8-23-d|Memory|mem_memtotal, fscc-p8-23-d|Memory|mem_memtotal,
fscc-p8-23-d|Memory|mem_memfree, fscc-p8-23-d|Memory|mem_buffers, fscc-p8-23-d|Memory|mem_cached),
column=2) value 29.6084011311 ls=0 cs=0 - ThresholdStateFilter.doFilter:164
2019-08-30_15:24:42.661+0200: [D] Thread-2096 doFilter _ColumnInfo(name='MemoryAvailable_percent', semType=15,
keys=(fscc-p8-23-e|Memory|mem_memtotal, fscc-p8-23-e|Memory|mem_memfree, fscc-p8-23-e|Memory|mem_buffers,
fscc-p8-23-e|Memory|mem_cached, fscc-p8-23-e|Memory|mem_memtotal, fscc-p8-23-e|Memory|mem_memtotal,
fscc-p8-23-e|Memory|mem_memfree, fscc-p8-23-e|Memory|mem_buffers, fscc-p8-23-e|Memory|mem_cached),
column=3) value 10.5771109742 ls=0 cs=0 - ThresholdStateFilter.doFilter:164
2019-08-30_15:24:42.663+0200: [D] Thread-2096 doFilter _ColumnInfo(name='MemoryAvailable_percent', semType=15,
keys=(fscc-p8-23-f|Memory|mem_memtotal, fscc-p8-23-f|Memory|mem_memfree, fscc-p8-23-f|Memory|mem_buffers,
fscc-p8-23-f|Memory|mem_cached, fscc-p8-23-f|Memory|mem_memtotal, fscc-p8-23-f|Memory|mem_memtotal,
fscc-p8-23-f|Memory|mem_memfree, fscc-p8-23-f|Memory|mem_buffers, fscc-p8-23-f|Memory|mem_cached),
column=4) value 11.7536187253 ls=0 cs=0 - ThresholdStateFilter.doFilter:164
The estimation of the available memory on the node is based on the free buffers
and the cached memory values. The free buffers and the cached memory values are returned by the
performance monitoring tool that is derived from the /proc/meminfo file. The
following queries show how the available memory percentage value depends on the sample interval. The
larger the
bucket_size
, the more the metrics values are smoothed.
[root@fscc-p8-23-c ~]# date; echo "get metrics mem_memfree,mem_buffers,mem_cached,
mem_memtotal from node=fscc-p8-23-c last 10 bucket_size 300 " | /opt/IBM/zimon/zc localhost
Fri 30 Aug 15:26:31 CEST 2019
1: fscc-p8-23-c|Memory|mem_memfree
2: fscc-p8-23-c|Memory|mem_buffers
3: fscc-p8-23-c|Memory|mem_cached
4: fscc-p8-23-c|Memory|mem_memtotal
Row Timestamp mem_memfree mem_buffers mem_cached mem_memtotal
1 2019-08-30 14:40:00 4582652 0 225716 7819328
2 2019-08-30 14:45:00 4003383 0 307715 7819328
3 2019-08-30 14:50:00 3341518 0 344366 7819328
4 2019-08-30 14:55:00 3145560 0 359768 7819328
5 2019-08-30 15:00:00 1968256 0 378541 7819328
6 2019-08-30 15:05:00 710601 0 290097 7819328
7 2019-08-30 15:10:00 418108 0 157229 7819328
8 2019-08-30 15:15:00 365706 0 129333 7819328
9 2019-08-30 15:20:00 293367 0 146004 7819328 --> 5,6%
10 2019-08-30 15:25:00 1637727 3976 1665790 7819328
[root@fscc-p8-23-c ~]# date; echo "get metrics mem_memfree,mem_buffers,mem_cached,
mem_memtotal from node=fscc-p8-23-c last 10 bucket_size 60 " | /opt/IBM/zimon/zc localhost
Fri 30 Aug 15:26:07 CEST 2019
1: fscc-p8-23-c|Memory|mem_memfree
2: fscc-p8-23-c|Memory|mem_buffers
3: fscc-p8-23-c|Memory|mem_cached
4: fscc-p8-23-c|Memory|mem_memtotal
Row Timestamp mem_memfree mem_buffers mem_cached mem_memtotal
1 2019-08-30 15:17:00 310581 0 161529 7819328
2 2019-08-30 15:18:00 283733 0 116588 7819328
3 2019-08-30 15:19:00 265449 0 112012 7819328
4 2019-08-30 15:20:00 263733 0 102635 7819328 --> 4,7%
5 2019-08-30 15:21:00 251716 0 71268 7819328
6 2019-08-30 15:22:00 3222924 0 61258 7819328
7 2019-08-30 15:23:00 2576786 0 1164106 7819328
8 2019-08-30 15:24:00 1693056 6842 2877966 7819328
9 2019-08-30 15:25:00 2171244 7872 2341481 7819328
10 2019-08-30 15:26:00 2100834 7872 2358798 7819328
.
.
[root@fscc-p8-23-c ~]# date; echo "get metrics mem_memfree,mem_buffers,mem_cached,
mem_memtotal from node=fscc-p8-23-c last 600 bucket_size 1 " | /opt/IBM/zimon/zc localhost
Fri 30 Aug 15:27:02 CEST 2019
1: fscc-p8-23-c|Memory|mem_memfree
2: fscc-p8-23-c|Memory|mem_buffers
3: fscc-p8-23-c|Memory|mem_cached
4: fscc-p8-23-c|Memory|mem_memtotal
Row Timestamp mem_memfree mem_buffers mem_cached mem_memtotal
...
365 2019-08-30 15:20:17 600064 0 116864 7819328
366 2019-08-30 15:20:18 557760 0 117568 7819328
367 2019-08-30 15:20:19 550336 0 117888 7819328
368 2019-08-30 15:20:20 533312 0 117312 7819328
369 2019-08-30 15:20:21 508096 0 117632 7819328
370 2019-08-30 15:20:22 450816 0 116736 7819328
371 2019-08-30 15:20:23 414272 0 118976 7819328
372 2019-08-30 15:20:24 400384 0 119680 7819328
373 2019-08-30 15:20:25 372096 0 119680 7819328
374 2019-08-30 15:20:26 344448 0 123392 7819328
375 2019-08-30 15:20:27 289664 0 131840 7819328
376 2019-08-30 15:20:28 254848 0 131840 7819328
377 2019-08-30 15:20:29 252416 0 127680 7819328 --> 4,7%
378 2019-08-30 15:20:30 244224 0 125504 7819328 --> 4,8%
379 2019-08-30 15:20:31 271872 0 105920 7819328
380 2019-08-30 15:20:32 270400 0 70592 7819328 ---> 4,3%
381 2019-08-30 15:20:33 229312 0 111360 7819328
382 2019-08-30 15:20:34 null null null null
383 2019-08-30 15:20:35 null null null null
...
527 2019-08-30 15:22:59 3186880 0 1902976 7819328
528 2019-08-30 15:23:00 3045760 0 1957952 7819328
529 2019-08-30 15:23:01 2939456 0 2012224 7819328
530 2019-08-30 15:23:02 2858560 0 2038272 7819328
531 2019-08-30 15:23:03 2835840 0 2038144 7819328
532 2019-08-30 15:23:04 2774656 0 2059968 7819328
533 2019-08-30 15:23:05 2718720 0 2089920 7819328
534 2019-08-30 15:23:06 2681280 0 2095168 7819328
535 2019-08-30 15:23:07 2650688 0 2136896 7819328
536 2019-08-30 15:23:08 2531200 1216 2224832 7819328
537 2019-08-30 15:23:09 2407744 7872 2298880 7819328
538 2019-08-30 15:23:10 2326976 7872 2341440 7819328
539 2019-08-30 15:23:11 2258240 7872 2410624 7819328
The suffix
-min
in the rule sensitivity parameter prevents the averaging of the
metric values. Of all the data points returned by a metrics sensor for a specified sensitivity
interval, the smallest value is involved in the threshold evaluation process. Use the following
command to get all parameter settings of the default MemFree_rule
:
[root@fscc-p8-23-c ~]# mmhealth thresholds list -v
The system displays output similar to
the following:
### MemFree_Rule details ###
attribute value
------------------------------------------------------------------------------
rule_name MemFree_Rule
frequency 300
tags thresholds
user_action_warn The estimated available memory is less than 5%,
calculated to the total RAM or 40 GB, whichever is lower.
The system performance and stability might be affected.
For more information see the IBM Storage Scale
performance tuning guidelines.
user_action_errorNone
priority 2
downsamplOpNone
type measurement
metric MemoryAvailable_percent
metricOp noOperation
bucket_size300
computationNone
duration None
filterBy
groupBy node
error None
warn 6.0
directionlow
hysteresis0.0
sensitivity 300-min