Use case 2: Observe the file system capacity usage by using default threshold rules

This use case demonstrates the use of mmhealth threshold list command for monitoring a file system capacity event by using default threshold rules.

Since the file system capacity-related thresholds such as DataCapUtil_Rule, MetaDataCapUtil_Rule, and InodeCapUtil_Rule are not node-specific. These thresholds are reported on the node that has active threshold monitor role.

  1. Issue the following command to view the node that has active threshold monitor role and the predefined threshold rules: DataCapUtil_Rule, MetaDataCapUtil_Rule, and InodeCapUtil_Rule enabled in a cluster.
    mmhealth thresholds list
    The preceding command shows output similar to the following as shown here:
    active_thresholds_monitor: scale-12.vmlocal
    
    ### Threshold Rules ###
    rule_name             metric                    error  warn  direction  filterBy  groupBy                                            sensitivity
    ----------------------------------------------------------------------------------------------------------------------------------------------------
    MemFree_Rule          MemoryAvailable_percent   None   5.0   low                  node                                               300-min
    DataCapUtil_Rule      DataPool_capUtil          90.0   80.0  high                 gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name  300
    MetaDataCapUtil_Rule  MetaDataPool_capUtil      90.0   80.0  high                 gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name  300
    InodeCapUtil_Rule     Fileset_inode             90.0   80.0  high                 gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name      300
    SMBConnPerNode_Rule   current_connections       3000   None  high                 node                                               300
    SMBConnTotal_Rule     current_connections       20000  None  high                                                                    300
    AFMInQueue_Rule       AFMInQueueMemory_percent  90.0   80.0  high                 node                                               300
  2. ssh to switch to the node that has active threshold monitor role:
    [root@scale-12 ~]# ssh scale-12.vmlocal
  3. Issue the following command to review file system events:
    [root@scale-12 ~]# mmhealth node show filesystem -v
    The preceding command gives output similar to the following as shown here.
    Node name:      scale-11.vmlocal
    
    Component         Status        Status Change            Reasons & Notices
    --------------------------------------------------------------------------
    FILESYSTEM        HEALTHY       2022-12-07 21:12:03      -
      cesSharedRoot   HEALTHY       2022-12-07 10:38:55      -
      localFS         DEGRADED      2022-12-07 21:12:03      pool-metadata_high_warn
      remote-fs       HEALTHY       2022-12-15 14:23:24      -
    Event                       Parameter         Severity    Active Since             Event Message
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    ...
    
    inode_normal                cesSharedRoot     INFO        2022-12-07 10:39:25      The inode usage of fileset root in file system cesSharedRoot reached a normal level.
    inode_normal                localFS           INFO        2022-12-07 21:34:03      The inode usage of fileset myFset1 in file system localFS reached a normal level.
    inode_normal                localFS           INFO        2022-12-07 21:34:03      The inode usage of fileset root in file system localFS reached a normal level.
    ...
    
    pool-data_normal            cesSharedRoot     INFO        2022-12-07 10:38:55      The pool data of file system cesSharedRoot has reached a normal data level.
    pool-data_normal            cesSharedRoot     INFO        2022-12-07 10:38:55      The pool system of file system cesSharedRoot has reached a normal data level.
    pool-data_normal            localFS           INFO        2022-12-07 21:34:03      The pool system of file system localFS has reached a normal data level.
    pool-metadata_normal        cesSharedRoot     INFO        2022-12-07 10:38:55      The pool data of file system cesSharedRoot has reached a normal metadata level.
    pool-metadata_normal        cesSharedRoot     INFO        2022-12-07 10:38:55      The pool system of file system cesSharedRoot has reached a normal metadata level.
    pool-metadata_high_warn     localFS           WARNING     2022-12-07 21:34:03      The pool system of file system localFS has reached a warning level for metadata. 80.0
    As you can see in the preceding file system example output, everything looks correct except the "pool-metadata_high_warn" event.
  4. Issue the following command to get the "pool-metadata_high_warn" warning details:
    [root@scale-12 ~]# mmhealth event show pool-metadatadata_high_warn
    The preceding command shows the warning detail similar to the following as shown here.
    
    Event Name:              pool-metadata_high_warn
    Description:             The pool has reached a warning level.
    Cause:                   The pool has reached a warning level.
    User Action:             Add more capacity to pool or move  to different pool or delete data and/or snapshots.
    Severity:                WARNING
    State:                   DEGRADED
    Tip: See File system events to get complete list of all the possible file system events.
  5. Compare the metadata capacity values reported by MetaDataCapUtil_Rule of the system pool from localFS file system with mmlspool command output.
    [root@scale-11 ~]# mmlspool localFS
    The preceding command shows the storage pools in file system at '/gpfs/localFS' similar to following as shown:
    
    Name      Id   BlkSize Data Meta Total Data in (KB)   Free Data in (KB)   Total Meta in (KB)    Free Meta in (KB)
    system    0    4 MB    yes  yes       16777216        13320192 ( 79%)       16777216            2515582 ( 15%)

    In the preceding output, you can see that the pool system has only 15% available space for meta_data.