Start of change

mmhealth command

Monitors health status of nodes.

Synopsis

mmhealth node show [ GPFS | NETWORK [ UserDefinedSubComponent ] 
                   | FILESYSTEM [UserDefinedSubComponent ] | DISK | CES | AUTH | AUTH_OBJ 
                   | BLOCK | CESNETWORK | NFS | OBJECT | SMB | CLOUDGATEWAY | GUI
                   | PERFMON ] [-N {Node[,Node..] | NodeFile | NodeClass}] 
                   [--verbose] [--unhealthy] 

or

mmhealth node eventlog [[--hour | --day | --week | --month] | [--verbose]]

Availability

Start of changeAvailable with IBM Spectrum Scale™ Express Edition or higher.End of change

Description

Use the mmhealth command to monitor the health of the node and services hosted on the node in IBM Spectrum Scale.

By using this command, IBM Spectrum Scale the administrator can monitor the health of each node and services hosted on that node. This command also shows the events that are responsible for the unhealthy status of the services hosted on that node. This data might be helpful for monitoring and analyzing the reasons for the unhealthy status of the node. So, mmhealth command acts as a problem determination tool to identify which services of the node are unhealthy and events responsible for their unhealthy status.

For more information about the system monitoring feature, see IBM Spectrum Scale: Administration Guide

Parameters

node
Displays the health status, specifically, at node level.
show
Displays the health status of the specified component with:
GPFS™ | NETWORK | FILESYSTEM | DISK | CES | AUTH | AUTH_OBJ | BLOCK | CESNETWORK | NFS | OBJECT | SMB | CLOUDGATEWAY | GUI | PERFMON
Displays the detailed health status of the specified component.
UserDefinedSubComponent
Displays services that are named by the customer, categorized by one of the other hosted services. For example, a file system named gpfs0 is a subcomponent of file system.
-N
Allows the system to make remote calls to the other nodes in the cluster for:
Node[,Node....]
Specifies the node or list of nodes that must be monitored for the health status.
NodeFile
Specifies a file, containing a list of node descriptors, one per line, to be monitored for health status.
NodeClass
Specifies a node class that must be monitored for the health status.
--verbose
Shows the detailed health status of a node, including its sub-components.
--unhealthy
Displays the unhealthy components only.
eventlog
Shows the event history for a specified period of time. If no time period is specified, it displays all the events by default:
[--hour | --day | --week| --month]
Displays the event history for the specified time period.
[--verbose]
Displays additional information about the event like component name and event ID in the eventlog.

Exit status

0
Successful completion.
nonzero
A failure has occurred.

Security

You must have root authority to run the mmhealth command.

The node on which the command is issued must be able to execute remote shell commands on any other node in the cluster without the use of a password and without producing any extraneous messages. See the information about the requirements for administering a GPFS system in the IBM Spectrum Scale: Administration Guide.

Examples

  1. To show the health status of the current node:
    mmhealth node show
    The system displays output similar to this:
    Node name:      test_node
    Node status:    HEALTHY
    Status Change:  39 min. ago
    
    Component          Status        Reasons
    -----------------------------------------
    GPFS               HEALTHY       -
    NETWORK            HEALTHY       -
    FILESYSTEM         HEALTHY       -
    DISK               HEALTHY       -
    CES                HEALTHY       -
    PERFMON            HEALTHY       -
  2. To view the health status of a specific node, issue this command:
    mmhealth node show -N test_node2
    The system displays output similar to this:
    Node name:      test_node2
    Node status:    CHECKING
    Status Change:  Now
    
    Component       Status        Status Change    Reasons
    -------------------------------------------------------------------
    GPFS            CHECKING      Now              -
    NETWORK         HEALTHY       Now              -
    FILESYSTEM      CHECKING      Now              -
    DISK            CHECKING      Now              -
    CES             CHECKING      Now              -
    PERFMON         HEALTHY       Now              -
  3. To view the health status of all the nodes, issue this command:
    mmhealth node show -N all
    The system displays output similar to this:
    Node name:    test_node
    Node status:  DEGRADED
    
    Component           Status        Status Change     Reasons
    -------------------------------------------------------------
    GPFS                HEALTHY          Now             -
    CES                 FAILED           Now             smbd_down
    FileSystem          HEALTHY          Now             -
    
    Node name:            test_node2
    Node status:          HEALTHY
    
    Component           Status        Status Change    Reasons
    ------------------------------------------------------------
    GPFS                HEALTHY       Now              -
    CES                 HEALTHY       Now              -
    FileSystem          HEALTHY       Now              -
  4. To view the detailed health status of the component and its sub-component, issue this command:
    mmhealth node show ces
    The system displays output similar to this:
    Node name:    test_node
    
    Component           Status        Reasons
    -----------------------------------------------
    CES                 FAILED        smbd_down
        AUTH               HEALTHY       -
        AUTH_OBJ           HEALTHY       -
        NFS                HEALTHY       -
        OBJ                HEALTHY       -
        SMB                FAILED        smbd_down
    
    Event              Parameter       Severity       Description
    -----------------------------------------------------------------
    smbd_down          SMB             ERROR          SMBD process not running
  5. To view the health status of only unhealthy components, issue this command:
    mmhealth node show --unhealthy
    The system displays output similar to this:
    Node name:    test_node
    Node status:  DEGRADED
    
    Component           Status        Reasons
    -----------------------------------------------
    CES                 FAILED        smbd_down
  6. To view the health status of sub-components of a node's component, issue this command:
    mmhealth node show --verbose
    The system displays output similar to this:
    Node name:    test_node
    Node status:  DEGRADED
    
    Component           Status        Reasons
    -----------------------------------------------
    GPFS                HEALTHY       -
    CES                 FAILED        smbd_down
        AUTH               HEALTHY       -
        AUTH_OBJ           HEALTHY       -
        NFS                HEALTHY       -
        OBJ                HEALTHY       -
        SMB                FAILED     smbd_down
    
    FILESYSTEM           HEALTHY      -
        gpfs0              HEALTHY      -
        FSII               HEALTHY      -  
  7. To view the eventlog history of the node for the last hour, issue this command:
    mmhealth node eventlog --hour
    The system displays output similar to this:
    Timestamp                             Event Name                Severity   Details
    2016-04-07 10:12:24.394569 CEST       quorum_warn               WARNING    GPFS quorum monitoring returned unknown result
    2016-04-07 10:12:39.366279 CEST       quorum_warn               WARNING    GPFS quorum monitoring returned unknown result
    2016-04-07 10:12:54.356577 CEST       quorum_warn               WARNING    GPFS quorum monitoring returned unknown result
  8. To view the eventlog history of the node for the last hour, issue this command:
    mmhealth node eventlog --hour --verbose
    The system displays output similar to this:
    Timestamp                             Component     Event Name                Event ID Severity   Details
    2016-04-07 10:12:54.356577 CEST        gpfs          quorum_warn               999291   WARNING   GPFS quorum monitoring returned unknown result
    2016-04-07 10:13:09.359602 CEST        gpfs          quorum_warn               999291   WARNING   GPFS quorum monitoring returned unknown result
    2016-04-07 10:13:24.425680 CEST        gpfs          quorum_warn               999291   WARNING   GPFS quorum monitoring returned unknown result

Location

/usr/lpp/mmfs/bin
End of change