mmdiag command

Displays diagnostic information about the internal GPFS state on the current node.

Synopsis

mmdiag [--afm [fileset={all|device[:filesetName]}|gw][-Y]]
        [--all [-Y]] [--version [-Y]] [--waiters [-Y]] [--deadlock [-Y]] [--threads [-Y]]
        [--lroc [-Y]] [--memory [-Y]] [--network [-Y]] [--config [-Y]] [--trace [-Y]] 
        [--iohist [verbose] [-Y]] [--tokenmgr [-Y]] [--commands [-Y]] 
        [--dmapi [session|event|token|disposition|all]]
        [--rpc [node[=name]|size|message|all|nn{S|s|M|m|H|h|D|d}] [-Y]]
        [--stats [-Y]][--nsd [all] [-Y]] [--nsdDiskAccessConfig [-Y]] [--eventproducer [-Y]]
        [--gds [-Y]]Start of change[--pagepool [-Y]]End of change [--verbs [-Y]]

Availability

Available on all IBM Storage Scale editions.

Description

Use the mmdiag command to query various aspects of the GPFS internal state for troubleshooting and tuning purposes. The mmdiag command displays information about the state of GPFS on the node where it is executed. The command obtains the required information by querying the GPFS daemon process (mmfsd), and thus functions only when the GPFS daemon is running.

Results

The mmdiag command displays the requested information and returns 0 if successful.

Parameters

--afm
Displays status and statistics of linked AFM and AFM DR filesets that are assigned to the gateway node. Accepts the following options:
Note: If you do not specify any option, status and statistics of all filesets are displayed.
fileset=all
Displays status and statistics of all active filesets.
fileset=device
Displays status and statistics of all active filesets on a specified device.
fileset=device:filesetName
Displays status and statistics of a specified fileset on the specified device.
gw
Displays gateway statistics like queue length and memory.
--all
Displays all available information. This option is the same as specifying all of the mmdiag parameters.
--commands
Displays all the commands currently running on the local node.
--config
Displays the GPFS configuration parameter names and their current active values in the mmfsd process running on the node where the command is executed. The output of this option differs from the output of the mmlsconfig command in the following ways:
  • The mmlsconfig command displays the information for documented configuration parameters, while the mmdiag --config command displays all configuration parameters, including those that are undocumented.
  • The mmlsconfig command displays the currently configured value for parameters, which might differ from the values shown by the mmdiag command. The mmlsconfig values are the values that would be in effect if GPFS was restarted.
    Note: All configuration parameter values are initialized when the mmfsd process is started. For parameters which have been changed from their default settings, a value is read from a local configuration file. Some parameters can be changed dynamically, without restarting the mmfsd process. For more information, see mmchconfig command. The following special characters might be prefixed to the output:
    !
    Denotes the parameters whose value has been changed from their default value, and which becomes effective after a restart of the mmfsd process.
    *
    Denotes the parameters which are initialized to the default value, but whose value in the currently running mmfsd process has changed through the execution of the mmchconfig command with either the -i or the -I option.
    #
    Denotes the parameters which were initialized from a value stored in the local configuration file, but whose value in the currently running mmfsd process was changed through the execution of the mmchconfig command with either the -i or the -I option.
    .
    Denotes the parameters whose value was changed implicitly as a consequence of an explicit change made to another configuration parameter.
    An example of this command output can be seen here.
--deadlock
Displays the longest waiters that exceed the deadlock detection thresholds.
If a deadlock situation occurs, administrators can use this information from all nodes in a cluster to help decide how to break up the deadlock.
--dmapi
Displays various DMAPI information. If no other options are specified, summary information is displayed for sessions, pending events, cached tokens, stripe groups, and events that are waiting for reply. The --dmapi parameter accepts the following options:
session
Displays a list of sessions.
event
Displays a list of pending events.
token
Displays a list of cached tokens, stripe groups, and events that waiting for reply.
disposition
Displays the DMAPI disposition for events.
all
Displays all of the session, event, token, and disposition information with more details.
Note: -Y is not supported with the --dmapi option.
--eventproducer
Displays statistics for file audit logging and clustered watch folder producers. The statistics include counts of how many messages have been sent, how many messages have been delivered (the target sink has acknowledged that the message has been received), messages that the producer failed to deliver, the amount of bytes sent, breakdown of the types of messages that were sent and delivered, and information on the status of the producer. For more information about the producer state and state changes, use the mmhealth command.
--gds
Displays the GPUDirect Storage (GDS) restriction counters. Each counter is counting the number of GDS operations returned to CUDA to be retried in compatibility mode because a specific limitation or error condition has been encountered. Retrying GDS requests in compatibility mode results in a significant performance drop. If one or more counters are increasing at a higher rate, the root cause must be investigated and the required actions taken to avoid the GDS limitation or error conditions. More information on restriction counters, see IBM Storage Scale Troubleshooting Guide.
--iohist [verbose]
Displays recent I/O history. The information about I/O requests recently submitted by GPFS code is shown here. It can provide some insight into various aspects of GPFS IO, such as the type of data or metadata being read or written, the distribution of I/O sizes, and I/O completion times for individual I/Os. This information can be useful in performance tuning and troubleshooting.
verbose
Displays additional columns of information info1, info2, context, and thread. The contents of the columns are as follows:
info1, info2
The contents of columns info1 and info2 depend on the buffer type. The buffer type is displayed in the Buf type column of the command output:
Table 1. Contents of columns input1 and input2 depending on the value in column Buf type
Buf type (Buffer type) info1 info2
data The inode number of the file The block number of the file
metadata The inode number of the file (For internal use by IBM®)
LLIndBlock The inode number of the file (For internal use by IBM)
inode (For internal use by IBM) The inode number of the file
Other types, such as diskDesc, sgDesc, and others. (For internal use by IBM) (For internal use by IBM)
context
The I/O context that started this I/O.
thread
The name of the thread that started this I/O.

The node that the command is issued from determines the I/O completion time that is shown.

If the command is issued from a Network Shared Disk (NSD) server node, the command shows the time that is taken to complete or serve the read or write I/O operations that are sent from the client node. This refers to the latency of the operations that are completed on the disk by the NSD server.

If the command is issued on an NSD client node that does not have local access to the disk, the command shows the complete time (requested by the client node) that is taken by the read or write I/O operations to complete. This refers to the latency of I/O request to the NSD server and the latency of I/O operations that are completed on the disk by the NSD server.

If the Type of the I/O is "lrc", this indicates the I/O was made to an LROC device. Under RW column, The LR value indicates read from LROC device, while the LS indicates write to LROC device. For more information, see Local read-only cache.

--lroc
Displays status and statistics for local read-only cache (LROC) devices. The statistics displayed are relevant only to the node where the command is issued from. This parameter is valid for x86_64, PPC64, and PPC64LE Linux® nodes.
--memory
Displays information about mmfsd memory usage. Several distinct memory regions are allocated and used by mmfsd, and it can be important to know the memory usage situation for each one.
Heap memory that is allocated by mmfsd
This area is managed by the OS and is not associated with a preset limit that is enforced by GPFS.
Memory pools 1 and 2
Both of these pools refer to a single memory area, also known as the shared segment. It is used to cache various kinds of internal GPFS metadata and for many other internal uses. This memory area is allocated by a special, platform-specific mechanism and is shared between user space and kernel code. The preset limit on the maximum shared segment size, current usage, and some prior usage information are shown here.
Memory pool 3
This area is also known as the token manager pool. This memory area is used to store the token state on token manager servers. The preset limit on the maximum memory pool size, current usage, and some prior-usage information are shown here.

This information can be useful when you are troubleshooting ENOMEM errors that are returned by GPFS to a user application and memory allocation failures reported in a GPFS log file.

--network
Displays information about mmfsd network connections and pending Remote Procedure Calls (RPCs). Basic information and statistics about all existing mmfsd network connections to other nodes is displayed, including information about broken connections. If any RPCs are pending (that is, sent but not yet replied to), the information about each one is shown, including the list of RPC destinations and the status of the request for each destination. This information can be helpful in following a multinode chain of dependencies during a deadlock or performance-problem troubleshooting.
--nsd [all]
Displays status and queue statistics for NSD queues that contain pending requests.
all
Displays status and queue statistics for all NSD queues.
--nsdDiskAccessConfig
Displays the currently active configuration for the nsdDiskAccessDistance parameter in the mmfsd process that is running on the node on which the command is issued. The output might be different from the output of the mmlsconfig command for this parameter if the configuration changes are yet to applied to the mmfsd daemon. Restarting GPFS applies the configuration changes.
Start of change--pagepoolEnd of change
Start of changeDisplays information about the GPFS pagepool. The output displays whether the dynamic pagepool is enabled and the current size of the pagepool.
When the dynamic pagepool is enabled, the output also displays the following information:
  • The minimum allowed size of the dynamic pagepool
  • The smallest encountered size, low watermark, for the dynamic pagepool since the current instance of GPFS daemon was started.
  • The maximum allowed size of the dynamic page pool.
  • The largest encountered size, high watermark, of the dynamic pagepool since the current instance of GPFS daemon was started.
End of change
--rpc
Displays RPC performance statistics. The --rpc parameter accepts the following options:
node[=name]
Displays all per node statistics (channel wait, send time TCP, send time verbs, receive time TCP, latency TCP, latency verbs, and latency mixed). If name is specified, all per node statistics for just the specified node are displayed.
size
Displays per size range statistics.
message
Displays per message type RPC execution time.
all
Displays everything.
nn{S|s|M|m|H|h|D|d}
Displays per node RPC latency statistics for the latest number of intervals, which are specified by nn, for the interval specified by one of the following characters:
S|s
Displays second intervals only.
M|m
Displays first the second intervals since the last-minute boundary followed by minute intervals.
H|h
Displays first the second and minute intervals since their last minute and hour boundary followed by hour intervals.
D|d
Displays first the second, minute, and hour intervals since their last minute, hour, and day boundary followed by day intervals.
Averages are displayed as a number of milliseconds with three decimal places (one-microsecond granularity).
--stats
Displays some general GPFS statistics.
GPFS uses a diverse array of objects to maintain the file system state and cache various types of metadata. The statistics about some of the more important object types are shown here.
OpenFile
This object is needed to access an inode. The target maximum number of cached OpenFile objects is governed by the maxFilesToCache configuration parameter. Note that more OpenFile objects can be cached, depending on the workload.
CompactOpenFile
These objects contain an abbreviated form of an OpenFile, and are collectively known as stat cache. The target maximum number of cached CompactOpenFile objects is governed by the maxStatCache parameter of the mmchconfig command.
OpenInstance
This object is created for each open file instance (file or directory that is opened by a distinct process).
BufferDesc
This object is used to manage buffers in the GPFS page pool.
indBlockDesc
This object is used to cache indirect block data.

All of these objects use the shared segment memory. For each object type, a preset target exists, which is derived from configuration parameters and the memory available in the shared segment. The information about current object usage can be helpful in performance tuning.

--threads
Displays mmfsd thread statistics and the list of active threads. For each thread, its type and kernel thread ID are shown. All non-idle mmfsd threads are shown. For those threads that are currently waiting for an event, the wait reason and wait time in seconds are shown. This information provides more detail than the data displayed by mmdiag --waiters.
--tokenmgr
Displays information about token management. For each mounted GPFS file system, one or more token manager nodes is appointed. The first token manager is always collocated with the file system manager, while other token managers can be appointed from the pool of nodes with the manager designation. The information that is shown here includes the list of currently appointed token manager nodes and, if the current node is serving as a token manager, some statistics about prior token transactions.
--trace
Displays current trace status and trace levels. During GPFS troubleshooting, it is often necessary to use the trace subsystem to obtain the debug data necessary to understand the problem. See Trace facility. It is important to have trace levels set correctly, per instructions provided by the IBM Support Center. The information that is shown here makes it possible to check the state of tracing and to see the trace levels currently in effect.
--verbs
Displays information about the VERBS RDMA subsystem.

The output displays:

  • Whether VERBS RDMA is active.
  • Status for each enabled RDMA port, including information such as:
    • Current port state
    • Interface ID
    • LID
    • Network interface name
    • Link layer
    • If an RDMA port entered fatal error state
--version
Displays information about the GPFS build currently running on this node. This information helps in troubleshooting installation problems. The information that is displayed here can be more comprehensive than the version information that is available from the OS package management infrastructure, in particular when an e-fix is installed.
--waiters
Displays mmfsd threads that are waiting for events. This information can be helpful in troubleshooting deadlocks and performance problems. For each thread, the thread name, wait time in seconds, and wait reason are typically shown. Only non-idle threads that are currently waiting for some event to occur are displayed. Note that only mmfsd threads are shown; any application I/O threads that might be waiting in GPFS kernel code would not be present here.
-Y
Displays the command output in a parseable format with a colon (:) as a field delimiter. Each column is described by a header.
Note: Fields that have a colon (:) are encoded to prevent confusion. For the set of characters that might be encoded, see the command documentation of mmclidecode. Use the mmclidecode command to decode the field.

Exit status

0
Successful completion.
nonzero
A failure has occurred.

Security

You must have root authority to run the mmdiag command.

Examples

  1. To display a list of waiters, enter the following command:
    mmdiag --waiters
    The command displays output like the following example:
    === mmdiag: waiters ===
    0x11DA520 waiting 0.001147000 seconds, InodePrefetchWorker:
     for I/O completion
    0x2AAAAAB02830 waiting 0.002152000 seconds, InodePrefetchWorker:
     for I/O completion
    0x2AAAAB103990 waiting 0.000593000 seconds, InodePrefetchWorker:
     for I/O completion
    0x11F51E0 waiting 0.000612000 seconds, InodePrefetchWorker:
     for I/O completion
    0x11EDE60 waiting 0.005736500 seconds, InodePrefetchWorker:
     on ThMutex 0x100073ABC8 (0xFFFFC2000073ABC8) 
     (CacheReplacementListMutex)

    In this example, all waiters have a short wait duration and represent a typical snapshot of normal GPFS operation.

  2. To display information about memory use, enter the mmdiag --memory command. The command displays output like the following example:
    mmfsd heap size: 1503232 bytes
    
    current mmfsd heap bytes in use: 1919624 total 1867672 payload
    
    Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)")
             128 bytes in use
       557721725 hard limit on memory usage
         1048576 bytes committed to regions
               1 allocations
               1 frees
               0 allocation failures
    
    
    Statistics for MemoryPool id 2 ("Shared Segment")
         8355904 bytes in use
       557721725 hard limit on memory usage
         8785920 bytes committed to regions
         1297534 allocations
         1296595 frees
               0 allocation failures
    
    
    Statistics for MemoryPool id 3 ("Token Manager")
          496184 bytes in use
       510027355 hard limit on memory usage
          524288 bytes committed to regions
            1309 allocations
             130 frees
               0 allocation failures

    In this example, a typical memory usage picture is shown. None of the memory pools are close to being full, and no prior allocation failures occurred.

  3. To display information about the network, enter the mmdiag --network command. The command displays information like the following example:
    === mmdiag: network ===
    
    Pending messages:
      (none)
    Inter-node communication configuration:
      tscTcpPort      1191
      my address      9.114.53.217/25 (eth2) <c0n2>
      my addr list    9.114.53.217/25 (eth2)
      my node number  4
    TCP Connections between nodes:
      Device null:
        hostname                       node     destination     status     err  sock  sent(MB)  recvd(MB)  ostype
        c941f1n05.pok.stglabs.ibm.com  <c0n1>   9.114.78.25     broken     233  -1    0         0          Linux/L
      Device eth2:
        hostname                       node     destination     status     err  sock  sent(MB)  recvd(MB)  ostype
        c941f3n03.pok.stglabs.ibm.com  <c0n0>   9.114.78.43     connected  0    61    0         0          Linux/L
        c870f4ap06                     <c0n3>   9.114.53.218    connected  0    64    0         0          Linux/B
    Connection details:
      <c0n1> 9.114.78.25/0 (c941f1n05.pok.stglabs.ibm.com)
        connection info:
          retry(success): 0(0)
      <c0n0> 9.114.78.43/0 (c941f3n03.pok.stglabs.ibm.com)
        connection info:
          retry(success): 0(0)
          tcp connection state: established     tcp congestion state: open
        packet statistics:
          lost: 0     unacknowledged: 0
          retrans: 0     unrecovered retrans: 0
        network speed(µs):
          rtt(round trip time): 456     medium deviation of rtt: 127
        pending data statistics(byte):
          read/write calls pending: 0
          GPFS Send-Queue: 0     GPFS Recv-Queue: 0
          Socket Send-Queue: 0     Socket Recv-Queue: 0
      <c0n3> 9.114.53.218/0 (c870f4ap06)
        connection info:
          retry(success): 0(0)
          tcp connection state: established     tcp congestion state: open
        packet statistics:
          lost: 0     unacknowledged: 0
          retrans: 0     unrecovered retrans: 0
        network speed(µs):
          rtt(round trip time): 8813     medium deviation of rtt: 13754
        pending data statistics(byte):
          read/write calls pending: 0
          GPFS Send-Queue: 0     GPFS Recv-Queue: 0
          Socket Send-Queue: 0     Socket Recv-Queue: 0
    Device details:
      devicename     speed     mtu       duplex    rx_dropped rx_errors tx_dropped tx_errors
      eth2           1000      1500      full      0          0         0          0
    diag verbs: VERBS RDMA class not initialized
    
  4. To display information about status and statistics of all AFM and AFM DR relationships, enter the mmdiag --afm command. The command displays output similar to the following example:
    === mmdiag: afm ===
     AFM Gateway: p7fbn10 Active
    
     AFM-Cache:  adrFset-4 (/gpfs/fs1/adrFset-4) in Device: fs1
        Mode: primary
        Home: p7fbn09 (nfs://p7fbn09/gpfs/fs1/adrFset-4)
        Fileset Status: Linked
             Handler-state: Mounted
             Cache-state: PrimInitInProg
             Q-state: Normal Q-length: 12126378 Q-executed: 40570
     AFM-Cache:  adrFset-5 (/gpfs/fs1/adrFset-5) in Device: fs1
        Mode: primary
        Home: p7fbn09 (nfs://p7fbn09/gpfs/fs1/adrFset-5)
        Fileset Status: Linked
             Handler-state: Mounted
             Cache-state: PrimInitInProg
             Q-state: Normal Q-length: 6164585 Q-executed: 7113648
     AFM-Cache:  adrFset-10 (/gpfs/fs1/adrFset-10) in Device: fs1
        Mode: primary
        Home: p7fbn09 (nfs://p7fbn09/gpfs/fs1/adrFset-10)
        Fileset Status: Linked
             Handler-state: Mounted
             Cache-state: PrimInitInProg
             Q-state: Normal Q-length: 16239687 Q-executed: 2415474
  5. To display gateway statistics, enter the mmdiag --afm gw command. The command displays output similar to the following example:
    === mmdiag: afm ===
     AFM Gateway: p7fbn10 Active
    
    QLen: 33165776 QMem: 12560682162 SoftQMem: 12884901888 HardQMem 32212254720
    Ping thread: Started
  6. To display LROC statistics, enter the mmdiag --lroc command. The command displays output similar to the following example:
    === mmdiag: lroc ===
    LROC Device(s): '090BD5CD603456D2#/dev/nvme1n1;090BD5CD60354659#/dev/nvme0n1;' status Running
    Cache inodes 1 dirs 1 data 1  Config: maxFile -1 stubFile -1
    Max capacity: 3051313 MB, currently in use: 3559 MB
    Statistics starting from: Tue Feb 23 11:42:32 2021
    
          Inode objects stored 312454 (1220 MB) recalled 157366 (614 MB) = 50.36 %
          Inode objects queried 0 (0 MB) = 0.00 % invalidated 157460 (615 MB)
          Inode objects failed to store 6 failed to recall 0 failed to query 0 failed to inval 0
    
          Directory objects stored 84 (2 MB) recalled 979 (226 MB) = 1165.48 %
          Directory objects queried 0 (0 MB) = 0.00 % invalidated 80 (6 MB)
          Directory objects failed to store 0 failed to recall 0 failed to query 0 failed to inval 0 inval no recall 0
    
          Data objects stored 57412 (188807 MB) recalled 10150641 (40597918 MB) = 17680.35 %
          Data objects queried 1 (0 MB) = 100.00 % invalidated 57612 (228070 MB)
          Data objects failed to store 30 failed to recall 407 failed to query 0 failed to inval 0 inval no recall 54307
      
      agent inserts=1074934, reads=162501285
            response times (usec):
            insert min/max/avg=3/8030/121
            read   min/max/avg=1/66717/2919
    
      ssd   writeIOs=380060, writePages=48671744
            readIOs=10345039194, readPages=10124700092
            response times (usec):
            write  min/max/avg=192/11985/233
            read   min/max/avg=13/49152/225
  7. To display the configuration information, run the mmdiag --config command. The command displays output similar to the following example:
    # mmdiag --config
    
    === mmdiag: config ===
       aclHashSpaceSize 2000
       afmHashVersion 2
       afmMaxWorkerThreads 1024
       aioWorkerThreads 256
       allowDeleteAclOnChmod 1
       allToAllConnection no
       allToAllPerPingDelay 60.000000 milliseconds
       allToAllRandomDelayLimit 60
       appendShipEnabled 0
       assertOnStructureError 0
       atimeDeferredSeconds 86400
     ! ccrEnabled 1
     ! cipherList AUTHONLY
     ! clusterId 10784156943329000315
       clusterManagerSelection PreferManager
     ! clusterName scale-cluster-1.openstacklocal
       ...
    
  8. To display the GDS restriction counters, run the mmdiag --gds command, as shown in the following example:
    # mmdiag --gds
    
    === mmdiag: gds ===
    
    GPU Direct Storage restriction counters:
      file less than 4k           0
      sparse file                 0
      snapshot file               0
      clone file                  0
      encrypted file              0
      memory mapped file          0
      compressed file             0
      dioWanted fail              0
      nsdServerDownlevel          0
      nsdServerGdsRead            0
      RDMA target port is down    0
      RDMA initiator port is down 0
      RDMA work request errors    0
    
  9. Start of changeTo display information about the GPFS pagepool, run the mmdiag --pagepool -Y. The command displays output similar to the following example:
    # mmdiag --pagepool -Y
    mmdiag:pagepool:HEADER:version:reserved:reserved:dynamicPagepool:minimumSize:currentSize:maximumSize:physicalMemorySize:lowWatermarkSize:highWatermarkSize
    mmdiag:pagepool:0:1:::1:6714884096:64424509440:64634224640:134298615808:53720645632:64424509440
    End of change
  10. Start of changeTo display information when dynamic pagepool is disabled (dynamicPagepoolEnabled=no). The command displays output similar to the following example:
    # mmdiag --pagepool
    
    === mmdiag: pagepool ===
    Dynamic pagepool: disabled
    Current pagepool size: 64424509440 Bytes (62914560 KiB, 61440 MiB, 60 GiB)
    End of change
  11. Start of changeTo display information when dynamic pagepool is enabled (dynamicPagepoolEnabled=yes). The command displays output similar to the following example:
    # mmdiag --pagepool
    
    === mmdiag: pagepool ===
    Dynamic pagepool: enabled
    Minimum pagepool size: 6714884096 Bytes (6557504 KiB, 6403 MiB, 6 GiB)
    Low watermark:         53720645632 Bytes (52461568 KiB, 51232 MiB, 50 GiB)
    Current pagepool size: 64424509440 Bytes (62914560 KiB, 61440 MiB, 60 GiB)
    High watermark:        64424509440 Bytes (62914560 KiB, 61440 MiB, 60 GiB)
    Maximum pagepool size: 64634224640 Bytes (63119360 KiB, 61640 MiB, 60 GiB)
    Physical memory  size: 134298615808 Bytes (131150992 KiB, 128077 MiB, 125 GiB)
    End of change
  12. To display VERBS RDMA status information, run the mmdiag --verbs command. The command displays output similar to the following example:
    # mmdiag --verbs
    
    === mmdiag: verbs ===
    verbsRdmaStarted: yes
    verbsPort: mlx5_0/1/0/0
      state          : IBV_PORT_ACTIVE
      interface ID   : 0xb8cef603004455f0
      lid            : 13
      interface name : ib0
      link layer     : INFINIBAND
      fatal error    : no
    verbsPort: mlx5_1/1/0/0
      state          : IBV_PORT_ACTIVE
      interface ID   : 0xb8cef603004455f1
      lid            : 19
      interface name : ib1
      link layer     : INFINIBAND
      fatal error    : no

Location

/usr/lpp/mmfs/bin