mmdiag command
Displays diagnostic information about the internal GPFS state on the current node.
Synopsis
mmdiag [--afm [fileset={all|device[:filesetName]}|gw][-Y]]
[--all [-Y]] [--version [-Y]] [--waiters [-Y]] [--deadlock [-Y]] [--threads [-Y]]
[--lroc [-Y]] [--memory [-Y]] [--network [-Y]] [--config [-Y]] [--trace [-Y]]
[--iohist [verbose] [-Y]] [--tokenmgr [-Y]] [--commands [-Y]]
[--dmapi [session|event|token|disposition|all]]
[--rpc [node[=name]|size|message|all|nn{S|s|M|m|H|h|D|d}] [-Y]]
[--stats [-Y]][--nsd [all] [-Y]] [--nsdDiskAccessConfig [-Y]] [--eventproducer [-Y]]
[--gds [-Y]]
[--pagepool [-Y]]
[--verbs [-Y]]
Availability
Available on all IBM Storage Scale editions.
Description
Use the mmdiag command to query various aspects of the GPFS internal state for troubleshooting and tuning purposes. The mmdiag command displays information about the state of GPFS on the node where it is executed. The command obtains the required information by querying the GPFS daemon process (mmfsd), and thus functions only when the GPFS daemon is running.
Results
The mmdiag command displays the requested information and returns 0 if successful.
Parameters
- --afm
- Displays status and statistics of linked AFM and AFM DR filesets that are assigned to the
gateway node. Accepts the following options:Note: If you do not specify any option, status and statistics of all filesets are displayed.
- fileset=all
- Displays status and statistics of all active filesets.
- fileset=device
- Displays status and statistics of all active filesets on a specified device.
- fileset=device:filesetName
- Displays status and statistics of a specified fileset on the specified device.
- gw
- Displays gateway statistics like queue length and memory.
- --all
- Displays all available information. This option is the same as specifying all of the mmdiag parameters.
- --commands
- Displays all the commands currently running on the local node.
- --config
-
Displays the GPFS configuration parameter names and their current active values in the mmfsd process running on the node where the command is executed. The output of this option differs from the output of the mmlsconfig command in the following ways:
- The mmlsconfig command displays the information for documented configuration parameters, while the mmdiag --config command displays all configuration parameters, including those that are undocumented.
- The mmlsconfig command displays the currently configured value for
parameters, which might differ from the values shown by the mmdiag command. The
mmlsconfig values are the values that would be in effect if GPFS was
restarted.Note: All configuration parameter values are initialized when the mmfsd process is started. For parameters which have been changed from their default settings, a value is read from a local configuration file. Some parameters can be changed dynamically, without restarting the mmfsd process. For more information, see mmchconfig command. The following special characters might be prefixed to the output:
- !
- Denotes the parameters whose value has been changed from their default value, and which becomes effective after a restart of the mmfsd process.
- *
- Denotes the parameters which are initialized to the default value, but whose value in the currently running mmfsd process has changed through the execution of the mmchconfig command with either the -i or the -I option.
- #
- Denotes the parameters which were initialized from a value stored in the local configuration file, but whose value in the currently running mmfsd process was changed through the execution of the mmchconfig command with either the -i or the -I option.
- .
- Denotes the parameters whose value was changed implicitly as a consequence of an explicit change made to another configuration parameter.
- --deadlock
- Displays the longest waiters that exceed the deadlock detection thresholds.
- --dmapi
- Displays various DMAPI information. If no other options are specified, summary information is
displayed for sessions, pending events, cached tokens, stripe groups, and events that are waiting
for reply. The --dmapi parameter accepts the following options:
- session
- Displays a list of sessions.
- event
- Displays a list of pending events.
- token
- Displays a list of cached tokens, stripe groups, and events that waiting for reply.
- disposition
- Displays the DMAPI disposition for events.
- all
- Displays all of the session, event, token, and disposition information with more details.
Note: -Y is not supported with the --dmapi option. - --eventproducer
- Displays statistics for file audit logging and clustered watch folder producers. The statistics include counts of how many messages have been sent, how many messages have been delivered (the target sink has acknowledged that the message has been received), messages that the producer failed to deliver, the amount of bytes sent, breakdown of the types of messages that were sent and delivered, and information on the status of the producer. For more information about the producer state and state changes, use the mmhealth command.
- --gds
- Displays the GPUDirect Storage (GDS) restriction counters. Each counter is counting the number of GDS operations returned to CUDA to be retried in compatibility mode because a specific limitation or error condition has been encountered. Retrying GDS requests in compatibility mode results in a significant performance drop. If one or more counters are increasing at a higher rate, the root cause must be investigated and the required actions taken to avoid the GDS limitation or error conditions. More information on restriction counters, see IBM Storage Scale Troubleshooting Guide.
- --iohist [verbose]
- Displays recent I/O history. The information about I/O requests recently submitted by GPFS code is shown here. It can provide some
insight into various aspects of GPFS IO, such
as the type of data or metadata being read or written, the distribution of I/O sizes, and I/O
completion times for individual I/Os. This information can be useful in performance tuning and troubleshooting.
- verbose
- Displays additional columns of information
info1
,info2
,context
, andthread
. The contents of the columns are as follows:- info1, info2
- The contents of columns
info1
andinfo2
depend on the buffer type. The buffer type is displayed in theBuf type
column of the command output:Table 1. Contents of columns input1 and input2 depending on the value in column Buf type Buf type (Buffer type) info1 info2 data The inode number of the file The block number of the file metadata The inode number of the file (For internal use by IBM®) LLIndBlock The inode number of the file (For internal use by IBM) inode (For internal use by IBM) The inode number of the file Other types, such as diskDesc, sgDesc, and others. (For internal use by IBM) (For internal use by IBM) context
- The I/O context that started this I/O.
thread
- The name of the thread that started this I/O.
The node that the command is issued from determines the I/O completion time that is shown.
If the command is issued from a Network Shared Disk (NSD) server node, the command shows the time that is taken to complete or serve the read or write I/O operations that are sent from the client node. This refers to the latency of the operations that are completed on the disk by the NSD server.
If the command is issued on an NSD client node that does not have local access to the disk, the command shows the complete time (requested by the client node) that is taken by the read or write I/O operations to complete. This refers to the latency of I/O request to the NSD server and the latency of I/O operations that are completed on the disk by the NSD server.
If the Type of the I/O is "lrc", this indicates the I/O was made to an LROC device. Under RW column, The LR value indicates read from LROC device, while the LS indicates write to LROC device. For more information, see Local read-only cache.
- --lroc
- Displays status and statistics for local read-only cache (LROC) devices. The statistics displayed are relevant only to the node where the command is issued from. This parameter is valid for x86_64, PPC64, and PPC64LE Linux® nodes.
- --memory
- Displays information about mmfsd memory usage. Several distinct memory
regions are allocated and used by mmfsd, and it can be important to know
the memory usage situation for each one.
- Heap memory that is allocated by mmfsd
- This area is managed by the OS and is not associated with a preset limit that is enforced by GPFS.
- Memory pools 1 and 2
- Both of these pools refer to a single memory area, also known as the shared segment. It is used to cache various kinds of internal GPFS metadata and for many other internal uses. This memory area is allocated by a special, platform-specific mechanism and is shared between user space and kernel code. The preset limit on the maximum shared segment size, current usage, and some prior usage information are shown here.
- Memory pool 3
- This area is also known as the token manager pool. This memory area is used to store the token state on token manager servers. The preset limit on the maximum memory pool size, current usage, and some prior-usage information are shown here.
This information can be useful when you are troubleshooting ENOMEM errors that are returned by GPFS to a user application and memory allocation failures reported in a GPFS log file.
- --network
- Displays information about mmfsd network connections and pending Remote Procedure Calls (RPCs). Basic information and statistics about all existing mmfsd network connections to other nodes is displayed, including information about broken connections. If any RPCs are pending (that is, sent but not yet replied to), the information about each one is shown, including the list of RPC destinations and the status of the request for each destination. This information can be helpful in following a multinode chain of dependencies during a deadlock or performance-problem troubleshooting.
- --nsd [all]
- Displays status and queue statistics for NSD queues that contain pending requests.
- all
- Displays status and queue statistics for all NSD queues.
- --nsdDiskAccessConfig
- Displays the currently active configuration for the nsdDiskAccessDistance parameter in the mmfsd process that is running on the node on which the command is issued. The output might be different from the output of the mmlsconfig command for this parameter if the configuration changes are yet to applied to the mmfsd daemon. Restarting GPFS applies the configuration changes.
--pagepool
Displays information about the GPFS pagepool. The output displays whether the dynamic pagepool is enabled and the current size of the pagepool.
When the dynamic pagepool is enabled, the output also displays the following information:- The minimum allowed size of the dynamic pagepool
- The smallest encountered size, low watermark, for the dynamic pagepool since the current instance of GPFS daemon was started.
- The maximum allowed size of the dynamic page pool.
- The largest encountered size, high watermark, of the dynamic pagepool since the current instance of GPFS daemon was started.
- --rpc
- Displays RPC performance statistics. The --rpc parameter accepts the
following options:
- node[=name]
- Displays all per node statistics (channel wait, send time TCP, send time verbs, receive time TCP, latency TCP, latency verbs, and latency mixed). If name is specified, all per node statistics for just the specified node are displayed.
- size
- Displays per size range statistics.
- message
- Displays per message type RPC execution time.
- all
- Displays everything.
- nn{S|s|M|m|H|h|D|d}
- Displays per node RPC latency statistics for the latest number of intervals, which are specified
by nn, for the interval specified by one of the following characters:
- S|s
- Displays second intervals only.
- M|m
- Displays first the second intervals since the last-minute boundary followed by minute intervals.
- H|h
- Displays first the second and minute intervals since their last minute and hour boundary followed by hour intervals.
- D|d
- Displays first the second, minute, and hour intervals since their last minute, hour, and day boundary followed by day intervals.
- --stats
- Displays some general GPFS statistics. GPFS uses a diverse array of objects to maintain the file system state and cache various types of metadata. The statistics about some of the more important object types are shown here.
- OpenFile
- This object is needed to access an inode. The target maximum number of cached OpenFile objects is governed by the maxFilesToCache configuration parameter. Note that more OpenFile objects can be cached, depending on the workload.
- CompactOpenFile
- These objects contain an abbreviated form of an OpenFile, and are collectively known as stat cache. The target maximum number of cached CompactOpenFile objects is governed by the maxStatCache parameter of the mmchconfig command.
- OpenInstance
- This object is created for each open file instance (file or directory that is opened by a distinct process).
- BufferDesc
- This object is used to manage buffers in the GPFS page pool.
- indBlockDesc
- This object is used to cache indirect block data.
All of these objects use the shared segment memory. For each object type, a preset target exists, which is derived from configuration parameters and the memory available in the shared segment. The information about current object usage can be helpful in performance tuning.
- --threads
- Displays mmfsd thread statistics and the list of active threads. For each thread, its type and kernel thread ID are shown. All non-idle mmfsd threads are shown. For those threads that are currently waiting for an event, the wait reason and wait time in seconds are shown. This information provides more detail than the data displayed by mmdiag --waiters.
- --tokenmgr
- Displays information about token management. For each mounted GPFS file system, one or more token manager nodes is appointed. The first token manager is always collocated with the file system manager, while other token managers can be appointed from the pool of nodes with the manager designation. The information that is shown here includes the list of currently appointed token manager nodes and, if the current node is serving as a token manager, some statistics about prior token transactions.
- --trace
- Displays current trace status and trace levels. During GPFS troubleshooting, it is often necessary to use the trace subsystem to obtain the debug data necessary to understand the problem. See Trace facility. It is important to have trace levels set correctly, per instructions provided by the IBM Support Center. The information that is shown here makes it possible to check the state of tracing and to see the trace levels currently in effect.
- --verbs
- Displays information about the VERBS RDMA subsystem.
The output displays:
- Whether VERBS RDMA is active.
- Status for each enabled RDMA port, including information such as:
- Current port state
- Interface ID
- LID
- Network interface name
- Link layer
- If an RDMA port entered fatal error state
- --version
- Displays information about the GPFS build currently running on this node. This information helps in troubleshooting installation problems. The information that is displayed here can be more comprehensive than the version information that is available from the OS package management infrastructure, in particular when an e-fix is installed.
- --waiters
- Displays mmfsd threads that are waiting for events. This information can be helpful in troubleshooting deadlocks and performance problems. For each thread, the thread name, wait time in seconds, and wait reason are typically shown. Only non-idle threads that are currently waiting for some event to occur are displayed. Note that only mmfsd threads are shown; any application I/O threads that might be waiting in GPFS kernel code would not be present here.
- -Y
- Displays the command output in a parseable format with a colon (:) as a field
delimiter. Each column is described by a header.Note: Fields that have a colon (:) are encoded to prevent confusion. For the set of characters that might be encoded, see the command documentation of mmclidecode. Use the mmclidecode command to decode the field.
Exit status
- 0
- Successful completion.
- nonzero
- A failure has occurred.
Security
You must have root authority to run the mmdiag command.
Examples
- To display a list of waiters, enter the following
command:
The command displays output like the following example:mmdiag --waiters
=== mmdiag: waiters === 0x11DA520 waiting 0.001147000 seconds, InodePrefetchWorker: for I/O completion 0x2AAAAAB02830 waiting 0.002152000 seconds, InodePrefetchWorker: for I/O completion 0x2AAAAB103990 waiting 0.000593000 seconds, InodePrefetchWorker: for I/O completion 0x11F51E0 waiting 0.000612000 seconds, InodePrefetchWorker: for I/O completion 0x11EDE60 waiting 0.005736500 seconds, InodePrefetchWorker: on ThMutex 0x100073ABC8 (0xFFFFC2000073ABC8) (CacheReplacementListMutex)
In this example, all waiters have a short wait duration and represent a typical snapshot of normal GPFS operation.
- To display information about memory use, enter the mmdiag --memory command.
The command displays output like the following example:
mmfsd heap size: 1503232 bytes current mmfsd heap bytes in use: 1919624 total 1867672 payload Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)") 128 bytes in use 557721725 hard limit on memory usage 1048576 bytes committed to regions 1 allocations 1 frees 0 allocation failures Statistics for MemoryPool id 2 ("Shared Segment") 8355904 bytes in use 557721725 hard limit on memory usage 8785920 bytes committed to regions 1297534 allocations 1296595 frees 0 allocation failures Statistics for MemoryPool id 3 ("Token Manager") 496184 bytes in use 510027355 hard limit on memory usage 524288 bytes committed to regions 1309 allocations 130 frees 0 allocation failures
In this example, a typical memory usage picture is shown. None of the memory pools are close to being full, and no prior allocation failures occurred.
- To display information about the network, enter the mmdiag --network command.
The command displays information like the following
example:
=== mmdiag: network === Pending messages: (none) Inter-node communication configuration: tscTcpPort 1191 my address 9.114.53.217/25 (eth2) <c0n2> my addr list 9.114.53.217/25 (eth2) my node number 4 TCP Connections between nodes: Device null: hostname node destination status err sock sent(MB) recvd(MB) ostype c941f1n05.pok.stglabs.ibm.com <c0n1> 9.114.78.25 broken 233 -1 0 0 Linux/L Device eth2: hostname node destination status err sock sent(MB) recvd(MB) ostype c941f3n03.pok.stglabs.ibm.com <c0n0> 9.114.78.43 connected 0 61 0 0 Linux/L c870f4ap06 <c0n3> 9.114.53.218 connected 0 64 0 0 Linux/B Connection details: <c0n1> 9.114.78.25/0 (c941f1n05.pok.stglabs.ibm.com) connection info: retry(success): 0(0) <c0n0> 9.114.78.43/0 (c941f3n03.pok.stglabs.ibm.com) connection info: retry(success): 0(0) tcp connection state: established tcp congestion state: open packet statistics: lost: 0 unacknowledged: 0 retrans: 0 unrecovered retrans: 0 network speed(µs): rtt(round trip time): 456 medium deviation of rtt: 127 pending data statistics(byte): read/write calls pending: 0 GPFS Send-Queue: 0 GPFS Recv-Queue: 0 Socket Send-Queue: 0 Socket Recv-Queue: 0 <c0n3> 9.114.53.218/0 (c870f4ap06) connection info: retry(success): 0(0) tcp connection state: established tcp congestion state: open packet statistics: lost: 0 unacknowledged: 0 retrans: 0 unrecovered retrans: 0 network speed(µs): rtt(round trip time): 8813 medium deviation of rtt: 13754 pending data statistics(byte): read/write calls pending: 0 GPFS Send-Queue: 0 GPFS Recv-Queue: 0 Socket Send-Queue: 0 Socket Recv-Queue: 0 Device details: devicename speed mtu duplex rx_dropped rx_errors tx_dropped tx_errors eth2 1000 1500 full 0 0 0 0 diag verbs: VERBS RDMA class not initialized
- To display information about status and statistics of all AFM and AFM DR relationships, enter
the mmdiag --afm command. The command displays output similar to the following
example:
=== mmdiag: afm === AFM Gateway: p7fbn10 Active AFM-Cache: adrFset-4 (/gpfs/fs1/adrFset-4) in Device: fs1 Mode: primary Home: p7fbn09 (nfs://p7fbn09/gpfs/fs1/adrFset-4) Fileset Status: Linked Handler-state: Mounted Cache-state: PrimInitInProg Q-state: Normal Q-length: 12126378 Q-executed: 40570 AFM-Cache: adrFset-5 (/gpfs/fs1/adrFset-5) in Device: fs1 Mode: primary Home: p7fbn09 (nfs://p7fbn09/gpfs/fs1/adrFset-5) Fileset Status: Linked Handler-state: Mounted Cache-state: PrimInitInProg Q-state: Normal Q-length: 6164585 Q-executed: 7113648 AFM-Cache: adrFset-10 (/gpfs/fs1/adrFset-10) in Device: fs1 Mode: primary Home: p7fbn09 (nfs://p7fbn09/gpfs/fs1/adrFset-10) Fileset Status: Linked Handler-state: Mounted Cache-state: PrimInitInProg Q-state: Normal Q-length: 16239687 Q-executed: 2415474
- To display gateway statistics, enter the mmdiag --afm gw command. The command
displays output similar to the following example:
=== mmdiag: afm === AFM Gateway: p7fbn10 Active QLen: 33165776 QMem: 12560682162 SoftQMem: 12884901888 HardQMem 32212254720 Ping thread: Started
- To display LROC statistics, enter the mmdiag --lroc command. The command
displays output similar to the following example:
=== mmdiag: lroc === LROC Device(s): '090BD5CD603456D2#/dev/nvme1n1;090BD5CD60354659#/dev/nvme0n1;' status Running Cache inodes 1 dirs 1 data 1 Config: maxFile -1 stubFile -1 Max capacity: 3051313 MB, currently in use: 3559 MB Statistics starting from: Tue Feb 23 11:42:32 2021 Inode objects stored 312454 (1220 MB) recalled 157366 (614 MB) = 50.36 % Inode objects queried 0 (0 MB) = 0.00 % invalidated 157460 (615 MB) Inode objects failed to store 6 failed to recall 0 failed to query 0 failed to inval 0 Directory objects stored 84 (2 MB) recalled 979 (226 MB) = 1165.48 % Directory objects queried 0 (0 MB) = 0.00 % invalidated 80 (6 MB) Directory objects failed to store 0 failed to recall 0 failed to query 0 failed to inval 0 inval no recall 0 Data objects stored 57412 (188807 MB) recalled 10150641 (40597918 MB) = 17680.35 % Data objects queried 1 (0 MB) = 100.00 % invalidated 57612 (228070 MB) Data objects failed to store 30 failed to recall 407 failed to query 0 failed to inval 0 inval no recall 54307 agent inserts=1074934, reads=162501285 response times (usec): insert min/max/avg=3/8030/121 read min/max/avg=1/66717/2919 ssd writeIOs=380060, writePages=48671744 readIOs=10345039194, readPages=10124700092 response times (usec): write min/max/avg=192/11985/233 read min/max/avg=13/49152/225
- To display the configuration information, run the mmdiag --config
command. The command displays output similar to the following
example:
# mmdiag --config === mmdiag: config === aclHashSpaceSize 2000 afmHashVersion 2 afmMaxWorkerThreads 1024 aioWorkerThreads 256 allowDeleteAclOnChmod 1 allToAllConnection no allToAllPerPingDelay 60.000000 milliseconds allToAllRandomDelayLimit 60 appendShipEnabled 0 assertOnStructureError 0 atimeDeferredSeconds 86400 ! ccrEnabled 1 ! cipherList AUTHONLY ! clusterId 10784156943329000315 clusterManagerSelection PreferManager ! clusterName scale-cluster-1.openstacklocal ...
- To display the GDS restriction counters, run the mmdiag
--gds command, as shown in the following example:
# mmdiag --gds === mmdiag: gds === GPU Direct Storage restriction counters: file less than 4k 0 sparse file 0 snapshot file 0 clone file 0 encrypted file 0 memory mapped file 0 compressed file 0 dioWanted fail 0 nsdServerDownlevel 0 nsdServerGdsRead 0 RDMA target port is down 0 RDMA initiator port is down 0 RDMA work request errors 0
To display information about the GPFS pagepool, run the mmdiag --pagepool -Y. The command displays output similar to the following example:
# mmdiag --pagepool -Y mmdiag:pagepool:HEADER:version:reserved:reserved:dynamicPagepool:minimumSize:currentSize:maximumSize:physicalMemorySize:lowWatermarkSize:highWatermarkSize mmdiag:pagepool:0:1:::1:6714884096:64424509440:64634224640:134298615808:53720645632:64424509440
To display information when dynamic pagepool is disabled (dynamicPagepoolEnabled=no). The command displays output similar to the following example:
# mmdiag --pagepool === mmdiag: pagepool === Dynamic pagepool: disabled Current pagepool size: 64424509440 Bytes (62914560 KiB, 61440 MiB, 60 GiB)
To display information when dynamic pagepool is enabled (dynamicPagepoolEnabled=yes). The command displays output similar to the following example:
# mmdiag --pagepool === mmdiag: pagepool === Dynamic pagepool: enabled Minimum pagepool size: 6714884096 Bytes (6557504 KiB, 6403 MiB, 6 GiB) Low watermark: 53720645632 Bytes (52461568 KiB, 51232 MiB, 50 GiB) Current pagepool size: 64424509440 Bytes (62914560 KiB, 61440 MiB, 60 GiB) High watermark: 64424509440 Bytes (62914560 KiB, 61440 MiB, 60 GiB) Maximum pagepool size: 64634224640 Bytes (63119360 KiB, 61640 MiB, 60 GiB) Physical memory size: 134298615808 Bytes (131150992 KiB, 128077 MiB, 125 GiB)
- To display VERBS RDMA status information, run the mmdiag --verbs command. The
command displays output similar to the following example:
# mmdiag --verbs === mmdiag: verbs === verbsRdmaStarted: yes verbsPort: mlx5_0/1/0/0 state : IBV_PORT_ACTIVE interface ID : 0xb8cef603004455f0 lid : 13 interface name : ib0 link layer : INFINIBAND fatal error : no verbsPort: mlx5_1/1/0/0 state : IBV_PORT_ACTIVE interface ID : 0xb8cef603004455f1 lid : 19 interface name : ib1 link layer : INFINIBAND fatal error : no
Location
/usr/lpp/mmfs/bin