Monitoring NVMe Devices

You can monitor the health of any NVMe drives in your system using the mmlsnvmestatus command. You can monitor the status of all devices or a specific device, specified by serial number.

For each NVMe device, the mmlsnvmestatus command will identify any devices where the link status does not match the link capabilities (speed and width). Additionally, it will identify any devices where the device LBA format is not one of the designated “best” formats for that device.

This example shows the output of the command on a 4-server system:

mmlsnvmestatus all
                                             Optimal     Optimal      needs
 node       NVMe device    serial number     Link State  LBA Formats service
 ------     -----------    -------------     ----------  -----------  ------
 node1      /dev/nvme0     57L0A03LTZ5D      NO          YES          NO
 node1      /dev/nvme1     57L0A03KTZ5D      YES         YES          NO
 node2      /dev/nvme0     57M0A01GTZ5D      YES         NO           NO
 node2      /dev/nvme1     57M0A01JTZ5D      YES         YES          NO
 node3      /dev/nvme0     57M0A00UTZ5D      YES         YES          NO
 node3      /dev/nvme1     57M0A00KTZ5D      YES         YES          NO
 node4      /dev/nvme0     57M0A019TZ5D      YES         YES          NO
 node4      /dev/nvme1     57M0A00QTZ5D      YES         YES          NO
You can pass the --not-ok flag example to only return devices with Link State or LBA Format that is not optimal. For example:

mmlsnvmestatus all --not-ok
                                             Optimal     Optimal      needs
 node       NVMe device    serial number     Link State  LBA Formats service
 ------     -----------    -------------     ----------  -----------  ------
 node1      /dev/nvme0     57L0A03LTZ5D      NO          YES          NO
 node2      /dev/nvme0     57M0A01GTZ5D      YES         NO           NO
In this example, the NVMe device on node1 is shown to have "Optimal Link State" value of "NO". This is likely due to device not being seated properly in PCIe slot. You can see more details by comparing at the LnkCap and LnkSta output of lspci command for this device. The NVMe device on node1 is shown to have "Optimal LBA Formats" value of "NO". You can view the available format values and the current in use value with the nvme id-ns command for the NVMe device.