Monitoring NVMe Devices
You can monitor the health of any NVMe drives in your system using the mmlsnvmestatus command. You can monitor the status of all devices or a specific device, specified by serial number.
For each NVMe device, the mmlsnvmestatus command will identify any devices where the link status does not match the link capabilities (speed and width). Additionally, it will identify any devices where the device LBA format is not one of the designated “best” formats for that device.
This example shows the output of the command on a 4-server
system:
mmlsnvmestatus all
Optimal Optimal needs
node NVMe device serial number Link State LBA Formats service
------ ----------- ------------- ---------- ----------- ------
node1 /dev/nvme0 57L0A03LTZ5D NO YES NO
node1 /dev/nvme1 57L0A03KTZ5D YES YES NO
node2 /dev/nvme0 57M0A01GTZ5D YES NO NO
node2 /dev/nvme1 57M0A01JTZ5D YES YES NO
node3 /dev/nvme0 57M0A00UTZ5D YES YES NO
node3 /dev/nvme1 57M0A00KTZ5D YES YES NO
node4 /dev/nvme0 57M0A019TZ5D YES YES NO
node4 /dev/nvme1 57M0A00QTZ5D YES YES NO
You can pass the --not-ok flag example to only return devices with Link State or LBA
Format that is not optimal. For
example:
mmlsnvmestatus all --not-ok
Optimal Optimal needs
node NVMe device serial number Link State LBA Formats service
------ ----------- ------------- ---------- ----------- ------
node1 /dev/nvme0 57L0A03LTZ5D NO YES NO
node2 /dev/nvme0 57M0A01GTZ5D YES NO NO
In
this example, the NVMe device on node1 is shown to have "Optimal Link State" value of "NO". This is
likely due to device not being seated properly in PCIe slot. You can see more details by comparing
at the LnkCap and LnkSta output of lspci command for
this device. The NVMe device on node1 is shown to have "Optimal LBA Formats" value of "NO". You can
view the available format values and the current in use value with the nvme id-ns
command for the NVMe device.