Monitoring the endurance of SSD Devices
You can monitor the endurance of the SSD drives in your system by using the mmhealth command.
An SSD or physical disk has a finite lifetime based on the number of drive writes per day. The
SSD endurance is a number between 0 and 255. The ssd-endurance-percentage value
indicates the percentage of life that is used by the drive. The value 0 indicates that full life
remains, and 100 indicates that the drive is at or past its end of life. When the endurance number
exceeds this threshold, the mmhealth command displays a
ssd_endurance_warn
warning with the specific physical disk name and the recovery
group name information. The drive must be replaced when the value exceeds 100, and the state of its
health is reported as DEGRADED
by the mmhealth command.
[root@client21 ~]# mmhealth node show NATIVE_RAID
If the endurance
number exceeds 100, the system gives an output similar to the following:
Node name: client21.sonasad.almaden.ibm.com
Component Status Status Change Reasons
----------------------------------------------------------------------------------------------------------------
NATIVE_RAID DEGRADED Now ssd_endurance_warn(rg1/n001p013)
ARRAY HEALTHY Now -
NVME HEALTHY 1 hour ago -
PHYSICALDISK DEGRADED Now ssd_endurance_warn(rg1/n001p013)
RECOVERYGROUP HEALTHY Now -
VIRTUALDISK HEALTHY Now -
[root@client21 ~]# mmhealth node show NATIVE_RAID
After the issue is
resolved the system gives an output similar to the following:
Node name: client21.sonasad.almaden.ibm.com
Component Status Status Change Reasons
--------------------------------------------------------------------
NATIVE_RAID HEALTHY Now -
ARRAY HEALTHY Now -
NVME HEALTHY 1 hour ago -
PHYSICALDISK HEALTHY Now -
RECOVERYGROUP HEALTHY Now -
VIRTUALDISK HEALTHY Now -