Flashes (Alerts)
Abstract
Failure analysis and repair actions for ESS5000 NVDIMM errors
Content
- During IPL, SRC BC23352C posts, calling-out all four NVDIMMs or two NVDIMMs behind the same processor at mandatory priority, but system may still IPL with NVDIMMs deconfigured.
OR - Log tip pdisk state in the ESS Recovery Group of the corresponding ESS 5000 I/O Server node may become missing and the corresponding Recovery Group may fail to start. The logtip devices may contain key metadata of the corresponding Recovery Group and if when sufficient replicas of the metadata cannot be accessed, the Recovery Group containing user disks for the file system cannot be activated.
Hex Words 6-9: 00000002 0003001E 00000000 00000000
Fri Sep 25 12:08:12.155 2020 dsk2c-io2 ST [W] Log tip DA NVR of RG rg_dsk2c-io2: insufficient spare space to complete rebalance. Unavailable disks in this DA may cause performance degradation.
Fri Sep 25 12:08:12.149 2020 dsk2c-io2 ST [I] Start rebalance of DA NVR in RG rg_dsk2c-io2.
Fri Sep 25 12:08:12.148 2020 dsk2c-io2 ST [D] Pdisk n003v002 of RG rg_dsk2c-io2 state changed from missing/00048.0c0 to missing/undrainable/00048.0d0.
Fri Sep 25 12:08:12.148 2020 dsk2c-io2 ST [W] Log tip DA NVR of RG rg_dsk2c-io2: insufficient spare space to complete rebuild. Unavailable disks in this DA may cause performance degradation.
Fri Sep 25 12:08:12.129 2020 dsk2c-io2 ST [I] Finished repairing RGD/VCD in RG rg_dsk2c-io2.
Fri Sep 25 12:08:12.011 2020 dsk2c-io2 ST [I] Start repairing RGD/VCD in RG rg_dsk2c-io2.
Fri Sep 25 12:08:11.689 2020 dsk2c-io2 ST [D] Pdisk n003v002 of RG rg_dsk2c-io2 state changed from diagnosing/00020.0c0 to missing/00048.0c0.
2020-11-11_11:33:55.926+0100: [I] Beginning log tip recovery for LG root of RG rg_nsdibm13g.
2020-11-11_11:33:55.929+0100: [E] Unable to read logTip vdisk rg_nsdibm13g_logtip track 1 due to fatal pdisk IO errors!
2020-11-11_11:33:55.929+0100: [E] Unable to read logTip vdisk rg_nsdibm13g_logtip track 3 due to fatal pdisk IO errors!
2020-11-11_11:33:55.929+0100: [E] Unable to read logTip vdisk rg_nsdibm13g_logtip track 2 due to fatal pdisk IO errors!
2020-11-11_11:33:55.938+0100: [E] Unable to read logTip vdisk rg_nsdibm13g_logtip track 0 due to fatal pdisk IO errors!
2020-11-11_11:33:55.938+0100: [E] Beginning to resign log group root in recovery group rg_nsdibm13g due to "recovery
mmvdisk pdisk list --rg all --da NVR --not-ok
declustered
recovery group pdisk array paths capacity free space FRU (type) state
-------------- ------------ ----------- ----- -------- ---------- --------------- -----
ess5k_7894DBA n001v002 NVR 0 31 GiB 31 GiB 34GB NVRAM missing/undrainable
ess5k_7894E4A n001v001 NVR 0 31 GiB 31 GiB 34GB NVRAM missing/undrainable
mmhealth node show NATIVE_RAID PHYSICALDISK
ess5k_7894E4A/e3s105 HEALTHY 3 days ago -
ess5k_7894E4A/n001v001 DEGRADED 3 days ago gnr_pdisk_missing(ess5k_7894E4A/n001v001)
ess5k_7894E4A/n002v001 HEALTHY 3 days ago -
Event Parameter Severity Active Since Event Message
--------------------------------------------------------------------------------------------------------------------------------
gnr_pdisk_missing ess5k_7894DBA/n001v002 WARNING 3 days ago GNR pdisk ess5k_7894DBA/n001v002 is missing
gnr_pdisk_replaceable ess5k_7894E4A/e3s005 ERROR 3 days ago GNR pdisk ess5k_7894E4A/e3s005 is replaceable
gnr_pdisk_missing ess5k_7894E4A/n001v001 WARNING 3 days ago GNR pdisk ess5k_7894E4A/n001v001 is missing
If you cannot apply the above PTF level, contact IBM service.
Updating the NVDIMM firmware takes approximately eight minutes per NVDIMM on ESS I/O servers 5105-22e. If an NVDIMM FW update is required, then this would be incurred only on the initial system boot when updating system firmware or replacing an NVDIMM. There are four NVDIMMs per server, so up to an extra 32 minutes might be needed to complete the system boot in these cases.
Note:
If the error recurs against the same NVDIMMs in a few days or weeks after performing the procedures in the workaround, then do not reperform this procedure nor replace the NVDIMM, contact IBM Service.
Was this topic helpful?
Document Information
Modified date:
21 May 2021
UID
ibm16450863