IBM Support

Host disks warnings

Troubleshooting


Problem

PureData System for Analytics: NPS reports Host disk warnings however when the host raid controller log is checked, all disks are reporting Online

Symptom

Description HW ID Location Role State
----------- ----- --------------------- ------ -------
HostDisk 1222 rack1.host1.hostDisk7 Spare Warning
HostDisk 1229 rack1.host2.hostDisk1 Active Warning
HostDisk 1230 rack1.host2.hostDisk2 Active Warning
HostDisk 1255 rack1.host2.hostDisk3 Active Warning

Cause

This is due to "Powersave" being enabled on the host disks.

Environment

TwinFin hosts with MegaRaid adapters

Diagnosing The Problem

In some older configurations, host disks are in a warning state but when checked in a DSA output the disks are all reporting "OK".

Description HW ID Location Role State
----------- ----- --------------------- ------ -------
HostDisk 1222 rack1.host1.hostDisk7 Spare Warning
HostDisk 1229 rack1.host2.hostDisk1 Active Warning
HostDisk 1230 rack1.host2.hostDisk2 Active Warning
HostDisk 1255 rack1.host2.hostDisk3 Active Warning

When you look at the MegaCLI pdlist output, you will see the "Other Error" count has incremented with no "Media" errors.

Media Error Count: 0
Other Error Count: 61

When a disk goes into powersave and spins down, a check condition is reported for that disk every time the controller has to "wake-up" the disk for selection. These check conditions will increment the "Other" error count. You can check for the corresponding powersave events in the adpevtlog. You should see a sequence of events similar to below.

Power state change on PD 09(e0xfc/s6) from ON(0) to POWERSAVE(1) <----Disk goes to sleep, aka spindown/powersave
Unexpected sense: PD 09(e0xfc/s6) Path 500000e117505132, CDB: 00 00 00 00 00 00, Sense: 2/04/02 <----Check condition reporting "LUN not ready, INIT CMD REQUIRED"
Power state change on PD 09(e0xfc/s6) from POWERSAVE(1) to TRANSITION(ff) <----Disk spin-up requested
Unexpected sense: PD 09(e0xfc/s6) Path 500000e117505132, CDB: 00 00 00 00 00 00, Sense: 2/04/11 <----Check condition reporting "LUN not ready, SPIN-UP REQUIRED"
Unexpected sense: PD 09(e0xfc/s6) Path 500000e117505132, CDB: 00 00 00 00 00 00, Sense: 2/04/01 <----Check condition reporting "LUN in process of becoming ready"
Power state change on PD 09(e0xfc/s6) from TRANSITION(ff) to ON(0) <----Disk is now ready

Resolving The Problem

This issue is likely seen on any M5000 series controller below firmware level 12.12.0-0085. If the controller is below that level, you will need to manually disable powersave by running the MegaCLI utility with the arguments below. Firmware 12.12.0-0085 disables this for all disk states.

NOTE**** You MUST run these for the Configured Drives and the Hot Spares. The Unconfigured Drives is optional since in theory we should never have a Unconfigured drive in the host.

These ./MegaCli64 arguments will display the current settings:

Configured Drives:
-AdpGetProp -DefaultLdPSPolicy -aALL
-LDGetProp -PSPolicy -LALL -aALL

Unconfigured Drives:
-AdpGetProp -EnblSpinDownUnConfigDrvs -aALL

Hot Spares:
-AdpGetProp -DsblSpinDownHSP -aALL


These ./MegaCli64 arguments will disable power save:

Configured Drives:
-AdpSetProp -DefaultLdPSPolicy -None-aALL
-LDSetPowerPolicy -None -LALL -aALL

Unconfigured Drives:
-AdpSetProp -EnblSpinDownUnConfigDrvs -0 -aALL

Hot Spares:
-AdpSetProp -DsblSpinDownHSP -0 -aALL


NOTE**** In order to clear the warnings for the hosts disks, you will need to reboot the host which will re-init the host raid controller and zero out the Other Error count.

[{"Product":{"code":"SSULQD","label":"IBM PureData System"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Host","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"1.0.0","Edition":"All Editions","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

More support for:
IBM PureData System

Software version:
1.0.0

Document number:
254765

Modified date:
17 October 2019

UID

swg21687996