IBM Support

PureData System for Analytics: Host Disk 'Failed Down'

Troubleshooting


Problem

When you get on a system and nzhw -issues shows the following: [] [nz@nzhost ~]$ nzhw -issues Description HW ID Location Role State ------------- ----- -------------------------- ------ ------- HostDisk 1080 rack1.host2.hostDisk1 Failed Down SASController 1085 rack1.host2.SASController0 Active Warning []

Symptom


The corresponding SASController will also show up with a Warning

Cause

A host disk has failed.

Diagnosing The Problem

As the Linux 'root' user, verify that the part is actually bad by running /opt/MegaRAID/MegaCli/MegaCli64 pdlist a0 | more

  • You will get a long output. Go through it and look out for Firmware state: Unconfigured(bad) :
Enclosure Device ID: 252
Slot Number: 1
Enclosure position: N/A
Device Id: 16
WWN: 5000CCA00AE01D6B
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 136.731 GB [0x11176d60 Sectors]
Non Coerced Size: 136.231 GB [0x11076d60 Sectors]
Coerced Size: 135.972 GB [0x10ff2000 Sectors]
Emulated Drive: No
Firmware state: Unconfigured(bad)
Device Firmware Level: C610
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x5000cca00ae01d69
SAS Address(1): 0x0
Connected Port Number: 6(path0)
Inquiry Data: IBM-ESXSCBRCA146C3ETS0 NC610PCYZ7XAECCXSA610
IBM FRU/CRU: 42D0422
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: Foreign
Foreign Secure: Drive is not secured by a foreign lock key

Resolving The Problem

  1. Verify if the system is a Skimmer(100), Twinfin (N1001), Striper(N2001) , or Cruiser ( 100, N1001, N2001/N2002, or C1000)
  2. Get the Replacement Procedures Guide for the corresponding machine type from the Netezza Documentation Page.
  3. From the Replacement Procedure Guide, get the correct FRU number that you will need:
    • *** Do Not trust the FRU number in the output of Step 1 as this is a generic FRU and may not be correct ***
    • The FRU will be in the beginning of the "Replacing a Host Disk Drive" chapter.
  4. The Replacement will require NO outage. The system needs to be online.
  5. Ask the customer when they would like to perform the host disk drive replacement.
    • *** Please note that we need AT LEAST 4 hours notice for US customers to order the part and get an SSR on-site***
  6. Once you have the date and time for the replacement, fill out the TSS Ticket ( Netezza Work Flow ) Insert and ask for the SSR to come on-site with the needed FRU number. (Please specify the quantity if you need more than one)
  7. Generate and requeue a secondary to NZOPS,387 so that they may assign an SSR and open a service request for the host disk drive replacement.
  8. Once an SSR has been assigned, call the SSR and confirm that they have the correct FRU.
  9. Follow the steps in the Replacement Procedure Guide for the Host Disk Drive Replacement.

Related Information

[{"Product":{"code":"SSULQD","label":"IBM PureData System"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Host","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"1.0.0","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
17 October 2019

UID

swg21693785