Question & Answer
Question
What steps are needed to prepare a host for repair of a disk hardware issue?
Cause
You notice a red error indicator on a host in the vCenter web interface, which indicates a disk problem on one host in the vCenter Server instance that uses VSAN storage. The VMware version is 6.5 or 6.7 where SSD disks are the hardware used for the VSAN storage.
In an SSH session on that host, the command output for "/opt/lsi/storcli/storcli /c0 show" indicates one bad physical disk with state "Ubad" or "Failed" instead of "Online".
PD LIST :
=======
----------------------------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp
----------------------------------------------------------------------------------
8:0 9 Onln 0 931.0 GB SATA HDD N N 512B ST1000NM0033-9ZM173 U
8:1 12 Onln 0 931.0 GB SATA HDD N N 512B ST1000NM0033-9ZM173 U
8:2 16 Onln 1 893.75 GB SATA SSD Y N 512B Micron_5100_MTFDDAK960TCC U
8:3 18 Onln 2 1.745 TB SATA SSD Y N 512B Micron_5100_MTFDDAK1T9TCC U
8:4 19 Onln 3 1.745 TB SATA SSD Y N 512B Micron_5100_MTFDDAK1T9TCC U
8:5 17 Onln 4 893.75 GB SATA SSD Y N 512B Micron_5100_MTFDDAK960TCC U
8:6 13 Onln 5 1.745 TB SATA SSD Y N 512B Micron_5100_MTFDDAK1T9TCC U
8:7 15 Failed 6 1.745 TB SATA SSD Y N 512B Micron_5100_MTFDDAK1T9TCC U
----------------------------------------------------------------------------------
=======
----------------------------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp
----------------------------------------------------------------------------------
8:0 9 Onln 0 931.0 GB SATA HDD N N 512B ST1000NM0033-9ZM173 U
8:1 12 Onln 0 931.0 GB SATA HDD N N 512B ST1000NM0033-9ZM173 U
8:2 16 Onln 1 893.75 GB SATA SSD Y N 512B Micron_5100_MTFDDAK960TCC U
8:3 18 Onln 2 1.745 TB SATA SSD Y N 512B Micron_5100_MTFDDAK1T9TCC U
8:4 19 Onln 3 1.745 TB SATA SSD Y N 512B Micron_5100_MTFDDAK1T9TCC U
8:5 17 Onln 4 893.75 GB SATA SSD Y N 512B Micron_5100_MTFDDAK960TCC U
8:6 13 Onln 5 1.745 TB SATA SSD Y N 512B Micron_5100_MTFDDAK1T9TCC U
8:7 15 Failed 6 1.745 TB SATA SSD Y N 512B Micron_5100_MTFDDAK1T9TCC U
----------------------------------------------------------------------------------
Answer
To address a disk problem for an IBM Cloud for VMware Solutions instance where VSAN is included, complete these steps by using the details that follow:
- Put the host into maintenance mode.
- Remove the disk group for the disk in error.
- If the disk in error is a VSAN disk, then follow this step.
- If the failed drive is HDD0, which is the RAID1 drive that hosts the operating system, then skip this step because it does not apply.
- Create a support case to engage the IBM Cloud for VMware Solutions team to complete disk repair or replace and reconfigure.
Here are the detailed steps:
1. Put the host into maintenance mode.
- If Zerto is installed, refer to How to Place a Host with an Associated VRA into Maintenance Mode
- Follow the instructions in the VMware Docs article titled Place a host in maintenance mode
2. Remove the disk group for the drives that need to be replaced.
- Select the Storage tab in the vCenter Web client Navigator alt text
- Expand the cluster for the host with the disk issue, and select the vsanDatastore in that cluster
- Click the Configure tab on the right
- Select Device Backing
- Select the Disk Group below the host that contains the SSD drive in error
- Click the Remove Disk Group icon (third from the left), and wait for that task to complete
- When you remove the Disk Group, there might be errors on the Cluster Tab indicating vSAN irregularities. These errors are resolved with disk repair.
- More information on removing the disk group is available in the article VMware Virtual SAN Operations: Replacing Disk Devices within the VMware Blogs site. Refer particularly to the section "Flash Device Decommission Procedure from the vSphere Web Client."
3. Go to the Support Center in the IBM Cloud console to create a support case.
- Your case routes to the IBM Cloud for VMware Solutions team to complete disk repair/replace and reconfigure.
- Consider authorizing a cold-swap within the text you add to the case details, which simplifies the repair because the support teams can power down the host as part of the disk repair.
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSCLB3","label":"VMware Solutions"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)","Line of Business":{"code":"","label":""}}]
Was this topic helpful?
Document Information
Modified date:
30 July 2020
UID
ibm16254778