Replacing the failed OSDs

You can replace the failed OSDs in an IBM Storage Ceph cluster with the cluster-manager level of access on the dashboard. One of the highlights of this feature on the dashboard is that the OSD IDs can be preserved while replacing the failed OSDs.

Before you begin

Before you begin, make sure that you have the following prerequisites in place:
  • A running IBM Storage Ceph cluster.

  • At least cluster-manager level of access to the Ceph Dashboard.

  • At least one of the OSDs is down

Procedure

  1. From the dashboard, identify the failed OSDs.
    The failed OSDs can be identified in any of the following ways:
    • Dashboard AlertManager pop-up notifications.
    • Dashboard landing page showing HEALTH_WARN status.
    • Dashboard landing page showing failed OSDs.
    • Dashboard OSD page showing failed OSDs.

      You can also view the LED blinking lights on the physical drive if one of the OSDs is down.

    Figure 1 shows tone down and one out OSD from the Ceph Dashboard landing page.
    Figure 1. Health status of OSDs on dashboard

    Screen showing health status of OSDs from the main dashboard.
  2. From Cluster > OSDs on the OSDs List table, select the out and down OSD.
    1. Click Flags from the action drop-down, select No Up in the Individual OSD Flags form, and click Update.
    2. Click Delete from the action drop-down. In the Delete OSD dialog, select Preserve OSD ID(s) for replacement and Yes, I am sure and click Delete OSD.
    3. Wait until the status of the OSD changes to out and destroyed.
  3. Optional: To change the No Up flag for the entire cluster, from the Cluster-wide configuration menu, select Flags.
    1. In the Cluster-wide configuration form, select No Up and click Update.
  4. Optional: If the OSDs are down due to a hard disk failure, replace the physical drive.
    • If the drive is hot-swappable, replace the failed drive with a new one.
    • If the drive is not hot-swappable and the host contains multiple OSDs, you might have to shut down the whole host and replace the physical drive. Consider preventing the cluster from backfilling. For more information, see Stopping and Starting Rebalancing.
    • When the drive appears under the /dev/ directory, make a note of the drive path.
    • If you want to add the OSD manually, find the OSD drive and format the disk.
    • If the new disk has data, zap the disk:
      ceph orch device zap HOST_NAME PATH --force
      For example,
      ceph orch device zap ceph-adm2 /dev/sdc --force
  5. From Cluster > OSDs on the OSDs List table, click Create.
  6. In the Create OSDs from Advanced Mode section, add a primary device.
    1. In the Primary devices dialog, select a Hostname filter.
    2. Select a device type from the list.
      Note: You have to select the Hostname first and then at least one filter to add the devices.

      For example, from Hostname list, select Type and then hdd.

    3. Select Vendor from the device list, select ATA.
    4. Click Add.
    Figure 2 shows how to select the hdd type in the Primary devices filter.
    Figure 2. Using Primary devices filter

    This screen shows the Primary devices table. The filter is set to show 10 hostnames at a time and the Type is being changed from Any to hdd. Within the Primary devices table, the ceph-adm2 hostname is displaying with the HDD type.
  7. In the Create OSDs from, click Preview.
  8. In the OSD Creation Preview dialog, click Create.
    A notification displays that the OSD is created successfully and the OSD changes to be in the out and down status.
  9. Select the newly created OSD that has out and down status.
    1. From the action drop-down, click Mark In.
    2. In the Mark OSD in notification, click Mark In.
      The OSD status changes to in.
    3. Click Flags from the action drop-down.
    4. Clear the No Up selection and click Update.
  10. Optional: If you have changed the No Up flag before for cluster-wide configuration, in the Cluster-wide configuration menu, select Flags.
    1. In Cluster-wide OSDs Flags form, clear the No Up selection and click Update.

What to do next

Verify that the OSD that was destroyed is created on the device and the OSD ID is preserved.