Replacing the OSDs

When disks fail, you can replace the physical storage device and reuse the same OSD ID to avoid having to reconfigure the CRUSH map.

You can replace the OSDs from the cluster by preserving the OSD ID using the ceph orch rm command.
Note: If you want to replace a single OSD, Deploying Ceph OSDs on specific devices and hosts . If you want to deploy OSDs on all available devices, Deploying Ceph OSDs on all available devices.
The OSD is not permanently removed from the CRUSH hierarchy, but is assigned the destroyed flag. This flag is used to determine the OSD IDs that can be reused in the next OSD deployment. The destroyed flag is used to determine which OSD id is reused in the next OSD deployment.

If you use OSD specification for deployment, your newly added disk is assigned the OSD ID of their replaced counterparts.

Prerequisites

  • A running IBM Storage Ceph cluster.

  • Hosts are added to the cluster.

  • Monitor, Manager, and OSD daemons are deployed on the storage cluster.

Procedure

  1. Log into the cephadm shell:

    Example

    [root@host01 ~]# cephadm shell
  2. Check the device and the node from which the OSD has to be replaced:

    Example

    [ceph: root@host01 /]# ceph osd tree
  3. Set the OSD service specification to unmanaged state:
    Note: Set the unmanaged parameter to true in the OSD service specification before removal. If you skip this step, cephadm may automatically redeploy OSDs, causing conflicts between orchestrator-driven deployment and OSD removal.

    Example

    service_type:osd
    service_id:<service_id>
    unmanaged:true
  4. Apply the updated specification:

    ceph orch apply -i <osds.yaml>
  5. Run the orch ls ceph command to confirm that the OSD service is set to the unmanaged state.
  6. Replace the OSD:
    Important: If the storage cluster has health_warn or other errors associated with it, check and try to fix any errors before replacing the OSD to avoid data loss.

    Syntax

    ceph orch osd rm OSD_ID --replace

    Example

    [ceph: root@host01 /]# ceph orch osd rm 0 --replace
  7. Check the status of the OSD replacement:

    Example

    [ceph: root@host01 /]# ceph orch osd rm status
  8. Set the OSD service specification back to managed state. After you replace the disk, set the OSD service back to a managed state so that cephadm can resume orchestration.
    1. Update the OSD service specification.
      unmanaged: false
    2. Apply the updated specification. This step allows cephadm to redeploy the OSD on the new device while reusing the existing OSD ID.
      ceph orch apply -i <osds.yaml>

Verification

  • Verify the details of the devices and the nodes from which the Ceph OSDs are replaced:

    Example

    [ceph: root@host01 /]# ceph osd tree

    You will see an OSD with the same id as the one you replaced running on the same host.