Simulating a node failure
To simulate a hard node failure, power off the node and reinstall the operating system.
Prerequisites
-
A running IBM Storage Ceph cluster.
-
Root-level access to all nodes on the storage cluster.
Procedure
-
Check the storage cluster’s capacity to understand the impact of removing the node:
Example
[ceph: root@host01 /]# ceph df [ceph: root@host01 /]# rados df [ceph: root@host01 /]# ceph osd df -
Optionally, disable recovery and backfilling:
Example
[ceph: root@host01 /]# ceph osd set noout [ceph: root@host01 /]# ceph osd set noscrub [ceph: root@host01 /]# ceph osd set nodeep-scrub -
Shut down the node.
-
If you are changing the host name, remove the node from CRUSH map:
Example
[ceph: root@host01 /]# ceph osd crush rm host03 -
Check the status of the storage cluster:
Example
[ceph: root@host01 /]# ceph -s -
Reinstall the operating system on the node.
-
Add the new node:
-
For more information, see Adding hosts.
-
-
Optionally, enable recovery and backfilling:
Example
[ceph: root@host01 /]# ceph osd unset noout [ceph: root@host01 /]# ceph osd unset noscrub [ceph: root@host01 /]# ceph osd unset nodeep-scrub -
Check Ceph’s health:
Example
[ceph: root@host01 /]# ceph -s