Recommendations for adding or removing nodes
Understand these recommendations for adding or removing nodes.
IBM recommends adding or removing one OSD at a time within a node and allowing the storage cluster to recover before proceeding to the next OSD. This helps to minimize the impact on storage cluster performance. Note that if a node fails, you might need to change the entire node at once, rather than one OSD at a time.
To remove an OSD:
-
Using Removing the OSD daemons.
To add an OSD:
When adding or removing Ceph OSD nodes, consider that other ongoing processes also affect storage cluster performance. To reduce the impact on client I/O, IBM recommends the following:
Calculate capacity
Before removing a Ceph OSD node, ensure that the storage cluster can backfill the contents of all
its OSDs without reaching the full ratio. Reaching the full ratio
will cause the storage cluster to refuse write operations.
Temporarily disable scrubbing
Scrubbing is essential to ensuring the durability of the storage cluster’s data; however, it is resource intensive. Before adding or removing a Ceph OSD node, disable scrubbing and deep-scrubbing and let the current scrubbing operations complete before proceeding.
ceph osd set noscrub
ceph osd set nodeep-scrub
Once you have added or removed a Ceph OSD node and the storage cluster has returned to an
active+clean state, unset the noscrub and
nodeep-scrub settings.
ceph osd unset noscrub
ceph osd unset nodeep-scrub
Limit backfill and recovery
If you have reasonable data durability, there is nothing wrong with operating in a
degraded state. For example, you can operate the storage cluster with
osd_pool_default_size = 3 and osd_pool_default_min_size = 2. You
can tune the storage cluster for the fastest possible recovery time, but doing so significantly
affects Ceph client I/O performance. To maintain the highest Ceph client I/O performance, limit the
backfill and recovery operations and allow them to take longer.
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_op_priority = 1
You can also consider setting the sleep and delay parameters such as,
osd_recovery_sleep.
Increase the number of placement groups
Finally, if you are expanding the size of the storage cluster, you may need to increase the number of placement groups. If you determine that you need to expand the number of placement groups, IBM recommends making incremental increases in the number of placement groups. Increasing the number of placement groups by a significant amount will cause a considerable degradation in performance.