IBM Storage Ceph RADOS: How to modify the failure domains for existing pools?

Troubleshooting

Problem

Change/Modify the failure domain of existing RADOS pools (Replicated and erasure-coded)
How to create a pool specifically on SSD OSDs for faster data reads/writes, while keeping other pools on HDDs?

Environment

IBM Storage Ceph 6.x
IBM Storage Ceph 7.x
IBM Storage Ceph 8.x
IBM Storage Ceph 9.x

Resolving The Problem

For Replicated Pools:

To change the failure domain for multiple replicated pools having a common crush rule, edit the CrushMap to change the existing replicated rule failure domain.
a. Collect the CRUSH map from the cluster and decompile into a text file:
```
$ ceph osd getcrushmap -o /tmp/cm.bin
$ crushtool -d /tmp/cm.bin -o /tmp/cm.bin.ascii
```
b. Carefully edit the rules section as required with your favorite EMACS. In this example, we change the failure domain from host to rack.
```
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
```
to
```
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type rack
step emit
}
```
c. Compile the adjusted CRUSH map back and update the cluster. Be sure to keep the previous CRUSH map files around for a time in case you need to revert.
```
$ crushtool -c /tmp/cm.bin.ascii -o /tmp/cm_updated.bin
$ ceph osd setcrushmap -i /tmp/cm_updated.bin
```
To change the failure domain for a specific replicated pool, create a new replicated rule specifying the new failure domain and device class. Every CRUSH rule should specify an OSD device class. In the above example, you would amend step take default to step take default~ssd or step take default~hdd.
a. Create a new replicated rule
```
$ ceph osd crush rule create-replicated <name> <root> <failure domain> <class>
```
b. Check the CRUSH rule name and then assign the new CRUSH rule to the pool
```
$ ceph osd crush dump --> get rule name
$ ceph osd pool set <pool-name> crush_rule <crush-rule-name>
```

NOTE: When the CRUSH map is updated the cluster will rebalance to converge on the new policy.
NOTE: It is preferred to use Method 2 above

For Erasure-coded Pools:

NOTE: Any CRUSH related information like failure-domain and device storage class will be used from the EC profile only during the creation of the CRUSH rule. Changes to the EC profile for an existing pool will not take effect, but this process can be used to change the failure domain and device class.

To change the failure domain for multiple erasure-coded pools using a common CRUSH rule, edit the CRUSH map to change the existing EC rule failure domain.
a. Collect and decompile the CRUSH map from the cluster.

$ ceph osd getcrushmap -o /tmp/cm.bin
$ crushtool -d /tmp/cm.bin -o /tmp/cm.bin.ascii

b. Edit the rules section as below, for example, change the failure domain from host to rack.

# rules
rule erasure-code {
id 3
type erasure
min_size 3
max_size 4
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host
step emit
}

# rules
rule erasure-code {
id 3
type erasure
min_size 3
max_size 4
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type rack
step emit
}

c. Compile the CRUSH map back and update in the cluster

$ crushtool -c /tmp/cm.bin.ascii -o /tmp/cm_updated.bin
$ ceph osd setcrushmap -i /tmp/cm_updated.bin

To change the failure domain for a specific EC pool, create a new EC rule and specify the new failure domain in the new EC profile. Do not change k or m or any attributes other than failure domain and device class.
a. Create a new EC profile
```
$ ceph osd erasure-code-profile set prof2 k=<int> m=<int> crush-failure-domain=<failure-domain>  crush-device-class=<class>
```
b. Create a new CRUSH rule with the new EC profile and update the pool with the new CRUSH rule
```
$ ceph osd crush rule create-erasure <new-crush-rule-name> <profile-name>
$ ceph osd pool set <pool-name> crush_rule <new-crush-rule-name>
```

NOTE: As the CRUSH map gets updated, the cluster will rebalance
NOTE: It is preferred to use the CLI when possible instead of method 2 above. CRUSH map changes are subtle and can lead to cluster issues: do not hesitate to raise a case with IBM Support to explore your goals and validate your planned actions.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB69","label":"Storage TPS"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSEG27","label":"IBM Storage Ceph"},"ARM Category":[{"code":"a8m3p000000UoIUAA0","label":"Documentation"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Tips

IBM Storage Ceph RADOS: How to modify the failure domains for existing pools?

Troubleshooting

Problem

Environment

Resolving The Problem

For Replicated Pools:

For Erasure-coded Pools:

Document Location

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?