Enabling three availability zones on the pool
Use this information to enable and integrate three availability zones within a generalized stretch cluster configuration.
Before you begin
- Root-level access to the nodes.
- The CRUSH location is set to the hosts.
Procedure
- Get the most recent CRUSH map and decompile the map into a text
file.
ceph osd getcrushmap > COMPILED_CRUSHMAP_FILENAME crushtool -d COMPILED_CRUSHMAP_FILENAME -o DECOMPILED_CRUSHMAP_FILENAMEFor example,[ceph: root@host01 /]# ceph osd getcrushmap > crush.map.bin [ceph: root@host01 /]# crushtool -d crush.map.bin -o crush.map.txt
- Add the new CRUSH rule into the decompiled CRUSH map file from step 1.In this example, the rule name is
3az_rule.rule 3az_rule { id 1 type replicated step take default step choose firstn 3 type datacenter step chooseleaf firstn 2 type host step emit }With this rule, the placement groups will be replicated with two copies in each of the three data centers.
- Inject the CRUSH map to make the rule available to the cluster.
crushtool -c DECOMPILED_CRUSHMAP_FILENAME -o COMPILED_CRUSHMAP_FILENAME ceph osd setcrushmap -i COMPILED_CRUSHMAP_FILENAMEFor example,[ceph: root@host01 /]# crushtool -c crush.map.txt -o crush2.map.bin [ceph: root@host01 /]# ceph osd setcrushmap -i crush2.map.bin
You can verify that the rule was injected successfully, by using the following steps.- List the rules on the cluster.
ceph osd crush rule lsFor example,[ceph: root@host01 /]# ceph osd crush rule ls replicated_rule ec86_pool 3az_rule
- Dump the CRUSH rule.
ceph osd crush rule dump CRUSH_RULEFor example,[ceph: root@host01 /]# ceph osd crush rule dump 3az_rule { "rule_id": 1, "rule_name": "3az_rule", "type": 1, "steps": [ { "op": "take", "item": -1, "item_name": "default" }, { "op": "choose_firstn", "num": 3, "type": "datacenter" }, { "op": "chooseleaf_firstn", "num": 2, "type": "host" }, { "op": "emit" } ] }
- List the rules on the cluster.
- Set the MON election strategy to
connectivity.ceph mon set election_strategy connectivityWhen updated successfully, theelection_strategyis updated to3. The defaultelection_strategyis1. - Optional: Verify the election strategy that was set in step 4.
ceph mon dumpCheck that all mon daemons are in the output and that the correct CRUSH locations are added.[ceph: root@host01 /]# ceph mon dump epoch 19 fsid b556497a-693a-11ef-b9d1-fa163e841fd7 last_changed 2024-09-03T12:47:08.419495+0000 created 2024-09-02T14:50:51.490781+0000 min_mon_release 19 (squid) election_strategy: 3 0: [v2:10.0.67.43:3300/0,v1:10.0.67.43:6789/0] mon.host01-installer; crush_location {datacenter=DC1} 1: [v2:10.0.67.20:3300/0,v1:10.0.67.20:6789/0] mon.host02; crush_location {datacenter=DC1} 2: [v2:10.0.64.242:3300/0,v1:10.0.64.242:6789/0] mon.host03; crush_location {datacenter=DC1} 3: [v2:10.0.66.17:3300/0,v1:10.0.66.17:6789/0] mon.host06; crush_location {datacenter=DC2} 4: [v2:10.0.66.228:3300/0,v1:10.0.66.228:6789/0] mon.host09; crush_location {datacenter=DC3} 5: [v2:10.0.65.125:3300/0,v1:10.0.65.125:6789/0] mon.host05; crush_location {datacenter=DC2} 6: [v2:10.0.66.252:3300/0,v1:10.0.66.252:6789/0] mon.host07; crush_location {datacenter=DC3} 7: [v2:10.0.64.145:3300/0,v1:10.0.64.145:6789/0] mon.host08; crush_location {datacenter=DC3} 8: [v2:10.0.64.125:3300/0,v1:10.0.64.125:6789/0] mon.host04; crush_location {datacenter=DC2} dumped monmap epoch 19 - Set the pool to associate with three availability zone stretch
clusters.For more information about available pool values, see Pool values.
ceph osd pool stretch set POOL_NAME PEERING_CRUSH_BUCKET_COUNT PEERING_CRUSH_BUCKET_TARGET PEERING_CRUSH_BUCKET_BARRIER CRUSH_RULE SIZE MIN_SIZE [--yes-i-really-mean-it]Replace the variables as follows:- POOL_NAME
- The name of the pool. It must be an existing pool, this command doesn't create a new pool.
- PEERING_CRUSH_BUCKET_COUNT
- The value is used along with peering_crush_bucket_barrier to determined whether the set of OSDs in the chosen acting set can peer with each other, based on the number of distinct buckets there are in the acting set.
- PEERING_CRUSH_BUCKET_TARGET
- This value is used along with peering_crush_bucket_barrier and size to calculate the value
bucket_maxwhich limits the number of OSDs in the same bucket from getting chose to be in the acting set of a PG. - PEERING_CRUSH_BUCKET_BARRIER
- The type of bucket a pool is stretched across. For example, rack, row, or datacenter.
- CRUSH_RULE
- The crush rule to use for the stretch pool. The type of pool must match the type of crush_rule (replicated or erasure).
- SIZE
- The number of replicas for objects in the stretch pool.
- MIN_SIZE
- The minimum number of replicas required for I/O in the stretch pool.
Important: The --yes-i-really-mean-it flag is required when setting the PEERING_CRUSH_BUCKET_COUNT and PEERING_CRUSH_BUCKET_TARGET to be more than the number of buckets in the CRUSH map. Use the optional flag to confirm that you want to bypass the safety checks and set the values for a stretch pool.For example,[ceph: root@host01 /]# ceph osd pool stretch set pool01 2 3 datacenter 3az_rule 6 3
Note: To revert a pool to a nonstretched cluster, use the ceph osd pool stretch unset POOL_NAME command. Using this command does not unset thecrush_rule,size, andmin_sizevalues. If needed, these need to be reset manually.A success message is emitted that the pool stretch values were set correctly. - Optional: Verify the pools associated with the stretch clusters, by using the
ceph osd pool stretch show commands.For example,
[ceph: root@host01 /]# ceph osd pool stretch show pool01 pool: pool01 pool_id: 1 is_stretch_pool: 1 peering_crush_bucket_count: 2 peering_crush_bucket_target: 3 peering_crush_bucket_barrier: 8 crush_rule: 3az_rule size: 6 min_size: 3