Enabling three availability zones on the pool

Use this information to enable and integrate three availability zones within a generalized stretch cluster configuration.

Before you begin

Before you begin, make sure that you have the following prerequisites in place:
  • Root-level access to the nodes.
  • The CRUSH location is set to the hosts.

Procedure

  1. Get the most recent CRUSH map and decompile the map into a text file.
    ceph osd getcrushmap > COMPILED_CRUSHMAP_FILENAME
    crushtool -d COMPILED_CRUSHMAP_FILENAME -o DECOMPILED_CRUSHMAP_FILENAME
    For example,
    [ceph: root@host01 /]# ceph osd getcrushmap > crush.map.bin
    [ceph: root@host01 /]# crushtool -d crush.map.bin -o crush.map.txt
  2. Add the new CRUSH rule into the decompiled CRUSH map file from step 1.
    In this example, the rule name is 3az_rule.
    rule 3az_rule {
             id 1
             type replicated
             step take default
             step choose firstn 3 type datacenter
             step chooseleaf firstn 2 type host
             step emit
     }

    With this rule, the placement groups will be replicated with two copies in each of the three data centers.

  3. Inject the CRUSH map to make the rule available to the cluster.
    crushtool -c DECOMPILED_CRUSHMAP_FILENAME -o COMPILED_CRUSHMAP_FILENAME
    ceph osd setcrushmap -i COMPILED_CRUSHMAP_FILENAME
    For example,
    [ceph: root@host01 /]# crushtool -c crush.map.txt -o crush2.map.bin
    [ceph: root@host01 /]# ceph osd setcrushmap -i crush2.map.bin
    You can verify that the rule was injected successfully, by using the following steps.
    1. List the rules on the cluster.
      ceph osd crush rule ls
      For example,
      [ceph: root@host01 /]# ceph osd crush rule ls
      replicated_rule
      ec86_pool
      3az_rule
    2. Dump the CRUSH rule.
      ceph osd crush rule dump CRUSH_RULE
      For example,
      [ceph: root@host01 /]# ceph osd crush rule dump 3az_rule
      {
          "rule_id": 1,
          "rule_name": "3az_rule",
          "type": 1,
          "steps": [
              {
                  "op": "take",
                  "item": -1,
                  "item_name": "default"
              },
              {
                  "op": "choose_firstn",
                  "num": 3,
                  "type": "datacenter"
              },
              {
                  "op": "chooseleaf_firstn",
                  "num": 2,
                  "type": "host"
              },
              {
                  "op": "emit"
              }
          ]
      }
  4. Set the MON election strategy to connectivity.
    ceph mon set election_strategy connectivity
    When updated successfully, the election_strategy is updated to 3. The default election_strategy is 1.
  5. Optional: Verify the election strategy that was set in step 4.
    ceph mon dump
    Check that all mon daemons are in the output and that the correct CRUSH locations are added.
    [ceph: root@host01 /]# ceph mon dump
    epoch 19
    fsid b556497a-693a-11ef-b9d1-fa163e841fd7
    last_changed 2024-09-03T12:47:08.419495+0000
    created 2024-09-02T14:50:51.490781+0000
    min_mon_release 19 (squid)
    election_strategy: 3
    0: [v2:10.0.67.43:3300/0,v1:10.0.67.43:6789/0] mon.host01-installer; crush_location {datacenter=DC1}
    1: [v2:10.0.67.20:3300/0,v1:10.0.67.20:6789/0] mon.host02; crush_location {datacenter=DC1}
    2: [v2:10.0.64.242:3300/0,v1:10.0.64.242:6789/0] mon.host03; crush_location {datacenter=DC1}
    3: [v2:10.0.66.17:3300/0,v1:10.0.66.17:6789/0] mon.host06; crush_location {datacenter=DC2}
    4: [v2:10.0.66.228:3300/0,v1:10.0.66.228:6789/0] mon.host09; crush_location {datacenter=DC3}
    5: [v2:10.0.65.125:3300/0,v1:10.0.65.125:6789/0] mon.host05; crush_location {datacenter=DC2}
    6: [v2:10.0.66.252:3300/0,v1:10.0.66.252:6789/0] mon.host07; crush_location {datacenter=DC3}
    7: [v2:10.0.64.145:3300/0,v1:10.0.64.145:6789/0] mon.host08; crush_location {datacenter=DC3}
    8: [v2:10.0.64.125:3300/0,v1:10.0.64.125:6789/0] mon.host04; crush_location {datacenter=DC2}
    dumped monmap epoch 19
  6. Set the pool to associate with three availability zone stretch clusters.
    For more information about available pool values, see Pool values.
    ceph osd pool stretch set POOL_NAME PEERING_CRUSH_BUCKET_COUNT PEERING_CRUSH_BUCKET_TARGET PEERING_CRUSH_BUCKET_BARRIER CRUSH_RULE SIZE MIN_SIZE [--yes-i-really-mean-it]
    Replace the variables as follows:
    POOL_NAME
    The name of the pool. It must be an existing pool, this command doesn't create a new pool.
    PEERING_CRUSH_BUCKET_COUNT
    The value is used along with peering_crush_bucket_barrier to determined whether the set of OSDs in the chosen acting set can peer with each other, based on the number of distinct buckets there are in the acting set.
    PEERING_CRUSH_BUCKET_TARGET
    This value is used along with peering_crush_bucket_barrier and size to calculate the value bucket_max which limits the number of OSDs in the same bucket from getting chose to be in the acting set of a PG.
    PEERING_CRUSH_BUCKET_BARRIER
    The type of bucket a pool is stretched across. For example, rack, row, or datacenter.
    CRUSH_RULE
    The crush rule to use for the stretch pool. The type of pool must match the type of crush_rule (replicated or erasure).
    SIZE
    The number of replicas for objects in the stretch pool.
    MIN_SIZE
    The minimum number of replicas required for I/O in the stretch pool.
    Important: The --yes-i-really-mean-it flag is required when setting the PEERING_CRUSH_BUCKET_COUNT and PEERING_CRUSH_BUCKET_TARGET to be more than the number of buckets in the CRUSH map. Use the optional flag to confirm that you want to bypass the safety checks and set the values for a stretch pool.
    For example,
    [ceph: root@host01 /]# ceph osd pool stretch set pool01 2 3 datacenter 3az_rule 6 3
    Note: To revert a pool to a nonstretched cluster, use the ceph osd pool stretch unset POOL_NAME command. Using this command does not unset the crush_rule, size, and min_size values. If needed, these need to be reset manually.
    A success message is emitted that the pool stretch values were set correctly.
  7. Optional: Verify the pools associated with the stretch clusters, by using the ceph osd pool stretch show commands.
    For example,
    [ceph: root@host01 /]# ceph osd pool stretch show pool01
    pool: pool01
    pool_id: 1
    is_stretch_pool: 1
    peering_crush_bucket_count: 2
    peering_crush_bucket_target: 3
    peering_crush_bucket_barrier: 8
    crush_rule: 3az_rule
    size: 6
    min_size: 3