Configuring IBM Storage Ceph stretch mode

Once the IBM Storage Ceph cluster is fully deployed using cephadm, use this information to configure the stretch cluster mode. The new stretch mode is designed to handle the 2-site case.

Procedure

  1. Check the current election strategy being used by the monitors with the ceph mon dump command.
    By default in a ceph cluster, the connectivity is set to classic.
    ceph mon dump | grep election_strategy
    Example output:
    dumped monmap epoch 9
    election_strategy: 1
  2. Change the monitor election to connectivity.
    ceph mon set election_strategy connectivity
  3. Rerun the ceph mon dump command to verify the election_strategy value.
    ceph mon dump | grep election_strategy
    Example output:
    dumped monmap epoch 10
    election_strategy: 3

    To know more about the different election strategies, see Administering > Operations > Management of monitors > Configuring monitor election strategy within the IBM Storage Ceph documentation.

  4. Set the location for all Ceph monitors.
    ceph mon set_location ceph1 datacenter=DC1
    ceph mon set_location ceph2 datacenter=DC1
    ceph mon set_location ceph4 datacenter=DC2
    ceph mon set_location ceph5 datacenter=DC2
    ceph mon set_location ceph7 datacenter=DC3
  5. Verify that each monitor has its appropriate location.
    ceph mon dump
    Example output:
    epoch 17
    fsid dd77f050-9afe-11ec-a56c-029f8148ea14
    last_changed 2022-03-04T07:17:26.913330+0000
    created 2022-03-03T14:33:22.957190+0000
    min_mon_release 16 (pacific)
    election_strategy: 3
    0: [v2:10.0.143.78:3300/0,v1:10.0.143.78:6789/0] mon.ceph1; crush_location {datacenter=DC1}
    1: [v2:10.0.155.185:3300/0,v1:10.0.155.185:6789/0] mon.ceph4; crush_location {datacenter=DC2}
    2: [v2:10.0.139.88:3300/0,v1:10.0.139.88:6789/0] mon.ceph5; crush_location {datacenter=DC2}
    3: [v2:10.0.150.221:3300/0,v1:10.0.150.221:6789/0] mon.ceph7; crush_location {datacenter=DC3}
    4: [v2:10.0.155.35:3300/0,v1:10.0.155.35:6789/0] mon.ceph2; crush_location {datacenter=DC1}
  6. Create a CRUSH rule that makes use of this OSD crush topology by installing the ceph-base RPM package in order to use the crushtool command.
    dnf -y install ceph-base

    To know more about CRUSH ruleset, see Concepts > Architecture > Core Ceph components > Ceph CRUS ruleset within the IBM Storage Ceph documentation.

  7. Get the compiled CRUSH map from the cluster.
    ceph osd getcrushmap > /etc/ceph/crushmap.bin
  8. Decompile the CRUSH map and convert it to a text file in order to be able to edit it.
    crushtool -d /etc/ceph/crushmap.bin -o /etc/ceph/crushmap.txt
  9. Add the following rule to the CRUSH map by editing the text file /etc/ceph/crushmap.txt at the end of the file.
    vim /etc/ceph/crushmap.txt
    rule stretch_rule {
            id 1
            type replicated
            min_size 1
            max_size 10
            step take default
            step choose firstn 0 type datacenter
            step chooseleaf firstn 2 type host
            step emit
    }
    # end crush map
    This example is applicable for active applications in both OpenShift Container Platform clusters.
    Note: The rule id has to be unique. In this example, there is only one more CRUSH rule with id 0 hence we are using id 1. If your deployment has more rules created, then use the next free ID.
    The CRUSH rule declared contains the following information:
    Rule name
    A unique whole name for identifying the rule. In this example, stretch_rule.
    id
    A unique whole number for identifying the rule. In this example, 1.
    type
    A rule for either a storage drive replicated or erasure-coded. In this example, replicated.
    min_size
    If a pool makes fewer replicas than this number, CRUSH will not select this rule. In this example, 1.
    max_size
    If a pool makes more replicas than this number, CRUSH will not select this rule. In this example, 10.
    step take default
    Takes the root bucket called default, and begins iterating down the tree.
    step choose firstn 0 type datacenter
    Selects the datacenter bucket, and goes into it’s subtrees.
    step chooseleaf firstn 2 type host
    Selects the number of buckets of the given type. In this case, it is two different hosts located in the datacenter it entered at the previous level.
    step emit
    Outputs the current value and empties the stack. Typically used at the end of a rule, but may also be used to pick from different trees in the same rule.
  10. Compile the new CRUSH map from the file /etc/ceph/crushmap.txt and convert it to a binary file called /etc/ceph/crushmap2.bin.
    crushtool -c /etc/ceph/crushmap.txt -o /etc/ceph/crushmap2.bin
  11. Inject the newly created CRUSH map back into the cluster.
    ceph osd setcrushmap -i /etc/ceph/crushmap2.bin
    Example output: 17
    Note: The number 17 is a counter and it will increase (18,19, and so on) depending on the changes made to the CRUSH map.
  12. Verify that the stretched rule created is now available for use.
    ceph osd crush rule ls
    Example output:
    replicated_rule
    stretch_rule
  13. Enable the stretch cluster mode.
    ceph mon enable_stretch_mode ceph7 stretch_rule datacenter
    In this example, ceph7 is the arbiter node, stretch_rule is the crush rule we created in the previous step and datacenter is the dividing bucket.
  14. Verify all our pools are using the stretch_rule CRUSH rule that was created as part of the Ceph cluster.
    for pool in $(rados lspools);do echo -n "Pool: ${pool}; ";ceph osd pool get ${pool} crush_rule;done
    Example output:
    Pool: device_health_metrics; crush_rule: stretch_rule
    Pool: cephfs.cephfs.meta; crush_rule: stretch_rule
    Pool: cephfs.cephfs.data; crush_rule: stretch_rule
    Pool: .rgw.root; crush_rule: stretch_rule
    Pool: default.rgw.log; crush_rule: stretch_rule
    Pool: default.rgw.control; crush_rule: stretch_rule
    Pool: default.rgw.meta; crush_rule: stretch_rule
    Pool: rbdpool; crush_rule: stretch_rule
    This indicates that a working IBM Storage Ceph stretched cluster with arbiter mode is now available.