Configuring IBM Storage Ceph stretch mode
Once the IBM Storage
Ceph cluster is fully deployed
using cephadm
, use this information to configure the stretch cluster mode. The new
stretch mode is designed to handle the 2-site case.
Procedure
- Check the current election strategy being used by the monitors with the ceph mon
dump command. By default in a ceph cluster, the connectivity is set to classic.
Example output:ceph mon dump | grep election_strategy
dumped monmap epoch 9 election_strategy: 1
- Change the monitor election to connectivity.
ceph mon set election_strategy connectivity
- Rerun the ceph mon dump command to verify the
election_strategy
value.
Example output:ceph mon dump | grep election_strategy
dumped monmap epoch 10 election_strategy: 3
To know more about the different election strategies, see Administering > Operations > Management of monitors > Configuring monitor election strategy within the IBM Storage Ceph documentation.
- Set the location for all Ceph monitors.
ceph mon set_location ceph1 datacenter=DC1 ceph mon set_location ceph2 datacenter=DC1 ceph mon set_location ceph4 datacenter=DC2 ceph mon set_location ceph5 datacenter=DC2 ceph mon set_location ceph7 datacenter=DC3
- Verify that each monitor has its appropriate
location.
Example output:ceph mon dump
epoch 17 fsid dd77f050-9afe-11ec-a56c-029f8148ea14 last_changed 2022-03-04T07:17:26.913330+0000 created 2022-03-03T14:33:22.957190+0000 min_mon_release 16 (pacific) election_strategy: 3 0: [v2:10.0.143.78:3300/0,v1:10.0.143.78:6789/0] mon.ceph1; crush_location {datacenter=DC1} 1: [v2:10.0.155.185:3300/0,v1:10.0.155.185:6789/0] mon.ceph4; crush_location {datacenter=DC2} 2: [v2:10.0.139.88:3300/0,v1:10.0.139.88:6789/0] mon.ceph5; crush_location {datacenter=DC2} 3: [v2:10.0.150.221:3300/0,v1:10.0.150.221:6789/0] mon.ceph7; crush_location {datacenter=DC3} 4: [v2:10.0.155.35:3300/0,v1:10.0.155.35:6789/0] mon.ceph2; crush_location {datacenter=DC1}
- Create a CRUSH rule that makes use of this OSD crush topology by installing the
ceph-base
RPM package in order to use the crushtool command.dnf -y install ceph-base
To know more about CRUSH ruleset, see Concepts > Architecture > Core Ceph components > Ceph CRUS ruleset within the IBM Storage Ceph documentation.
- Get the compiled CRUSH map from the cluster.
ceph osd getcrushmap > /etc/ceph/crushmap.bin
- Decompile the CRUSH map and convert it to a text file in order to be able to edit
it.
crushtool -d /etc/ceph/crushmap.bin -o /etc/ceph/crushmap.txt
- Add the following rule to the CRUSH map by editing the text file
/etc/ceph/crushmap.txt at the end of the file.
vim /etc/ceph/crushmap.txt
rule stretch_rule { id 1 type replicated min_size 1 max_size 10 step take default step choose firstn 0 type datacenter step chooseleaf firstn 2 type host step emit } # end crush map
This example is applicable for active applications in both OpenShift Container Platform clusters.Note: The ruleid
has to be unique. In this example, there is only one more CRUSH rule withid 0
hence we are usingid 1
. If your deployment has more rules created, then use the next free ID.The CRUSH rule declared contains the following information:-
Rule name
- A unique whole name for identifying the rule. In this example,
stretch_rule
. -
id
- A unique whole number for identifying the rule. In this example,
1
. -
type
- A rule for either a storage drive replicated or erasure-coded. In this example,
replicated
. -
min_size
- If a pool makes fewer replicas than this number, CRUSH will not select this rule. In this
example,
1
. max_size
- If a pool makes more replicas than this number, CRUSH will not select this rule. In this
example,
10
. step take default
- Takes the root bucket called
default
, and begins iterating down the tree. step choose firstn 0 type datacenter
- Selects the datacenter bucket, and goes into it’s subtrees.
step chooseleaf firstn 2 type host
- Selects the number of buckets of the given type. In this case, it is two different hosts located in the datacenter it entered at the previous level.
step emit
- Outputs the current value and empties the stack. Typically used at the end of a rule, but may also be used to pick from different trees in the same rule.
-
- Compile the new CRUSH map from the file /etc/ceph/crushmap.txt and
convert it to a binary file called /etc/ceph/crushmap2.bin.
crushtool -c /etc/ceph/crushmap.txt -o /etc/ceph/crushmap2.bin
- Inject the newly created CRUSH map back into the
cluster.
Example output: 17ceph osd setcrushmap -i /etc/ceph/crushmap2.bin
Note: The number 17 is a counter and it will increase (18,19, and so on) depending on the changes made to the CRUSH map. - Verify that the stretched rule created is now available for
use.
Example output:ceph osd crush rule ls
replicated_rule stretch_rule
- Enable the stretch cluster
mode.
In this example,ceph mon enable_stretch_mode ceph7 stretch_rule datacenter
ceph7
is the arbiter node,stretch_rule
is the crush rule we created in the previous step anddatacenter
is the dividing bucket. - Verify all our pools are using the
stretch_rule
CRUSH rule that was created as part of the Ceph cluster.for pool in $(rados lspools);do echo -n "Pool: ${pool}; ";ceph osd pool get ${pool} crush_rule;done
Example output:
This indicates that a working IBM Storage Ceph stretched cluster with arbiter mode is now available.Pool: device_health_metrics; crush_rule: stretch_rule Pool: cephfs.cephfs.meta; crush_rule: stretch_rule Pool: cephfs.cephfs.data; crush_rule: stretch_rule Pool: .rgw.root; crush_rule: stretch_rule Pool: default.rgw.log; crush_rule: stretch_rule Pool: default.rgw.control; crush_rule: stretch_rule Pool: default.rgw.meta; crush_rule: stretch_rule Pool: rbdpool; crush_rule: stretch_rule