Erasure code profiles

Ceph defines an erasure-coded pool with a profile. Ceph uses a profile when creating an erasure-coded pool and the associated CRUSH rule.

Ceph creates a default erasure code profile when initializing a cluster and it provides the same level of redundancy as two copies in a replicated pool. This default profile defines k=2 and m=2, meaning Ceph spreads the object data over four OSDs (k+m=4) and Ceph can lose one of those OSDs without losing data. EC 2+2 requires a minimum deployment footprint of 4 nodes (5 nodes recommended) and can cope with the temporary loss of 1 OSD nodes.

To display the default profile use the following command:

$ ceph osd erasure-code-profile get default
k=2
m=2
plugin=jerasure
technique=reed_sol_van

You can create a new profile to improve redundancy without increasing raw storage requirements. For instance, a profile with k=8 and m=4 can sustain the loss of four (m=4) OSDs by distributing an object on 12 (k+m=12) OSDs. Ceph divides the object into 8 chunks and computes 4 coding chunks for recovery. For example, if the object size is 8 MB, each data chunk is 1 MB and each coding chunk has the same size as the data chunk, that is also 1 MB. The object is not lost even if four OSDs fail simultaneously.

The most important parameters of the profile are k, m, and crush-failure-domain, because they define the storage overhead and the data durability.

Important: Choosing the correct profile is important because you cannot change the profile after you create the pool. To modify a profile, you must create a new pool with a different profile and migrate the objects from the old pool to the new pool.

For instance, if the desired architecture must sustain the loss of two racks with a storage overhead of 40% overhead, the following profile can be defined:

$ ceph osd erasure-code-profile set myprofile \
   k=4 \
   m=2 \
   crush-failure-domain=rack
$ ceph osd pool create ecpool 12 12 erasure *myprofile*
$ echo ABCDEFGHIJKL | rados --pool ecpool put NYAN -
$ rados --pool ecpool get NYAN -
ABCDEFGHIJKL

The primary OSD divides the NYAN object into four (k=4) data chunks and creates two additional chunks (m=2). The value of m defines how many OSDs can be lost simultaneously without losing any data. The crush-failure-domain=rack creates a CRUSH rule that ensures no two chunks are stored in the same rack. Figure 1 depicts the erasure code breakdown.

Figure 1. Erasure code
Erasure code setup

Table 1 lists the supported erasure coding profiles.

Table 1. Supported erasure coding profiles
k M Minimum number of nodes Minimum number of OSDs Redundancy Storage overhead
2 2 4 4 (1 per node) Loss of up to 1 node + 1 OSD or up to 2 OSDs 200%
2 2 5 or more 5 (1 per node) Loss of up to 2 nodes or OSDs 200%
4 2 7 or more 7 (1 per node) Loss of up to 2 nodes or OSDs 150%
8 3 12 or more 12 (1 per node) Loss of up to 3 nodes or OSDs 137.50%
8 4 13 or more 13 (1 per node) Loss of up to 4 nodes or OSDs 150%
8 6 4 or more 16 (4 per node) Loss of up to 1 node + 2 OSDs or up to 6 OSDs 175%