Replication
Similar to Ceph clients, Ceph OSDs can contact Ceph monitors to retrieve the latest copy of the cluster map. Ceph OSDs also use the CRUSH algorithm, but they use it to compute where to store replicas of objects. In a typical write scenario, a Ceph client uses the CRUSH algorithm to compute the placement group ID and the primary OSD in the Acting Set for an object. When the client writes the object to the primary OSD, the primary OSD finds the number of replicas that it should store.
The value is found in the osd_pool_default_size setting. Then, the primary OSD
takes the object ID, pool name and the cluster map and uses the CRUSH algorithm to calculate the IDs
of secondary OSDs for the acting set. The primary OSD writes the object to the secondary OSDs. When
the primary OSD receives an acknowledgment from the secondary OSDs and the primary OSD itself
completes its write operation, it acknowledges a successful write operation to the Ceph client.
Data copies
In a replicated storage pool, Ceph needs multiple copies of an object to operate in a degraded state. Ideally, a Ceph storage cluster enables a client to read and write data even if one of the OSDs in an acting set fails. For this reason, Ceph defaults to making three copies of an object with a minimum of two copies clean for write operations. Ceph will still preserve data even if two OSDs fail. However, it will interrupt write operations.
-
k=8 m=3
-
k=8 m=4
-
k=4 m=2