Cluster configuration repository

The cluster configuration repository (CCR) of IBM Storage Scale is a fault tolerant configuration store that is used by nearly all IBM Storage Scale components, including GPFS, GUI, system health, and Cluster Export Services (CES) to name a few. It is not meant to be used directly by the customer. It offers an interface to store configuration files and flat key-value pairs.

The CCR state consists of files and directories under /var/mmfs/ccr. The consistency of CCR state replicas is maintained by using a majority consensus algorithm. That is, the CCR state from a majority of quorum nodes or tiebreaker disks must be available for the CCR to function properly. For example, a cluster with just one or two quorum nodes and no tiebreaker disks has zero fault tolerance. If CCR state is unavailable on one of the quorum nodes, the cluster reports a quorum loss. If the CCR state is unavailable on both quorum nodes, the cluster becomes inoperable. You can create a fault tolerance by assigning more quorum nodes or tiebreaker disks.

Tiebreaker disks can be configured while GPFS is up and running. Increasing the fault tolerance by leveraging up to eight quorum nodes and up to three tiebreaker disks helps to keep the CCR state. That means the CCR remains fully functional as long as a majority of quorum nodes or tiebreaker disks are available.

Common CCR functions

The CCR has the following functions:

Provides a PUT and GET interface for storing configuration files and flat key-value pairs redundantly across all quorum nodes. CCR uses a Paxos based algorithm to keep the stored data consistent among the quorum nodes. That is, ensure that all quorum nodes agree on the most recent version of each configuration file or key-value pair.
Updates the CCR configuration when the number of quorum nodes or tiebreaker disks are changing.
Creates a CCR backup file and initializes the entire quorum nodes from a CCR backup by using the mmsdrrestore -F <CCR_BACKUP_FILE> -a command.
Elects the cluster manager by running the Paxos protocol among the available quorum nodes.
If the cluster is configured with tiebreaker disks and the network between the current cluster manager and the remaining quorum nodes is not working, then one of the quorum nodes writes the so-called challenges to the tiebreaker disks. The current cluster manager must answer within a specific time limit to keep this role. Otherwise, the challenger node becomes the new cluster manager.
Provides monitor and debug commands that can be used for analysis when CCR is not functional. For more information about how to use these commands, see CCR issues.