Backup and restore

Backup and restore is a consideration, which is essential for production use.

You must ensure the backup of the crucial etcd database.

  • The etcd database is the key-value store for RHOCP, which persists the state of all resource objects.
  • Do not take an etcd backup before the first certificate rotation completes, which occurs 24 hours after installation. Otherwise, the backup contains expired certificates.
  • Take etcd backups during non-peak usage hours, as it is a blocking action.
  • Monitor Prometheus for etcd metrics and defragment it when needed before etcd raises a cluster-wide alarm that puts the cluster into a maintenance mode

For details see: Control plane backup and restore (Red Hat documentation).

Beyond the etcd database, there are the backup strategies for your virtualization layer. For example, z/VM or KVM.

The presentation Backup Strategies for z/VM and Linux provides helpful background information.