Backup and restore
Backup and restore is a consideration, which is essential for production use.
You must ensure the backup of the crucial etcd database.
- The etcd database is the key-value store for RHOCP, which persists the state of all resource objects.
- Do not take an etcd backup before the first certificate rotation completes, which occurs 24 hours after installation. Otherwise, the backup contains expired certificates.
- Take etcd backups during non-peak usage hours, as it is a blocking action.
- Monitor Prometheus for etcd metrics and defragment it when needed before etcd raises a cluster-wide alarm that puts the cluster into a maintenance mode
For details see: Control plane backup and restore (Red Hat documentation).
Beyond the etcd database, there are the backup strategies for your virtualization layer. For example, z/VM or KVM.
The presentation Backup Strategies for z/VM and Linux provides helpful background information.