Backing up and restoring the Red Hat Advanced Cluster Management for Kubernetes observability service

The Red Hat Advanced Cluster Management for Kubernetes observability service is used to store the Monitoring metric data. The metric data is stored in two locations:

  1. An S3-compatible storage bucket is used for long-term data storage. The S3 bucket contains all metric data that is sent for storage except for the last two hours of data. To backup and restore the S3 storage data, use the procedures that you might already have for handling S3 data.

  2. The new metric data is stored in kubernetes persistent volumes. The most recent 24 hours (four days for Red Hat Advanced Cluster Management for Kubernetes observability version 2.2 and earlier) of metric data is kept in the persistent volumes, and the new metric data is forwarded to the S3 storage bucket every two hours. To determine the stateful sets that need to be included in your backup and restore strategy, run the following command. You can use an open-source solution Velero to back up and restore the data.

    oc get pvc -n open-cluster-management-observability | grep thanos-receive-default
    

    In addition to backing up the appropriate metric storage volumes, it is also important to back up the following Red Hat Advanced Cluster Management for Kubernetes observability service configuration information:

    • The tenant id configuration maps. To determine the configuration maps that need to be backed up, run the following command:
      oc get configmap -n open-cluster-management-observability | grep thanos-receive-controller-tenants
      
    • The S3 storage configuration secret. To determine the secret that needs to be backed up, run the following command:

      oc get secret -n open-cluster-management-observability | grep thanos-object-storage
      
    • The data retention configuration. To determine the object that needs to be backed up, run the following command:

      oc get MultiClusterObservability -n open-cluster-management-observability
      
    • If custom metric recording rules have been defined, then the thanos rule configuration and persistent volumes should be backed up as well. To determine the configuration information and persistent volumes that need to be backed up, run the following commands:

      oc get pvc -n open-cluster-management-observability | grep thanos-rule
      
      oc get configmap -n open-cluster-management-observability | grep thanos-ruler-custom
      

    Notes:

    • The metric data that is stored in the kubernetes persistent volumes (thanos-receive-default) is transient in nature. It means that if the metric data is not restored to the persistent volumes, the persistent volumes will be rebuilt in a clean state. In this recovery scenario, about two hours of the most recent metric data will be lost. If you can accept this scenario, then it is not necessary to back up the kubernetes persistent volumes that are associated with the thanos-receive-default pods.

    • If the backup of the S3 bucket is already being handled by your storage provider, then it is only necessary to back up the Red Hat Advanced Cluster Management for Kubernetes observability configuration information.

    • This backup procedure only covers the monitoring metric data that is stored by the Monitoring in the observability service. If the observability service is also being used to monitor the cluster health, it might be necessary to back up additional kubernetes storage volumes and resources. For example, if you customized the observability grafana dashboards and alerting rules, the storage volumes and configuration maps that are associated with those components need to be backed up.

      To determine the resources that need to be backed up for the usage of the observability service, see Customizing observability or contact with Red Hat customer support.