Backing up multi-node cluster

Use the icic-opsmgr backup command to back up your essential IBM® Cloud Infrastructure Center data. You can then restore your system to a working state in the event of data corruption or data loss.

Notes:

  • If the multi-node cluster is established by non-root user, you need to call sudo icic-opsmgr backup command.

The icic-opsmgr backup command is available on any of the multi-node cluster management nodes. For syntax and options, run icic-opsmgr backup --help. Usually, you need to run the command on the DC management node. (You can get the DC node information by using the command pcs cluster status on a management node.)

The following data is backed up:

  • The IBM Cloud Infrastructure Center databases, such as the nova database where information about your registered hosts is stored
  • The IBM Cloud Infrastructure Center configuration data, such as /etc/nova
  • SSH private keys that are provided by the administrator
  • Data in Swift Object Storage, such as: Glance images, volume backups or aggregated metering data of Gnocchi
  • SSH key pairs generated for software-defined storage

The following data is NOT backed up:

  • The virtual server instances created by the IBM® Cloud Infrastructure Center
  • The OpenStack policy.yaml files
  • The version.properties file in folder /opt/ibm/icic/
  • The certificate files or directory of LDAP server

Complete the following steps to back up the IBM Cloud Infrastructure Center data:

  1. Open a command-line interface to the operating system on the multi-node cluster DC management node on which the IBM Cloud Infrastructure Center is installed.

  2. Run the icic-opsmgr backup command with any needed options. The following example runs the command and backs up data from the multi-node cluster management nodes and all compute nodes to a nondefault mounted file system's target directory:

    icic-opsmgr backup -c clustername -p /tmp/backups
    

    Make sure that it has enough disk space in target directory /tmp/backups before you run the backup command. You can run icic-opsmgr inventory -l to get the name of the multi-node cluster.

    When the backup operation completes, the new archive file icic_backup.tar.gz is stored in the dynamically created <timestamp> subdirectory of the target directory, for example, a potential file path is /tmp/backups/20200208210343451220/icic_backup.tar.gz . The backup files for the compute nodes are listed in the same folder /tmp/backups/0200208210343451220/ . The success.txt file is generated if the backup has no error. For example,

    [root@sample-dc-mgmt 20230521T005706697084]# ll
    total 1192
    -rw-r--r--. 1 root root 122305 May 21 00:59 hostname2_icic_backup_20230521015900644885.tar.gz
    -rw-r--r--. 1 root root 937626 May 21 00:58 icic_backup.tar.gz
    -rw-r--r--. 1 root root 156708 May 21 00:59 hostname1_icic_backup_20230521015900823624.tar.gz
    -rw-------. 1 root root      0 May 21 00:59 success.txt
    

    The failure.txt is generated if the backup has an error. Following the instruction in the failure.txt , you can get the complete backup files.

    [root@sample-dc-mgmt 20230516T014138268825]# ls -l
    total 767740
    -rw-------. 1 root root       253 May 16 01:44 failure.txt
    -rw-r--r--. 1 root root 785997667 May 16 01:43 icic_backup.tar.gz
    -rw-r--r--. 1 root root    155684 May 16 01:44 hostname1_icic_backup_20230516024402492807.tar.gz
    [root@sample-dc-mgmt 20230516T014138268825]# cat failure.txt
    
    Failed to get backup files from hostname2. You need to run `icic-opsmgr backup -c sample_cluster -r hostname2 -p /root/AA_plus_backup/20230516T014138268825` manually after fixing the backup errors in /opt/ibm/icic/log/icic_backup_20230516014356164790.log. Thanks
    

    The following example runs the command that uses the ignore-swift-data (-i) option to ignore the backup of swift data. (The images are stored in the swift object storage if you have adopted swift as the glance backend storage when setting up the multi-node cluster environment.)

    icic-opsmgr backup -c clustername -p /tmp/backups -i
    

    This backup with the ignore-swift-data (-i) option is called a lightweight backup because the size of the backup file is only a few megabytes and the backup takes little time.

    The following example runs the command that uses the remote-hosts (-r) option to back up the data on all compute nodes.

    icic-opsmgr backup -c clustername -p /tmp/backups -r all
    

    The following example runs the command that uses the remote-hosts (-r) option to back up the data on some specific compute nodes. Name list of available hosts can be fetched by running command icic-services remote list on the management node.

    icic-opsmgr backup -c clustername -p /tmp/backups -r hostname1,hostname2,...,hostnameN
    

    When you apply remote-hosts (-r) argument, the new archive files hostname1_icic_backup_20230604234816709821.tar.gz , hostname2_icic_backup_20230604234816709856.tar.gz and hostnameN_icic_backup_20230604234816709931.tar.gz are directly stored in the backup directory /tmp/backups .

The following table shows the CLI of the backup service in the IBM Cloud Infrastructure Center multi-node cluster environment:

Command Description
icic-opsmgr backup -c clustername Backup the management nodes and all compute nodes and store them into the default /var/opt/ibm/icic/backups folder
icic-opsmgr backup -c clustername -p backup_folder Backup to a specific target directory, that can be a mounted file path
icic-opsmgr backup -c clustername -r host1,host2,hostn Only backup the specified compute nodes
icic-opsmgr backup -c clustername -r all Backup all compute nodes
icic-opsmgr backup -c clustername -i A lightweight backup without image data, that is usually used in the following Running periodic active backups case
Note: If an error occurs while you run icic-opsmgr backup, check for errors in the icic-opsmgr backup logs in /opt/ibm/icic/log/backup . The log file contains the output when you run the command to backup data. And, the log file is like this /opt/ibm/icic/log/backup/icic-opsmgr_<clustername>_backup_<timestamp>/stdout. You can check failure.txt as well if you fail to backup a compute node.
Note: Forced termination of a backup action is not suggested. It might cause resource leak.

Backup your data after you manually, configure OVN or MacVTap on a KVM compute node. Backup your data after you manually remove the configuration of OVN or MacVTap on a KVM compute node.

You need to consider backing up your IBM Cloud Infrastructure Center data regularly, for example, daily, as part of a broader system backup and recovery strategy. If you want to regularly back up your data, you need to use your operating system's scheduling tool to set the icic-opsmgr backup command to run on a schedule. (Target off-peak intervals when few or no other operations are running in the IBM Cloud Infrastructure Center.)

Running periodic backups

Complete the following steps to schedule periodic backups. These backups run while the IBM Cloud Infrastructure Center is running.

  1. Create a shell script to run the backup.

    • The shell script needs the following command to regularly back up the management nodes and all compute nodes: icic-opsmgr backup -c <clustername> -p <backup_dir>. The IBM Cloud Infrastructure Center services are NOT stopped during the backup. The command runs successfully only when the Galera cluster is active.
    • The shell script can know whether the backup is successful or not by detecting the existence of file success.txt or failure.txt .
    • Because the backup runs while the management nodes are active, it is recommended that you run the backup when there is minimal activity in the IBM Cloud Infrastructure Center.
    • The backup directory needs to be a directory on a mounted Network File System (NFS) share if available. Otherwise, the script needs to back up to a directory on the local file system and then you need to use sftp or scp to transfer the backup to a remote system. Having the backup on a remote system ensures that the backup file is available.
    • Each backup that you run on the IBM Cloud Infrastructure Center creates a new backup file in the backup_dir directory that is specified, so the used disk space increases as more backups occur. To avoid using too much disk space, you can choose to have the script clean up older backups from the directory.
    Note: The backup file contains the IBM Cloud Infrastructure Center confidential data and customer's private data. You need ensure the security of the backup data files especially for which are stored on a mounted Network file system(NFS) or a remote system.
  2. Use cron to schedule a task on the DC management node to run the shell script at the interval of your choosing. You can find information about how to use cron jobs on the internet.

    Note: The interval can vary based on how often the data is updated in the swift cluster and how often new virtual machines are deployed or deleted in the IBM Cloud Infrastructure Center. Scheduling weekly tasks can be sufficient, but in other cases you might want to run the backup hourly. For example, The schedule is:
    • Run icic-opsmgr backup -c <clustername> -p <backup_dir> every week to backup the system data with the swift data. It is a complete backup.
    • Run icic-opsmgr backup -c <clustername> -p <backup_dir> -i every day to backup the system data without the swift data. It is faster and saves much disk space.
  3. Schedule a mixed backup plan

    The following backup result is an example for your reference.

    Date Command Status Start Time End Time
    2023-08-27 icic-opsmgr backup -c clustername -i -p ok 12:00 pm 00:03 am
    2023-08-27 icic-opsmgr backup -c clustername -p ok 01:00 am 01:10 am
    2023-08-27 icic-opsmgr backup -c clustername -i -p failed 12:00 pm 00:01 am
    2023-08-27 icic-opsmgr backup -c clustername -i -p ok 01:00 am 01:03 am
    2023-08-27 icic-opsmgr backup -c clustername -i -p ok 02:00 am 02:03 am
    2023-08-27 icic-opsmgr backup -c clustername -i -p ok 03:00 am 03:03 am
    2023-08-27 icic-opsmgr backup -c clustername -i -p ok 04:00 am 04:03 am
    2023-08-27 icic-opsmgr backup -c clustername -i -p ok 05:00 am 05:03 am
    2023-08-27 icic-opsmgr backup -c clustername -i -p ok 06:00 am 06:03 am
    2023-08-27 icic-opsmgr backup -c clustername -i -p ok start time end time
    2023-08-28 icic-opsmgr backup -c clustername -p failed 01:00 am 01:01 am

    The normal backup with image data occupies a large amount of disk space. However, it is a complete backup. The lightweight backup takes less diskspace and is faster. It can be run with a short time interval - daily, hourly or even every half an hour. The image data is not expected to be updated frequently. You can, manually, trigger a backup as well, if a new image is uploaded to the IBM Cloud Infrastructure Center.

    Note: If you have NOT adopted swift as the glance backend storage when setting up the multi-node cluster environment and you have configured filesystem_store_datadir in /etc/glance/glance-api.conf to a mounted NFS server or IBM Storage Spectrum instead of the default folder /var/lib/glance/images/, you can run icic-opsmgr backup -c clustername -p in above backup plan hourly and you do not need to run the lightweight backup. Please refer to Config image filesystem_store_datadir for the details.
    Note: When discussing recovery, you need to know how to achieve the 2 key objectives RTO (Recovery Time Objective) and RPO (Recovery Point Objective). Refer to Disaster Recovery in wikipedia for the details.

    If you have adopted the mixed backup plan per above example, the RPO of the data (excluding image data) is 1 hour. The RPO of the image data is 24 hours. The RTO is up to the size of the image data and the number of compute nodes. The recovery takes from a few minutes to tens of minutes.