Restoring the management database in a Kubernetes environment

The management database can be restored as a complete restoration. Partial restorations are not supported.

Before you begin

Before you being restoring a management database, ensure your deployment and your process meets these requirements:
  • If you have to perform a restore, you must complete the restoration of the Management Service first, and then immediately restore the Developer Portal. The backups of the Management and Portal must be taken at the same time to ensure that the Portal sites are consistent with Management database.
  • Restoring the Management Service requires database downtime and is a destructive process that deletes current data and copies backup data. During the restoration process, external traffic must be stopped.
  • In a Disaster Recovery scenario, do not log in to the administration UI or attempt to configure or change any settings prior to restoring the backup. Restore the backup immediately after installing the subsystem.

  • To restore the management database, you must use the original project directory that was created with apicup during the initial product installation. You cannot restore the database without the initial project directory because it contains pertinent information about the cluster. The endpoints and certificates cannot change; the same endpoints and certificates will be used in the restored system. Note that successful restoration depends on use of a single apicup project for all subsystems, even those in a different cluster. Multiple projects will result in multiple certificate chains which will not match.

  • Map the DNS entries from the source cluster to the corresponding IP addresses on the target cluster. Record the DNS entries for each endpoint before starting the restore.
  • When restoring the management database, the endpoints (on the new cluster which is the target for the restoration) have to be the same as those on the old cluster (the source of the backup). This includes all the endpoints for API Connect: api-manager-ui, cloud-admin-ui, consumer-api, platform-api; api-gateway, api-gw-service; analytics-ingestion, analytics-client; and portal-admin, portal-www.
  • When restoring, the Gateway and all deployed subsystems (Management, Analytics, and Developer Portal) must be at the same version level.

About this task

Follow the procedure on this page to restore your management database. You must complete the prerequisite steps before beginning the restore.

Note that in a disaster recovery scenario you must first re-establish the management subystem. The procedure includes an optional first step for disaster recovery.

If you encounter errors, see Troubleshooting restoration of management database. Note that the troubleshooting page includes Overview of restore process for management database.

Procedure

  1. If the restoration is for a disaster recovery scenario, complete this step to first install a new Management subsystem. If the restoration is not for a disaster recovery (meaning that you have a running Management subsystem to use for restoration), skip this step and go directly to Step 2.
    1. Copy the project folder that corresponds to the backup files into a new location.
    2. If, for the previous installation, you have redirected the configuration to an optional output folder using the apicup subsys install mgmt --out=mgmt-out command as explained in Installing the Management subsystem into a Kubernetes environment, delete all output folders prior to starting the new installation.
    3. (Optional) If the image registry location or the secret has changed, update the image registry and registry-secret for each subsystem, as follows:
      • apicup subsys set <SUB_SYS> registry <REPOSITORY> 

        The location where you are running the image registry (for example, Artifactory). <SUB_SYS> is the name of the subsystem. <REPOSITORY> is the URL to the repository.

      • apicup subsys set <SUB_SYS> registry-secret <SECRET>

        Kubernetes secret with Docker credentials to authenticate with the image registry. The value for registry-secret must match the name of the secret created using the kubectl create secret command. The secret contains the Docker credentials for accessing the registry. <SUB_SYS> is the name of the subsystem. <SECRET> is the repository secret.

    4. Perform a fresh installation of the management subsystem using the following command:
      apicup subsys install <SUB_SYS>
      Important:

      Do not make any other changes in the project directory nor run any other apicup operations before proceeding to the next step.

  2. Verify that your deployment meets the prerequisites for restoring a management database:
    1. Verify that you are about to restore onto a deployment that has the same number of Cassandra pods as the deployment where the backup was created. For example, if the management service backup deployment had 3 Cassandra pods, the restore deployment must have 3 Cassandra pods. You cannot restore onto a deployment with fewer Cassandra pods because the Cassandra data is sharded across the pods. To successfully restore, first create the matching number of Cassandra pods.

      To view the number of pods, enter the kubectl get pods command.

    2. Ensure that all Cassandra pods are on-line and running normally.
  3. Restoration of a management subsystem requires access to backup tar files. Obtain your backup files, as follows:
    1. Enter apicup subsys exec <MANAGMENT_SUBSYS_NAME> list-backups. The output lists the current backups in your namespace with Backup ID and status.
      Cluster               Namespace     ID                    Timestamp                                 Status             
      rf0c7310d07-apiconnect-cc   e2edemo       1537987501522014136   2018-09-26 18:45:01.522014136 +0000 UTC   Complete
      rf0c7310d07-apiconnect-cc   e2edemo       1537920006787257385   2018-09-26 00:00:06.787257385 +0000 UTC   Complete

      The backup files are stored at the location specified by the cassandra-backup-path parameter.

      Examine the backup filename to identify the backup ID for use with the restore command.

      Table 1. Backup file naming convention
      Backup file name BackupID (for use with restore command)
      <backupId>-<pod index>-<# of pods>.tar.gz

      Example:

      1534954510365016356-0-3.tar.gz
      [backupId]

      Example:

      1534954510365016356
      Note: Disaster Recovery: If you are restoring on a new deployment, you will not have any backups. You will have to find the backupID by going to your backup host and looking for the backup files from previous deployments.
    2. Confirm that you have a backup file for each Cassandra pod in your cluster. If you have n pods, you must have the same number of backup files.
    3. Optional but recommended: Verify the integrity of the backup tar files.
      Ensure that the tar files are not corrupt. For example, on Linux:
      tar -tzf <backup_file>
    4. Optional but recommended: Determine whether your environment has sufficient space to perform the restore. You need enough free space such that the size of the backup file, when multiplied by four, does not exceed 85% of the available free space.

      For example, on Linux, you can use the following steps:

      1. Exec a bash terminal on the Cassandra pod:
        kubectl exec -it <Cassandra-pod> --bash
      2. Paste the following script to calculate space available for restore:
        # current available space
        avail=$(df /var/db/ | awk 'NR==2{print $4}')
         
        #space occupied by /var/db/data/
        db_data=$(du -s /var/db/data/ | awk '{print$1}')
        				
        # Estimated available space after cleanup (avail + db_data) and a buffer of 15%
        total_space_avail=$(((($avail + $db_data) * 1024) * 85 / 100))
        echo $total_space_avail
        

        The value obtained in the above script is in bytes and must be calculated for every pod, and compared against 4x the value, where x is the backup tar size of the corresponding backup file.

        Each backup file is in format <backup-id>-<ordinal-of-cassandra pod>-<number-of-cassandra-pods in cluster>.tar.gz

        In the example script above, if the backup tar size of <backup-id>-0-3.tar.gz is 15*1024*1024 bytes, the value for $total_space_avail in Cassandra cc-0 pod must be around 60*1024*1024 bytes.

        If free space is insufficient, create additional free space before starting the restore.
  4. Restore the management database by entering:
    apicup subsys exec <MANAGMENT_SUBSYS_NAME> restore <backupID>

    The restore command will restore all backups from files with the same backupID. You must have the same number of management database (Cassandra) pods running as the number of backup files that match the backupID.

  5. Verify that the restore process completed successfully.

    Ensure that the restore job is marked as completed. Note, however, that it is possible for the restore job to be marked complete, but the Cassandra restore is not complete.

    The best way to ensure that the Cassandra restore is complete is to review the CassandraRestoreStatus field in the CassandraClusters Custom Resource. When the CassandraRestoreStatus is completed, the Cassandra database is successfully restored.

    Example command flow to verify the restore:
    1. Examine the restore job and pod:
      # kubectl get jobs | grep restore				
      restore-plnfs   0/1           71s        71s
      
      # kubectl get pods | grep restore								
      restore-plnfs-hr2rh                      0/2     Init:0/2    0    54s	
      
    2. Since the status of the restore pod is in Init:0/2, the first init container (Container restore) is being executed. In this case, watch the ClusterRestoreStatus inside CassandraCluster (cc) Custom Resource to see the current status of the Cassandra restore process.
      kubectl get cc -o yaml | grep -A 1 ClusterRestoreStatus						
      ClusterRestoreStatus: Running Retrieve checks on backup file 1583268283534609276-1-3.tar.gz  for pod rdd94fb4a21-apiconnect-cc-1
      
      kubectl get cc -o yaml | grep -A 1 ClusterRestoreStatus						
      ClusterRestoreStatus: Running Retrieve checks on backup file 1583268283534609276-2-3.tar.gz  for pod rdd94fb4a21-apiconnect-cc-2
      
      kubectl get cc -o yaml | grep -A 1 ClusterRestoreStatus				                    	
      ClusterRestoreStatus: Restore prelim checks passed for rdd94fb4a21-apiconnect-cc-2
      
    3. When the restore process is complete, examine the restore pod and job status. Make sure ClusterRestoreStatus is marked as completed.
      kubectl get jobs | grep restore					      
      restore-plnfs                                 1/1           10m        7h17m
      
      kubectl get cc -o yaml | grep -A 1 ClusterRestoreStatus
      ClusterRestoreStatus: completed	
      

      If you encounter errors, see Troubleshooting restoration of management database.

  6. Version 2018.4.1.9 iFix1.0 and later: After completion of the restore, verify that all tasks are running. Complete the following steps:
    1. Download the apicops utility from https://github.com/ibm-apiconnect/apicops/releases.
    2. Run the following command to remove any pending tasks:
      $ apicops task-queue:fix-stuck-tasks
    3. Run the following command to verify that the returned list (task queue) is empty.
      $ apicops task-queue:list-stuck-tasks