Recovering the management subsystem from SFTP backups

You can recover the management subsystem from SFTP backups after a disaster event.

Before you begin

To successfully recover the management subsystem, you must have previously completed the steps in Preparing the management subsystem for disaster recovery.
Important: Successful disaster recovery depends on recovery of both the Management subsystem and the Developer Portal subsystem. You must complete preparation steps for both subsystems in order to achieve disaster recovery. If you have to perform a restore, you must complete the restoration of the Management Service first, and then immediately restore the Developer Portal. Therefore, the backups of the Management and Portal must be taken at the same time, to ensure that the Portal sites are consistent with Management database.

About this task

To recover from a disaster event, you must create a new IBM API Connect installation with a running IBM API Connect Operator.

Note: Limitation for backups created on Version 10.0.2
  • If restoring a Version 10.0.2 backup onto a new Version 10.0.2 deployment, performing a restore may not work if the subsystem CR name exceeds 15 characters. This limitation applies only to restoring onto Version 10.0.2.
  • Restoration is supported for Version 10.0.2 backups onto a Version 10.0.3.0 or later deployment when subsystem name exceeds 15 characters, as long as the correct spec.originalUID is specified upon restore. See Step 3.f.

Procedure

  1. Determine which backup to restore from.

    View the SFTP backups available on your remote storage site:

    -rw-r--r--    1 root     root     13092333 Aug 26 08:56 20200826-154646F.tgz
    -rw-r--r--    1 root     root     18703758 Aug 26 09:10 20200826-160010F.tgz
    -rw-r--r--    1 root     root     24318561 Aug 26 09:21 20200826-161301F.tgz
    

    Take note of the backup ID of the backup you wish to restore to. Each filename contains the date, time, and type of the backup stored. This will be used later in the procedure.

    The format of the backup ID is YYYYMMDD-HHMMSS<F|I>. For example, if we want to use the Aug 26, 09:21 backup, it's backup ID will be 20200826-161301F
    1. Incremental backups are denoted with a suffix I on the ID, ensure each incremental backup has it's prior backup also present in storage. You can check this by examining the ID <prior-backup-id>_<backup-id>.
    2. Full backups are denoted with a suffix F on the ID.
  2. Make sure you know the management database cluster name.

    You can get this name from the original management subsystem CR. You made note of this name in Step 2.d in Preparing the management subsystem for disaster recovery.

    If you are not able to recover the original management subsystem CR, you can also recover the Management database cluster name and siteName by examining the SFTP backup tar:

    1. Download or move the SFTP backup tar file and decompress (untar) it.
    2. Open <management-subsystem-name>-<siteName>-postgres-backrest-shared-repo/backup/db/<backup-id>/pg_data/postgresql.conf.gz which contains the management subsystem name and siteName. For example:
      # Do not edit this file manually!
      # It will be overwritten by Patroni!
      include 'postgresql.base.conf'
      
      archive_command = 'source /opt/cpm/bin/pgbackrest/pgbackrest-set-env.sh && pgbackrest archive-push "%p"'
      archive_mode = 'True'
      archive_timeout = '60'
      autovacuum_vacuum_cost_limit = '1000'
      autovacuum_vacuum_scale_factor = '0.01'
      cluster_name = 'm1-f785a3e3-postgres'

    In this example:

    • cluster_name has both the management subsystem name and siteName
    • m1 - Management subsystem name
    • f785a3e3 - site name
  3. Before installing the replacement management subsystem CR:
    1. Apply the YAML file that contains the Management Database Encryption Secret into the cluster. For example, where encryption-bin-secret.yaml is the local YAML file containing the backup-up encryption secret:
      kubectl create -f encryption-bin-secret.yaml -n <namespace>.

      Replace <namespace> with the namespace being used for the management subsystem installation.

      This command re-creates the original Management Database encryption secret on the cluster. It will be named as the original name of the secret.

    2. Add the following encryptionSecret subsection to the spec of the Management CR. For example, if management-enc-key is the name of the newly created secret on the cluster containing the original Management Database encryption secret from the previous step:
      encryptionSecret:
        secretName: management-enc-key
    3. For each of the saved YAML Files that contain the Management Client Application Credential Secrets, apply each file into the cluster using the following command:
      kubectl create -f <secret_name>.yaml -n <namespace>

      where <secret_name> is the local YAML file containing one of the backed-up Credential Secrets.

      Repeat this for each of the backed-up Credential Secrets. These are the secrets you saved in Step 2.b in Preparing the management subsystem for disaster recovery.

      These commands will re-create the original Management Client Application Credential Secrets on the cluster. Each will be named as the original name of the Secret.

    4. Add the following customApplicationCredentials subsection to the spec subsection of the Management CR:
      customApplicationCredentials:
      - name: atm-cred
        secretName: management-atm-cred
      - name: ccli-cred
        secretName: management-ccli-cred
      - name: cli-cred
        secretName: management-cli-cred
      - name: cui-cred
        secretName: management-cui-cred
      - name: dsgr-cred
        secretName: management-dsgr-cred
      - name: juhu-cred
        secretName: management-juhu-cred
      - name: ui-cred
        secretName: management-ui-cred
      

      For each named credential above, the secretName is given as the corresponding name of the newly created secret from Step 3.c.

    5. Add the siteName property to the spec of the Management CR.

      For example, if a2a5e6e2 is the original siteName that was noted after the installation of the original Management Subsystem:

      siteName: a2a5e6e2
    6. Version 10.0.3.0 or later: Add the originalUID: property to the spec of the Management CR.

      When recreating a system, to restore a backup into it, you must specify the same spec.originalUID in the CR as was present in the system that was backed up. If the spec.originalUID in the new CR for recreating the system does not match the spec.originalUID that was present in the system that was backed up, the restore will fail.

      
      spec:
        originalUID: "fa0f6f49-b931-4472-b84d-0922a9a92dfd"
      
      Note:
      • For Version 10.0.3.0 or later, if you do not specify spec.originalUID in the new CR, the operator automatically sets the Management CR value of spec.originalUID to match the new CR value metadata.uid. In this case, the restore will fail because the spec.originalUID in the saved (backed-up) CR does not match spec.originalUID in the new CR.
      • The originalUID is only essential when the subsystem CR name exceeds 15 characters in length, or 10 characters limit for the API Connect Cluster CR. Recommended practice is that all backups should include the originalUID for Management.
      • See also Step 2.d in Preparing the management subsystem for disaster recovery.
    7. Verify that the name of the management subsystem in the CR matches with the old management subsystem name, as described in Step 2.d in Preparing the management subsystem for disaster recovery.
  4. Install the Management subsystem CR with the values obtained in Step 2.d in Preparing the management subsystem for disaster recovery..
    Important: The hostnames of the endpoints cannot be changed, and must remain the same in the Management CR YAML file used for installation now as they were for the original installation.

    To review installation of the management subsystem, see Installing the Management subsystem cluster.

  5. Get a list of available backups and confirm that the backup ID noted in Step 1 is in the backup list.

    The API Connect Operator automatically reads backups from configured remote storage and populates the list of available backups we can restore to.

    $ kubectl get mgmtb
    NAME                STATUS     ID                 CLUSTER                SUBSYSTEM   TYPE   CR TYPE   AGE
    mgmt-backup-4z87f   Complete   20200606-145011F   m1-82b290a2-postgres   m1          full   record    11m
    mgmt-backup-6bms2   Complete   20200606-144315F   m1-82b290a2-postgres   m1          full   record    11m
    
  6. Perform a Management Restore using the name of the backup that has the ID you want to restore. For example, as shown in Step 5, for ID 20200606-145011F the backup name is mgmt-backup-4z87f.

    For more info on restoring the management subsystem, see Restoring the management subsystem (v10.0.1.1 or later).

  7. Use the following command to check the status of the restore:
    kubectl get mgmtr -n <namespace>

    Once the Management Restore has completed and the database is running again, the data of the old Management subsystem will be successfully restored onto the new Management subsystem. Manual and scheduled backups should perform as normal once again.

What to do next

You should now complete the recovery steps for the Developer Portal subsystem on Kubernetes, see Recovering the Developer Portal after a disaster.