Recovering the management subsystem from SFTP backups after a disaster event on VMware (10.0.1.1-eus)

Recover the management subsystem from SFTP backups after a disaster event on 10.0.1.1-eus

Before you begin

To successfully recover the management subsystem, you must have previously completed the steps in Preparing the management subsystem for disaster recovery on VMware using SFTP backups (10.0.1.1-eus).

You must use the same project directory that you used for your original deployment, to ensure that configuration and secret information is transferred to the replacement deployment.

Important: Successful disaster recovery depends on recovery of both the Management subsystem and the Developer Portal subsystem. You must complete preparation steps for both subsystems in order to achieve disaster recovery. If you have to perform a restore, you must complete the restoration of the Management Service first, and then immediately restore the Developer Portal. Therefore, the backups of the Management and Portal must be taken at the same time, to ensure that the Portal sites are consistent with Management database.

Procedure

  1. Determine which backup to restore from.

    View the SFTP backups available on your remote storage site:

    -rw-r--r--    1 root     root     13092333 Aug 26 08:56 20200826-154646F.tgz
    -rw-r--r--    1 root     root     18703758 Aug 26 09:10 20200826-160010F.tgz
    -rw-r--r--    1 root     root     24318561 Aug 26 09:21 20200826-161301F.tgz
    

    Take note of the backup ID of the backup you wish to restore to. Each filename contains the date, time, and type of the backup stored. This will be used later in the procedure.

    The format of the backup ID is YYYYMMDD-HHMMSS<F|I>. For example, if we want to use the Aug 26, 09:21 backup, it's backup ID will be 20200826-161301F
    1. Incremental backups are denoted with a suffix I on the ID, ensure each incremental backup has it's prior backup also present in storage. You can check this by examining the ID <prior-backup-id>_<backup-id>.
    2. Full backups are denoted with a suffix F on the ID.
  2. Make sure you know the management database cluster name.

    You can get this name from the original management subsystem CR. You made note of this name in Step 2.e in Preparing the management subsystem for disaster recovery on VMware using SFTP backups (10.0.1.1-eus).

    If you are not able to recover the original management subsystem CR, you can also recover the Management database cluster name and siteName by examining the SFTP backup tar:

    1. Download or move the SFTP backup tar file and decompress (untar) it.
    2. Open <management-subsystem-name>-<siteName>-postgres-backrest-shared-repo/backup/db/<backup-id>/pg_data/postgresql.conf.gz which contains the management subsystem name and siteName. For example:
      # Do not edit this file manually!
      # It will be overwritten by Patroni!
      include 'postgresql.base.conf'
      
      archive_command = 'source /opt/cpm/bin/pgbackrest/pgbackrest-set-env.sh && pgbackrest archive-push "%p"'
      archive_mode = 'True'
      archive_timeout = '60'
      autovacuum_vacuum_cost_limit = '1000'
      autovacuum_vacuum_scale_factor = '0.01'
      cluster_name = 'm1-f785a3e3-postgres'

    In this example:

    • cluster_name has both the management subsystem name and siteName
    • m1 - Management subsystem name
    • f785a3e3 - site name
  3. From your new project directory, install the Management subsystem as per Deploying the Management subsystem
    Important:
    • The hostnames of the Endpoints cannot be changed, and must remain the same in the Management subsystem used for installation now as they were for the original installation.
    • The Management subsystem name must remain the same in the new installation as the original.
  4. Before you restore onto new management subsystem, complete the following steps:
    1. SSH onto your Management appliance and remove the Management subsystem if it is already present.
      kubectl get managementcluster
      NAME             READY   STATUS    VERSION    RECONCILED VERSION   AGE
      management       16/16   Running   10.0.1.1   10.0.1.1-706         5h28m
      
      kubectl delete managementcluster management
      managementcluster.management.apiconnect.ibm.com "management" deleted
    2. Apply the YAML file that contains the Management Database Encryption Secret into the cluster. For example, where encryption-bin-secret.yaml is the local YAML file containing the backup-up encryption secret:
      kubectl create -f encryption-bin-secret.yaml -n <namespace>.

      Replace <namespace> with the namespace being used for the management subsystem installation.

      This command re-creates the original Management Database encryption secret on the cluster. It will be named as the original name of the secret.

    3. For each of the saved YAML Files that contain the Management Client Application Credential Secrets, apply each file into the cluster using the following command:
      kubectl create -f <secret_name>.yaml -n <namespace>

      where <secret_name> is the local YAML file containing one of the backed-up Credential Secrets.

      Repeat this for each of the backed-up Credential Secrets. These are the secrets you saved in Step 2.b of Preparing the management subsystem for disaster recovery using SFTP backups.

      These commands will re-create the original Management Client Application Credential Secrets on the cluster. Each will be named as the original name of the Secret.

    4. In your project directory, create the management-extra-values.yaml file for the Management subsystem.
      • Add the encryptionSecret subsection to the spec, which is the name of the newly created secret on the cluster containing the original Management Database Encryption Secret from the previous step.
      • Add the siteName property to the spec, where in this example, 82b290a2 is the original siteName that was noted after the installation of the original Management Subsystem
      • Add the customApplicationCredentials subsection to the spec. For each named credential above, the secretName is given as the corresponding name of the newly created Secret from Step 4.c above for each of the above Credential Secrets.
      spec:
        customApplicationCredentials:
        - name: atm-cred
          secretName: management-atm-cred
        - name: ccli-cred
          secretName: management-ccli-cred
        - name: cli-cred
          secretName: management-cli-cred
        - name: cui-cred
          secretName: management-cui-cred
        - name: dsgr-cred
          secretName: management-dsgr-cred
        - name: juhu-cred
          secretName: management-juhu-cred
        - name: ui-cred
          secretName: management-ui-cred
        encryptionSecret:
          secretName: management-enc-key
        siteName: 82b290a2
    5. Set the management-extra-values file in your Management subsystem
      apicup subsys set [SUBSYS_NAME] extra-values-file management-extra-values.yaml
  5. Prepare your backup configuration before fresh install.

    The following settings can be configured via apicup subsys set [SUBSYS_NAME] [setting]=[value]:

    database-backup-auth-pass
    database-backup-auth-user
    database-backup-host
    database-backup-path
    database-backup-port          22              (default) 
    database-backup-protocol      sftp            (default) 
    database-backup-retries       0               (default)
    database-backup-schedule      0 0 * * *       (default)
    You can view all the management subsystem settings with the command:
    apicup subsys get[SUBSYS_NAME] --validate

    To review backup configuration, see Configuring backup settings during initial installation of the management subsystem (10.0.1.1-eus or greater).

  6. Install the Management subsystem using apicup with the flag --skip-health-check:
    apicup subsys install [SUBSYS_NAME] --skip-health-check
  7. Once your Management subsystem is ready, confirm the backup ID noted in Step 1 is present on the sftp server.
  8. After a few moments, confirm there is a ManagementBackup of type record and it's backup ID matches with the backup ID noted in Step 1.
    You can list the management backups using:
    apicup subsys list-backups [SUBSYS_NAME]

    For example:

    NAME                STATUS   ID                 CLUSTER                        SUBSYSTEM   TYPE   CR TYPE   AGE
    mgmt-backup-8hqqg   Ready    20200826-161301F   management-82b290a2-postgres   management  full   record    40s
    
  9. Perform a Management Subsystem Restore using the name of the backup CR that has the ID you want to restore.

    For example, for ID 20200826-161301F the backup CR name is mgmt-backup-8hqqg.

    For instructions on how to restore, see Restoring the management subsystem.

    Once the Management Restore has completed and the database is running again, the data of the old Management subsystem will be successfully restored onto the new Management subsystem. Manual and scheduled backups should perform as normal once again.

What to do next

You should now complete the recovery steps for the Developer Portal subsystem on VMware, see Recovering the Developer Portal subsystem on VMware.