Recovering from a disaster on Cloud Pak for Integration

Install and configure API Connect, then restore the backed up data to recover the deployment after a disaster.

Before you begin

To successfully recover API Connect from a disaster, you must have previously completed the steps to prepare the Management, Portal, and Analytics subsystems for disaster recovery. Make sure that you can access the backups and deployment information because you cannot recover your deployment without it.

Attention: Successful disaster recovery depends on recovery of both the Management subsystem and the Portal subsystem. You must complete preparation steps for both subsystems in order to achieve disaster recovery. If you have to perform a restore, you must complete the restoration of the Management subsystem first, and then immediately restore the Portal subsystem.

About this task

To recover from a disaster event, recreate your API Connect Cluster CR and add recovery details as explained in the following procedure. Use the file containing the original CR (which you saved during the preparation task) as a reference.

Note: Your recovery installation of Cloud Pak for Integration must include the Platform Navigator if your original deployment used it. In addition, the CR must use the same settings as in the original deployment.

Procedure

  1. Prepare your Cloud Pak for Integration environment for deploying API Connect as explained in the appropriate version of the Cloud Pak for Integration documentation.

    After you prepare your environment, return to this procedure to configure the installation CR, deploy API Connect, and then restore data to the API Connect subsystems.

    Important: Do not install API Connect until you have configured the CR as explained in following steps.
  2. Apply the Management and Portal encryption secrets to the cluster where you will install API Connect.
    1. Apply the YAML file that contains the Management database encryption secret into the cluster.
      Run the following command, using the filename of the encryption secret that you saved when you prepared the Management subsystem for disaster recovery:
      oc apply -f mgmt-enc-key.yaml
    2. Apply the YAML file that contains the Portal encryption secret into the cluster.
      Run the following command, using the filename of the encryption secret that you saved when you prepared the Portal subsystem for disaster recovery:
      oc apply -f portal-enc-key.yaml
  3. Apply the YAML file that contains the CP4I credentials secret into the cluster.

    Run the following command, using the filename of the secret that you saved when you prepared the Management subsystem for disaster recovery:

    oc apply -f <cp4i_creds_secret>.yaml -n <APIC_namespace>
  4. Set up the installation CR for deploying API Connect.

    In this step, you perform the initial steps for deploying API Connect in Cloud Pak for Integration, but you do not begin the actual installation until step 15. Instead, you will complete a series of steps to configure the CR with deployment settings and additional recovery settings.

    1. Log in to the IBM Cloud Pak for Integration Platform Navigator.
    2. On the home page, click Create capability.
    3. Select the API Connect tile and click Next.
    4. On the Create an instance of API Connect cluster page, select the deployment type and click Next.

      Choose the same deployment type that you used when you originally deployed API Connect.

    5. On the deployment settings page, click the YAML tab to edit the installation CR in YAML format.
    6. Copy the content of the saved CR into the YAML tab, replacing the default CR.

      Keep the YAML tab open. In the following steps, you will configure additional deployment and recovery settings in the new API Connect Cluster installation CR before you install API Connect.

  5. Important: Make sure that your new version of the API Connect Cluster installation CR uses the same value for the name setting as the original CR.
  6. Verify the backup and restore settings for each subsystem.
  7. Generate the Kubernetes secret for each subsystem's backups by running the appropriate command:

    For each secret, use the same name that you used in the original deployment (you can see the name in the credentials setting of each subsystem's backup section).

    • Management subsystem:
      • S3:
        oc create secret generic mgmt-backup-secret --from-literal=username='<Your_access_key-or-user_name>' 
        --from-literal=password='<Your_access_key_secret-or-password>' -n <APIC_namespace>
      • SFTP with username and password:
        oc create secret generic mgmt-backup-secret --from-literal=username='<Your_user_name>' 
        --from-literal=password='<Your_password>' -n <APIC_namespace>
      • SFTP with username and SSH-key
        oc create secret generic mgmt-backup-secret --from-literal=username='<Your_user_name>'
        --from-file=ssh-privatekey='<Your_private_key_file>' -n <APIC_namespace>
    • Portal subsystem:
      • S3:
        oc create secret generic portal-backup-secret --from-literal=username='<Your_access_key-or-user_name>' 
        --from-literal=password='<Your_access_key_secret-or-password>' -n <APIC_namespace>
      • SFTP with username and password:
        oc create secret generic portal-backup-secret --from-literal=username='<Your_user_name>' 
        --from-literal=password='<Your_password>' -n <APIC_namespace>
      • SFTP with username and SSH-key
        oc create secret generic portal-backup-secret --from-literal=username='<Your_user_name>'
        --from-file=ssh-privatekey='<Your_private_key_file>' -n <APIC_namespace>
    • Analytics:
      oc create secret generic analytics-backup-secret --from-literal=username='<Your_Access_Key>' --from-literal=password='<Your_access_key_secret>' -n <APIC_namespace>
  8. Add the saved Management and Portal encryption secrets to the installation CR.

    Add each secret to the appropriate spec.subsystem section of the CR, as shown in the following example:

    spec:
      management:
        encryptionSecret:
          secret_name: mgmt-enc-key
      portal:
        encryptionSecret:
          secret_name: portal-enc-key
  9. Apply each of the Management client application credential secrets to the cluster.
    Run the following command to apply each secret, using the filename of each client application credential secret that you saved when you prepared the Management subsystem for disaster recovery:
    oc apply -f <secret_name>.yaml

    For example, a deployment might use the following names for the secrets:

    atmCredentialSecret: management-atm-cred
    consumerToolkitCredentialSecret: management-ccli-cred
    consumerUICredentialSecret: management-cui-cred
    designerCredentialSecret: management-dsgr-cred
    juhuCredentialSecret: management-juhu-cred
    toolkitCredentialSecret: management-cli-cred
    uiCredentialSecret: management-ui-cred
  10. Add each of the credential secrets to the spec.management section of installation CR.

    For example:

    spec:
      management:
        customApplicationCredentials:
        - name: atm-cred
          secretName: management-atm-cred
        - name: ccli-cred
          secretName: management-ccli-cred
        - name: cui-cred
          secretName: management-cui-cred
        - name: dsgr-cred
          secretName: management-dsgr-cred
        - name: juhu-cred
          secretName: management-juhu-cred
        - name: cli-cred
          secretName: management-cli-cred
        - name: ui-cred
          secretName: management-ui-cred
     
  11. Add the siteName property to the spec.management and spec.portal sections of the installation CR.

    For example, if the a2a5e6e2 is the original siteName of the Management subsystem, and 890772e3 is the original siteName of the Portal subsystem, the CR looks like the following example:

    spec:
      management:
        siteName: "a2a5e6e2"
      portal:
        siteName: "890772e3"
  12. If your original CR name was 10 characters or longer, you must add the originalUID or metadata.uid value from the original Management and Portal subsystems into the appropriate sections of the installation CR.
    Important: If your original CR name was 10 characters or longer and you do not add this value, the deployments and routes that are generated during installation will include a different UID, which will cause the recovery process to fail.

    The IDs were created with the original deployment of the Management and Portal subsystems, and the recovered subsystems must continue to use the same IDs. You can locate the ID in the Management (apic-cluster-name-mgt.yaml) or Portal (apic-cluster-name-ptl.yaml) CR that you saved while preparing for disaster recovery. (Do not use the metadata.uid setting from the API Connect Cluster CR).

    The CR setting that contains the ID depends on the version of API Connect that you are recovering:

    • Version 10.0.1.4-ifix1-eus or later: spec.originalUID

    • Version 10.0.1.2-ifix2-eus or earlier: metadata.uid

    Locate each of the original settings in the appropriate subsystem CR, and copy them to corresponding sections of the new API Connect Cluster CR using the same key and value. For example, if you originally deployed version 10.0.1.2-ifix2-eus, the setting name was uid. If you are deploying Version 10.0.1.5-eus as part of disaster recover, copy the original value and use it for the originalUID setting in the new CR. For example, in 10.0.1.5-eus:

    spec:
      management:
        originalUID: "fa0f6f49-b931-4472-b84d-0922a9a92dfd"
      portal:
        originalUID: "447ea4b3-9514-4a84-a34b-ce0b349838da"
  13. If you installed the API Connect using an S3 provider for the Management subsystem backups, add the following annotation to the apiconnectcluster CR.

    If you specified SFTP or local as the backup, skip this step.

    • Using the Platform UI:
      1. Edit the API Management instance and click the YAML tab to edit the CR.
      2. Add the following statement to the spec.metadata.annotations section:
        apiconnect-operator/deployment-mode: disasterRecovery

        For example:

        metadata:
          annotations:
            apiconnect-operator/deployment-mode: disasterRecovery
    • Using the CLI:
      1. Get the name of the CR by running the following command:
        oc get apiconnectcluster -o name -n <APIC_namespace>

        The response looks like the following example:

        apiconnectcluster.apiconnect.domain/instance_name
      2. Add the following annotation to the spec.metadata.annotations section of the CR by running the following command:
        oc annotate apiconnectcluster/instance_name apiconnect-operator/deployment-mode="disasterRecovery" -n <APIC_namespace>

        The response looks like the following example:

        apiconnectcluster.apiconnect.domain/instance_name annotated
  14. Verify that you correctly added all of the settings to the installation CR.
    For example, the completed CR might look like the following example:
    apiVersion: apiconnect.ibm.com/v1beta1
    kind: APIConnectCluster
    metadata:
      namespace: apiconnect
      name: apis-minimum
      labels:
        app.kubernetes.io/instance: apiconnect
        app.kubernetes.io/managed-by: ibm-apiconnect
        app.kubernetes.io/name: apiconnect-minimum
    spec:
      license:
        accept: true
        license: L-RJON-C2YLGB
        metric: VIRTUAL_PROCESSOR_CORE
        use: nonproduction
      management:
        originalUID: "fa0f6f49-b931-4472-b84d-0922a9a92dfd"
        encryptionSecret:
          secret_name: apis-minim-faaa44bf-enc-key
        siteName: "faaa44bf"
        customApplicationCredentials:
          - name: atm-cred
            secretName: apis-minim-faaa44bf-atm-cred
          - name: ccli-cred
            secretName: apis-minim-faaa44bf-ccli-cred
          - name: cli-cred
            secretName: apis-minim-faaa44bf-cli-cred
          - name: cui-cred
            secretName: apis-minim-faaa44bf-cui-cred
          - name: dsgr-cred
            secretName: apis-minim-faaa44bf-dsgr-cred
          - name: juhu-cred
            secretName: apis-minim-faaa44bf-juhu-cred
          - name: ui-cred
            secretName: apis-minim-faaa44bf-ui-cred
        analytics:
          client: {}
          ingestion: {}
        apiManagerEndpoint: {}
        cloudManagerEndpoint: {}
        consumerAPIEndpoint: {}
        platformAPIEndpoint: {}
        portal:
          admin: {}
        databaseBackup:
          path: ocp-dr-mgmt
          host: s3.eu-gb.cloud-object-storage.appdomain.cloud/eu-standard
          s3provider: ibm
          schedule: 0 1 * * *
          protocol: objstore
          credentials: mgmt-backup-secret
      analytics:
        storage:
          enabled: true
          type: unique
        databaseBackup:
          chunkSize: 1GB
          credentials: a7s-backup-secret
          host: s3.eu-gb.cloud-object-storage.appdomain.cloud/eu-standard
          path: ocp-dr-a7s
          schedule: 0 1 * * *
      storageClassName: rook-ceph-block
      profile: n1xc7.m48
      portal:
        adminClientSubjectDN: ""
        portalAdminEndpoint: {}
        portalUIEndpoint: {}
        originalUID: "447ea4b3-9514-4a84-a34b-ce0b349838da"
        encryptionSecret:
          secret_name: apis-minim-913edd20-enc-key
        siteName: "913edd20"
        portalBackup:
          credentials: portal-backup-secret
          host: s3.eu-gb.cloud-object-storage.appdomain.cloud/eu-standard
          path: ocp-dr-portal
          protocol: objstore
          schedule: 0 1 * * *
      version: 10.0.1.5-eus
    Important: If you are restoring your deployment in a different data center, the endpoints that are used in your original deployment must be the same in your new data center. The Platform UI in Cloud Pak for Integration autogenerates the endpoints if they are left empty in the CR, therefore ensure that you explicitly set the endpoints in the CR to match what was set in your original data center. For example:
    apiManagerEndpoint:
          annotations:
            cert-manager.io/issuer: prod-ingress-issuer
            haproxy.router.openshift.io/timeout: 240s
          hosts:
            - name: prod-api-manager.example.com
            secretName: prod-be787dd3-api-manager
  15. On the deployment page, install API Connect by clicking Create.
  16. Wait for all subsystems to be created.

    When the Management subsystem is installed, you might see backup job pods and stanza-create job pods in Error state; for example:

    m1-82b290a2-postgres-stanza-create-4zcgz                    0/1     Error       0          35m
    m1-82b290a2-postgres-full-sch-backup-2g9hm                  0/1     Error       0          20m

    This is expected behavior, for the following reasons:

    • The stanza-create job normally expects buckets or subdirectories within buckets to be empty. However, since you configured the Management subsystem with your already-populated S3 bucket (where your backups exist), the job will enter the Error state.
    • Any scheduled or manual backups will enter the Error state. Although you configured the Management subsystem with your already-populated S3 bucket, the new database isn't yet configured to write backups into remote storage.
    • The configurator job will fail because the CP4I credentials secret that you manually restored does not match the value in the Management database. As a result, the state of new cluster is not "Ready" and will show "6/7". This error will be resolved when you restore the Management subsystem from the backup that you prepared earlier.

    The errors will not prevent a successful restore, so continue to the next step.

  17. Restore the subsystems in the following sequence:
    Important: Make sure that you restore using the backup (locate the one with the correct backup name and ID) that you created when you prepared each subsystem for disaster recovery.
    1. Restore the Management subsystem.

      When you perform a restore, you must complete the restoration of the Management subsystem first. Verify that the restoration completed successfully and that the Management subsystem is Ready. When the Management subsystem is healthy, proceed to the next step and restore the Portal subsystem.

    2. Restore the Developer Portal subsystem.

      Use the restore type all to ensure that you restore the complete subsystem and all Portal sites.

    3. Restore the Analytics subsystem.
  18. Force the configurator to run again.

    The cluster will still not be ready after restore at this stage because the configurator has yet to successfully complete (refer to point 12). In order for the configurator to run again, delete the associated job so that a new pod will start running:

    1. Run the following command to get the list of jobs:
      oc get jobs -n <APIC_namespace>
    2. Run the following command to determine the name of your API Connect instance:
      oc get apiconnectcluster -n <APIC_namespace>
    3. Run the following command to delete the configurator job:
      oc -n <APIC_namespace> delete job <instance_name>-configurator
  19. Update the Management OIDC credentials as explained in Configuring the OIDC credentials on Cloud Pak for Integration.
  20. Verify that the recovery was successful:
    1. Ensure that you can log in to the Cloud Manager UI.
    2. Verify that your provider organizations exist.
    3. Ensure that you can log in to each Developer portal.
    4. Ensure that the Analytics dashboard contains all of the analytics data that you preserved.
  21. After the successful recovery, remove the annotation that you added to the apiconnectcluster CR in step 13.

    If you skipped step 13, then skip this step as well.

    • Using the Platform UI:
      1. Edit the API Management instance and click the YAML tab to edit the CR.
      2. Delete the following statement from the spec.metadata.annotations section:
        apiconnect-operator/deployment-mode: disasterRecovery
    • Using the CLI:
      1. Get the name of the CR by running the following command:
        oc get apiconnectcluster -o name -n <APIC_namespace>

        The response looks like the following example:

        apiconnectcluster.apiconnect.domain/instance_name
      2. Remove the annotation by running the following command, making sure to include the trailing "-" on deployment-mode- to indicate the removal:
        oc -n <APIC_namespace> annotate apiconnectcluster/instance_name apiconnect-operator/deployment-mode-

        The response looks like the following example:

        apiconnectcluster.apiconnect.domain/instance_name annotated