Recovering from a disaster on Cloud Pak for Integration

Install and configure API Connect, then restore the backed up data to recover the deployment after a disaster.

Before you begin

To successfully recover API Connect from a disaster, you must have previously completed the steps to prepare the Management, Portal, and Analytics subsystems for disaster recovery. Make sure that you can access the backups and deployment information because you cannot recover your deployment without it.

Attention: Successful disaster recovery depends on recovery of both the Management subsystem and the Portal subsystem. You must complete preparation steps for both subsystems in order to achieve disaster recovery. If you have to perform a restore, you must complete the restoration of the Management subsystem first, and then immediately restore the Portal subsystem.

About this task

To recover from a disaster event, recreate your API Connect Cluster CR and add recovery details as explained in the following procedure. Use the file containing the original CR (which you saved during the preparation task) as a reference.

Note: Your recovery installation of Cloud Pak for Integration must include the Platform Navigator if your original deployment used it. In addition, the CR must use the same settings as in the original deployment.

Procedure

Prepare your Cloud Pak for Integration environment for deploying API Connect as explained in the appropriate version of the Cloud Pak for Integration documentation.

After you prepare your environment, return to this procedure to configure the installation CR, deploy API Connect, and then restore data to the API Connect subsystems.

Important: Do not install API Connect until you have configured the CR as explained in following steps.
Apply the Management and Portal encryption secrets to the cluster where you will install API Connect.
1. Apply the YAML file that contains the Management database encryption secret into the cluster.
  Run the following command, using the filename of the encryption secret that you saved when you prepared the Management subsystem for disaster recovery:
```
oc apply -f mgmt-enc-key.yaml
```
2. Apply the YAML file that contains the Portal encryption secret into the cluster.
  Run the following command, using the filename of the encryption secret that you saved when you prepared the Portal subsystem for disaster recovery:
```
oc apply -f portal-enc-key.yaml
```
Apply the YAML file that contains the CP4I credentials secret into the cluster.
Run the following command, using the filename of the secret that you saved when you prepared the Management subsystem for disaster recovery:
```
oc apply -f <cp4i_creds_secret>.yaml -n <APIC_namespace>
```
Set up the installation CR for deploying API Connect.

In this step, you perform the initial steps for deploying API Connect in Cloud Pak for Integration, but you do not begin the actual installation until step 15. Instead, you will complete a series of steps to configure the CR with deployment settings and additional recovery settings.
1. Log in to the IBM Cloud Pak for Integration Platform Navigator.
2. On the home page, click Create capability.
3. Select the API Connect tile and click Next.
4. On the Create an instance of API Connect cluster page, select the deployment type and click Next.
  
  Choose the same deployment type that you used when you originally deployed API Connect.
5. On the deployment settings page, click the YAML tab to edit the installation CR in YAML format.
6. Copy the content of the saved CR into the YAML tab, replacing the default CR.
  
  Keep the YAML tab open. In the following steps, you will configure additional deployment and recovery settings in the new API Connect Cluster installation CR before you install API Connect.
Important: Make sure that your new version of the API Connect Cluster installation CR uses the same value for the name setting as the original CR.
Verify the backup and restore settings for each subsystem.
For more information on the backup settings, see the following topics for instructions on configuring backup and restore settings:

Generate the Kubernetes secret for each subsystem's backups by running the appropriate command:

For each secret, use the same name that you used in the original deployment (you can see the name in the credentials setting of each subsystem's backup section).

Management subsystem:

S3:

oc create secret generic mgmt-backup-secret --from-literal=username='<Your_access_key-or-user_name>' 
--from-literal=password='<Your_access_key_secret-or-password>' -n <APIC_namespace>

SFTP with username and password:

oc create secret generic mgmt-backup-secret --from-literal=username='<Your_user_name>' 
--from-literal=password='<Your_password>' -n <APIC_namespace>

SFTP with username and SSH-key

oc create secret generic mgmt-backup-secret --from-literal=username='<Your_user_name>'
--from-file=ssh-privatekey='<Your_private_key_file>' -n <APIC_namespace>

Portal subsystem:

S3:

oc create secret generic portal-backup-secret --from-literal=username='<Your_access_key-or-user_name>' 
--from-literal=password='<Your_access_key_secret-or-password>' -n <APIC_namespace>

SFTP with username and password:

oc create secret generic portal-backup-secret --from-literal=username='<Your_user_name>' 
--from-literal=password='<Your_password>' -n <APIC_namespace>

SFTP with username and SSH-key

oc create secret generic portal-backup-secret --from-literal=username='<Your_user_name>'
--from-file=ssh-privatekey='<Your_private_key_file>' -n <APIC_namespace>

Analytics:

oc create secret generic analytics-backup-secret --from-literal=username='<Your_Access_Key>' --from-literal=password='<Your_access_key_secret>' -n <APIC_namespace>

Add the saved Management and Portal encryption secrets to the installation CR.
Add each secret to the appropriate spec.subsystem section of the CR, as shown in the following example:
```
spec:
  management:
    encryptionSecret:
      secret_name: mgmt-enc-key
  portal:
    encryptionSecret:
      secret_name: portal-enc-key
```

Apply each of the Management client application credential secrets to the cluster.

Run the following command to apply each secret, using the filename of each client application credential secret that you saved when you prepared the Management subsystem for disaster recovery:

oc apply -f <secret_name>.yaml

For example, a deployment might use the following names for the secrets:

atmCredentialSecret: management-atm-cred
consumerToolkitCredentialSecret: management-ccli-cred
consumerUICredentialSecret: management-cui-cred
designerCredentialSecret: management-dsgr-cred
juhuCredentialSecret: management-juhu-cred
toolkitCredentialSecret: management-cli-cred
uiCredentialSecret: management-ui-cred

Add each of the credential secrets to the spec.management section of installation CR.

For example:

spec:
  management:
    customApplicationCredentials:
    - name: atm-cred
      secretName: management-atm-cred
    - name: ccli-cred
      secretName: management-ccli-cred
    - name: cui-cred
      secretName: management-cui-cred
    - name: dsgr-cred
      secretName: management-dsgr-cred
    - name: juhu-cred
      secretName: management-juhu-cred
    - name: cli-cred
      secretName: management-cli-cred
    - name: ui-cred
      secretName: management-ui-cred

Add the siteName property to the spec.management and spec.portal sections of the installation CR.
For example, if the a2a5e6e2 is the original siteName of the Management subsystem, and 890772e3 is the original siteName of the Portal subsystem, the CR looks like the following example:
```
spec:
  management:
    siteName: "a2a5e6e2"
  portal:
    siteName: "890772e3"
```
If your original CR name was 10 characters or longer, you must add the originalUID or metadata.uid value from the original Management and Portal subsystems into the appropriate sections of the installation CR.
Important: If your original CR name was 10 characters or longer and you do not add this value, the deployments and routes that are generated during installation will include a different UID, which will cause the recovery process to fail.

The IDs were created with the original deployment of the Management and Portal subsystems, and the recovered subsystems must continue to use the same IDs. You can locate the ID in the Management (apic-cluster-name-mgt.yaml) or Portal (apic-cluster-name-ptl.yaml) CR that you saved while preparing for disaster recovery. (Do not use the metadata.uid setting from the API Connect Cluster CR).

The CR setting that contains the ID depends on the version of API Connect that you are recovering:
- Version 10.0.1.4-ifix1-eus or later: spec.originalUID
- Version 10.0.1.2-ifix2-eus or earlier: metadata.uid
Locate each of the original settings in the appropriate subsystem CR, and copy them to corresponding sections of the new API Connect Cluster CR using the same key and value. For example, if you originally deployed version 10.0.1.2-ifix2-eus, the setting name was uid. If you are deploying Version 10.0.1.5-eus as part of disaster recover, copy the original value and use it for the originalUID setting in the new CR. For example, in 10.0.1.5-eus:
```
spec:
  management:
    originalUID: "fa0f6f49-b931-4472-b84d-0922a9a92dfd"
  portal:
    originalUID: "447ea4b3-9514-4a84-a34b-ce0b349838da"
```
If you installed the API Connect using an S3 provider for the Management subsystem backups, add the following annotation to the apiconnectcluster CR.
If you specified SFTP or local as the backup, skip this step.
- Using the Platform UI:
  1. Edit the API Management instance and click the YAML tab to edit the CR.
  2. Add the following statement to the spec.metadata.annotations section:
```
apiconnect-operator/deployment-mode: disasterRecovery
```
    For example:
```
metadata:
  annotations:
    apiconnect-operator/deployment-mode: disasterRecovery
```
- Using the CLI:
  1. Get the name of the CR by running the following command:
```
oc get apiconnectcluster -o name -n <APIC_namespace>
```
    The response looks like the following example:
```
apiconnectcluster.apiconnect.domain/instance_name
```
  2. Add the following annotation to the spec.metadata.annotations section of the CR by running the following command:
```
oc annotate apiconnectcluster/instance_name apiconnect-operator/deployment-mode="disasterRecovery" -n <APIC_namespace>
```
    The response looks like the following example:
```
apiconnectcluster.apiconnect.domain/instance_name annotated
```

Verify that you correctly added all of the settings to the installation CR.

For example, the completed CR might look like the following example:

apiVersion: apiconnect.ibm.com/v1beta1
kind: APIConnectCluster
metadata:
  namespace: apiconnect
  name: apis-minimum
  labels:
    app.kubernetes.io/instance: apiconnect
    app.kubernetes.io/managed-by: ibm-apiconnect
    app.kubernetes.io/name: apiconnect-minimum
spec:
  license:
    accept: true
    license: L-RJON-C2YLGB
    metric: VIRTUAL_PROCESSOR_CORE
    use: nonproduction
  management:
    originalUID: "fa0f6f49-b931-4472-b84d-0922a9a92dfd"
    encryptionSecret:
      secret_name: apis-minim-faaa44bf-enc-key
    siteName: "faaa44bf"
    customApplicationCredentials:
      - name: atm-cred
        secretName: apis-minim-faaa44bf-atm-cred
      - name: ccli-cred
        secretName: apis-minim-faaa44bf-ccli-cred
      - name: cli-cred
        secretName: apis-minim-faaa44bf-cli-cred
      - name: cui-cred
        secretName: apis-minim-faaa44bf-cui-cred
      - name: dsgr-cred
        secretName: apis-minim-faaa44bf-dsgr-cred
      - name: juhu-cred
        secretName: apis-minim-faaa44bf-juhu-cred
      - name: ui-cred
        secretName: apis-minim-faaa44bf-ui-cred
    analytics:
      client: {}
      ingestion: {}
    apiManagerEndpoint: {}
    cloudManagerEndpoint: {}
    consumerAPIEndpoint: {}
    platformAPIEndpoint: {}
    portal:
      admin: {}
    databaseBackup:
      path: ocp-dr-mgmt
      host: s3.eu-gb.cloud-object-storage.appdomain.cloud/eu-standard
      s3provider: ibm
      schedule: 0 1 * * *
      protocol: objstore
      credentials: mgmt-backup-secret
  analytics:
    storage:
      enabled: true
      type: unique
    databaseBackup:
      chunkSize: 1GB
      credentials: a7s-backup-secret
      host: s3.eu-gb.cloud-object-storage.appdomain.cloud/eu-standard
      path: ocp-dr-a7s
      schedule: 0 1 * * *
  storageClassName: rook-ceph-block
  profile: n1xc7.m48
  portal:
    adminClientSubjectDN: ""
    portalAdminEndpoint: {}
    portalUIEndpoint: {}
    originalUID: "447ea4b3-9514-4a84-a34b-ce0b349838da"
    encryptionSecret:
      secret_name: apis-minim-913edd20-enc-key
    siteName: "913edd20"
    portalBackup:
      credentials: portal-backup-secret
      host: s3.eu-gb.cloud-object-storage.appdomain.cloud/eu-standard
      path: ocp-dr-portal
      protocol: objstore
      schedule: 0 1 * * *
  version: 10.0.1.5-eus

Important: If you are restoring your deployment in a different data center, the endpoints that are used in your original deployment must be the same in your new data center. The Platform UI in Cloud Pak for Integration autogenerates the endpoints if they are left empty in the CR, therefore ensure that you explicitly set the endpoints in the CR to match what was set in your original data center. For example:

apiManagerEndpoint:
      annotations:
        cert-manager.io/issuer: prod-ingress-issuer
        haproxy.router.openshift.io/timeout: 240s
      hosts:
        - name: prod-api-manager.example.com
        secretName: prod-be787dd3-api-manager

On the deployment page, install API Connect by clicking Create.
Wait for all subsystems to be created.
When the Management subsystem is installed, you might see backup job pods and stanza-create job pods in Error state; for example:
```
m1-82b290a2-postgres-stanza-create-4zcgz                    0/1     Error       0          35m
m1-82b290a2-postgres-full-sch-backup-2g9hm                  0/1     Error       0          20m
```
This is expected behavior, for the following reasons:
- The stanza-create job normally expects buckets or subdirectories within buckets to be empty. However, since you configured the Management subsystem with your already-populated S3 bucket (where your backups exist), the job will enter the Error state.
- Any scheduled or manual backups will enter the Error state. Although you configured the Management subsystem with your already-populated S3 bucket, the new database isn't yet configured to write backups into remote storage.
- The configurator job will fail because the CP4I credentials secret that you manually restored does not match the value in the Management database. As a result, the state of new cluster is not "Ready" and will show "6/7". This error will be resolved when you restore the Management subsystem from the backup that you prepared earlier.
The errors will not prevent a successful restore, so continue to the next step.
Restore the subsystems in the following sequence:
Important: Make sure that you restore using the backup (locate the one with the correct backup name and ID) that you created when you prepared each subsystem for disaster recovery.
1. Restore the Management subsystem.
  When you perform a restore, you must complete the restoration of the Management subsystem first. Verify that the restoration completed successfully and that the Management subsystem is Ready. When the Management subsystem is healthy, proceed to the next step and restore the Portal subsystem.
2. Restore the Developer Portal subsystem.
  Use the restore type all to ensure that you restore the complete subsystem and all Portal sites.
3. Restore the Analytics subsystem.
Force the configurator to run again.

The cluster will still not be ready after restore at this stage because the configurator has yet to successfully complete (refer to point 12). In order for the configurator to run again, delete the associated job so that a new pod will start running:
1. Run the following command to get the list of jobs:
```
oc get jobs -n <APIC_namespace>
```
2. Run the following command to determine the name of your API Connect instance:
```
oc get apiconnectcluster -n <APIC_namespace>
```
3. Run the following command to delete the configurator job:
```
oc -n <APIC_namespace> delete job <instance_name>-configurator
```
Update the Management OIDC credentials as explained in Configuring the OIDC credentials on Cloud Pak for Integration.
Verify that the recovery was successful:
1. Ensure that you can log in to the Cloud Manager UI.
2. Verify that your provider organizations exist.
3. Ensure that you can log in to each Developer portal.
4. Ensure that the Analytics dashboard contains all of the analytics data that you preserved.
After the successful recovery, remove the annotation that you added to the apiconnectcluster CR in step 13.
If you skipped step 13, then skip this step as well.
- Using the Platform UI:
  1. Edit the API Management instance and click the YAML tab to edit the CR.
  2. Delete the following statement from the spec.metadata.annotations section:
```
apiconnect-operator/deployment-mode: disasterRecovery
```
- Using the CLI:
  1. Get the name of the CR by running the following command:
```
oc get apiconnectcluster -o name -n <APIC_namespace>
```
    The response looks like the following example:
```
apiconnectcluster.apiconnect.domain/instance_name
```
  2. Remove the annotation by running the following command, making sure to include the trailing "-" on deployment-mode- to indicate the removal:
```
oc -n <APIC_namespace> annotate apiconnectcluster/instance_name apiconnect-operator/deployment-mode-
```
    The response looks like the following example:
```
apiconnectcluster.apiconnect.domain/instance_name annotated
```