Recovering from a disaster on Cloud Pak for Integration
Install and configure API Connect, then restore the backed up data to recover the deployment after a disaster.
Before you begin
To successfully recover API Connect from a disaster, you must have previously completed the steps to prepare the Management, Portal, and Analytics subsystems for disaster recovery. Make sure that you can access the backups and deployment information because you cannot recover your deployment without it.
About this task
To recover from a disaster event, recreate your API Connect Cluster CR and add recovery details as explained in the following procedure. Use the file containing the original CR (which you saved during the preparation task) as a reference.
Procedure
-
Prepare your Cloud Pak for Integration environment for deploying API Connect as explained in
the appropriate version of the Cloud Pak for Integration documentation.
After you prepare your environment, return to this procedure to configure the installation CR, deploy API Connect, and then restore data to the API Connect subsystems.
Important: Do not install API Connect until you have configured the CR as explained in following steps. -
Apply the Management and Portal encryption secrets to the cluster where you will install API
Connect.
-
Apply the YAML file that contains the Management database encryption secret into the
cluster.
Run the following command, using the filename of the encryption secret that you saved when you prepared the Management subsystem for disaster recovery:
oc apply -f mgmt-enc-key.yaml
-
Apply the YAML file that contains the Portal encryption secret into the cluster.
Run the following command, using the filename of the encryption secret that you saved when you prepared the Portal subsystem for disaster recovery:
oc apply -f portal-enc-key.yaml
-
Apply the YAML file that contains the Management database encryption secret into the
cluster.
-
Apply the YAML file that contains the CP4I credentials secret into the cluster.
Run the following command, using the filename of the secret that you saved when you prepared the Management subsystem for disaster recovery:
oc apply -f <cp4i_creds_secret>.yaml -n <APIC_namespace>
-
Set up the installation CR for deploying API Connect.
In this step, you perform the initial steps for deploying API Connect in Cloud Pak for Integration, but you do not begin the actual installation until step 15. Instead, you will complete a series of steps to configure the CR with deployment settings and additional recovery settings.
- Log in to the IBM Cloud Pak Platform UI.
- On the home page, click Create capability.
- Select the API Connect tile and click Next.
-
On the Create an instance of API Connect cluster page, select the
deployment type and click Next.
Choose the same deployment type that you used when you originally deployed API Connect.
- On the deployment settings page, click the YAML tab to edit the installation CR in YAML format.
-
Copy the content of the saved CR into the YAML tab, replacing the
default CR.
Keep the YAML tab open. In the following steps, you will configure additional deployment and recovery settings in the new API Connect Cluster installation CR before you install API Connect.
-
Important: Make sure that your new version of the API Connect Cluster installation CR uses the
same value for the
name
setting as the original CR. -
Verify the backup and restore settings for each subsystem.
Remember to create the backup secret for each subsystem as well, using the same name that you used in the original deployment (you can see the name in the
credentials
setting of each subsystem's backup section). The backup and restore settings, including the secrets, must exactly match the settings that you created in the original deployment. See the following topics for instructions on configuring backup and restore settings:- Configuring backup settings for a fresh install of the Management subsystem on OpenShift or Cloud Pak for Integration
- Configuring backups for Developer Portal on OpenShift and Cloud Pak for Integration
- Configuring backup settings for Analytics on OpenShift and Cloud Pak for Integration
Note:During the disaster recovery process, the S3 configuration detail of the older management system is used, but the older management system must be in offline mode. The old subsystem must be offline because you cannot have two management systems simultaneously using the same s3 bucket name in the database backup configurations.
-
Generate the Kubernetes secret for each subsystem's backups by running the appropriate
command:
For each secret, use the same name that you used in the original deployment (you can see the name in the
credentials
setting of each subsystem's backup section).- Management subsystem:
- S3:
oc create secret generic mgmt-backup-secret --from-literal=username='<Your_access_key-or-user_name>' --from-literal=password='<Your_access_key_secret-or-password>' -n <APIC_namespace>
- SFTP with username and
password:
oc create secret generic mgmt-backup-secret --from-literal=username='<Your_user_name>' --from-literal=password='<Your_password>' -n <APIC_namespace>
- SFTP with username and
SSH-key
oc create secret generic mgmt-backup-secret --from-literal=username='<Your_user_name>' --from-file=ssh-privatekey='<Your_private_key_file>' -n <APIC_namespace>
- S3:
- Portal subsystem:
- S3:
oc create secret generic portal-backup-secret --from-literal=username='<Your_access_key-or-user_name>' --from-literal=password='<Your_access_key_secret-or-password>' -n <APIC_namespace>
- SFTP with username and
password:
oc create secret generic portal-backup-secret --from-literal=username='<Your_user_name>' --from-literal=password='<Your_password>' -n <APIC_namespace>
- SFTP with username and
SSH-key
oc create secret generic portal-backup-secret --from-literal=username='<Your_user_name>' --from-file=ssh-privatekey='<Your_private_key_file>' -n <APIC_namespace>
- S3:
- Analytics:
oc create secret generic analytics-backup-secret --from-literal=access_key='<Your_Access_Key>' --from-literal=secret_key='<Your_access_key_secret>' -n <APIC_namespace>
- Management subsystem:
-
Add the saved Management and Portal encryption secrets to the installation CR.
Add each secret to the appropriate
spec.subsystem
section of the CR, as shown in the following example:spec: management: encryptionSecret: secretName: mgmt-enc-key portal: encryptionSecret: secretName: portal-enc-key
-
Apply each of the Management client application credential secrets to the cluster.
Run the following command to apply each secret, using the filename of each client application credential secret that you saved when you prepared the Management subsystem for disaster recovery:
oc apply -f <secretName>.yaml
For example, a deployment might use the following names for the secrets:
atmCredentialSecret: management-atm-cred consumerToolkitCredentialSecret: management-ccli-cred consumerUICredentialSecret: management-cui-cred designerCredentialSecret: management-dsgr-cred governanceCredentialSecret: management-governance-cred juhuCredentialSecret: management-juhu-cred toolkitCredentialSecret: management-cli-cred uiCredentialSecret: management-ui-cred
-
Add each of the credential secrets to the
spec.management
section of installation CR.For example:
spec: management: customApplicationCredentials: - name: atm-cred secretName: management-atm-cred - name: ccli-cred secretName: management-ccli-cred - name: cui-cred secretName: management-cui-cred - name: dsgr-cred secretName: management-dsgr-cred - name: governance-cred secretName: management-governance-cred - name: juhu-cred secretName: management-juhu-cred - name: cli-cred secretName: management-cli-cred - name: ui-cred secretName: management-ui-cred
-
Add the
siteName
property to thespec.management
andspec.portal
sections of the installation CR.For example, if the
a2a5e6e2
is the originalsiteName
of the Management subsystem, and890772e3
is the originalsiteName
of the Portal subsystem, the CR looks like the following example:spec: management: siteName: "a2a5e6e2" portal: siteName: "890772e3"
-
If your original CR
name
was 10 characters or longer, you must add theoriginalUID
ormetadata.uid
value from the original Management and Portal subsystems into the appropriate sections of the installation CR.Important: If your original CR name was 10 characters or longer and you do not add this value, the deployments and routes that are generated during installation will include a different UID, which will cause the recovery process to fail.The IDs were created with the original deployment of the Management and Portal subsystems, and the recovered subsystems must continue to use the same IDs. You can locate the ID in the Management (apic-cluster-name-mgt.yaml) or Portal (apic-cluster-name-ptl.yaml) CR that you saved while preparing for disaster recovery. (Do not use the
metadata.uid
setting from the API Connect Cluster CR).The CR setting that contains the ID depends on the version of API Connect that you are recovering:
- Version 10.0.3 or later:
spec.originalUID
- Version 10.0.2 or earlier:
metadata.uid
Locate each of the original settings in the appropriate subsystem CR, and copy them to corresponding sections of the new API Connect Cluster CR using the same key and value.
For example, if you originally deployed version 10.0.2, the setting name was
uid
. If you are deploying Version 10.0.3 as part of disaster recover, copy the original value and use it for theoriginalUID
setting in the new CR. For example, in 10.0.3:spec: management: originalUID: "fa0f6f49-b931-4472-b84d-0922a9a92dfd" portal: originalUID: "447ea4b3-9514-4a84-a34b-ce0b349838da"
- Version 10.0.3 or later:
- If you installed the API Connect using an S3 provider for the Management
subsystem backups, add the following annotation to the
apiconnectcluster
CR.If you specified SFTP or local as the backup, skip this step.
- Using the Platform UI:
- Edit the API Management instance and click the YAML tab to edit the CR.
- Add the following statement to the
spec.metadata.annotations
section:apiconnect-operator/deployment-mode: disasterRecovery
For example:
metadata: annotations: apiconnect-operator/deployment-mode: disasterRecovery
- Using the CLI:
- Get the name of the CR by running the following
command:
oc get apiconnectcluster -o name -n <APIC_namespace>
The response looks like the following example:
apiconnectcluster.apiconnect.domain/instance_name
- Add the following annotation to the
spec.metadata.annotations
section of the CR by running the following command:oc annotate apiconnectcluster/instance_name apiconnect-operator/deployment-mode="disasterRecovery" -n <APIC_namespace>
The response looks like the following example:
apiconnectcluster.apiconnect.domain/instance_name annotated
- Get the name of the CR by running the following
command:
- Using the Platform UI:
-
Verify that you correctly added all of the settings to the installation CR.
For example, the completed CR might look like the following example:
apiVersion: apiconnect.ibm.com/v1beta1 kind: APIConnectCluster metadata: namespace: apiconnect name: apis-minimum labels: app.kubernetes.io/instance: apiconnect app.kubernetes.io/managed-by: ibm-apiconnect app.kubernetes.io/name: apiconnect-minimum spec: license: accept: true license: L-RJON-C2YLGB metric: VIRTUAL_PROCESSOR_CORE use: nonproduction management: originalUID: "fa0f6f49-b931-4472-b84d-0922a9a92dfd" encryptionSecret: secretName: apis-minim-faaa44bf-enc-key siteName: "faaa44bf" customApplicationCredentials: - name: atm-cred secretName: apis-minim-faaa44bf-atm-cred - name: ccli-cred secretName: apis-minim-faaa44bf-ccli-cred - name: cli-cred secretName: apis-minim-faaa44bf-cli-cred - name: cui-cred secretName: apis-minim-faaa44bf-cui-cred - name: dsgr-cred secretName: apis-minim-faaa44bf-dsgr-cred - name: governance-cred secretName: apis-minim-faaa44bf-governance-cred - name: juhu-cred secretName: apis-minim-faaa44bf-juhu-cred - name: ui-cred secretName: apis-minim-faaa44bf-ui-cred analytics: client: {} ingestion: {} apiManagerEndpoint: {} cloudManagerEndpoint: {} consumerAPIEndpoint: {} platformAPIEndpoint: {} portal: admin: {} databaseBackup: path: ocp-dr-mgmt host: s3.eu-gb.cloud-object-storage.appdomain.cloud/eu-standard s3provider: ibm schedule: 0 1 * * * protocol: objstore credentials: mgmt-backup-secret analytics: storage: enabled: true type: unique databaseBackup: chunkSize: 1GB credentials: a7s-backup-secret host: s3.eu-gb.cloud-object-storage.appdomain.cloud/eu-standard path: ocp-dr-a7s schedule: 0 1 * * * storageClassName: rook-ceph-block profile: n1xc7.m48 portal: adminClientSubjectDN: "" portalAdminEndpoint: {} portalUIEndpoint: {} originalUID: "447ea4b3-9514-4a84-a34b-ce0b349838da" encryptionSecret: secretName: apis-minim-913edd20-enc-key siteName: "913edd20" portalBackup: credentials: portal-backup-secret host: s3.eu-gb.cloud-object-storage.appdomain.cloud/eu-standard path: ocp-dr-portal protocol: objstore schedule: 0 1 * * * version: 10.0.3
Important: If you are restoring your deployment in a different data center, the endpoints that are used in your original deployment must be the same in your new data center. The Platform UI in Cloud Pak for Integration autogenerates the endpoints if they are left empty in the CR, therefore ensure that you explicitly set the endpoints in the CR to match what was set in your original data center. For example:apiManagerEndpoint: annotations: cert-manager.io/issuer: prod-ingress-issuer haproxy.router.openshift.io/timeout: 240s hosts: - name: prod-api-manager.example.com secretName: prod-be787dd3-api-manager
- On the deployment page, install API Connect by clicking Create.
-
Wait for all subsystems to be created.
When the Management subsystem is installed, you might see
backup
job pods andstanza-create
job pods inError
state; for example:m1-82b290a2-postgres-stanza-create-4zcgz 0/1 Error 0 35m m1-82b290a2-postgres-full-sch-backup-2g9hm 0/1 Error 0 20m
This is expected behavior, for the following reasons:
- The
stanza-create
job normally expects buckets or subdirectories within buckets to be empty. However, since you configured the Management subsystem with your already-populated S3 bucket (where your backups exist), the job will enter theError
state. - Any scheduled or manual backups will enter the
Error
state. Although you configured the Management subsystem with your already-populated S3 bucket, the new database isn't yet configured to write backups into remote storage. - The configurator job will fail because the CP4I credentials secret that you manually restored does not match the value in the Management database. As a result, the state of new cluster is not "Ready" and will show "6/7". This error will be resolved when you restore the Management subsystem from the backup that you prepared earlier.
The errors will not prevent a successful restore, so continue to the next step.
- The
-
Restore the subsystems in the following sequence:
Important: Make sure that you restore using the backup (locate the one with the correct backup name and ID) that you created when you prepared each subsystem for disaster recovery.
- Restore the Management
subsystem.
When you perform a restore, you must complete the restoration of the Management subsystem first. Verify that the restoration completed successfully and that the Management subsystem is Ready. When the Management subsystem is healthy, proceed to the next step and restore the Portal subsystem.
- Restore the Developer Portal
subsystem.
Use the restore type
all
to ensure that you restore the complete subsystem and all Portal sites. - Restore the Analytics subsystem.
- Restore the Management
subsystem.
-
Force the configurator to run again.
The cluster will still not be ready after restore at this stage because the configurator has yet to successfully complete (refer to point 12). In order for the configurator to run again, delete the associated job so that a new pod will start running:
-
Run the following command to get the list of jobs:
oc get jobs -n <APIC_namespace>
-
Run the following command to determine the name of your API Connect instance:
oc get apiconnectcluster -n <APIC_namespace>
-
Run the following command to delete the configurator job:
oc -n <APIC_namespace> delete job <instance_name>-configurator
-
Run the following command to get the list of jobs:
- Version 10.0.2 or earlier: Update the Management OIDC credentials as explained in Configuring the OIDC credentials on Cloud Pak for Integration.
-
Verify that the recovery was successful:
- Ensure that you can log in to the Cloud Manager UI.
- Verify that your provider organizations exist.
- Ensure that you can log in to each Developer portal.
- Ensure that the Analytics dashboard contains all of the analytics data that you preserved.
- After the successful recovery, remove the annotation that you added to
the
apiconnectcluster
CR in step 13.If you skipped step 13, then skip this step as well.
- Using the Platform UI:
- Edit the API Management instance and click the YAML tab to edit the CR.
- Delete the following statement from the
spec.metadata.annotations
section:apiconnect-operator/deployment-mode: disasterRecovery
- Using the CLI:
- Get the name of the CR by running the following
command:
oc get apiconnectcluster -o name -n <APIC_namespace>
The response looks like the following example:
apiconnectcluster.apiconnect.domain/instance_name
- Remove the annotation by running the following command, making sure to include the trailing "-"
on
deployment-mode-
to indicate the removal:oc -n <APIC_namespace> annotate apiconnectcluster/instance_name apiconnect-operator/deployment-mode-
The response looks like the following example:
apiconnectcluster.apiconnect.domain/instance_name annotated
- Get the name of the CR by running the following
command:
- Using the Platform UI: