Upgrading a 2DCDR deployment on Cloud Pak for Integration from V10.0.5

How to upgrade from an API Connect V10.0.5 two data center disaster recovery (DR) deployment on Cloud Pak for Integration.

To upgrade a Cloud Pak for Integration deployment of API Connect in a 2DCDR deployment, use the Cloud Pak for Integration upgrade instructions Upgrading on OpenShift and Cloud Pak for Integration, but with the extra considerations and steps documented in this topic.

Before you begin

Verify that API Connect is running in both data centers and that replication is working: Verifying replication between data centers.
Check that you have recent backups of management and portal subsystems: Backup and restore requirements for a 2DCDR deployment.
If you have a top-level CR deployment, and your portal and management warm-standby subsystems are not in the same data center, then complete an operational failover of either the management or portal subsystems so that they are both warm-standby in the same data center. See Failing over to the warm-standby.

Key points applicable to both management and portal subsystems

Your API Connect deployments must be upgraded to the same API Connect release, down to the interim fix level.
Both data centers must be upgraded in the same maintenance window.
Upgrade steps might result in an update to the ingress-ca X.509 certificate. Extra steps must be taken at various points in the upgrade process to ensure that the ingress-ca secret in both data centers contains the same ingress-ca X.509 certificate.

V10.0.5 to V10.0.8 2DCDR upgrade steps for Cloud Pak for Integration

In Cloud Pak for Integration, the apiconnectcluster CR (also known as the top-level CR) manages the management and portal subsystems. When the apiconnectcluster CR is updated with the new API Connect version, all the subsystems are updated automatically. Follow the upgrade process for Cloud Pak for Integration, review Upgrading on OpenShift and Cloud Pak for Integration, along with the following steps.

Verify that both data centers have the same 2DCDR mode for both management and portal subsystems. For example, in DC1, both management and portal subsystems are active, and in DC2, both are warm-standby. If necessary, complete a failover of one of the subsystems so that management and portal have the same mode in the same data centers: Failing over to the warm-standby.
Verify that both data centers have the same ingress-ca X.509 certificate in their <apic instance-name>-ingress-ca secret. Run the following command in both data centers and check that the output is the same:
```
openssl x509 -noout -fingerprint -sha256 -in <(oc get secret <apic instance-name>-ingress-ca -n <namespace> -o yaml | grep "^  tls.crt:" | awk '{print $2}' | base64 -d)
```
if you do not have the openssl command available, you can instead run only the oc part, which produces a larger output:
```
oc get secret <apic instance-name>-ingress-ca -n <namespace> -o yaml | grep "^  tls.crt:" | awk '{print $2}' | base64 -d
```
if the outputs are different, follow these steps to synchronize the certificates: Synchronizing the ingress-ca X.509 certificate across data centers.
Start the upgrade of your warm-standby data center.
Follow the upgrade documentation for the API Management capability in https://www.ibm.com/docs/en/cloud-paks/cp-integration. Stop at the point after the API Connect and DataPower operators are updated but before the operands are updated.
Repeat step 2 to verify that the ingress-ca X.509 certificates are still the same in both data centers.
Complete the upgrade of your warm-standby data center, updating the operands as described in the Cloud Pak for Integration documentation.

Check the state of your APIConnectCluster CR:

Check the cluster status:

oc get apiconnectcluster <apic CR name> -o json | jq -r '.status.conditions[] | select(.status=="True")'

returns:

{
  "lastTransitionTime": "2024-05-17T21:05:15Z",
  "message": "APIC instance being upgraded. Not all services are ready, pending services: management",
  "reason": "Upgrading",
  "status": "True",
  "type": "Pending"
}

Check the management subsystem status:

oc get mgmt <mgmt CR name> -o json | jq -r '.status.conditions[] | select(.status=="True")'

returns:

{
  "lastTransitionTime": "2024-05-21T16:05:34Z",
  "message": "HA status Error - see HAStatus in CR for details",
  "reason": "na",
  "status": "True",
  "type": "Blocked"
}

Check the management subsystem 2DCDR status:

oc get mgmt <mgmt CR name> -o json | jq -r '.status.haStatus[] | select(.status=="True")'

returns:

{
  "lastTransitionTime": "2024-05-21T16:05:34Z",
  "message": "Remote HAMode is Empty (Not received from peer). Expected it to be in either active or setup complete phase",
  "reason": "na",
  "status": "True",
  "type": "Error"
}

Repeat step 2 to verify that the ingress-ca X.509 certificates are still the same in both data centers.
Start the upgrade of your active data center by following the upgrade documentation for the API Management capability in https://www.ibm.com/docs/en/cloud-paks/cp-integration. Stop at the point after the API Connect and DataPower operators are updated but before the operands are updated.
Repeat step 2 to verify that the ingress-ca X.509 certificates are still the same in both data centers.
Complete the upgrade of the active data center, updating the operands as described in the Cloud Pak for Integration documentation.

If the management CR has successfully reconciled to the new version:

NAME              READY   STATUS    VERSION         RECONCILED VERSION   MESSAGE                                                               AGE
production-mgmt   6/6     Running   10.0.8.1-1087   10.0.8.1-1087        Management is ready. HA status Ready - see HAStatus in CR for details 46h

but the portal CR spec has not been updated to the new version:

NAME             READY   STATUS    VERSION         RECONCILED VERSION   MESSAGE                              AGE
production-ptl   7/7     Warning   10.0.5.7        10.0.5.7-300         Full file synchronization running    46h

Then you must manually set the spec.version property in the portal CR to the new version to proceed with the upgrade. This issue can occur due to incompatibilities in the file synchronization specification between the old and new portal web (www) pods.

2DCDR installations on Cloud Pak for Integration before V10.0.7 required the creation of a custom endpoint for the management subsystem user-facing certificates. The custom endpoint is not required from V10.0.7 and later, so you can update your management subsystem endpoints.
In both data centers, edit your top-level CR:
```
oc edit apiconnectcluster
```
Add the following endpoint sections:
- spec.management.cloudManagerEndpoint
- spec.management.apiManagerEndpoint
- spec.management.platformAPIEndpoint
- spec.management.consumerAPIEndpoint
For example:
```
    cloudManagerEndpoint:
      annotations:
        cert-manager.io/issuer: <APIConnectCluster CR name>-ingress-issuer
      hosts:
      - name: <cloudManagerEndpoint>
        secretName: <secret name>
```
where:
- <APIConnectCluster CR name> is the name of your APIConnectCluster CR. Run oc get apiconnectcluster to see this name.
- <cloudManagerEndpoint> is the new endpoint that you want to use for the Cloud Manager UI.
- <secret name> name for the secret that stores the TLS certificate for the endpoint. Set this value as follows according to the endpoint:
  - platformAPIEndpoint: <APIConnectCluster CR name>-mgmt-platform-api
  - consumerAPIEndpoint: <APIConnectCluster CR name>-mgmt-consumer-api
  - cloudManagerEndpoint: <APIConnectCluster CR name>-mgmt-admin
  - apiManagerEndpoint: <APIConnectCluster CR name>-mgmt-api-manager

How to upgrade when one data center is down

If API Connect is still running on the failed data center, follow the steps that are documented previously to upgrade both data centers before you bring the failed data center back online.

If the failed data center is expected to be down for a long time, you can convert the active data center to a stand-alone data center by following these steps: Removing a two data center deployment, but note the following points:

Ensure that the network links to the failed data center are removed.
Ensure that the failed data center is set to warm-standby in the multiSiteHA section. Do not proceed to the next step until the data center completes the transition to warm-standby. View the status of the management and portal CRs to confirm that HA Mode reports passive.
Remove the multiSiteHA section from failed data center, and verify that the failed data center resets itself to become an empty stand-alone API Connect deployment (all data is deleted).
Before you restore the network links between the data centers, do the following operations:
- Upgrade API Connect on the failed data center to the same version as the active.
- Add the multiSiteHA sections to both data centers, setting the failed data center to be warm-standby.
  Important: Do not set the failed data center to be active in the multiSiteHA section because it results in an overwrite of the data on your working data center with the empty database of your failed data center.