Upgrading to 10.0.1.8-eus online

Perform an online upgrade to API Connect 10.0.1.8-eus on OpenShift.

Before you begin

If you are upgrading an air-gapped (disconnected from the internet) installation, see Air-gapped upgrade to 10.0.1.8-eus.
Review the supported upgrade paths and upgrade requirements in Upgrade considerations on OpenShift.
If you plan to upgrade to the latest version of 10.0.1.x-eus, your API Connect deployment must be upgraded to 10.0.1.7-eus or 10.0.1.8-eus first. If your deployment is already at 10.0.1.7-eus, you can skip this task and proceed directly to Upgrading to the latest 10.0.1.x-eus online.

Restriction: Cloud Pak for Integration 2020.4 is now End of Support and the API Management component cannot be upgraded to a version later than API Connect 10.0.1.7-eus.
The Gateway subsystem remains available during the upgrade of the Management, Portal, and Analytics subsystems.

About this task

While upgrading from a version prior to 10.0.1.7-eus, keep the following considerations in mind:

You must upgrade the API Connect deployment before upgrading OpenShift from 4.6 to 4.10.
The step to upgrade OpenShift appears at the end of the upgrade procedure.
You must upgrade operators in the specified sequence to ensure that dependencies are satisfied. In addition, the Cloud Pak common services operator and the API Connect operator must be upgraded in tandem (as close in time as possible) to ensure success.
Upgrading the Cloud Pak common services operator can take as long as an hour, and the new certificate manager is not available until that upgrade is complete. After you upgrade operators, it's important to wait for the certificate manager update to complete before proceeding to the next step.

Procedure

Ensure that you have completed all of the steps in Preparing to upgrade on OpenShift.

Do not attempt an upgrade until you have reviewed the considerations and prepared your deployment.
Use the OCP Operator Hub to update the operator channels, which upgrades the operators.
When you update operators, the behavior depends on whether you enabled automatic or manual subscriptions for the operator channel:
- If you enabled Automatic subscriptions, the operator version will automatically upgrade to if needed.
- If you enabled Manual subscriptions, and if operator channel is already at the required version, then OpenShift UI (OLM) will notify you that an upgrade is available. You must manually approve the upgrade.
1. If your DataPower Operator channel is not at v1.2-eus, update it to v1.2-eus now, and then wait for the operator to update, for the pods to restart, and for a Ready status.
  Known issues:
  When upgrading the DataPower Operator, you might encounter the following known problems:
  - The DataPower Operator gets stuck after updating the channel from v1.1 to v1.2 (this can happen if OpenShift attaches an old install plan to the DataPower Operator). To work around this problem, delete the install plan that is attached to the subscription, delete the DataPower Operator, and then re-install the DataPower Operator.
  - Messages appear in the log on the datapower-operator pod indicating that the pod is waiting for the lock to be removed:
    {"level":"info","ts":"2021-03-08T19:29:53.432Z","logger":"leader","msg":"Not the leader. Waiting."} {"level":"info","ts":"2021-03-08T19:29:57.971Z","logger":"leader","msg":"Leader pod has been deleted, waiting for garbage collection to remove the lock."}
    If you see these messages, the DataPower Operator cannot be upgraded until you resolve the problem as explained in the DataPower Operator documentation.
2. If needed, update the Cloud Pak common services operator channel to v3, and update the operator to 3.19.
  
  The certificate manager is included with common services; when you upgrade common services, the certificate manager is also upgraded.
3. Update the API Connect operator channel to v2.1.8-eus.
  Known issues:
  - The certificate manager is upgraded in 10.0.1.8-eus, and you might encounter an upgrade error if the CRD for the new certificate manager is not found. For information on the errors messages that indicate this problem, and steps to resolve it, see Upgrade error when the CRD for the new certificate manager is not found in the Troubleshooting installation and upgrade on OpenShift topic.
  - A null value for the backrestStorageType property in the pgcluster CR causes an error during the operator upgrade from 10.0.1.6-ifix1-eus or earlier. For information on the errors messages that indicate this problem, and steps to resolve it, see Operator upgrade fails with error from API Connect operator and Postgres operator in the Troubleshooting installation and upgrade on OpenShift topic.
Verify that the cert-manager was upgraded by running the following command:
```
oc get csv -n ibm-common-services | grep ibm-cert-manager-operator
```
The response shows that ibm-cert-manager-operator.v3.21.x was deployed and the last column PHASE shows "Succeeded".

Attention: The Cloud Pak common services upgrade can take as long as an hour, and the new version of cert-manager will not be available until the upgrade is complete. Do not proceed until cert-manager upgrade is complete.
Check for certificate errors, and then recreate issuers and certificates if needed.

The upgrade from cert-manager 0.10.1 might cause some errors during the API Connect operator upgrade. Complete the following steps to check for certificate errors and correct them.
1. Check the new API Connect operator's log for an error similar to the following example:
```
{"level":"error","ts":1634966113.8442025,"logger":"controllers.AnalyticsCluster","msg":"Failed to set owner reference on certificate request","analyticscluster":"apic/<instance-name>-a7s","certificate":"<instance-name>-a7s-ca","error":"Object apic/<instance-name>-a7s-ca is already owned by another Certificate controller <instance-name>-a7s-ca",
```
  To correct this problem, delete all issuers and certificates generated with certmanager.k8s.io/v1alpha1. For certificates used by route objects, you must also delete the route and secret objects.
2. Run the following commands to delete the issuers and certificates that were generated with certmanager.k8s.io/v1alpha1:
```
oc delete issuers.certmanager.k8s.io <instance-name>-self-signed <instance-name>-ingress-issuer  <instance-name>-mgmt-ca <instance-name>-a7s-ca <instance-name>-ptl-ca
```
```
oc delete certs.certmanager.k8s.io <instance-name>-ingress-ca <instance-name>-mgmt-ca <instance-name>-ptl-ca <instance-name>-a7s-ca
```
  In the examples, <instance-name> is the instance name of the top-level apiconnectcluster.
  
  When you delete the issuers and certificates, the new certificate manager generates replacements; this might take a few minutes.
3. Verify that the new CA certs are refreshed and ready.
  Run the following command to verify the certificates:
```
oc get certs <instance-name>-ingress-ca <instance-name>-mgmt-ca <instance-name>-ptl-ca <instance-name>-a7s-ca
```
  The CA certs are ready when AGE is "new" and the READY column shows True.
4. Delete the remaining old certificates, routes, and secret objects corresponding to those routes.
  Run the following commands:
```
oc get certs.certmanager.k8s.io | awk '/<instance-name>/{print $1}'  | xargs oc delete certs.certmanager.k8s.io
```
```
oc delete certs.certmanager.k8s.io postgres-operator
```
```
oc get routes --no-headers -o custom-columns=":metadata.name" | grep ^<instance-name>- | xargs oc delete routes
```
  Note: The following command deletes the secrets for the routes. Do not delete any other secrets.
```
oc get routes --no-headers -o custom-columns=":metadata.name" | grep ^<instance-name>- | xargs oc delete secrets
```
5. Verify that no old issuers or certificates from your top-level instance remain.
  Run the following commands:
```
oc get issuers.certmanager.k8s.io | grep <instance-name>
```
```
oc get certs.certmanager.k8s.io | grep <instance-name>
```
  Both commands should report that no resources were found.
Use the latest version of apicops to validate the certificates.
1. Run the following command:
```
apicops upgrade:stale-certs -n <APIC_namespace>
```
2. Delete any stale certificates that are managed by cert-manager.
  If a certificate failed the validation and it is managed by cert-manager, you can delete the stale certificate secret, and let cert-manager regenerate it. Run the following command:
```
kubectl delete secret <stale-secret> -n <APIC_namespace>
```
3. Restart the corresponding so that it can pick up the new secret.
  To determine which pod to restart, see the following topics:
  - List of external certificates
  - List of internal certificates
For information on the apicops tool, see The API Connect operations tool: apicops.

Run the following command to delete the Postgres pods, which refreshes the new certificate:

oc get pod -n <namespace> --no-headers=true | grep postgres | grep -v backup | awk '{print $1}' | xargs oc delete pod -n <namespace>

Delete the portal-www, portal-db and portal-nginx pods to ensure they use the new secrets.
If you have the Developer Portal deployed, then the portal-www, portal-db, and portal-nginx pods might require deleting to ensure that they pick up the newly generated secrets when restarted. If the pods are not showing as "ready" in a timely manner, then delete all the pods at the same time (this will cause down time).

Run the following commands to get the name of the portal CR and delete the pods:
```
oc project <APIC_namespace>
```
```
oc get ptl
```
```
oc delete po -l app.kubernetes.io/instance=<name_of_portal_CR>
```
Renew the internal certificates for the analytics subsystem.

If you see analytics pods in the CrashLoopBackOff state, then renew the internal certificates for the analytics subsystem and force a restart of the pods.
1. Switch to the project/namespace where analytics is deployed and run the following command to get the name of the analytics CR (AnalyticsCluster):
```
oc project <APIC_namespace>
```
```
oc get a7s
```
  You need the CR name for the remaining steps.
2. Renew the internal certificates (CA, client, and server) by running the following commands:
```
oc get certificate <name_of_analytics_CR>
-ca -o=jsonpath='{.spec.secretName}' | xargs oc delete secret
```
```
oc get certificate <name_of_analytics_CR>
-client -o=jsonpath='{.spec.secretName}' | xargs oc delete secret
```
```
oc get certificate <name_of_analytics_CR>
-server -o=jsonpath='{.spec.secretName}' | xargs oc delete secret
```
3. Force a restart of all analytics pods by running the following command:
```
oc delete po -l app.kubernetes.io/instance=<name_of_analytics_CR>
```
Ensure that the operators and operands are healthy before proceeding.
- Operators: The OpenShift web console indicates that all operators are in Succeeded state without any warnings.
- Operands:
  - To verify whether operands are healthy, run the following command: oc get apic
    Check the status of the apiconnectcluster custom resource. The CR will not report as ready until you complete some additional steps in this procedure.
  - In Cloud Pak for Integration, wait until the API Connect capability shows READY (green check) in Platform Navigator.
    Known issue: Status toggles between Ready and Warning
    There is a known issue where the API Connect operator toggles the overall status of the API Connect deployment in Platform Navigator between Ready and Warning. Look at the full list of conditions and when Ready is True, you can proceed to the next step even if Warning is also true.
Update the operand version:
- OpenShift:
  1. Edit the top-level apiconnectcluster CR by running the following command:
```
oc -n <APIC_namespace> edit apiconnectcluster
```
  2. Change the version setting to 10.0.1.8-eus.
  3. In the spec.gateway section, delete the template override section, if it exists. You cannot run an upgrade if the CR contains an override.
  4. Save and close the CR.
- Cloud Pak for Integration:
  1. In Platform Navigator, click the Runtimes tab.
  2. Click at the end of the current row, and then click Change version.
  3. Click Select a new channel or version, and then select 10.0.1.8-eus in the Version field.
    Selecting the new channel ensures that both DataPower Gateway and API Connect are upgraded.
  4. Click Save to save your selections and start the upgrade.
    In the runtimes table, the Status column for the runtime displays the "Upgrading" message. The upgrade is complete when the Status is "Ready" and the Version displays the new version number.

Verify that the upgraded subsystems report as Running.

Run the following command:

oc get apic --all-namespaces

The Management, Analytics, and Portal subsystems should report as Running. The Gateway subsystem will not be running until you complete the next step to correct peering issues.

Example response:

NAME                                                READY   STATUS    VERSION              RECONCILED VERSION      AGE
analyticscluster.analytics.apiconnect.ibm.com/analytics      8/8     Running   10.0.1.8-eus   10.0.1.8-eus-1074   121m

NAME                                     PHASE     READY   SUMMARY                           VERSION    AGE
datapowerservice.datapower.ibm.com/gw1   Running   True    StatefulSet replicas ready: 1/1   10.0.1.8-eus   100m

NAME                                     PHASE     LAST EVENT   WORK PENDING   WORK IN-PROGRESS   AGE
datapowermonitor.datapower.ibm.com/gw1   Running                false          false              100m

NAME                                            READY   STATUS    VERSION              RECONCILED VERSION      AGE
gatewaycluster.gateway.apiconnect.ibm.com/gw1   2/2     Running   10.0.1.8-eus   10.0.1.8-eus-1074  100m

NAME                                                 READY   STATUS    VERSION              RECONCILED VERSION      AGE
managementcluster.management.apiconnect.ibm.com/m1   16/16   Running   10.0.1.8-eus   110.0.1.8-eus-1074   162m


NAME                                             READY   STATUS    VERSION              RECONCILED VERSION      AGE
portalcluster.portal.apiconnect.ibm.com/portal   3/3     Running   10.0.1.8-eus   10.0.1.8-eus-1074   139m

After the operand upgrade, scale the Gateways pods down, and back up, to correct peering issues caused by the ingress issuer change.

Scale down the Gateway firmware containers by editing the top API Connect CR and setting the replicaCount to 0.
- OpenShift:
  1. Run the following command to edit the CR:
```
oc -n <APIC_namespace> edit apiconnectcluster
```
  2. Set the replicaCount to 0 (you might have to add the setting):
```
...
spec:
  gateway:
    replicaCount: 0
...
```
- Cloud Pak for Integration:
  1. In Platform Navigator, edit the instance and enable the Advanced settings.
  2. In the Gateway subsystem section, set the Advance Replica count field to 0.
Wait for Gateway firmware pods to scale down and terminate.

Do not proceed until the pods have terminated.
Scale up the Gateway firmware containers back to the original value.
- OpenShift:
  1. Run the following command to edit the apiconnectcluster CR:
```
oc -n <APIC_namespace> edit apiconnectcluster
```
  2. Set the replicaCount to its original value, or remove the setting:
```
...
spec:
  gateway:
...
```
- Cloud Pak for Integration:
  1. In the Platform UI, edit the instance and enable the Advanced settings.
  2. In the Gateway subsystem section, set the Advance Replica count field to its original value, or clear the field.

Run the following command and verify that the all subsystems (including Gateway) now report the STATUS as Running and the

RECONCILED
VERSION

as 10.0.1.8-eus:

oc get apic --all-namespaces

For example:

NAME                                                      READY   STATUS    VERSION              RECONCILED VERSION      AGE
analyticscluster.analytics.apiconnect.ibm.com/analytics   8/8     Running   10.0.1.8-eus   10.0.1.8-eus-5352   121m

NAME                                     PHASE     READY   SUMMARY                           VERSION    AGE
datapowerservice.datapower.ibm.com/gw1   Running   True    StatefulSet replicas ready: 1/1   10.0.1.8-eus   100m

NAME                                     PHASE     LAST EVENT   WORK PENDING   WORK IN-PROGRESS   AGE
datapowermonitor.datapower.ibm.com/gw1   Running                false          false              100m

NAME                                            READY   STATUS    VERSION              RECONCILED VERSION      AGE
gatewaycluster.gateway.apiconnect.ibm.com/gw1   2/2     Running   10.0.1.8-euss   10.0.1.8-eus-5352  100m

NAME                                                 READY   STATUS    VERSION              RECONCILED VERSION      AGE
managementcluster.management.apiconnect.ibm.com/m1   16/16   Running   10.0.1.8-eus   10.0.1.8-eus-5352   162m


NAME                                             READY   STATUS    VERSION              RECONCILED VERSION      AGE
portalcluster.portal.apiconnect.ibm.com/portal   3/3     Running   10.0.1.8-eus   10.0.1.8-eus-5352   139m

If the Gateway pods appear to be out-of-sync with the Management subsystem after upgrading, see Gateway pods not in sync with Management after upgrade.

Upgrade the OpenShift cluster to OpenShift 4.10.

Upgrading OpenShift requires that you proceed through each minor release instead of upgrading directly to 4.10. For more information, see the Red Hat OpenShift documentation. In the "Documentation" banner, select the version of OpenShift that you want to upgrade to, and then expand the "Updating clusters" section in the navigation list.

What to do next

After the upgrade to 10.0.1.8-eus is complete, upgrade to the latest version.