Upgrading to 10.0.1.8-eus online
Perform an online upgrade to API Connect 10.0.1.8-eus on OpenShift or IBM Cloud Pak for Integration.
Before you begin
- If you are upgrading an air-gapped (disconnected from the internet) installation, see Air-gapped upgrade to 10.0.1.8-eus.
- Review the supported upgrade paths and upgrade requirements in Upgrade considerations on OpenShift and Cloud Pak for Integration.
If you plan to upgrade to the latest version of 10.0.1.x-eus, your API Connect deployment must be upgraded to 10.0.1.7-eus or 10.0.1.8-eus first. If your deployment is already at 10.0.1.7-eus, you can skip this task and proceed directly to Upgrading to the latest 10.0.1.x-eus online.
Restriction: Cloud Pak for Integration 2020.4 is now End of Support and the API Management component cannot be upgraded to a version later than API Connect 10.0.1.7-eus. - The Gateway subsystem remains available during the upgrade of the Management, Portal, and Analytics subsystems.
About this task
- You must upgrade the API Connect deployment before upgrading OpenShift from 4.6 to 4.10.
The step to upgrade OpenShift appears at the end of the upgrade procedure.
- You must upgrade operators in the specified sequence to ensure that dependencies are satisfied. In addition, the Cloud Pak common services operator and the API Connect operator must be upgraded in tandem (as close in time as possible) to ensure success.
- Upgrading the Cloud Pak common services operator can take as long as an hour, and the new certificate manager is not available until that upgrade is complete. After you upgrade operators, it's important to wait for the certificate manager update to complete before proceeding to the next step.
Procedure
-
Ensure that you have completed all of the steps in Preparing to upgrade on OpenShift and Cloud Pak for Integration.
Do not attempt an upgrade until you have reviewed the considerations and prepared your deployment.
-
Use the OCP Operator Hub to update the operator channels, which upgrades the operators.
When you update operators, the behavior depends on whether you enabled automatic or manual subscriptions for the operator channel:
- If you enabled Automatic subscriptions, the operator version will automatically upgrade to if needed.
- If you enabled Manual subscriptions, and if operator channel is already at the required version, then OpenShift UI (OLM) will notify you that an upgrade is available. You must manually approve the upgrade.
- If your DataPower Operator channel is not at v1.2-eus, update it to
v1.2-eus
now, and then wait for the operator to update, for the pods to restart, and for aReady
status.Known issues:When upgrading the DataPower Operator, you might encounter the following known problems:
- The DataPower Operator gets stuck after updating the channel from v1.1 to v1.2 (this can happen if OpenShift attaches an old install plan to the DataPower Operator). To work around this problem, delete the install plan that is attached to the subscription, delete the DataPower Operator, and then re-install the DataPower Operator.
- Messages appear in the log on the
datapower-operator
pod indicating that the pod is waiting for the lock to be removed:{"level":"info","ts":"2021-03-08T19:29:53.432Z","logger":"leader","msg":"Not the leader. Waiting."} {"level":"info","ts":"2021-03-08T19:29:57.971Z","logger":"leader","msg":"Leader pod has been deleted, waiting for garbage collection to remove the lock."}
If you see these messages, the DataPower Operator cannot be upgraded until you resolve the problem as explained in the DataPower Operator documentation.
- If needed, update the Cloud Pak common services operator channel to
v3
, and update the operator to3.19
. -
Update the API Connect operator channel to v3.1.
Known issues:
- The certificate manager was upgraded in 10.0.1.8-eus, and you might encounter an upgrade error if the CRD for the new certificate manager is not found. For information on the errors messages that indicate this problem, and steps to resolve it, see Upgrade error when the CRD for the new certificate manager is not found in the Troubleshooting installation and upgrade on OpenShift topic.
- A null value for the
backrestStorageType
property in the pgcluster CR causes an error during the operator upgrade from 10.0.1.6-ifix1-eus or earlier. For information on the errors messages that indicate this problem, and steps to resolve it, see Operator upgrade fails with error from API Connect operator and Postgres operator in the Troubleshooting installation and upgrade on OpenShift topic.
- Verify that the cert-manager was upgraded. Attention: The Cloud Pak common services upgrade can take as long as an hour, and the new version of cert-manager will not be available until the upgrade is complete. Do not proceed until cert-manager upgrade is complete.
- Check for certificate errors, and then recreate issuers and certificates if needed.
The upgrade from cert-manager 0.10.1 might cause some errors during the API Connect operator upgrade. Complete the following steps to check for certificate errors and correct them.
- Check the new API Connect operator's log for an error similar to the following
example:
{"level":"error","ts":1634966113.8442025,"logger":"controllers.AnalyticsCluster","msg":"Failed to set owner reference on certificate request","analyticscluster":"apic/<instance-name>-a7s","certificate":"<instance-name>-a7s-ca","error":"Object apic/<instance-name>-a7s-ca is already owned by another Certificate controller <instance-name>-a7s-ca",
To correct this problem, delete all issuers and certificates generated with
certmanager.k8s.io/v1alpha1
. For certificates used by route objects, you must also delete the route and secret objects. - Run the following commands to delete the issuers and certificates that were generated
with
certmanager.k8s.io/v1alpha1
:oc delete issuers.certmanager.k8s.io <instance-name>-self-signed <instance-name>-ingress-issuer <instance-name>-mgmt-ca <instance-name>-a7s-ca <instance-name>-ptl-ca
oc delete certs.certmanager.k8s.io <instance-name>-ingress-ca <instance-name>-mgmt-ca <instance-name>-ptl-ca <instance-name>-a7s-ca
In the examples,
<instance-name>
is the instance name of the top-levelapiconnectcluster
.When you delete the issuers and certificates, the new certificate manager generates replacements; this might take a few minutes.
- Verify that the new CA certs are refreshed and ready.
Run the following command to verify the certificates:
oc get certs <instance-name>-ingress-ca <instance-name>-mgmt-ca <instance-name>-ptl-ca <instance-name>-a7s-ca
The CA certs are ready when
AGE
is "new" and theREADY
column showsTrue
. - Delete the remaining old certificates, routes, and secret objects corresponding to
those routes.
Run the following commands:
oc get certs.certmanager.k8s.io | awk '/<instance-name>/{print $1}' | xargs oc delete certs.certmanager.k8s.io
oc delete certs.certmanager.k8s.io postgres-operator
oc get routes --no-headers -o custom-columns=":metadata.name" | grep ^<instance-name>- | xargs oc delete routes
Note: The following command deletes the secrets for the routes. Do not delete any other secrets.oc get routes --no-headers -o custom-columns=":metadata.name" | grep ^<instance-name>- | xargs oc delete secrets
- Verify that no old issuers or certificates from your top-level instance remain.
Run the following commands:
oc get issuers.certmanager.k8s.io | grep <instance-name>
oc get certs.certmanager.k8s.io | grep <instance-name>
Both commands should report that no resources were found.
- Check the new API Connect operator's log for an error similar to the following
example:
- Use the latest version of
apicops
to validate the certificates.- Run the following command:
apicops upgrade:stale-certs -n <APIC_namespace>
- Delete any stale certificates that are managed by cert-manager. If a certificate failed the validation and it is managed by cert-manager, you can delete the stale certificate secret, and let cert-manager regenerate it. Run the following command:
kubectl delete secret <stale-secret> -n <APIC_namespace>
- Restart the corresponding so that it can pick up the new secret. To determine which pod to restart, see the following topics:
For information on the
apicops
tool, see The API Connect operations tool: apicops. - Run the following command:
- Run the following command to delete the Postgres pods, which refreshes
the new certificate:
oc get pod -n <namespace> --no-headers=true | grep postgres | grep -v backup | awk '{print $1}' | xargs oc delete pod -n <namespace>
- Delete the
portal-www
,portal-db
andportal-nginx
pods to ensure they use the new secrets.If you have the Developer Portal deployed, then the
portal-www
,portal-db
, andportal-nginx
pods might require deleting to ensure that they pick up the newly generated secrets when restarted. If the pods are not showing as "ready" in a timely manner, then delete all the pods at the same time (this will cause down time).Run the following commands to get the name of the portal CR and delete the pods:
oc project <APIC_namespace>
oc get ptl
oc delete po -l app.kubernetes.io/instance=<name_of_portal_CR>
- Renew the internal certificates for the analytics subsystem.
If you see analytics pods in the
CrashLoopBackOff
state, then renew the internal certificates for the analytics subsystem and force a restart of the pods.- Switch to the project/namespace where analytics is deployed and run the following
command to get the name of the analytics CR (AnalyticsCluster):
oc project <APIC_namespace>
oc get a7s
You need the CR name for the remaining steps.
- Renew the internal certificates (CA, client, and server) by running the following
commands:
oc get certificate <name_of_analytics_CR> -ca -o=jsonpath='{.spec.secretName}' | xargs oc delete secret
oc get certificate <name_of_analytics_CR> -client -o=jsonpath='{.spec.secretName}' | xargs oc delete secret
oc get certificate <name_of_analytics_CR> -server -o=jsonpath='{.spec.secretName}' | xargs oc delete secret
- Force a restart of all analytics pods by running the following command:
oc delete po -l app.kubernetes.io/instance=<name_of_analytics_CR>
- Switch to the project/namespace where analytics is deployed and run the following
command to get the name of the analytics CR (AnalyticsCluster):
- Ensure that the operators and operands are healthy before
proceeding.
- Operators: The OpenShift web console indicates that all operators are in
Succeeded
state without any warnings. - Operands:
- To verify whether operands are healthy, run the following command:
oc get apic
Check the status of the
apiconnectcluster
custom resource. The CR will not report as ready until you complete some additional steps in this procedure. - In Cloud Pak for Integration, wait until the API Connect capability shows
READY
(green check) in Platform Navigator.Known issue: Status toggles betweenReady
andWarning
There is a known issue where the API Connect operator toggles the overall status of the API Connect deployment in Platform Navigator between
Ready
andWarning
. Look at the full list of conditions and whenReady
isTrue
, you can proceed to the next step even ifWarning
is also true.
- To verify whether operands are healthy, run the following command:
- Operators: The OpenShift web console indicates that all operators are in
- Update the operand version:
- OpenShift:
- Edit the top-level
apiconnectcluster
CR by running the following command:oc -n <APIC_namespace> edit apiconnectcluster
- Change the
version
setting to10.0.1.8-eus
. - In the
spec.gateway
section, delete thetemplate
override section, if it exists. You cannot run an upgrade if the CR contains an override. - Save and close the CR.
- Edit the top-level
- Cloud Pak for Integration:
- In Platform Navigator, click the Runtimes tab.
- Click
at the end of the current row, and then click Change version.
- Click Select a new channel or version, and then select
10.0.1.8-eus
in the Version field.Selecting the new channel ensures that both DataPower Gateway and API Connect are upgraded.
- Click Save to save your selections and start the upgrade.
In the runtimes table, the Status column for the runtime displays the "Upgrading" message. The upgrade is complete when the Status is "Ready" and the Version displays the new version number.
- OpenShift:
- Verify that the upgraded subsystems report as
Running
.Run the following command:
oc get apic --all-namespaces
The Management, Analytics, and Portal subsystems should report as
Running
. The Gateway subsystem will not be running until you complete the next step to correct peering issues.Example response:
NAME READY STATUS VERSION RECONCILED VERSION AGE analyticscluster.analytics.apiconnect.ibm.com/analytics 8/8 Running 10.0.1.8-eus 10.0.1.8-eus-1074 121m NAME PHASE READY SUMMARY VERSION AGE datapowerservice.datapower.ibm.com/gw1 Running True StatefulSet replicas ready: 1/1 10.0.1.8-eus 100m NAME PHASE LAST EVENT WORK PENDING WORK IN-PROGRESS AGE datapowermonitor.datapower.ibm.com/gw1 Running false false 100m NAME READY STATUS VERSION RECONCILED VERSION AGE gatewaycluster.gateway.apiconnect.ibm.com/gw1 2/2 Running 10.0.1.8-eus 10.0.1.8-eus-1074 100m NAME READY STATUS VERSION RECONCILED VERSION AGE managementcluster.management.apiconnect.ibm.com/m1 16/16 Running 10.0.1.8-eus 110.0.1.8-eus-1074 162m NAME READY STATUS VERSION RECONCILED VERSION AGE portalcluster.portal.apiconnect.ibm.com/portal 3/3 Running 10.0.1.8-eus 10.0.1.8-eus-1074 139m
- After the operand upgrade, scale the Gateways pods down, and back up, to correct peering
issues caused by the ingress issuer change.
- Scale down the Gateway firmware containers by editing the top API Connect CR and
setting the
replicaCount
to0
.- OpenShift:
- Run the following command to edit the
CR:
oc -n <APIC_namespace> edit apiconnectcluster
- Set the
replicaCount
to0
(you might have to add the setting):... spec: gateway: replicaCount: 0 ...
- Run the following command to edit the
CR:
- Cloud Pak for Integration:
- In Platform Navigator, edit the instance and enable the Advanced settings.
- In the Gateway subsystem section, set the Advance Replica count field to 0.
- OpenShift:
- Wait for Gateway firmware pods to scale down and terminate.
Do not proceed until the pods have terminated.
- Scale up the Gateway firmware containers back to the original value.
- OpenShift:
- Run the following command to edit the
apiconnectcluster
CR:oc -n <APIC_namespace> edit apiconnectcluster
- Set the
replicaCount
to its original value, or remove the setting:... spec: gateway: ...
- Run the following command to edit the
- Cloud Pak for Integration:
- In the Platform UI, edit the instance and enable the Advanced settings.
- In the Gateway subsystem section, set the Advance Replica count field to its original value, or clear the field.
- OpenShift:
- Run the following command and verify that the all subsystems (including Gateway) now
report the
STATUS
asRunning
and theRECONCILED VERSION
as 10.0.1.8-eus:oc get apic --all-namespaces
For example:
NAME READY STATUS VERSION RECONCILED VERSION AGE analyticscluster.analytics.apiconnect.ibm.com/analytics 8/8 Running 10.0.1.8-eus 10.0.1.8-eus-5352 121m NAME PHASE READY SUMMARY VERSION AGE datapowerservice.datapower.ibm.com/gw1 Running True StatefulSet replicas ready: 1/1 10.0.1.8-eus 100m NAME PHASE LAST EVENT WORK PENDING WORK IN-PROGRESS AGE datapowermonitor.datapower.ibm.com/gw1 Running false false 100m NAME READY STATUS VERSION RECONCILED VERSION AGE gatewaycluster.gateway.apiconnect.ibm.com/gw1 2/2 Running 10.0.1.8-euss 10.0.1.8-eus-5352 100m NAME READY STATUS VERSION RECONCILED VERSION AGE managementcluster.management.apiconnect.ibm.com/m1 16/16 Running 10.0.1.8-eus 10.0.1.8-eus-5352 162m NAME READY STATUS VERSION RECONCILED VERSION AGE portalcluster.portal.apiconnect.ibm.com/portal 3/3 Running 10.0.1.8-eus 10.0.1.8-eus-5352 139m
If the Gateway pods appear to be out-of-sync with the Management subsystem after upgrading, see Gateway pods not in sync with Management after upgrade.
- Scale down the Gateway firmware containers by editing the top API Connect CR and
setting the
- Upgrade the OpenShift cluster to OpenShift 4.10.
Upgrading OpenShift requires that you proceed through each minor release instead of upgrading directly to 4.10. For more information, see the Red Hat OpenShift documentation. In the "Documentation" banner, select the version of OpenShift that you want to upgrade to, and then expand the "Updating clusters" section in the navigation list.
What to do next
After the upgrade to 10.0.1.8-eus is complete, upgrade to the latest version.