Troubleshooting upgrades on Kubernetes
Review the following troubleshooting guidance if you encounter a problem during an API Connect on Kubernetes.
Subsystem is stuck in Pending
state with a reason of
PreUpgradeCheckInProgress
Before the subsystem microservices are upgraded, the operator triggers a set of pre-upgrade
checks that must pass for the upgrade to proceed. If one or more of the checks fail, the subsystem
status remains in Pending
state with a reason of
PreUpgradeCheckInProgress
. Check the status condition of the subsystem CR to
confirm the pre-upgrade check failed. The status.PreUpgradeCheck
property contains
a summary of the failed checks. Full logs for the checks that are carried out can be viewed in the
ConfigMap
referenced in the status.PreUpgradeCheck
property. The
pre-upgrade checks automatically retry until they successfully pass. If you are unable to rectify
the problem that causes a check to fail, then open an IBM support case.
License webhook error
admission webhook "vmanagementcluster.kb.io" denied the request:
ManagementCluster.management.apiconnect.ibm.com "management" is invalid:
spec.license.license: Invalid value: "L-RJON-BYGHM4":
License L-RJON-BYGHM4 is invalid for the chosen version 10.0.8.1.
Please refer license document https://ibm.biz/apiclicenses
To resolve the error, see API Connect licenses for the list of the available license IDs and select the appropriate license IDs for your deployment. Update the CR with the new license value as in the following example, and then save and apply your changes again.
Taskmanager error syncing management and gateway
taskmanager
pods log the following error message. It starts 15 minutes after
upgrade and repeats every 15 minutes for any stuck task.
TASK: Stale claimed task set to errored state:
management-natscluster
pods, for
example:
management-natscluster-1
.kubectl -n <namespace> delete pod management-natscluster-1 management-natscluster-2 management-natscluster-3
DataPower operator fails to start
no nodes match pod topology spread constraints (missing required label)
.
For
example:0/15 nodes are available: 12 node(s) didn't match pod topology spread constraints (missing required label),
3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
You
can workaround the issue by editing the DataPower operator deployment and reapplying it, as
follows:- Delete the DataPower operator deployment, if deployed
already:
kubectl delete -f ibm-datapower.yaml -n <namespace>
- Open
ibm-datapower.yaml
, and locate thetopologySpreadConstraints:
section. For example:topologySpreadConstraints: - maxSkew: 1 topologyKey: zone whenUnsatisfiable: DoNotSchedule
- Replace the values for
topologyKey:
andwhenUnsatisfiable:
with the corrected values that are shown in the following example:topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway
- Save
ibm-datapower.yaml
and deploy the file to the cluster:kubectl apply -f ibm-datapower.yaml -n <namespace>
Unexpected behavior in Cloud Manager and API Manager UIs after upgrade
Stale browser cache issues can cause problems after an upgrade. To remedy this problem, clear your browser cache, and open a new browser window.
Portal sites failed to be upgraded successfully
If one or more of the Portal sites failed to be upgraded successfully, check the portal-www admin container logs to see what prevented the site upgrade from completing successfully.
To trigger the site upgrade again, exec into the portal-www pod admin container and run the following command:
upgrade_devportal -s <site_uuid> -p <platform>
- <site_uuid> can be obtained by running the command:
list_sites
- <platform> can be obtained by running the command:
list_platforms