IBM Support

After upgrade of Db2U operator, Db2 service may fail to start which causes the db2u pod to recycle in a loop

Troubleshooting


Problem

This issue is only seen in MPP Db2 deployments when an upgrade to s1158-cn level of Db2U operator in CP4D environment is performed.
The db2u container pods readiness and liveliness probe never return healthy status that causes Red Hat OpenShift to keep on recycling the db2u pods.
You see that upgrade completed successfully by looking at db2cluster yaml. However, the issue starts once the upgrade changes are in effect that is when restart of db2u operator or worker nodes is done.
Snippet from db2cluster yaml - 
status:
  conditions:
  - lastTransitionTime: "2023-06-02T21:40:28Z"
    status: OK
    type: FormationStatus
  maintenanceState: None
  state: Ready
  version: s11.5.8.0-cn2

Symptom

db2u pods keep on getting killed and redeployed every 15 mins. In the Pod logs, you see the following errors - 
{"level":"error","component":"server","subComponent":"GetDb2Health","caller":"[13]:db2.go:258:GetDb2Health()","timestamp":"2023-06-02T00:25:36Z","message":"1 MLNs found in MLN dist file, but only 0 db2sysc processes match"}
{"level":"error","component":"server","subComponent":"GetDb2Health","caller":"[13]:db2.go:258:GetDb2Health()","timestamp":"2023-06-02T00:26:06Z","message":"1 MLNs found in MLN dist file, but only 0 db2sysc processes match"}
{"level":"error","component":"server","subComponent":"GetDb2Health","caller":"[13]:db2.go:258:GetDb2Health()","timestamp":"2023-06-02T00:26:36Z","message":"1 MLNs found in MLN dist file, but only 0 db2sysc processes match"}
{"level":"error","component":"server","subComponent":"GetDb2Health","caller":"[13]:db2.go:258:GetDb2Health()","timestamp":"2023-06-02T00:26:36Z","message":"1 MLNs found in MLN dist file, but only 0 db2sysc processes match"}

 

Events of the DB2U pod report Readiness probe failed - 
Events:
  Type     Reason          Age               From               Message
  ----     ------          ----              ----               -------
  Normal   Scheduled       70s               default-scheduler  Successfully assigned db2namespace/c-db2namespace-db2-db2u-1 to ip-10-0-236-93.us-east-2.compute.internal
  Normal   AddedInterface  66s               multus             Add eth0 [10.128.8.20/23] from openshift-sdn
  Normal   Pulled          65s               kubelet            Container image "icr.io/db2u/db2u.tools@sha256:4515fe819f9812fabf5b0e65acbbf65c2923ac89baa48976f903d695c4129030" already present on machine
  Normal   Created         65s               kubelet            Created container init-labels
  Normal   Started         65s               kubelet            Started container init-labels
  Normal   Pulled          64s               kubelet            Container image "icr.io/db2u/db2u.tools@sha256:4515fe819f9812fabf5b0e65acbbf65c2923ac89baa48976f903d695c4129030" already present on machine
  Normal   Created         64s               kubelet            Created container init-kernel
  Normal   Started         64s               kubelet            Started container init-kernel
  Normal   Pulled          63s               kubelet            Container image "icr.io/db2u/db2u@sha256:2532bee039ea3e1d825cfc0ac226fddf88a1192ae85b98cadcbdad5e75fd138b" already present on machine
  Normal   Created         63s               kubelet            Created container db2u
  Normal   Started         63s               kubelet            Started container db2u
  Warning  Unhealthy       7s (x2 over 27s)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 500
  Warning  Unhealthy       7s                kubelet            Liveness probe failed: HTTP probe failed with statuscode: 500
 

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SS5MAX","label":"IBM Cloud Pak for Data Db2"},"ARM Category":[{"code":"a8m3p000000UoRRAA0","label":"Administration-\u003EUpgrade"}],"ARM Case Number":"TS013203142","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Log InLog in to view more of this document

This document has the abstract of a technical article that is available to authorized users once you have logged on. Please use Log in button above to access the full document. After log in, if you do not have the right authorization for this document, there will be instructions on what to do next.

Document Information

Modified date:
12 June 2023

UID

ibm17003285