Pods restarted regularly every 10 hours
Pods which have certificate secrets mounted are being restarted about every 10 hours in foundational services 3.13.
Symptoms
Pods which have certificate secrets mounted are restarted about every 10 hours. When listing pods with oc get pods
, the Age column will never exceed about 12 hours.
Checking the YAML of the restarted pod(s), there is a label named certmanager.k8s.io/time-restarted
, and the value of this label matches the time when the pod was created.
Cause
The ibm-cert-manager-operator
pod automatically restarts pods whenever the Certificate they use is renewed. However, in foundational services 3.13, there is a bug that causes the operator to incorrectly restart pods even when the Certificate
has NOT been renewed.
Resolving the problem
This issue is fixed in foundational services 3.14, so upgrading would be the permanent solution. If upgrading foundational services as a whole cannot be done for particular reasons, then patching the ibm-cert-manager-operator
CSV with
the image from foundational services 3.14 can be an option. This will technically upgrade ibm-cert-manager-operator
, leaving all other services from foundational services untouched.
Patching CSV
-
Run the following command:
oc edit ibm-cert-manager-operator.v3.13.0
-
Change the operator image value to
quay.io/opencloudio/ibm-cert-manager-operator@sha256:bcf43bd31ed39ba8a8f559e503c0ec2db62459a7466248917c258d9a990c5d17
.Notes:
- Note that the registry
quay.io/opencloudio
may have to be changed depending on what registry you used for foundational services. - In an air-gapped scenario, the operator image and the operand images must be mirrored first.
- The following are the operand images:
quay.io/opencloudio/icp-cert-manager-controller@sha256:b4ab5ef86d492b6f5caa6e2676b095ab7683ade4c42660f36ec3d43616558f3b
quay.io/opencloudio/icp-cert-manager-webhook@sha256:a3a4c2982f018ae500274e154ff00a39b72501c0f4d6f5293401f0dc9e16a915
quay.io/opencloudio/icp-cert-manager-cainjector@sha256:66447d5997e0ee3d7ed8226cfda1774c6a412e9a749515b1f6bf1fd6ac5726f8
quay.io/opencloudio/icp-cert-manager-acmesolver@sha256:9eabb93d88dc158c4892e298eec1da262515c9359ef70c21a91f4260b1aaf37f
quay.io/opencloudio/icp-configmap-watcher@sha256:ffa55e50f834d1ab79686832679ad9e97c2a529dad6a262dac6fd60467cadf19
- Note that the registry
If patching the CSV is also not an option, then the temporary workaround is to scale down the ibm-cert-manager-operator
:
oc scale --replicas=0 deployment/ibm-cert-manager-operator
Note that this will stop all pod restarts even if a Certificate has been renewed. Use this method ONLY as a short-term, temporary workaround.
Another issue with this method is that any new Certificates created using the v1alpha1 version will also not be converted to v1 Certificate, which means no certificate secret will be generated.