Problem when you install two different cert-managers
The cert-manager that is installed by foundational services is based on the CNCF cert-manager . This was previously known as the Jetstack cert-manager.
Symptoms
-
Certificates never become ready
-
In the cert-manager-controller pod, there are error messages that indicate there is more than one CertificateRequest for one or more Certificates. Note that cert-manager will create a CertificateRequest object whenever a Certificate object is created, but there should not be more than one per Certificate. If there are, you will see an error message.
-
Multiple cert-managers are installed on one cluster. To check if there are multiple cert-managers on the cluster, run the following:
oc get pods -A | grep cert-manager
If you have foundational services cert-manager installed, then the output should resemble the following:
ibm-common-services cert-manager-cainjector-xxx-xxx
ibm-common-services cert-manager-controller-xxx-xxx
ibm-common-services cert-manager-webhook-xxx-xxx
ibm-common-services ibm-cert-manager-operator-xxx-xxx
If you have the CNCF cert-manager installed, then the output should resemble the following:
cert-manager cert-manager-cainjector-xxx-xxx
cert-manager cert-manager-xxx-xxx
cert-manager cert-manager-webhook-xxx-xxx
If you see both sets of pods, then there are multiple cert-manager instances installed.
Cause
The CNCF cert-manager has a limitation where only one instance of it can properly run on a cluster. If there is more than one, then there can be unexpected behavior, such as Certificates never becoming ready.
Because foundational services installs a cert-manager that is based off of CNCF cert-manager, it has the same limitation.
Resolving the problem
Depending on the situation, there are several methods to resolve the problem, but all of them involve uninstalling one of the cert-managers.
Before attempting any method, create a backup of the Issuers, ClusterIssuers, and Certificate objects. For example:
oc get -A -o yaml issuers > issuers.yaml
None of the methods should remove any of these objects, but if an incident occurs and the Custom Resource Definition (CRD) for these objects are unintentionally removed, then those objects will also be removed. Do not remove any CRDs.
Method 1: Uninstall foundational services cert-manager operands
See the following prerequisite:
- Foundational services version 3.19 – 4.0.0.
If you want to use the CNCF cert-manager to manage certificates, either because it was installed on the cluster first or due to a requirement, then follow this method:
-
Delete the CertManager object by running the following command:
oc delete certmanagers.operator.ibm.com default
a. Optional: If the deletion gets stuck, you can force delete by editing the yaml and set the value of the finalizer to
null
.oc edit certmanagers.operator.ibm.com default
-
Restart the
ibm-cert-manager-operator pod
. To get the pod name, run the following command:oc get pods -n ibm-common-services -l "app.kubernetes.io/managed-by=ibm-cert-manager-operator"
-
Verify that the foundational services cert-manager pods are no longer running (except for
ibm-cert-manager-operator
):oc get pods -n ibm-common-services | grep cert-manager ibm-cert-manager-operator-xxxx
Method 2: Uninstall CNCF cert-manager
This method will vary depending on how the CNCF cert-manager was installed. The most important thing is that the CRDs are NOT removed. There are generally three ways that the CNCF cert-manager could have been installed:
kubectl apply
- Helm
- OperatorHub
Uninstalling kubectl apply
The CNCF instructions for uninstalling via the YAML file is simply to delete the YAML file you used in order to install it in the first place. However, this also removes the CRDs, which must be avoided. Hence there are two options:
-
Edit the YAML file first, and remove the CustomResourceDefinition YAMLs inside of it. After removing the CustomResourceDefinition YAMLs from the file, delete the file by running:
oc delete -f <file.yaml>
-
foundational services provides a YAML file you can use to delete cert-manager, which already has the CustomResourceDefinition YAMLs removed from it:
oc delete -f https://raw.githubusercontent.com/IBM/ibm-common-service-operator/scripts/cert-manager.yaml
Uninstalling Helm
Follow the CNCF instructions. However, skip the part where the CustomResourceDefinitions (CRDs) are deleted.
Uninstalling Operator
If using the Red Hat OpenShift Container Platform console:
- Navigate to Installed Operators
- Find the cert-manager operator (NOT the IBM Cert Manager)
- Click the three dots and click on Uninstall operator
If using the CLI:
-
Get the subscription:
oc get sub -n openshift-operators | grep cert-manager
-
Delete the subscription:
oc delete sub -n openshift-operators cert-manager
-
Get the CSV:
oc get csv -n openshift-operators | grep cert-manager
-
Delete the CSV:
oc delete csv -n openshift-operators cert-manager.v1.x.x
Note that the names of the subscriptions and CSVs may vary depending on how you installed it.