Problem when you install two different cert-managers

The cert-manager that is installed by foundational services is based on the CNCF cert-manager Opens in a new tab. This was previously known as the Jetstack cert-manager.

Symptoms

  • Certificates never become ready

  • In the cert-manager-controller pod, there are error messages that indicate there is more than one CertificateRequest for one or more Certificates. Note that cert-manager will create a CertificateRequest object whenever a Certificate object is created, but there should not be more than one per Certificate. If there are, you will see an error message.

  • Multiple cert-managers are installed on one cluster. To check if there are multiple cert-managers on the cluster, run the following:

    oc get pods -A | grep cert-manager
    

If you have foundational services cert-manager installed, then the output should resemble the following:

ibm-common-services    cert-manager-cainjector-xxx-xxx
ibm-common-services    cert-manager-controller-xxx-xxx
ibm-common-services    cert-manager-webhook-xxx-xxx
ibm-common-services    ibm-cert-manager-operator-xxx-xxx

If you have the CNCF cert-manager installed, then the output should resemble the following:

cert-manager    cert-manager-cainjector-xxx-xxx
cert-manager    cert-manager-xxx-xxx
cert-manager    cert-manager-webhook-xxx-xxx

If you see both sets of pods, then there are multiple cert-manager instances installed.

Cause

The CNCF cert-manager has a limitation where only one instance of it can properly run on a cluster. If there is more than one, then there can be unexpected behavior, such as Certificates never becoming ready.

Because foundational services installs a cert-manager that is based off of CNCF cert-manager, it has the same limitation.

Resolving the problem

Depending on the situation, there are several methods to resolve the problem, but all of them involve uninstalling one of the cert-managers.

Before attempting any method, create a backup of the Issuers, ClusterIssuers, and Certificate objects. For example:

oc get -A -o yaml issuers > issuers.yaml

None of the methods should remove any of these objects, but if an incident occurs and the Custom Resource Definition (CRD) for these objects are unintentionally removed, then those objects will also be removed. Do not remove any CRDs.

Method 1: Uninstall foundational services cert-manager operands

Prerequisite:

  • Foundational services version 3.19 or later

If you want to use the CNCF cert-manager to manage certificates, either because it was installed on the cluster first or due to a requirement, then follow this method:

  1. Follow the instructions in Control installation of Certificate manager operands

  2. Delete the CertManager object by running the following command:

    oc delete certmanagers.operator.ibm.com default
    
    1. Optional: If the deletion gets stuck, you can force delete by editing the yaml and set the value of the finalizer to null.

      oc edit certmanagers.operator.ibm.com default
      
  3. Restart the ibm-cert-manager-operator pod. To get the pod name, run the following command:

    oc get pods -n ibm-common-services -l "app.kubernetes.io/managed-by=ibm-cert-manager-operator"
    
  4. Verify that the foundational services cert-manager pods are no longer running (except for ibm-cert-manager-operator):

    oc get pods -n ibm-common-services | grep cert-manager
    
    ibm-cert-manager-operator-xxxx
    

Method 2: Uninstall CNCF cert-manager

This method will vary depending on how the CNCF cert-manager was installed. The most important thing is that the CRDs are NOT removed. There are generally three ways that the CNCF cert-manager could have been installed:

  1. kubectl apply
  2. Helm
  3. OperatorHub

Uninstalling kubectl apply

The CNCF instructions for uninstalling via the YAML file is simply to delete the YAML file you used in order to install it in the first place. However, this also removes the CRDs, which must be avoided. Hence there are two options:

  1. Edit the YAML file first, and remove the CustomResourceDefinition YAMLs inside of it. After removing the CustomResourceDefinition YAMLs from the file, delete the file by running: oc delete -f <file.yaml>

  2. foundational services provides a YAML file you can use to delete cert-manager, which already has the CustomResourceDefinition YAMLs removed from it: oc delete -f https://raw.githubusercontent.com/IBM/ibm-common-service-operator/scripts/cert-manager.yaml

Uninstalling Helm

Follow the CNCF instructions. However, skip the part where the CustomResourceDefinitions (CRDs) are deleted.

Uninstalling Operator

If using the Red Hat OpenShift Container Platform console:

  1. Navigate to Installed Operators
  2. Find the cert-manager operator (NOT the IBM Cert Manager)
  3. Click the three dots on the right and click "Uninstall operator"

If using the CLI:

  1. Get the subscription:

    oc get sub -n openshift-operators | grep cert-manager
    
  2. Delete the subscription:

    oc delete sub -n openshift-operators cert-manager
    
  3. Get the CSV:

    oc get csv -n openshift-operators | grep cert-manager
    
  4. Delete the CSV:

    oc delete csv -n openshift-operators cert-manager.v1.x.x
    

Note that the names of the subscriptions and CSVs may vary depending on how you installed it.

Troubleshooting cert-manager

If the cert-manager does not refresh the mutating or validating of webhook configuration after upgrade or reinstallation, see Cert Manager fails to call webhook