Changes to CA certificate and key does not automatically rotate Kafka leaf certificates
- You are looking to manually renew your IBM Automation foundation
AutomationBase
CA certificate and key. - You are unable to connect to Kafka after you adjust the certificates that you provide to the
AutomationBase
CR or after you attempt to manually rotate the CA certificates and keys by following this documentation. - It has been a year since Kafka was installed and the Kafka leaf certificates have expired.
For example, you see the following error from the Go Kafka client
failed to create protocol: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
after your force a renewal of the CA certificate and key for the AutomationBase
instance by deleting the
secret.
Cause
This issue is because the IBM Events Operator responsible for managing the Kafka instance does not automatically rotate the leaf certificates for the cluster when it is provided with a custom CA. The Kafka cluster is provided with a custom CA so that a common CA can be used for all IBM Automation foundation components.
Resolving the problem
Before you proceed, take a copy of the following secrets:
iaf-system-cluster-ca-cert
iaf-system-cluster-ca
iaf-system-cluster-operator-certs
iaf-system-zookeeper-nodes
iaf-system-kafka-brokers
iaf-system-entity-operator-certs
If you need to renew the CA or the CA and the key as a part of this process follow these steps:
- Read the documentation for renewing certificates here.
- Determine the CA certificates for Kafka that you are going to renew from the above documentation.
- Follow the documentation to renew the CA for Kafka and any other leaf certificates for components underneath that CA.
If you are using v1.0 or v1.1 of AutomationBase
, then:
- Uninstall the IBM Automation foundation Operator.
-
Edit the secret
iaf-system-cluster-ca-cert
and add a copy of the oldca.crt
file in pem format asca-<exipry_date>.crt
, where<exipry_date>
is the certificate expiry date in the formatYEAR-MONTH-DAYTHOUR-MINUTE-SECONDS
.-
The expiry date can be retrieved by using the openssl command:
openssl x509 -enddate -noout -in <path_to_ca>
For example,
ca-2018-09-27T17-32-00Z.crt
.
-
-
Follow the steps for v1.2+
AutomationBase
below.
If you are using v1.2+ of AutomationBase
, then:
- Ensure that the secret
iaf-system-cluster-ca-cert
contains a copy of new CA certificate in fieldca.crt
and a copy of the old CA inca-<expiry_date>.crt
. - Restart all the Zookeeper pods one at a time
iaf-system-zookeeper-*
. Waiting for each to become ready after being restarted. - Restart all the Kafka pods one at a time
iaf-system-kafka-*
. Waiting for each pod to become ready after being restarted. - Restart the entity Operator pod
iaf-system-entity-operator-*
. Waiting for the pod to become ready after being restarted. - Restart the Apicurio pod
iaf-system-apicurio-*
, if you have Apicurio installed. Waiting for the pod to become ready after being restarted. - Follow the steps below to renew the leaf certificates for Kafka.
To renew the leaf certificates for Kafka follow these steps:
You can start here if you are okay with the state of your CA certificates and the keys as they are, and they are not expired.
- Delete the secret
iaf-system-cluster-operator-certs
. - Wait for the secret
iaf-system-cluster-operator-certs
to be recreated, this could take a few minutes. If this is taking too long see the Note. - Delete the secret
iaf-system-zookeeper-nodes
. - Wait for the secret
iaf-system-zookeeper-nodes
to be recreated, this could take a few minutes. If this is taking too long see the Note. - Restart all the Zookeeper pods one at a time
iaf-system-zookeeper-*
. Waiting for each pod to become ready after being restarted. - Delete the secret
iaf-system-kafka-brokers
. - Wait for the secret
iaf-system-kafka-brokers
to be recreated, this could take a few minutes. If this is taking too long see the Note. - Wait for all the Kafka pods to restart, this will occur automatically and may take a few minutes.
- Delete the secret
iaf-system-entity-operator-certs
- Wait for the secret
iaf-system-entity-operator-certs
to be recreated, this could take a few minutes. If this is taking too long see the Note. - Delete the secret
iaf-system-cluster-ca-cert
. - If you are following the v1.0/v1.1 steps re-install the IBM Automation Foundation Operator now.
- Restart the entity Operator pod
iaf-system-entity-operator-*
. Waiting for the pod to become ready after being restarted.
Note: If the step is taking too long you can restart the ibm-events-operator-*
pod in the ibm-common-services
namespace.
You will now be able to connect to the Kafka instance successfully.