Upgrading Events Operator in foundational services version 3.17.x and earlier or 3.23.0 and later
Operator upgrade fails when upgrading directly from 3.17 to 3.23.0
Foundational services version 3.23.0 introduces Events Operator 4.5.0, which is based on Kafka version 3.3.1. Events Operator 4.5.0 uses enhanced inter-broker communication with separated control and data-plane listeners. Since brokers must be able to communicate via the listeners to maintain a quorum during the rolling upgrade, and not all prior versions are compatible with the new listeners. Therefore, some prior versions can be upgraded directly to foundational services version 3.23.0 while other versions require a multi-step upgrade process.
The following are the recommended upgrade paths based on the code stream currently deployed:
-
When upgrading from foundational services CD code stream (foundational services 3.18.x–3.22.x)
- Direct upgrade to foundational services 3.23.0 is possible in a single step
-
When upgrading from foundational services LTSR code stream (foundational services 3.19.x)
- Direct upgrade to foundational services 3.23.0 is possible in a single step
-
When upgrading from older versions (foundational services version 3.17.x or earlier; May 2022 release)
- Two-step upgrade required:
- Direct upgrade to an intermediate version (CS 3.18.x–3.22.x)
- Direct upgrade from an intermediate version to foundational services 3.23.0
This will allow the pods to upgrade successfully without the cluster falling into a degraded state with impacted performance and risk of data loss in the event of further failures.
Important: It is strongly recommended that you follow one of the upgrade paths above, dependent on the version of foundational services currently deployed.
Procedure
Attempting to upgrade from versions foundational services 3.17.x or earlier (May 2022 release) to 3.23.x in a single step might leave one or more Kafka pods in a hung state as the old pods are not aware of the separated listeners and unable to form a quorum with the new pods. This will result in a Kafka cluster with degraded performance and in certain cases, at risk of data loss should further failures occur before the upgrade completes.
To determine if any pods are hung, you can inspect the Kafka custom resource by running:
oc get kafka my-kafka -n my-ns -o yaml
Note: Substitute your Kafka cluster name and namespace where appropriate. The status
section in the returned YAML document will show the overall state of the cluster. In cases where the rolling upgrade has failed, it
will look similar to:
status:
conditions:
- lastTransitionTime: "2023-01-12T10:35:30.462360234Z"
message: Pod my-kafka-kafka-1 is currently not rollable
reason: UnforceableProblem # Or 'ForceableProblem'
status: "True"
type: NotReady
observedGeneration: 2
The following script will attempt to remediate failed direct upgrades from foundational services version 3.17.x to foundational services version 3.23.0.
Important: Run the following script during a period when message traffic and producer or consumer activity are low.
#!/bin/bash
### Script to bounce the Kafka Pods if the Kafka Pod is running
### With an older version - #ControlPlaneListerner issue
FG_CYAN='\033[0;36m'
FG_PURPLE='\033[0;35m'
FG_DEF="\033[39m"
FG_RED='\033[0;31m'
FG_GREEN='\033[0;32m'
RESTART_REQUIRED="false"
script_name=$(basename "$0")
## Check if user has entered `namespace` argument to the script
## If not show the help text and exit
if [ -z "$1" ]
then
echo -e "${FG_RED}namespace is not supplied ${FG_DEF}"
echo -e "${FG_CYAN}usage: $script_name <namespace>, example: $script_name myproject ${FG_DEF}"
exit 0
fi
### Accept namespace as an input paramter
namespace=$1
## Check the Kafka CR status to see whether the Kafka Pod is running
## with an old version after the upgrade if so will set RESTART_REQUIRED="true"
cr_status_reason=$(oc get kafkas.ibmevents.ibm.com -n ${namespace} -o jsonpath='{.items[0].status.conditions[*].reason}')
if [[ "$cr_status_reason" == *"UnforceableProblem"* || "$cr_status_reason" == *"ForceableProblem"* ]]; then
RESTART_REQUIRED="true"
echo -e "${FG_RED} There is Kafka Pod(s) running with old version of Kafka in ${namespace} namespace and needs a restart. ${FG_DEF}"
else
echo -e "${FG_GREEN} All the Kafka Pods in ${namespace} namespace is running with the latest version of Kafka...things are looking good ${FG_DEF}"
exit 0
fi
## Events Operator versions 3.15.0 and below ships Kafka
## 2.6.0, 2.6.1, 2.7.0. These are the Versions from which
## The update to latest version (3.3.1) will cause UnforceableProblem
kafka_versions=("2.6.0.jar" "2.6.1.jar" "2.7.0.jar")
## If the Kafka CR status shows `UnforceableProblem` message
## Then loop through the Kafka Pods check whether it's running
## With any of these (2.6.0, 2.6.1, 2.7.0) versions if so
## Bounce the Pod to progress the upgrade.
if [ "$RESTART_REQUIRED" = "true" ]; then
echo -e "${FG_CYAN}Checking Kafka version for namespace $namespace ${FG_DEF}"
for i in $(oc get pods -l app.kubernetes.io/name=kafka -o name -n $namespace)
do
echo -e "${FG_PURPLE}Kafka Pod $i ${FG_DEF}"
version=$(oc -n $namespace rsh $i ls libs | grep kafka_)
echo $version
for item in "${kafka_versions[@]}"; do
if [[ "${version#*-}" == *$item* ]]; then
oc delete $i -n $namespace
count=0
while :
do
PODS=$(oc get pods -l app.kubernetes.io/name=kafka -o jsonpath='{range .items[*]}{@.metadata.name}{" "}{@.status.phase}{"\n"}')
if [[ "Running" == $(echo "$PODS" | grep kafka | awk '{print $2}' | uniq) ]]; then
echo -e "${FG_CYAN}Kafka Pod is successfully bounced!!!! ${FG_DEF}"
break
else
((count+=1))
if (( count <= 24 )); then
echo -e "${FG_PURPLE}Waiting for the Kafka pod to roll. Recheck in 10 seconds ${FG_DEF}"
sleep 10
else
echo -e "${FG_RED}Pods taking too long. Giving up.${FG_DEF}"
exit 1
fi
fi
done
fi
done
done
fi