Considerations for performing your own rolling update of a Native HA queue manager

Any update to the IBM® MQ version or Pod specification for a Native HA queue manager, will require you to perform a rolling update of the queue manager instances. The IBM MQ Operator handles this for you automatically, but if you are building your own deployment code, then there are some important considerations.

Note: The sample Helm chart includes a shell script to perform a rolling update, but the script is not suitable for production use, as it does not address the considerations in this topic.

In Kubernetes, StatefulSet resources are used to manage ordered start-up and rolling updates. Part of the start-up procedure is to start each Pod individually, wait for it to become ready, and then move onto the next Pod. This won't work for Native HA, as all Pods need to be started so that they can run a leader election. Therefore the .spec.podManagementPolicy field on the StatefulSet needs to be set to Parallel. This also means that all Pods will be updated in parallel too, which is particularly undesirable. For this reason, the StatefulSet should also use the OnDelete update strategy.

Inability to use the StatefulSet rolling update code drives a need for custom rolling update code, which should consider the following:

General rolling update procedure
Minimizing down time by updating Pods in the best order
Handling changes in cluster state
Handling errors
Handling timing problems

General rolling update procedure

The rolling update code should wait for each instance to show a status of REPLICA from dspmq. This means that the instance has performed some level of start up (for example, the container is started, and MQ processes are running), but it has not necessarily managed to talk to the other instances yet. For example: Pod A gets restarted, and as soon as it's in REPLICA state, Pod B gets restarted. Once Pod B starts with the new configuration, it should be able to talk to Pod A, and can form quorum, and either A or B will become the new active instance.

As part of this, it is useful to have a delay after each Pod has reached the REPLICA state, to allow for it to connect to its peers and establish quorum.

Minimizing down time by updating Pods in the best order

The rolling update code should delete Pods one at a time, starting with Pods which are in a known error state, followed by any Pods that have not successfully started. The active queue manager Pod should generally be updated last.

It is also important to pause the deletion of Pods if the last update resulted in a Pod going into a known error state. This prevents the roll-out of a broken update across all Pods. For example, this can happen if the Pod is updated to use a new container image which isn't accessible (or contains a typo).

Handling changes in cluster state

The rolling update code needs to react appropriately to real-time changes in cluster state. For example, one of the queue manager's Pods might be evicted due to a Node reboot or due to Node pressure. It's possible that an evicted Pod might not be immediately re-scheduled if the cluster is busy. In this case, the rolling update code would need to wait appropriately before restarting any other Pods.

Handling errors

The rolling update code needs to be robust to failures when calling the Kubernetes API and other unexpected cluster behaviour.

In addition, the rolling update code itself needs to be tolerant to being restarted. A rolling update can be long-running, and the code may need to be restarted.

Handling timing problems

The rolling update code needs to check the update revisions of the Pod, so that it can ensure the Pod has restarted. This avoids timing issues where a Pod may indicate that it is "Started", but it has in-fact not yet terminated.