ESS introduced in version 6.1.1.1 a new feature which allows more granular control over how the online upgrade is performed. essrun is an Ansible wrapper which gives users the ability to perform a variety of deployment options while hiding the complexity of many aspects of system bring-up. essrun provides an option for users to perform an online upgrade by typically upgrading ½ of all the specified building-blocks at a time while always maintaining quorum. In reaction to recent issues with online upgrade we have introduced a new flag called --serial which is intended to give the user more control and safety measures over the online update process.
Problem Summary:
An issue with the --serial option has the potential to cause a loss of quorum on a running cluster.
Example:
essrun -N ess3200-1a,ess3200-1b,ess3200-2a,ess3200-2b update --serial 1
TASK [/opt/ibm/ess/deploy/ansible/roles/updatenode : Autoload off]
*************************************************************************
changed: [ess3200-1a]
Thursday 08 July 2021 17:00:12 +0000 (0:00:03.378) 2:38:32.129 *********
TASK [/opt/ibm/ess/deploy/ansible/roles/updatenode : Obtain GPFS status]
*************************************************************************
ok: [ess3200-1a] =>
msg: GPFS is active on node. Shutting Down now.
ok: [ess3200-1b] =>
msg: GPFS is active on node. Shutting Down now.
ok: [ess3200-2a] =>
msg: GPFS is active on node. Shutting Down now.
ok: [ess3200-2b] =>
msg: GPFS is active on node. Shutting Down now.
Thursday 08 July 2021 17:00:13 +0000 (0:00:00.619) 2:38:32.749 *********
Note: Quorum loss was observed in this scenario as the essrun process doesn't currently have the proper logic to ensure the cluster has 50%+1 quorum nodes available.
The issue is that essrun makes a bad comparison with the quantity of nodes involved in the online upgrade.
Recommendations:
Customers are advised to perform online update as documented in the Quick Deployment Guide.
Note : Do not use the --serial flag introduced in ESS 6.1.1.1. Development is looking to add back support for this option in the ESS upcoming release.
A subsequent update will be notified when the IBM ESS fix version are generally released.
[{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STHMCM","label":"IBM Elastic Storage Server"},"ARM Category":[{"code":"a8m50000000Kzf0AAC","label":"Upgrade"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"6.1.1"}]