Performing offline upgrade or excluding nodes from upgrade using installation toolkit

Starting in the IBM Spectrum Scale 5.0.2 release, the installation toolkit can upgrade and tolerate nodes being in an offline state, and it can exclude some nodes from the upgrade.

Note: Online upgrades can be done only from release 4.2.x to release 5.0.x . However, you can use offline upgrades to upgrade directly from release 4.1.x to release 5.0.x.

Upgrade when nodes are unhealthy

By using the IBM Spectrum Scale offline upgrade, you can upgrade your cluster even if one or more nodes are unhealthy. A node is called unhealthy when the services are down but it is reachable through ping commands.

When you designate a node as offline in the cluster configuration, during the upgrade run, the installation toolkit upgrades all installed packages. However, there is no attempt made to stop or restart the respective services. You must manually restart the previously offline services by using these commands: mmces service start for protocol components and mmstartup for GPFS daemon.

For example, you try to upgrade a 5-node cluster whose nodes are node1, node2, node3, node4, and node5 (protocol node).

On upgrade precheck, you notice the following issues:
  • node3 is reachable but NFS is down
  • node5 is reachable but SMB is down
  • node2 is reachable but all services including GPFS are down
Here, you can do the following:
  • Designate node3 as offline. This means that, during or after the upgrade, the installation toolkit does not restart any services including NFS on this node, but all installed packages including NFS (GPFS, SMB, and OBJ, and others) are upgraded on node3.
  • Designate node5 as offline. This means that, during or after the upgrade, the installation toolkit does not restart any services including SMB on this node, but all installed packages including SMB (GPFS, NFS, SMB, OBJ, and others) are upgraded on node5.
  • Designate node2 as offline. This means that all installed packages are upgraded, but none of the services are tried to be restarted.
Note:
  • If you designate all nodes in a cluster as offline, then a full offline upgrade is performed on all nodes, leading to an upgrade of all installed packages without any services being started or stopped.
  • If you try to designate a node that is already excluded as offline, then the exclude designation of the node will be cleared, and the offline designation will be added. For example,
    ./spectrumscale upgrade config offline -N vm1
    [ INFO ] The node vm1.ibm.com was added in excluded list previously. Clearing this from excluded list. [ INFO ] Adding vm1.ibm.com as smb offline
  • After an offline upgrade, you must ensure that all unhealthy services are manually started (using mmces service start for protocol components, mmstartup for GPFS) .

Designating nodes as offline in the upgrade configuration

  • To designate a node as offline, issue this command:
    ./spectrumscale upgrade config offline -N nodename
    An offline upgrade is performed on this node, which means that all installed packages are upgraded without any services being restarted.
    Important: Before designating a node as offline, you must ensure that none of the components are active and if the node is a protocol node, then it must be suspended.
    • To check the status of the GPFS daemon, issue the mmgetstate command.
    • To stop the GPFS daemon, issue the mmshutdown command.
    • To check the status of protocol components, issue the mmces service list command.
    • To suspend the protocol node and stop the protocol services, issue the mmces node suspend --stop command.
      If you are upgrading from IBM Spectrum Scale version 5.0.2.0 or earlier, issue the following commands to suspend the protocol node and stop the protocol services:
      mmces node suspend
      mmces service stop Protocol
  • To designate all nodes as offline and do a full offline upgrade across the cluster, issue this command:
    ./spectrumscale upgrade config offline -N node1,node2,node3.....,noden
    All installed packages are upgraded on all the nodes in the cluster, but no services are restarted on any of the nodes.
  • Clearing the offline designations
    • To clear all offline designations from a specific node, issue this command:
      ./spectrumscale upgrade config offline -N nodename --clear
    • To clear all the offline designations from all the nodes, issue this command:
      ./spectrumscale upgrade config offline --clear
  • To clear both the offline and exclude configurations, issue this command:
    ./spectrumscale upgrade config clear
  • To view all configurations that are done for offline upgrade, issue this command:
    ./spectrumscale upgrade config list
    This includes the nodes that are excluded and the nodes where the components are designated as offline. An offline upgrade is initiated based on this configuration. For example,
    ./spectrumscale upgrade config list
    [ INFO ] GPFS Node SMB NFS OBJ GPFS [ INFO ] [ INFO ] Phase1: Non Protocol Nodes Upgrade [ INFO ] nsd001st001 - - - [ INFO ] nsd002st001 - - - [ INFO ] nsd003st001 - - - [ INFO ] nsd004st001 - - - [ INFO ] [ INFO ] Phase2: Protocol Nodes Upgrade [ INFO ] prt002st001 [ INFO ] prt003st001 [ INFO ] prt004st001 [ INFO ] prt006st001 [ INFO ] prt008st001 [ INFO ] prt009st001 [ INFO ] prt011st001 [ INFO ] [ INFO ] Excluded Nodes : prt007st001,prt001st001,prt010st001,prt005st001 [ INFO ]

Upgrade when nodes are not reachable

You can exclude one or more nodes from the current upgrade run, if the nodes are unreachable or if you want to upgrade them at a later time. When you exclude a node from the upgrade configuration, no action is performed on this node during the upgrade.
Note:
  • It is not recommended to exclude a subset of protocol nodes. For example, if you have 3 protocol nodes, then you must exclude all 3 nodes together. It is not recommended to exclude only a subset (1 or 2) of nodes. For example,
    ./spectrumscale upgrade config exclude -N vm1
    [ INFO ] Adding node vm1.ibm.com in excluded list. [ WARN ] Protocol nodes should all be upgraded together if possible, since mixed versions of the code are not allowed in CES components (SMB/OBJ). You may add the remaining protocol node(s) : vm2.ibm.com in the excluded list or clear node(s): vm1.ibm.com with the ./spectrumscale config exclude --clear option so that no protocol nodes are excluded.
  • Ensure that not all admin nodes are excluded and that at least one admin node is available in the non-excluded list. For example, if you have 3 admin nodes in a cluster that you want to upgrade, then you can exclude a maximum of 2 admin nodes only. If you have only one admin node, then it must not be excluded.

Excluding nodes from the upgrade configuration

  • To exclude one or more nodes from the upgrade configuration, issue this command:
    ./spectrumscale upgrade config exclude -N node1,node2
    This ensures that the installation toolkit does not perform any action on node1 and node2 during upgrade.
  • Clearing the exclude designations
    • To clear the exclude configuration from specific nodes, issue this command:
      ./spectrumscale upgrade config exclude -N node1,node2 --clear
      Note: It is not recommended to clear only a subset of the protocol nodes that are designated as offline.
    • To clear the exclude configuration from all nodes, issue this command:
      ./spectrumscale upgrade config exclude --clear

Upgrading the excluded nodes or offline designated nodes

  1. Ensure that the nodes on which you want to perform offline upgrade are reachable through ping commands.
  2. For nodes that are designated as excluded, clear the exclude designation of the nodes in the cluster definition file by using this command:
    ./spectrumscale upgrade config offline -N node1,node2 --clear
  3. For nodes that were earlier designated as excluded, designate them as offline if all the required services are not running by using this command:
    ./spectrumscale upgrade config offline -N node1,node2
  4. Run the upgrade procedure on the offline designated nodes by using this command:
    ./spectrumscale upgrade run
    The installation toolkit upgrades the packages and restarts the services only for an online upgrade. For an offline upgrade, the installation toolkit only upgrades the packages that are currently installed on the offline designated nodes.
  5. After the upgrade procedure is completed, do the following:
    • Restart the GPFS daemon by using the mmstartup command on each offline designated node.
    • If the object protocol is configured, perform the post-upgrade object configuration by using the following command from one of the protocol nodes.
      mmobj config manage --version-sync
      
    • Resume the protocol node and restart the protocol services by using the mmces node resume --start command for every offline designated node that is a protocol node.
      If you are upgrading from IBM Spectrum Scale version 5.0.2.0 or earlier, issue the following commands to resume the protocol node and start the protocol services:
      mmces node resume
      mmces service start Protocol

Populating cluster configuration when nodes are designated as offline in the upgrade configuration

You cannot use the config populate functionality if one or more nodes in the cluster are designated as offline in the upgrade configuration. Use the following steps to populate the cluster configuration in scenarios in which you plan to designate one or more nodes as offline in the upgrade configuration.
  1. Extract the installation image that you want to use for doing the upgrade.
  2. Use the ./spectrumscale config populate command or copy the old cluster definition file or create a cluster definition file using the ./spectrumscale command.
  3. Shut down the node(s).
  4. Use the ./spectrumscale upgrade config command to designate the node(s) as offline in the upgrade configuration.

Limitations

  • You cannot exclude a node if SMB service is running on it. This is not applicable for NFS or object. For example,
    ./spectrumscale upgrade config exclude -N vm1
    [ FATAL ] In order to exclude a protocol node running SMB from the current upgrade, the SMB service must first be stopped on that node. Please stop SMB using the mmces command and retry.