Upgrading AFM and AFM DR

Consider the following while upgrading Active File Management (AFM) or Active File Management - DR (AFM DR).

Before upgrading to a newer version of IBM Spectrum Scale™, consider the version from which you are upgrading. IBM Spectrum Scale supports a limited form of backward compatibility between two adjacent releases and hence, coexistence and compatibility measures are required. For details, see IBM Spectrum Scale supported upgrade paths. Limited backward compatibility allows you to temporarily operate with some IBM Spectrum Scale nodes running on the newer version, and some nodes running an earlier version. Within a cluster this enables you to perform a rolling upgrade to the new IBM Spectrum Scale version, if upgrade from your current version to the newer version is supported.

In AFM and multi-cluster environment individual clusters can be upgraded at different schedules. Access to the file system data can be preserved even though some of the clusters might still be running on an earlier version. Home or the cache cluster must be upgraded independent of the other.

During a regular upgrade, the IBM Spectrum Scale service is interrupted. For a regular upgrade, you must shut down the cluster and suspend the application workload of the cluster. During a rolling upgrade, IBM Spectrum Scale service is not interrupted. In a rolling upgrade, the system is upgraded node-by-node or failure group-by-failure group. During the upgrade, IBM Spectrum Scale runs on a subset of nodes. You can also perform offline upgrades, if you can shut down the entire cluster. An offline upgrade is similar to the online upgrade procedure. As the entire cluster is offline, it is possible to upgrade to the latest code level instead of upgrading to an intermediate level, as might be required during an online upgrade.

Before you consider a rolling upgrade of the home or cache, ensure that:

the cluster is healthy and operational.
IBM Spectrum Scale is running on all nodes defined in the cluster definition file.
all protocols defined on the protocol node are running, if the protocol is initially enabled.
you provision adequate storage at cache during home upgrade, if cache is used for storage management by using quotas.

After you are ready to upgrade, see Completing the upgrade to a new level of IBM Spectrum Scale. To know upgrade support for protocols and performance monitoring, see Online upgrade support for protocols and performance monitoring.

Cache cluster - In multiple gateway environment, gateway nodes can be upgraded one-by-one. In these cases, filesets associated with the gateway node to upgrade, are transferred to another gateway node, and any write-class operation triggers recovery feature that builds the queue on associated gateway node to continue processing the operations to home. Thus, cache to home are not disconnected, but some performance degradation can be seen due to another gateway node working for the connect for those filesets hosted on the upgrade node previously. In heavy load systems, transferring the filesets to another gateway node might have a performance impact. Its advised to choose a time for such upgrades where the load on the system or the amount of data transfers is minimal.

Parallel data transfers enabled with GW mappings - In multiple gateway environment, where parallel data transfers is enabled with multiple gateways, upgrading any of these mapped nodes results in a normal data transfer path.

Note: Fileset gateway relation or mapping might remain intact after upgrade, depending on the afmHashing version in use.

Home cluster - Cluster Export Services (CES) provides highly available file and object services to an IBM Spectrum Scale cluster by using Network File System (NFS), Object, or Server Message Block (SMB) protocols. With CES environment the exports at home can be seen from cache by using the CES IP addresses. These IP addresses can align to protocol nodes when the CES node which already holds the CES IP address is shut down for an upgrade. The IP addresses alignment is according to the CES IP address distribution policies. Cache might see a short disruption at the time of CES failover at home but cache filesets continue to operate.

In a non-CES or NFS Server at Home environments, home to cache disconnects until the upgrade of NFS home server is complete. In a disconnected mode, cache builds up the queue for application operations. After Home is available, these operations are pushed to Home.

Note: mmclone is not supported on cache for AFM and primary for AFM DR. Clones created at home or secondary are replicated as different files. While upgrading to IBM Spectrum Scale 4.2.2 or later, the cache cluster must be upgraded before considering upgrade of the home cluster.

Using stop and start replication to upgrade AFM and AFM DR

You can use replication stop and start method to assist an upgrade. Replication must be stopped on all filesets of a gateway node you want to upgrade. After stopping replication, you can shut down the gateway node for an upgrade. Local data availability is not affected as the filesets can be accessed by other nodes. However, replication activity stops during upgrade. You must ensure that replication is re-started after the gateway node is upgraded.

Complete the following steps:

Prepare to upgrade to IBM Spectrum Scale 5.0.2.
Run mmafmctl <fsname> getstate |grep <GatewayNode> to identify filesets that belong to the gateway node you want to upgrade.
Before upgrade, check if any messages are left in queue. Wait till all pending messages are completed.
Run mmafmctl <fsname> stop -j <filesetname> to stop replication of the filesets.
Before upgrading the gateway node, run mmafmctl <fsname> getstate to verify that replication has stopped.
Unmount the fileset and shutdown GPFS™ on the gateway node.
Upgrade the gateway node.
Run mmstartup to start the gateway node daemon.
Check the node state by running mmgetstate.
After the node is active, mount the file systems and run mmafmctl <fsname> start -j <filesetname>.

Recovery is initiated and replication starts.

Note: Synchronization of data between home and cache stops during the upgrade process. Recovery is run after you start replication to ensure that data at home and cache is synchronized.