Restarting a large IBM Storage Scale cluster
A cluster might have to be restarted because of an OS upgrade. On large FPO clusters auto-recovery must be disabled before restarting IBM Storage Scale.
- Ensure that the status of all disks is Ready and the availability is Up by running the mmlsdisk <fs-name> -L command.
- Verify the compatibility of the planned upgrade system kernel and Linux® distro versions with IBM Storage Scale by reviewing the IBM Storage Scale FAQ in IBM® Documentation.
- Disable auto recovery for disk failure.
When IBM Storage Scale stops functioning, some nodes might not shut down. This might bring some disks down from the fast nodes and might trigger auto recovery. To avoid this, temporarily disable auto recovery.
Run the mmchconfig restripeOnDiskFailure=no -i command to disable auto recovery for disk failure. With the -i option, the parameter takes effect immediately and permanently. For example, in small clusters, the node number is less than 30 nodes. Therefore, it takes a shorter time for IBM Storage Scale to synchronize the configuration. For large clusters, the node number is in hundreds. Therefore, the time taken to synchronize the configuration is longer. The restripeOnDiskFailure parameter is a cluster-wide configuration.
After disabling auto recovery, check for auto recovery in the file system manager by running the following commands:
- If there are multiple file systems in the cluster, run mmlsmgr command to check the fs manager of a single file system.
- Log in to the fs manager of the file system and run ps -elf | grep -e tschdisk -e tsrestripefs command. If there are processes running, wait for them to complete.
- Stop all applications that are using the IBM
Storage Scale file
system. To check for open files in the file system, run the lsof or
the fuse command.
For example, to check if IBM Storage Scale file system has processes using it run the following commands: lsof +f -- /dev/name_of_SpectrumScale_filesystem or fuser -m /mount_point_of_SpectrumScale_filesystem
- Unmount the IBM
Storage Scale file
system on all nodes for this upgrade cycle by running the following
command: mmumount <fsName> -a
To confirm that the file system has been unmounted on all related nodes, run the following command: mmlsmount <fsName> -L
- To disable the Automatic mount option, run the following
command: mmchfs <fsName> -A no Note: In a large cluster, some nodes might take a while to start. If the -A option is not set to no, unnecessary disk IO might cause some disks from slow nodes to be marked as non functional.
- Shut down IBM
Storage Scale on the
nodes by running the following command:
mmshutdown -N <nodeList>
To confirm IBM Storage Scale has stopped functioning on these nodes, run the following command: mmgetstate -a
- Upgrade IBM Storage Scale or perform the maintenance procedure on the whole cluster.
- Start IBM
Storage Scale cluster.
After everything has been installed and the portability layer has been built, start IBM Storage Scale by running the following command: mmgetstate -a.
To confirm that IBM Storage Scale is active on the upgraded nodes, run the following command: mmgetstate -a.
- When IBM
Storage Scale is active
on all nodes, check the state of all disks by running the following
command: mmlsdisk <fsName> -e. If some disks in the file system do not have the Up availability and the Ready status, run the mmchdisk <fsName> start -a command so that the disks start functioning. Run the mmchdisk <fsName> resume -a command so that the suspended and to-be-emptied disks become available.
- When all the disks in the file system are functioning,
mount the file system by running the following command: mmmount <fsName>
-N <nodeList>
Confirm that the IBM Storage Scale file system has mounted by running the following command: mmlsmount <fsName> -L
- To enable auto recovery for disk failure, run the following
command: mmchconfig restripeOnDiskFailure=yes -i
Ensure that you use the -i option so that this change takes effect immediately and permanently.
- To enable the Automatic mount option, run the following command: mmchfs <fsName> -A yes.
-
If you have upgraded IBM
Storage Scale version in
step 6, upgrade the IBM
Storage Scale cluster version and
file system version.
If all applications run without any issues, run the mmchconfig release=LATEST command to upgrade the cluster version to the latest. Then, run the mmchfs -V compat command to ensure that the upgrade is successful. To enable backward-compatible format changes, run mmchfs -V compat.
Note: After running the mmchconfig release=LATEST command, you cannot revert the cluster release version to an older version. After running the mmchfs -V compat command, you cannot revert the file system version to an older version.For major IBM Storage Scale upgrade, check IBM Storage Scale FAQ in IBM Documentation or contact
scale@us.ibm.com
before running the mmchfs -V full command to verify the compatibility between the different IBM Storage Scale major versions. For information about specific file system format and function changes, see File system format changes between versions of IBM Storage Scale.