File system forced unmount

There are indications that lead you to the conclusion that your file system has been forced to unmount and various courses of action that you can take to correct the problem.

Those indications are:
  • Forced unmount messages in the GPFS™ log.
  • Your application no longer has access to data.
  • Your application is getting ESTALE or ENOENT return codes.
  • Multiple unsuccessful attempts to appoint a file system manager may cause the cluster manager to unmount the file system everywhere.

    Such situations involve the failure of paths to disk resources from many, if not all, nodes. The underlying problem may be at the disk subsystem level, or lower. The error logs for each node that unsuccessfully attempted to appoint a file system manager will contain records of a file system unmount with an error that are either coded 212, or that occurred when attempting to assume management of the file system. Note that these errors apply to a specific file system although it is possible that shared disk communication paths will cause the unmount of multiple file systems.

  • File system unmounts with an error indicating too many disks are unavailable.

    The mmlsmount -L command can be used to determine which nodes currently have a given file system mounted.

If your file system has been forced to unmount, follow these steps:
  1. With the failure of a single disk, if you have not specified multiple failure groups and replication of metadata, GPFS will not be able to continue because it cannot write logs or other critical metadata. If you have specified multiple failure groups and replication of metadata, the failure of multiple disks in different failure groups will put you in the same position. In either of these situations, GPFS will forcibly unmount the file system. This will be indicated in the error log by records indicating exactly which access failed, with an MMFS_SYSTEM_UNMOUNT record indicating the forced unmount. The user response to this is to take the needed actions to restore the disk access and issue the mmchdisk command to disks that are shown as down in the information displayed by the mmlsdisk command.
  2. Internal errors in processing data on a single file system may cause loss of file system access. These errors may clear with the invocation of the umount command, followed by a remount of the file system, but they should be reported as problems to the IBM® Support Center.
  3. If an MMFS_QUOTA error log entry containing Error writing quota file... is generated, the quota manager continues operation if the next write for the user, group, or fileset is successful. If not, further allocations to the file system will fail. Check the error code in the log and make sure that the disks containing the quota file are accessible. Run the mmcheckquota command. For more information, see The mmcheckquota command.
    If the file system must be repaired without quotas:
    1. Disable quota management by issuing the command:
      mmchfs Device -Q no
    2. Issue the mmmount command for the file system.
    3. Make any necessary repairs and install the backup quota files.
    4. Issue the mmumount -a command for the file system.
    5. Restore quota management by issuing the mmchfs Device -Q yes command.
    6. Run the mmcheckquota command with the -u, -g, and -j options. For more information, see The mmcheckquota command.
    7. Issue the mmmount command for the file system.
  4. If errors indicate that too many disks are unavailable, see Additional failure group considerations.