GPFS daemon went down
There are a number of conditions that can cause the GPFS daemon to exit.
These are all conditions where the GPFS internal checking has determined that continued operation would be dangerous to the consistency of your data. Some of these conditions are errors within GPFS processing but most represent a failure of the surrounding environment.
In most cases, the daemon exits and restarts after recovery. If it is not safe to simply force the unmounted file systems to recover, the GPFS daemon exits.
- Applications running at the time of the failure, see either ENODEV or
ESTALE errors. The ENODEV errors are generated by the operating system
until the daemon has restarted. The ESTALE error is generated by GPFS as soon as it restarts.
When quorum is lost, applications with open files receive an ESTALE error return code until the files are closed and reopened. New file open operations fail until quorum is restored and the file system is remounted. Applications accessing these files prior to GPFS return may receive a ENODEV return code from the operating system.
- The GPFS log contains the
message:
- 6027-650 [X]
- The mmfs daemon is shutting down abnormally.
Most GPFS daemon down error messages are in the mmfs.log.previous log for the instance that failed. If the daemon restarted, it generates a new mmfs.log.latest. Begin problem determination for these errors by examining the operating system error log.
If an existing quorum is lost, GPFS stops all processing within the cluster to protect the integrity of your data. GPFS attempts to rebuild a quorum of nodes and remounts the file system if automatic mounts are specified.
- Open
requests are rejected with no such file or no
such directory errors.
When quorum has been lost, requests are rejected until the node has rejoined a valid quorum and mounted its file systems. If messages indicate lack of quorum, follow the procedures in GPFS daemon does not come up.
- Removing
the
setuid
bit from the permission bits of any of the following IBM Storage Scale commands can produce errors for non-root users:- mmdf
- mmgetacl
- mmlsdisk
- mmlsfs
- mmlsmgr
- mmlspolicy
- mmlsquota
- mmlssnapshot
- mmputacl
- mmsnapdir
- mmsnaplatest
The GPFS system-level versions of these commands (prefixed by ts) may need to be checked for how permissions are set if non-root users see the following message:- 6027-1209
- GPFS is down on this node.
If thesetuid
bit is removed from the permission bits of any of the following system-level commands, non-root users cannot execute the command and to non-root users the node appears to be down:- tsdf
- tslsdisk
- tslsfs
- tslsmgr
- tslspolicy
- tslsquota
- tslssnapshot
- tssnapdir
- tssnaplatest
These are found in the /usr/lpp/mmfs/bin directory.
Note: The mode bits for all listed commands are 4555 or -r-sr-xr-x. To restore the default (shipped) permission, enter:chmod 4555 tscommand
Attention: Only administration-level versions of GPFS commands (prefixed by mm) should be executed. Executing system-level commands (prefixed by ts) directly produces unexpected results. - For all other errors, follow the procedures in Additional information to collect for GPFS daemon crashes, and then contact the IBM® Support Center.