Topic
  • 3 replies
  • Latest Post - ‏2008-08-29T18:36:24Z by dlmcnabb
SystemAdmin
SystemAdmin
2092 Posts

Pinned topic mmfsck errors

‏2008-08-28T18:18:29Z |
Hi,

We had a problem where a gpfs filesystem suddenly went down with messages similar to this on the NSDs:

Wed Aug 27 17:01:21 2008: File system eliza12 unmounted because it does not have a manager.
Wed Aug 27 17:01:21 2008: Log recovery failed.
Wed Aug 27 17:01:21 2008: File System eliza12 unmounted by the system with return code 212 reason code 225
Wed Aug 27 17:01:21 2008: The current file system manager failed and no new manager will be appointed.

Aug 27 17:01:21 pc1420 mmfs: Error=MMFS_SYSTEM_UNMOUNT, ID=0xC954F85D, Tag=8994479: Unrecoverable file system operation error. Status code 212. Volume eliza12

On the manager node we see:
Wed Aug 27 17:01:21 2008: Recovery Log I/O Failed, Unmounting file system eliza12
Wed Aug 27 17:01:21 2008: Inconsistency in file system metadata.
Wed Aug 27 17:01:21 2008: File System eliza12 unmounted by the system with return code 234 reason code 225
Wed Aug 27 17:01:21 2008: Inconsistency in file system metadata.

The f/s would not mount, so we ran a mmfsck, but it runs and then a while into it fails with errors such as:

486087 0 0 0 0 1 1 0x40000000 InodeUnavail
486088 0 0 0 0 1 1 0x40000000 InodeUnavail
486089 0 0 0 0 1 1 0x40000000 InodeUnavail
486090 0 0 0 0 1 1 0x40000000 InodeUnavail
486091 0 0 0 0 1 1 0x40000000 InodeUnavail

Limit of 5460 problems exceeded.
Re-execute command to find additional problems.

File system check has ended prematurely.
File system contains unrepaired damage.
Exit status 666:4:10.

The actual storage (fibre channel/SATA) shows no errors at all, and we don't see any connectivity errors on the nodes that the storage is hooked up to.

Any ideas on what we can try next? We have already tried rerunning the command several times and it always fails similarly.

We are running GPFS 3.1.0-8 on x86-64.

Thanks,

Jay
Updated on 2008-08-29T18:36:24Z at 2008-08-29T18:36:24Z by dlmcnabb
  • dlmcnabb
    dlmcnabb
    1012 Posts

    Re: mmfsck errors

    ‏2008-08-28T20:06:04Z  
    What kind of controller do you have on these disks? DS4K?
  • SystemAdmin
    SystemAdmin
    2092 Posts

    Re: mmfsck errors

    ‏2008-08-28T20:20:30Z  
    • dlmcnabb
    • ‏2008-08-28T20:06:04Z
    What kind of controller do you have on these disks? DS4K?
    Hi Dan,

    No, we have a mixture of 24-bay and 16-bay storage units from Infortrend,Inc. who also make the F/C controllers on them.

    Jay
  • dlmcnabb
    dlmcnabb
    1012 Posts

    Re: mmfsck errors

    ‏2008-08-29T18:36:24Z  
    Hi Dan,

    No, we have a mixture of 24-bay and 16-bay storage units from Infortrend,Inc. who also make the F/C controllers on them.

    Jay
    You should open a PMR through IBM service.

    My first recommendation is to not run mmfsck right away until you can determine whether this is a permanent corruption on disk, or just bad data coming from the controller. (I have seen the latter over the last few weeks).

    Have you looked in the system errlogs for GPFS reported problems or other problems? On AIX gather all the "errpt -a" output, or on Linux /var/log/messages* files.

    If there are FSSTRUCT errors being reported it might help to see if all errors coming from some particular disk.
    Use /usr/lpp/mmfs/samples/debugtools/fsstructlx.awk on /var/log/messages* files to decode the FSSTRUCT sense data.