Topic
  • 7 replies
  • Latest Post - ‏2012-12-11T16:05:36Z by Just_Being_Frank
Just_Being_Frank
Just_Being_Frank
5 Posts

Pinned topic NSD in ready / unrecovered state, mmfsck runs clean

‏2012-08-13T17:58:13Z |
At some point, not sure when, we had one of two NSDs for device gpfs1 to get into this condition.

#lxvcm1-1> mmlsdisk gpfs1 -L
disk driver sector failure holds holds storage
name type size group metadata data status availability disk id pool remarks

gpfs1nsd nsd 512 4001 yes yes ready unrecovered 1 system desc

From a previous support call, we were told we needed to perform a mmfsck to clear this up most likely.

Over the last weekend we had a chance to take all 5 nodes down from our GPFS replication setup ( We are using two nodes as the main write only nodes and two for read only replicates and have tiebreaker disks and a tie breaker site node as well ).

Ran the mmfsck on our 2.3TB filesystem and it completed cleanly after about 25-30 minutes. Indicated no issues at all.

We brought everything back online and it still has the same availability showing as unrecovered. Trying to start it just fails with the same error as before:

#lxvcm1-1> mmchdisk gpfs1 start -d gpfs1nsd
Scanning file system metadata, phase 1 ...
Error migrating log.
Inconsistency in file system metadata.
Initial disk state was updated successfully, but another error may have changed the state again.
mmchdisk: Command failed. Examine previous error messages to determine cause.

Nothing good from the errpt output for us to use either. All indications from the DS5000 unit and the SAN are no issues, all is green.

Not a seasoned pro with GPFS, learning and willing student though... Any ideas on what to check next? Or any commands that would shed more light on potential issues?

Looked through the forum but it's hard to find a clear start point with so much information. I'm still going through the docs to learn more as well.

Thanks!

Frank
Updated on 2012-12-11T16:05:36Z at 2012-12-11T16:05:36Z by Just_Being_Frank
  • dlmcnabb
    dlmcnabb
    1012 Posts

    Re: NSD in ready / unrecovered state, mmfsck runs clean

    ‏2012-08-13T20:46:44Z  
    You need to run "mmchdisk $fsname start -a" to move unrecovered disks from unrecovered to up.
  • Just_Being_Frank
    Just_Being_Frank
    5 Posts

    Re: NSD in ready / unrecovered state, mmfsck runs clean

    ‏2012-08-13T21:09:06Z  
    • dlmcnabb
    • ‏2012-08-13T20:46:44Z
    You need to run "mmchdisk $fsname start -a" to move unrecovered disks from unrecovered to up.
    I get the same error if I run that or the command I mentioned.

    Scanning file system metadata, phase 1 ...
    Error migrating log.
    Inconsistency in file system metadata.
    Initial disk state was updated successfully, but another error may have changed the state again.
    mmchdisk: Command failed. Examine previous error messages to determine cause.

    So either one: mmchdisk gpfs1 start -a or mmchdisk gpfs1 -d gpfs1nsd

    retruns the same error - and again, mmfsck has run clean on the system and there are no errors in the error log at all...?

    I'm stumped, as it seems something is inconsistent with the metadata, or this is possibly a bug? It hasn't prevented us from using the filesystem and the disk from what I can tell... we hoped when we shutdown gpfs, rebooted all the nodes, ran the mmfsck, etc - it would go away, but no success.
  • dlmcnabb
    dlmcnabb
    1012 Posts

    Re: NSD in ready / unrecovered state, mmfsck runs clean

    ‏2012-08-14T06:07:47Z  
    I get the same error if I run that or the command I mentioned.

    Scanning file system metadata, phase 1 ...
    Error migrating log.
    Inconsistency in file system metadata.
    Initial disk state was updated successfully, but another error may have changed the state again.
    mmchdisk: Command failed. Examine previous error messages to determine cause.

    So either one: mmchdisk gpfs1 start -a or mmchdisk gpfs1 -d gpfs1nsd

    retruns the same error - and again, mmfsck has run clean on the system and there are no errors in the error log at all...?

    I'm stumped, as it seems something is inconsistent with the metadata, or this is possibly a bug? It hasn't prevented us from using the filesystem and the disk from what I can tell... we hoped when we shutdown gpfs, rebooted all the nodes, ran the mmfsck, etc - it would go away, but no success.
    There have been a few log migration fixes recently. Please upgrade to latest service level for your release.
  • FelipeKnop
    FelipeKnop
    33 Posts

    Re: NSD in ready / unrecovered state, mmfsck runs clean

    ‏2012-08-15T15:41:30Z  
    • dlmcnabb
    • ‏2012-08-14T06:07:47Z
    There have been a few log migration fixes recently. Please upgrade to latest service level for your release.
    This seems to match a problem which has been fixed in 3.4.0.15.

    Felipe
  • Just_Being_Frank
    Just_Being_Frank
    5 Posts

    Re: NSD in ready / unrecovered state, mmfsck runs clean

    ‏2012-08-15T17:01:27Z  
    • dlmcnabb
    • ‏2012-08-14T06:07:47Z
    There have been a few log migration fixes recently. Please upgrade to latest service level for your release.
    Thanks! We will try the upgrade on our next maintenance window in Sept.
  • Just_Being_Frank
    Just_Being_Frank
    5 Posts

    Re: NSD in ready / unrecovered state, mmfsck runs clean

    ‏2012-08-15T17:01:55Z  
    Thanks! We will try the upgrade on our next maintenance window in Sept.
    Thanks! We will try the upgrade on our next maintenance window in Sept.
  • Just_Being_Frank
    Just_Being_Frank
    5 Posts

    Re: NSD in ready / unrecovered state, mmfsck runs clean

    ‏2012-12-11T16:05:36Z  
    The answer to the issue was to upgrade to version 15 or above. We upgraded to fix version 17 this last weekend finally and was able to resolve the unrecovered disk problem that was really only a bug in the metadata logs. There was never a real issue with the data or metadata. However, keep in mind that the cluster will have issues with the filesystem if there are any issues with other disks in the filesystem that reaches a qourum majority! The filesystem will then be unusable. In our case there were only two disks and when the other disk had an issue, it took the filesystem down.

    Thanks for the info Dan McNabb and Dr. Felipe Knop!!