Topic
  • 12 replies
  • Latest Post - ‏2014-01-18T12:21:52Z by chr78
chr78
chr78
138 Posts

Pinned topic mmadddisk fails with err: 34

‏2013-12-15T12:48:43Z |

I am trying to understand a situation where mmadddisk fails with (6 nsds, for a single new pool)

Command: err 34: tsadddisk ...

Numerical result out of range

 

cheers.

Updated on 2013-12-15T12:49:11Z at 2013-12-15T12:49:11Z by chr78
  • ezhong
    ezhong
    33 Posts

    Re: mmadddisk fails with err: 34

    ‏2013-12-17T03:41:57Z  

    If you can recreate the problem, please open a PMR and  work with IBM Service to collect GPFS debug data covering the failure event. Then the debug data will be analyzed to help understand why the mmadddisk command failed in your case.

  • yuri
    yuri
    282 Posts

    Re: mmadddisk fails with err: 34

    ‏2013-12-17T17:17:35Z  

    When reporting a problem, it's always a good idea to supply what you deep inside already know will be asked for: debug data.  ERANGE (error 34) is far from being so unique that simply knowing that it happened is enough to pinpoint the cause.  Would you care to supply the descriptor file used with mmadddisk, the current fs layout (mmlsdisk and mmdf output), and mmadddisk console output?

    yuri

  • chr78
    chr78
    138 Posts

    Re: mmadddisk fails with err: 34

    ‏2013-12-17T20:22:48Z  
    • yuri
    • ‏2013-12-17T17:17:35Z

    When reporting a problem, it's always a good idea to supply what you deep inside already know will be asked for: debug data.  ERANGE (error 34) is far from being so unique that simply knowing that it happened is enough to pinpoint the cause.  Would you care to supply the descriptor file used with mmadddisk, the current fs layout (mmlsdisk and mmdf output), and mmadddisk console output?

    yuri

    Hi Yuri,

    you'll find the output attached (and, I also tried the very same

    using the new stanza format)

    cheers.

    Attachments

  • yuri
    yuri
    282 Posts

    Re: mmadddisk fails with err: 34

    ‏2013-12-17T23:14:28Z  
    • chr78
    • ‏2013-12-17T20:22:48Z

    Hi Yuri,

    you'll find the output attached (and, I also tried the very same

    using the new stanza format)

    cheers.

    What about the cluster config?  What version of the code is running, on which nodes?  In particular, what is running of the file system manager?  Anything else in fsmgr mmfs.log besides tsadddisk error code?

    yuri

  • chr78
    chr78
    138 Posts

    Re: mmadddisk fails with err: 34

    ‏2013-12-18T08:32:23Z  
    • yuri
    • ‏2013-12-17T23:14:28Z

    What about the cluster config?  What version of the code is running, on which nodes?  In particular, what is running of the file system manager?  Anything else in fsmgr mmfs.log besides tsadddisk error code?

    yuri

    the cluster (about 100 nodes) is at the 3.5.0.7 config level, runs the 3.5.0.10 code; the nodes having

    the new disks attached are already at 3.5.0.15. I have both the cluster and filesystem manager run on the node

    that runs mmadddisk. No further logs... I can easily create a new filesystem as long as I remove the pool name and the diskusage from the descriptor file.

    I will have a full downtime soon, and will completely upgrade to 3.5.0.15 then. If things don't change, I'll open a PMR.

    thanks again.

  • yuri
    yuri
    282 Posts

    Re: mmadddisk fails with err: 34

    ‏2013-12-18T17:17:59Z  
    • chr78
    • ‏2013-12-18T08:32:23Z

    the cluster (about 100 nodes) is at the 3.5.0.7 config level, runs the 3.5.0.10 code; the nodes having

    the new disks attached are already at 3.5.0.15. I have both the cluster and filesystem manager run on the node

    that runs mmadddisk. No further logs... I can easily create a new filesystem as long as I remove the pool name and the diskusage from the descriptor file.

    I will have a full downtime soon, and will completely upgrade to 3.5.0.15 then. If things don't change, I'll open a PMR.

    thanks again.

    Something clearly isn't working correctly: you descriptor file looks OK to me, but evidently it's not being interpreted correctly.  A PMR would be indeed appropriate.  We'd need to see a trace from fsmgr covering the failing command execution.  

    It's still not clear to me what level of code is running on the fsmgr node.  For troubleshooting purposes, one thing you can try is moving fsmgr temporarily to a node running a different service level of GPFS.

    yuri

  • chr78
    chr78
    138 Posts

    Re: mmadddisk fails with err: 34

    ‏2013-12-18T17:30:13Z  
    • yuri
    • ‏2013-12-18T17:17:59Z

    Something clearly isn't working correctly: you descriptor file looks OK to me, but evidently it's not being interpreted correctly.  A PMR would be indeed appropriate.  We'd need to see a trace from fsmgr covering the failing command execution.  

    It's still not clear to me what level of code is running on the fsmgr node.  For troubleshooting purposes, one thing you can try is moving fsmgr temporarily to a node running a different service level of GPFS.

    yuri

    I actually first tried with fsmgr on the .15 release and then moved it to a node with the .10 release. The problem

    exists in both cases (and in both cases the error is same) Nevertheless, let's see what happens once the cluster is completely on the .15 release next weekend.

    cheers.

  • chr78
    chr78
    138 Posts

    Re: mmadddisk fails with err: 34

    ‏2013-12-22T12:46:11Z  
    • chr78
    • ‏2013-12-18T17:30:13Z

    I actually first tried with fsmgr on the .15 release and then moved it to a node with the .10 release. The problem

    exists in both cases (and in both cases the error is same) Nevertheless, let's see what happens once the cluster is completely on the .15 release next weekend.

    cheers.

    even after bringing the whole cluster to the .15 release, mmadddisk failed. I tried to mmfsck the filesystem, but that

    was sitting stuck for more than 3 hours at 69%, so I decided to terminate the fsck. As additionial disk space was urgently

    needed, I tried add the new disks to the system pool and succeeded.

    Playing around with some spare disks in the cluster, all tries to add disks for a new pool fail.

    I'll collect debug data and open a PMR, but most likely not this year anymore.

    cheers.

  • GongWei@CN
    GongWei@CN
    37 Posts

    Re: mmadddisk fails with err: 34

    ‏2014-01-03T14:18:31Z  
    • chr78
    • ‏2013-12-22T12:46:11Z

    even after bringing the whole cluster to the .15 release, mmadddisk failed. I tried to mmfsck the filesystem, but that

    was sitting stuck for more than 3 hours at 69%, so I decided to terminate the fsck. As additionial disk space was urgently

    needed, I tried add the new disks to the system pool and succeeded.

    Playing around with some spare disks in the cluster, all tries to add disks for a new pool fail.

    I'll collect debug data and open a PMR, but most likely not this year anymore.

    cheers.

    Looks like that the file system was upgraded from a old version (10.01 (3.2.1.5)), and fastea feature is not eanbed. Not sure whether there is something related with it.

  • chr78
    chr78
    138 Posts

    Re: mmadddisk fails with err: 34

    ‏2014-01-07T09:46:57Z  

    Looks like that the file system was upgraded from a old version (10.01 (3.2.1.5)), and fastea feature is not eanbed. Not sure whether there is something related with it.

    I had a similar thought when I had the downtime, but enabling fast EAs didn't improve the situation.

    cheers.

  • yuri
    yuri
    282 Posts

    Re: mmadddisk fails with err: 34

    ‏2014-01-17T18:27:47Z  
    • chr78
    • ‏2013-12-22T12:46:11Z

    even after bringing the whole cluster to the .15 release, mmadddisk failed. I tried to mmfsck the filesystem, but that

    was sitting stuck for more than 3 hours at 69%, so I decided to terminate the fsck. As additionial disk space was urgently

    needed, I tried add the new disks to the system pool and succeeded.

    Playing around with some spare disks in the cluster, all tries to add disks for a new pool fail.

    I'll collect debug data and open a PMR, but most likely not this year anymore.

    cheers.

    My L2 colleague has pointed out that these symptoms were seen in another PMR.  The problem occurs of file systems that were created with an earlier version of GPFS and then migrated to 3.5.  GPFS code gets confused about the settings of some FPO-related parameters, because those are not explicitly set in this scenario (it's a bug).  Explicitly setting those parameters effectively fixes the problem.  In particular, the --block-group-factor setting could be problematic.  One can verify this by doing "mmlsfs fsname --snc".  If the "--block-group-factor" field is zero, run "mmchfs fsname --block-group-factor 1".  mmadddisk should work afterwards.

    yuri

  • chr78
    chr78
    138 Posts

    Re: mmadddisk fails with err: 34

    ‏2014-01-18T12:21:52Z  
    • yuri
    • ‏2014-01-17T18:27:47Z

    My L2 colleague has pointed out that these symptoms were seen in another PMR.  The problem occurs of file systems that were created with an earlier version of GPFS and then migrated to 3.5.  GPFS code gets confused about the settings of some FPO-related parameters, because those are not explicitly set in this scenario (it's a bug).  Explicitly setting those parameters effectively fixes the problem.  In particular, the --block-group-factor setting could be problematic.  One can verify this by doing "mmlsfs fsname --snc".  If the "--block-group-factor" field is zero, run "mmchfs fsname --block-group-factor 1".  mmadddisk should work afterwards.

    yuri

    thanks, Yuri - you're great as usual - that did the trick.

    cheers.