Topic
  • 2 replies
  • Latest Post - ‏2013-12-11T13:59:09Z by Amy.S
Amy.S
Amy.S
14 Posts

Pinned topic Node down when disk down?

‏2013-12-10T18:34:34Z |

I have a configuration with 2 nodes and a single disk attached directly to each node. On rebooting node mygpfs2, ./mmgetstate showed this node as down. mygpfs2 would not start on ./mmstartup. Investigating the log files under /var/adm/ras/ it seems related to a previously created NSD whose server was the mygpfs2 node. Does this mean that a node witha singly attached disk that is down, will also remain down? When I deleted the NSD and did mmstartup on mygpfs2, the node did start.

Module                  Size  Used by
mmfs26               1806448  0
mmfslinux             319559  1 mmfs26
tracedev               43757  2 mmfs26,mmfslinux
Mon Dec  9 17:35:30.561 2013: mmfsd initializing. {Version: 3.5.0.13   Built: Sep 30 2013 15:06:09} ...
Mon Dec  9 17:35:31.417 2013: The defined server mygpfs2 for NSD mynsd2 couldn't be resolved.
Mon Dec  9 17:35:31.426 2013: mmfsd is shutting down.
Mon Dec  9 17:35:31.427 2013: Reason for shutdown: Could not initialize network shared disks
Mon Dec  9 17:35:31 EST 2013: mmcommon mmfsdown invoked.  Subsystem: mmfs Status: active

Could you please help me understand the reason for the failure in resolution of the NSD to server,and what this has to do with starting the node?

 

thanks,

Amy

 

  •  
  • yuri
    yuri
    210 Posts

    Re: Node down when disk down?

    ‏2013-12-11T00:12:08Z  

    In this case mmfsd refuses to start because of an apparent DNS config problem.  GPFS hostnames must be static, so if you're using something like DDNS, that won't work well.

    More generally, this isn't a fault-tolerant config.  The default GPFS quorum model is "majority quorum": with N quorum nodes present, N/2 + 1 nodes must be up to reach quorum.  With 2 nodes, this means both nodes must be up, and if one node goes down, quorum is lost.  In order to have a fault-tolerant two-node cluster, one must use "minority quorum", i.e. define a tiebreaker disk, which must be visible (as a local block device) on both nodes.  This can't be done if each disk is only visible to one node.

    yuri

  • Amy.S
    Amy.S
    14 Posts

    Re: Node down when disk down?

    ‏2013-12-11T13:59:09Z  
    • yuri
    • ‏2013-12-11T00:12:08Z

    In this case mmfsd refuses to start because of an apparent DNS config problem.  GPFS hostnames must be static, so if you're using something like DDNS, that won't work well.

    More generally, this isn't a fault-tolerant config.  The default GPFS quorum model is "majority quorum": with N quorum nodes present, N/2 + 1 nodes must be up to reach quorum.  With 2 nodes, this means both nodes must be up, and if one node goes down, quorum is lost.  In order to have a fault-tolerant two-node cluster, one must use "minority quorum", i.e. define a tiebreaker disk, which must be visible (as a local block device) on both nodes.  This can't be done if each disk is only visible to one node.

    yuri

    Thank you Yuri. I was trying to explore various GPFS functionality using my toy cluster, but I suppose a more realistic setup would be beneficial for my purposes as well.