Topic
  • 10 replies
  • Latest Post - ‏2013-02-11T22:38:43Z by truongv
SystemAdmin
SystemAdmin
2092 Posts

Pinned topic mmcrfs is failing with "No such device", and "Error accessing disks"

‏2013-01-30T19:15:47Z |
Hey all..I'm pretty new to GPFS so bear with me.

I'm trying to set up a 2 node cluster (both quorum), both machines have the OS installed on sda and a free disk on sdb, pretty simple right? I've been using guides such as this http://www.ibm.com/developerworks/wikis/display/hpccentral/gpfs+quick+start+guide+for+linux and everything is going well right up until I need to run mmcrfs. Both machines have passwordless ssh access set up and I've even tried chmod'ing the block device under /dev (I don't know if this was a good idea or not..)

Anyway, here is the output of anything that may help. I've been on this for about 2 days with no luck so anything is a huge help. I hope it's just some simple issue.

Output of mmcrfs

mmcrfs gpfs1 -F diskdef.txt -A yes -T /gpfs Unable to open disk 
'gpfs4nsd' on node ITE5. No such device Error accessing disks. mmcrfs: tscrfs failed.  Cannot create gpfs1 mmcrfs: Command failed.  Examine previous error messages to determine cause.


Output of mmlsnsd -M

Disk name    NSD volume ID      Device         Node name Remarks --------------------------------------------------------------------------------------- gpfs3nsd     C0A87A1A5106F8BB   /dev/sdb       ITE5                     server node gpfs4nsd     000000005106F8A6   /dev/sdb       ITE6                     server node


Output of tspreparedisk -S from both nodes

C0A87A1A5106F8BB /dev/sdb generic tspreparedisk:0::::0:0::   000000005106F8A6 /dev/sdb generic tspreparedisk:0::::0:0::


Output of diskdef.txt

# /dev/sdb:ITE5:::: gpfs3nsd:::dataAndMetadata:4001::system # /dev/sdb:ITE6:::: gpfs4nsd:::dataAndMetadata:4002::system


That's all I can think of. If anything else is needed please let me know.
Updated on 2013-02-11T22:38:43Z at 2013-02-11T22:38:43Z by truongv
  • SystemAdmin
    SystemAdmin
    2092 Posts

    Re: mmcrfs is failing with "No such device", and "Error accessing disks"

    ‏2013-01-30T19:22:01Z  
    Is GPFS actually up on both nodes, per mmgetstate -a?

    Whenever something goes wrong with GPFS, the first place you need to look is GPFS log, /var/adm/ras/mmfs.log.latest, on all nodes involved. Attach logs here when asking for help.

    yuri
  • SystemAdmin
    SystemAdmin
    2092 Posts

    Re: mmcrfs is failing with "No such device", and "Error accessing disks"

    ‏2013-01-30T19:26:34Z  
    Is GPFS actually up on both nodes, per mmgetstate -a?

    Whenever something goes wrong with GPFS, the first place you need to look is GPFS log, /var/adm/ras/mmfs.log.latest, on all nodes involved. Attach logs here when asking for help.

    yuri
    Hi there, here is the info you need.

    Output of mmgetstate -a
    
    Node number  Node name        GPFS state ------------------------------------------ 1      ITE5             active 2      ITE6             active
    


    Most recent entries in mmfs.log.latest
    
    Wed Jan 30 14:09:38.778 2013: Node 192.168.122.26 (ITE5) appointed as manager 
    
    for gpfs1. Wed Jan 30 14:09:38.779 2013: Command: tscrfs /dev/gpfs1 -F /var/mmfs/tmp/tsddFile.mmcrfs.4134 -I 16384 -i 512 -M 2 -n 32 -R 2 -w 0 Wed Jan 30 14:09:39.621 2013: Command: err 19: tscrfs /dev/gpfs1 -F /var/mmfs/tmp/tsddFile.mmcrfs.4134 -I 16384 -i 512 -M 2 -n 32 -R 2 -w 0 Wed Jan 30 14:09:39.622 2013: No such device Wed Jan 30 14:09:39.625 2013: Node 192.168.122.26 (ITE5) resigned as manager 
    
    for gpfs1. Wed Jan 30 14:09:39.626 2013: File system has been deleted.
    
  • SystemAdmin
    SystemAdmin
    2092 Posts

    Re: mmcrfs is failing with "No such device", and "Error accessing disks"

    ‏2013-01-30T19:32:35Z  
    Hi there, here is the info you need.

    Output of mmgetstate -a
    <pre class="jive-pre"> Node number Node name GPFS state ------------------------------------------ 1 ITE5 active 2 ITE6 active </pre>

    Most recent entries in mmfs.log.latest
    <pre class="jive-pre"> Wed Jan 30 14:09:38.778 2013: Node 192.168.122.26 (ITE5) appointed as manager for gpfs1. Wed Jan 30 14:09:38.779 2013: Command: tscrfs /dev/gpfs1 -F /var/mmfs/tmp/tsddFile.mmcrfs.4134 -I 16384 -i 512 -M 2 -n 32 -R 2 -w 0 Wed Jan 30 14:09:39.621 2013: Command: err 19: tscrfs /dev/gpfs1 -F /var/mmfs/tmp/tsddFile.mmcrfs.4134 -I 16384 -i 512 -M 2 -n 32 -R 2 -w 0 Wed Jan 30 14:09:39.622 2013: No such device Wed Jan 30 14:09:39.625 2013: Node 192.168.122.26 (ITE5) resigned as manager for gpfs1. Wed Jan 30 14:09:39.626 2013: File system has been deleted. </pre>
    Sorry, overlooked this in the original post: it looks like you don't have NSD servers defined for either NSD, which GPFS takes to mean that both disks are visible directly on both nodes (as would be the case had you had a SAN). Use mmchnsd or mmdelnsd/mmcrnsd to re-define NSDs with a primary server.

    yuri
  • SystemAdmin
    SystemAdmin
    2092 Posts

    Re: mmcrfs is failing with "No such device", and "Error accessing disks"

    ‏2013-01-30T19:37:57Z  
    Sorry, overlooked this in the original post: it looks like you don't have NSD servers defined for either NSD, which GPFS takes to mean that both disks are visible directly on both nodes (as would be the case had you had a SAN). Use mmchnsd or mmdelnsd/mmcrnsd to re-define NSDs with a primary server.

    yuri
    I'm not quite sure what you mean..here is some output from mmlscluster, it says ITE5 is the primary node with ITE6 being a secondary node. Is this different than a primary NSD server? If I indeed have a primary and secondary set up, is something wrong in maybe diskdef.txt?
    
    GPFS cluster information ======================== GPFS cluster name:         ITE5 GPFS cluster id:           13882480104816360238 GPFS UID domain: ITE5 Remote shell command:      /usr/bin/ssh Remote file copy command:  /usr/bin/scp   GPFS cluster configuration servers: ----------------------------------- Primary server:    ITE5 Secondary server:  ITE6   Node  Daemon node name            IP address       Admin node name             Designation ----------------------------------------------------------------------------------------------- 1   ITE5                        192.168.122.26   ITE5                        quorum 2   ITE6                        192.168.122.25   ITE6                        quorum
    
  • SystemAdmin
    SystemAdmin
    2092 Posts

    Re: mmcrfs is failing with "No such device", and "Error accessing disks"

    ‏2013-01-30T20:01:35Z  
    I'm not quite sure what you mean..here is some output from mmlscluster, it says ITE5 is the primary node with ITE6 being a secondary node. Is this different than a primary NSD server? If I indeed have a primary and secondary set up, is something wrong in maybe diskdef.txt?
    <pre class="jive-pre"> GPFS cluster information ======================== GPFS cluster name: ITE5 GPFS cluster id: 13882480104816360238 GPFS UID domain: ITE5 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: ITE5 Secondary server: ITE6 Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------------------------------- 1 ITE5 192.168.122.26 ITE5 quorum 2 ITE6 192.168.122.25 ITE6 quorum </pre>
    Primary/secondary configuration server nodes listed by mmlscluster are entirely unrelated to NSD server definitions. You need to specify an NSD server when creating an NSD for a disk device that is only visible from some but not all nodes, using the second field of the NSD descriptor. Please see 'mmcrnsd' man page.

    yuri
  • truongv
    truongv
    81 Posts

    Re: mmcrfs is failing with "No such device", and "Error accessing disks"

    ‏2013-01-30T20:02:33Z  
    I'm not quite sure what you mean..here is some output from mmlscluster, it says ITE5 is the primary node with ITE6 being a secondary node. Is this different than a primary NSD server? If I indeed have a primary and secondary set up, is something wrong in maybe diskdef.txt?
    <pre class="jive-pre"> GPFS cluster information ======================== GPFS cluster name: ITE5 GPFS cluster id: 13882480104816360238 GPFS UID domain: ITE5 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: ITE5 Secondary server: ITE6 Node Daemon node name IP address Admin node name Designation ----------------------------------------------------------------------------------------------- 1 ITE5 192.168.122.26 ITE5 quorum 2 ITE6 192.168.122.25 ITE6 quorum </pre>
    What Yuri means is your mmlsnsd output should look similar to the below:
    
    File system   Disk name    NSD servers --------------------------------------------------------------------------- gpfs1         gpfs3nsd     ITE5 gpfs1         gpfs4nsd     ITE6
    

    You can use mmchnsd command to change the NSD server
    
    mmchnsd 
    "gpfs3nsd:ITE5;gpfs4:ITE6"
    
  • SystemAdmin
    SystemAdmin
    2092 Posts

    Re: mmcrfs is failing with "No such device", and "Error accessing disks"

    ‏2013-01-30T20:22:57Z  
    • truongv
    • ‏2013-01-30T20:02:33Z
    What Yuri means is your mmlsnsd output should look similar to the below:
    <pre class="jive-pre"> File system Disk name NSD servers --------------------------------------------------------------------------- gpfs1 gpfs3nsd ITE5 gpfs1 gpfs4nsd ITE6 </pre>
    You can use mmchnsd command to change the NSD server
    <pre class="jive-pre"> mmchnsd "gpfs3nsd:ITE5;gpfs4:ITE6" </pre>
    Thanks for your help everyone but I still seem to be hitting the same issue..after running
    
    mmchnsd 
    "gpfs3nsd:ITE5;gpfs4nsd:ITE6"
    

    as truongv suggested I tried to recreate the filesystem but I am receiving the same error..my first thought was that the diskdef.txt file might be wrong (since some config was changed), but even with the NSD servers in diskdef.txt it's still trying to open gpfs4nsd on ITE5 which is obviously impossible.

    Here is the output of mmlsnsd after the NSDs have been given appropriate servers:
    
    File system   Disk name    NSD servers --------------------------------------------------------------------------- (free disk)   gpfs3nsd     ITE5 (free disk)   gpfs4nsd     ITE6
    


    Here is once again the error I'm getting, followed by the most recent entry in mmfs.log.latest
    
    mmcrfs gpfs1 -F diskdef.txt -A yes -T /gpfs Unable to open disk 
    'gpfs4nsd' on node ITE5. No such device Error accessing disks. mmcrfs: tscrfs failed.  Cannot create gpfs1 mmcrfs: Command failed.  Examine previous error messages to determine cause.
    


    
    Wed Jan 30 15:21:22.489 2013: Node 192.168.122.26 (ITE5) appointed as manager 
    
    for gpfs1. Wed Jan 30 15:21:22.490 2013: Command: tscrfs /dev/gpfs1 -F /var/mmfs/tmp/tsddFile.mmcrfs.6607 -I 16384 -i 512 -M 2 -n 32 -R 2 -w 0 Wed Jan 30 15:21:23.329 2013: Command: err 19: tscrfs /dev/gpfs1 -F /var/mmfs/tmp/tsddFile.mmcrfs.6607 -I 16384 -i 512 -M 2 -n 32 -R 2 -w 0 Wed Jan 30 15:21:23.330 2013: No such device Wed Jan 30 15:21:23.333 2013: Node 192.168.122.26 (ITE5) resigned as manager 
    
    for gpfs1. Wed Jan 30 15:21:23.334 2013: File system has been deleted.
    


    What I think is weird is that the log seems to be complaining about no such device "gpfs1" existing when the man page for mmcrfs specifically says that the device can not already exist under /dev..maybe I'm reading it wrong.
  • truongv
    truongv
    81 Posts

    Re: mmcrfs is failing with "No such device", and "Error accessing disks"

    ‏2013-01-30T22:34:15Z  
    Thanks for your help everyone but I still seem to be hitting the same issue..after running
    <pre class="jive-pre"> mmchnsd "gpfs3nsd:ITE5;gpfs4nsd:ITE6" </pre>
    as truongv suggested I tried to recreate the filesystem but I am receiving the same error..my first thought was that the diskdef.txt file might be wrong (since some config was changed), but even with the NSD servers in diskdef.txt it's still trying to open gpfs4nsd on ITE5 which is obviously impossible.

    Here is the output of mmlsnsd after the NSDs have been given appropriate servers:
    <pre class="jive-pre"> File system Disk name NSD servers --------------------------------------------------------------------------- (free disk) gpfs3nsd ITE5 (free disk) gpfs4nsd ITE6 </pre>

    Here is once again the error I'm getting, followed by the most recent entry in mmfs.log.latest
    <pre class="jive-pre"> mmcrfs gpfs1 -F diskdef.txt -A yes -T /gpfs Unable to open disk 'gpfs4nsd' on node ITE5. No such device Error accessing disks. mmcrfs: tscrfs failed. Cannot create gpfs1 mmcrfs: Command failed. Examine previous error messages to determine cause. </pre>

    <pre class="jive-pre"> Wed Jan 30 15:21:22.489 2013: Node 192.168.122.26 (ITE5) appointed as manager for gpfs1. Wed Jan 30 15:21:22.490 2013: Command: tscrfs /dev/gpfs1 -F /var/mmfs/tmp/tsddFile.mmcrfs.6607 -I 16384 -i 512 -M 2 -n 32 -R 2 -w 0 Wed Jan 30 15:21:23.329 2013: Command: err 19: tscrfs /dev/gpfs1 -F /var/mmfs/tmp/tsddFile.mmcrfs.6607 -I 16384 -i 512 -M 2 -n 32 -R 2 -w 0 Wed Jan 30 15:21:23.330 2013: No such device Wed Jan 30 15:21:23.333 2013: Node 192.168.122.26 (ITE5) resigned as manager for gpfs1. Wed Jan 30 15:21:23.334 2013: File system has been deleted. </pre>

    What I think is weird is that the log seems to be complaining about no such device "gpfs1" existing when the man page for mmcrfs specifically says that the device can not already exist under /dev..maybe I'm reading it wrong.
    I would try:
    dd read the raw device /dev/sdb make sure you can see the disk
    create the filesystem on just one disk and see which one is good/bad
    run each command on the respective NSD server to see if it makes any difference
    Do you have /var/mmfs/etc/nsddevices user exit script?
    Can you show the output of mmdevdiscover on both node? Also /var/mmfs/gen/mmsdrfs file.
  • SystemAdmin
    SystemAdmin
    2092 Posts

    Re: mmcrfs is failing with "No such device", and "Error accessing disks"

    ‏2013-02-11T16:29:07Z  
    • truongv
    • ‏2013-01-30T22:34:15Z
    I would try:
    dd read the raw device /dev/sdb make sure you can see the disk
    create the filesystem on just one disk and see which one is good/bad
    run each command on the respective NSD server to see if it makes any difference
    Do you have /var/mmfs/etc/nsddevices user exit script?
    Can you show the output of mmdevdiscover on both node? Also /var/mmfs/gen/mmsdrfs file.
    Hey there, sorry for the super late response. I was sick recently.

    I managed to create a filesystem on one node, I then tried adding the disk from the other node to the filesystem but it's failing in a similar way.

    
    Unable to open disk 
    'gpfs4nsd' on node ITE5. No such device Error processing disks. mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: Propagating the cluster configuration data to all affected nodes.  This is an asynchronous process. mmadddisk: Command failed.  Examine previous error messages to determine cause.
    


    What I find weird is the diskdef.txt file doesn't say gpfs4nsd is even on ITE5, i.e, it says:
    
    gpfs4nsd:ITE6::dataAndMetadata:4002::system
    


    If I try running the disk add command from the other server it fails to obtain a lock from ITE5.

    I can dd read the raw devices, in fact I've verified that the gpfs tag thing is written to the second sector (like it says in the documentation).

    There is no nsddevices exit script.

    mmdevdiscover on ITE5
    
    sdb generic sda generic sda1 generic
    


    mmdevdiscover on ITE6
    
    sda generic sda1 generic sdb generic
    


    I don't know if it's still helpful, but here is the contents of the mmsdrfs file
    
    %%9999%%:00_VERSION_LINE::1210:3:13::lc:ITE5:ITE6:4:/usr/bin/ssh:/usr/bin/scp:13882480104816360238:lc2:1359067957::ITE5:0:0:0:0::::central:0.0: %%home%%:03_COMMENT::1: %%home%%:03_COMMENT::2:    This is a machine generated file.  Do not edit! %%home%%:03_COMMENT::3: %%home%%:03_COMMENT::4:2013.01.24.17.52.37:2:1:mmcrcluster -N ITE5:quorum,ITE6:quorum -p ITE5 -s ITE6 -r /usr/bin/ssh -R /usr/bin/scp %%home%%:03_COMMENT::5:2013.01.24.17.54.15:3:1:mmchlicense server --accept -N ITE5,ITE6 %%home%%:03_COMMENT::6:2013.01.24.18.13.26:4:1:mmcrnsd -F /root/diskdef.txt %%home%%:03_COMMENT::7:2013.01.28.16.49.40:5:1:mmdelnsd gpfs1nsd %%home%%:03_COMMENT::8:2013.01.28.16.49.56:6:1:mmdelnsd gpfs2nsd %%home%%:03_COMMENT::9:2013.01.28.17.16.30:7:1:mmcrnsd -F diskdef.txt %%home%%:03_COMMENT::10:2013.01.30.15.08.00:8:1:mmchnsd gpfs3nsd:ITE5;gpfs4nsd:ITE6 %%home%%:03_COMMENT::11:2013.01.30.18.16.56:9:1:mmcrfs gpfs1 -F diskdef.txt -A yes -T /gpfs %%home%%:03_COMMENT::12:2013.01.30.20.37.01:10:1:mmadddisk gpfs1 -F diskdef.txt:pre-commit %%home%%:03_COMMENT::13:2013.01.30.20.37.11:11:1:mmadddisk gpfs1 -F diskdef.txt:ts_failed %%home%%:03_COMMENT::14:2013.02.11.11.21.57:12:1:mmadddisk gpfs1 -F diskdef.txt:pre-commit %%home%%:03_COMMENT::15:2013.02.11.11.22.09:13:1:mmadddisk gpfs1 -F diskdef.txt:ts_failed %%home%%:10_NODESET_HDR:::2:TCP::1191::::1214:1214:L:::::::::::::: %%home%%:20_MEMBER_NODE::1:1:ITE5:192.168.122.26:ITE5:client::::::ITE5:ITE5:1214:3.4.0.11:Linux:Q::::::server:: %%home%%:20_MEMBER_NODE::2:2:ITE6:192.168.122.25:ITE6:client::::::ITE6:ITE6:1214:3.4.0.11:Linux:Q::::::server:: %%home%%:70_MMFSCFG::1:#   ::::::::::::::::::::::: %%home%%:70_MMFSCFG::2:#   WARNING:   This is a machine generated file.  Do not edit!      :::::::::::::::::::::: %%home%%:70_MMFSCFG::3:#   Use the mmchconfig command to change configuration parameters.  ::::::::::::::::::::::: %%home%%:70_MMFSCFG::4:#   ::::::::::::::::::::::: %%home%%:70_MMFSCFG::5:clusterName ITE5::::::::::::::::::::::: %%home%%:70_MMFSCFG::6:clusterId 13882480104816360238::::::::::::::::::::::: %%home%%:70_MMFSCFG::7:autoload no::::::::::::::::::::::: %%home%%:70_MMFSCFG::8:minReleaseLevel 1210 3.4.0.7::::::::::::::::::::::: %%home%%:70_MMFSCFG::9:dmapiFileHandleSize 32::::::::::::::::::::::: %%home%%:30_SG_HEADR:gpfs1::150:no:::0::::no::::::::::::::: %%home%%:40_SG_ETCFS:gpfs1:1:%2Fgpfs: %%home%%:40_SG_ETCFS:gpfs1:2:   dev             = /dev/gpfs1 %%home%%:40_SG_ETCFS:gpfs1:3:   vfs             = mmfs %%home%%:40_SG_ETCFS:gpfs1:4:   nodename        = - %%home%%:40_SG_ETCFS:gpfs1:5:   mount           = mmfs %%home%%:40_SG_ETCFS:gpfs1:6:   type            = mmfs %%home%%:40_SG_ETCFS:gpfs1:7:   account         = 
    
    false %%home%%:50_SG_MOUNT:gpfs1::rw:mtime:atime::::::::::::::::::::: %%home%%:60_SG_DISKS:gpfs1:1:gpfs3nsd:976773168:4001:dataAndMetadata:C0A87A1A5106F8BB:nsd:ITE5::other::generic:cmd::::ready::system:ITE5::::: ~%BBBB%%:60_SG_DISKS:~/~:0:gpfs4nsd:976773168:4002:dataAndMetadata:000000005106F8A6:nsd:ITE6::other::generic:cmd::::::system:ITE6:::::
    
  • truongv
    truongv
    81 Posts

    Re: mmcrfs is failing with "No such device", and "Error accessing disks"

    ‏2013-02-11T22:38:43Z  
    Hey there, sorry for the super late response. I was sick recently.

    I managed to create a filesystem on one node, I then tried adding the disk from the other node to the filesystem but it's failing in a similar way.

    <pre class="jive-pre"> Unable to open disk 'gpfs4nsd' on node ITE5. No such device Error processing disks. mmadddisk: tsadddisk failed. Verifying file system configuration information ... mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. mmadddisk: Command failed. Examine previous error messages to determine cause. </pre>

    What I find weird is the diskdef.txt file doesn't say gpfs4nsd is even on ITE5, i.e, it says:
    <pre class="jive-pre"> gpfs4nsd:ITE6::dataAndMetadata:4002::system </pre>

    If I try running the disk add command from the other server it fails to obtain a lock from ITE5.

    I can dd read the raw devices, in fact I've verified that the gpfs tag thing is written to the second sector (like it says in the documentation).

    There is no nsddevices exit script.

    mmdevdiscover on ITE5
    <pre class="jive-pre"> sdb generic sda generic sda1 generic </pre>

    mmdevdiscover on ITE6
    <pre class="jive-pre"> sda generic sda1 generic sdb generic </pre>

    I don't know if it's still helpful, but here is the contents of the mmsdrfs file
    <pre class="jive-pre"> %%9999%%:00_VERSION_LINE::1210:3:13::lc:ITE5:ITE6:4:/usr/bin/ssh:/usr/bin/scp:13882480104816360238:lc2:1359067957::ITE5:0:0:0:0::::central:0.0: %%home%%:03_COMMENT::1: %%home%%:03_COMMENT::2: This is a machine generated file. Do not edit! %%home%%:03_COMMENT::3: %%home%%:03_COMMENT::4:2013.01.24.17.52.37:2:1:mmcrcluster -N ITE5:quorum,ITE6:quorum -p ITE5 -s ITE6 -r /usr/bin/ssh -R /usr/bin/scp %%home%%:03_COMMENT::5:2013.01.24.17.54.15:3:1:mmchlicense server --accept -N ITE5,ITE6 %%home%%:03_COMMENT::6:2013.01.24.18.13.26:4:1:mmcrnsd -F /root/diskdef.txt %%home%%:03_COMMENT::7:2013.01.28.16.49.40:5:1:mmdelnsd gpfs1nsd %%home%%:03_COMMENT::8:2013.01.28.16.49.56:6:1:mmdelnsd gpfs2nsd %%home%%:03_COMMENT::9:2013.01.28.17.16.30:7:1:mmcrnsd -F diskdef.txt %%home%%:03_COMMENT::10:2013.01.30.15.08.00:8:1:mmchnsd gpfs3nsd:ITE5;gpfs4nsd:ITE6 %%home%%:03_COMMENT::11:2013.01.30.18.16.56:9:1:mmcrfs gpfs1 -F diskdef.txt -A yes -T /gpfs %%home%%:03_COMMENT::12:2013.01.30.20.37.01:10:1:mmadddisk gpfs1 -F diskdef.txt:pre-commit %%home%%:03_COMMENT::13:2013.01.30.20.37.11:11:1:mmadddisk gpfs1 -F diskdef.txt:ts_failed %%home%%:03_COMMENT::14:2013.02.11.11.21.57:12:1:mmadddisk gpfs1 -F diskdef.txt:pre-commit %%home%%:03_COMMENT::15:2013.02.11.11.22.09:13:1:mmadddisk gpfs1 -F diskdef.txt:ts_failed %%home%%:10_NODESET_HDR:::2:TCP::1191::::1214:1214:L:::::::::::::: %%home%%:20_MEMBER_NODE::1:1:ITE5:192.168.122.26:ITE5:client::::::ITE5:ITE5:1214:3.4.0.11:Linux:Q::::::server:: %%home%%:20_MEMBER_NODE::2:2:ITE6:192.168.122.25:ITE6:client::::::ITE6:ITE6:1214:3.4.0.11:Linux:Q::::::server:: %%home%%:70_MMFSCFG::1:# ::::::::::::::::::::::: %%home%%:70_MMFSCFG::2:# WARNING: This is a machine generated file. Do not edit! :::::::::::::::::::::: %%home%%:70_MMFSCFG::3:# Use the mmchconfig command to change configuration parameters. ::::::::::::::::::::::: %%home%%:70_MMFSCFG::4:# ::::::::::::::::::::::: %%home%%:70_MMFSCFG::5:clusterName ITE5::::::::::::::::::::::: %%home%%:70_MMFSCFG::6:clusterId 13882480104816360238::::::::::::::::::::::: %%home%%:70_MMFSCFG::7:autoload no::::::::::::::::::::::: %%home%%:70_MMFSCFG::8:minReleaseLevel 1210 3.4.0.7::::::::::::::::::::::: %%home%%:70_MMFSCFG::9:dmapiFileHandleSize 32::::::::::::::::::::::: %%home%%:30_SG_HEADR:gpfs1::150:no:::0::::no::::::::::::::: %%home%%:40_SG_ETCFS:gpfs1:1:%2Fgpfs: %%home%%:40_SG_ETCFS:gpfs1:2: dev = /dev/gpfs1 %%home%%:40_SG_ETCFS:gpfs1:3: vfs = mmfs %%home%%:40_SG_ETCFS:gpfs1:4: nodename = - %%home%%:40_SG_ETCFS:gpfs1:5: mount = mmfs %%home%%:40_SG_ETCFS:gpfs1:6: type = mmfs %%home%%:40_SG_ETCFS:gpfs1:7: account = false %%home%%:50_SG_MOUNT:gpfs1::rw:mtime:atime::::::::::::::::::::: %%home%%:60_SG_DISKS:gpfs1:1:gpfs3nsd:976773168:4001:dataAndMetadata:C0A87A1A5106F8BB:nsd:ITE5::other::generic:cmd::::ready::system:ITE5::::: ~%BBBB%%:60_SG_DISKS:~/~:0:gpfs4nsd:976773168:4002:dataAndMetadata:000000005106F8A6:nsd:ITE6::other::generic:cmd::::::system:ITE6::::: </pre>
    The NSD id of gpfs4nsd doesn't look right. The first part should contains IP address instead of 0s.
    
    gpfs4nsd     000000005106F8A6   /dev/sdb       ITE6                     server node
    

    However, this doesn't explain why it ITE5 can't access the ITE6's NSD disk, gpfs4nsd. I suggest you to start everything from scratch but follow everything from the book this time. Make sure your hostnames resolve. Check /etc/hosts or DNS on all nodes. Since you can't get obtain a lock, make sure you don't have any issue with ssh/rsh.