Topic
IC4NOTICE: developerWorks Community will be offline May 29-30, 2015 while we upgrade to the latest version of IBM Connections. For more information, read our upgrade FAQ.
1 reply Latest Post - ‏2013-05-23T05:50:07Z by ScaleoutSean
Theeraph
Theeraph
110 Posts
ACCEPTED ANSWER

Pinned topic Failed to read a file system descriptor - input/output error

‏2013-05-10T09:01:50Z |

Hi,
.
A Linux customer has power failure on GPFS nodes and in the end they have to reinstall 2 nodes (NSD server nodes, nsd1 and nsd2).
.
[root@nsd1 ~]# rpm -qa|grep gpfs
gpfs.base-3.4.0-11.x86_64
gpfs.src-3.4.0-0.noarch
gpfs.msg.en_US-3.4.0-11.noarch
gpfs.docs-3.4.0-11.noarch
gpfs.libsrc-3.4.0-0.noarch
gpfs.gplbin-2.6.32-131.0.15.el6.x86_64-3.4.0-11.x86_64
gpfs.gpl-3.4.0-11.noarch
[root@nsd1 ~]#
[root@nsd1 ~]# mmgetstate -La

 Node number  Node name       Quorum  Nodes up  Total nodes  GPFS state  Remarks
------------------------------------------------------------------------------------
       1      nsd1               2        3         12       active      quorum node
       2      nsd2               2        3         12       active      quorum node
       5      sut-kvm01          2        3         12       active      quorum node
       6      sut-kvm02          2        3         12       active
       7      sut-kvm03          2        3         12       active
       8      sut-kvm04          2        3         12       active
       9      sut-kvm05          2        3         12       active
      10      sut-kvm06          2        3         12       active
      11      sut-kvm07          2        3         12       active
      12      sut-kvm08          2        3         12       active
      14      suthpcc1           2        3         12       active
      15      se                 0        0         12       unknown
.
They were not able to mount the file system.  Most commands we run will give this message:
.
[root@nsd1 ~]# mmlsnsd -f sut

 File system   Disk name    NSD servers
---------------------------------------------------------------------------
 sut           nsd_sut01    nsd1,nsd2
 sut           nsd_sut02    nsd2,nsd1
 sut           nsd_sut03    nsd1,nsd2
 sut           nsd_sut04    nsd2,nsd1
[root@nsd1 ~]# mmlsdisk sut
Failed to read a file system descriptor.
Input/output error
mmlsdisk: Command failed.  Examine previous error messages to determine cause.
[root@nsd1 ~]# mmdf sut
Failed to read a file system descriptor.
Input/output error
mmdf: Command failed.  Examine previous error messages to determine cause.
[root@nsd1 ~]# mmlsfs sut
Failed to read a file system descriptor.
Input/output error
mmlsfs: Command failed.  Examine previous error messages to determine cause.
.
I searched the problem database and found 2 low level commands that read the descriptor from the disk:
.
A. mmfsadm test readdesc <nsdname> (read descriptor)
.
[root@nsd1 ~]# mmfsadm test readdesc nsd_sut01
reading descriptor from nsd_sut01 ...
error 19 trying to read descriptor.
[root@nsd1 ~]# mmfsadm test readdesc nsd_sut02
reading descriptor from nsd_sut02 ...
error 19 trying to read descriptor.
[root@nsd1 ~]# mmfsadm test readdesc nsd_sut03
reading descriptor from nsd_sut03 ...
error 19 trying to read descriptor.
[root@nsd1 ~]# mmfsadm test readdesc nsd_sut04
reading descriptor from nsd_sut04 ...
error 19 trying to read descriptor.
.
(In AIX errno 19 is ENODEV.)
.
B. mmfsadm test readdescraw /dev/xxx (read raw descriptor - sda to sdd maps to nsd_sut01 to 04)
.
[root@nsd1 ~]# mmfsadm test readdescraw /dev/sda
No NSD descriptor in sector 2 of /dev/sda
No Disk descriptor in sector 1 of /dev/sda
No FS descriptor in sector 8 of /dev/sda
[root@nsd1 ~]# mmfsadm test readdescraw /dev/sdb
No NSD descriptor in sector 2 of /dev/sdb
No Disk descriptor in sector 1 of /dev/sdb
No FS descriptor in sector 8 of /dev/sdb
[root@nsd1 ~]# mmfsadm test readdescraw /dev/sdc
No NSD descriptor in sector 2 of /dev/sdc
No Disk descriptor in sector 1 of /dev/sdc
No FS descriptor in sector 8 of /dev/sdc
[root@nsd1 ~]# mmfsadm test readdescraw /dev/sdd
No NSD descriptor in sector 2 of /dev/sdd
No Disk descriptor in sector 1 of /dev/sdd
No FS descriptor in sector 8 of /dev/sdd
.
These does not look good, so I tried to export / import the file systems:
.
[root@nsd1 ~]# mmexportfs all -o /var/exportfs_all.out

mmexportfs: Processing file system alice ...

mmexportfs: Processing file system sut ...

mmexportfs: Processing file system vm_images ...
mmexportfs: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
.
[root@nsd1 ~]# mmlsnsd
mmlsnsd: No disks were found.

[root@nsd1 ~]# mmlsconfig
Configuration data for cluster sutcluster.nsd1:
-----------------------------------------------
myNodeConfigNumber 1
clusterName sutcluster.nsd1
clusterId 13882346956505965919
autoload yes
uidDomain sutdomain
minReleaseLevel 3.4.0.7
dmapiFileHandleSize 32
pagepool 16g
maxMBpS 6144
prefetchThreads 400
worker1Threads 1000
maxFilesToCache 2000
maxblocksize 4m
adminMode central

File systems in cluster sutcluster.nsd1:
----------------------------------------
(none)
.
[root@nsd1 gen]# mmimportfs all -i /var/exportfs_all.out

mmimportfs: Processing file system alice ...
mmimportfs: Processing disk nsd_alice01
mmimportfs: Processing disk nsd_alice02
mmimportfs: Processing disk nsd_alice03
mmimportfs: Processing disk nsd_alice04
mmimportfs: Processing disk nsd_alice05
mmimportfs: Processing disk nsd_alice06
mmimportfs: Processing disk nsd_alice07
mmimportfs: Processing disk nsd_alice08
mmimportfs: Processing disk nsd_alice09
mmimportfs: Processing disk nsd_alice10
mmimportfs: Processing disk nsd_alice11
mmimportfs: Processing disk nsd_alice12
mmimportfs: Processing disk nsd_alice13
mmimportfs: Processing disk nsd_alice14
mmimportfs: Processing disk nsd_alice15
mmimportfs: Processing disk nsd_alice16
mmimportfs: Processing disk nsd_alice17
mmimportfs: Processing disk nsd_alice18
mmimportfs: Processing disk nsd_alice19
mmimportfs: Processing disk nsd_alice20

mmimportfs: Processing file system sut ...
mmimportfs: Processing disk nsd_sut01
mmimportfs: Processing disk nsd_sut02
mmimportfs: Processing disk nsd_sut03
mmimportfs: Processing disk nsd_sut04

mmimportfs: Processing file system vm_images ...
mmimportfs: Processing disk nsd_vmNew01
mmimportfs: Processing disk nsd_vmNew02

mmimportfs: Committing the changes ...

mmimportfs: The following file systems were successfully imported:
        alice
        sut
        vm_images
mmimportfs: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
[root@nsd1 gen]#
[root@nsd1 gen]# mmlsconfig
Configuration data for cluster sutcluster.nsd1:
-----------------------------------------------
myNodeConfigNumber 1
clusterName sutcluster.nsd1
clusterId 13882346956505965919
autoload yes
uidDomain sutdomain
minReleaseLevel 3.4.0.7
dmapiFileHandleSize 32
pagepool 16g
maxMBpS 6144
prefetchThreads 400
worker1Threads 1000
maxFilesToCache 2000
maxblocksize 4m
adminMode central

File systems in cluster sutcluster.nsd1:
----------------------------------------
/dev/alice
/dev/sut
/dev/vm_images
.
[root@nsd1 gen]# mmlsdisk sut
Failed to read a file system descriptor.
Input/output error
mmlsdisk: Command failed.  Examine previous error messages to determine cause.
[root@nsd1 gen]#
.
How come GPFS still complaint about the file system descriptor?!? (It just export and import it back!)
.
Please advise on how to solve this problem.
.
Thank you very much,
Theeraphong
 

  • ScaleoutSean
    ScaleoutSean
    15 Posts
    ACCEPTED ANSWER

    Re: Failed to read a file system descriptor - input/output error

    ‏2013-05-23T05:50:07Z  in response to Theeraph

    I don't know why, but it sounds to me like all FS descriptors aren't accessible.

    I wouldn't expect that re-importing the filesystem (GPFS or any other) would fix this.

    I'd contact support and ask them whether it's possible to recover from this.