I have problem regarding failover of remote mount filesystem. In following environment, I killed network of NSD server connected to Node A-1 by "ifconfig en? down". When I started this test, the failover worked perfectly. The access was moved to another node in Cluster B and I/O continued without error. But after several tests, this test suddenly failed and never succeeded. The filesystem fs1 was still alive, but the failover didn't work somehow. I needed to reboot all nodes in Cluster A and Cluster B to recover initial state.
I need suggestion of GPFS guru ...
(No SAN access path to nsd)
Remotely mounted filesystem fs1 via network
Node B-1: NSD server (for nsd1)
Node B-2: NSD server (for nsd1)
Node B-3: NSD server (for nsd1)
Filesystem fs1 consists of nsd1.
All nodes are AIX7.1 + GPFS 18.104.22.168
(mmfs.log of Node A-1)
Fri Dec 14 14:50:08.639 2012: Recovering nodes in cluster node_b: 192.168.93.31 (in cluster_b)
Fri Dec 14 14:50:08.649 2012: Disk lease period expired in cluster cluster_b. Attempting to reacquire lease.
Fri Dec 14 14:50:13.641 2012: Disk lease reacquired in cluster cluster_b.
Fri Dec 14 14:50:13.670 2012: Disk failure. Volume remote_fs1. rc = 5. Physical volume nsd1.
Fri Dec 14 14:50:13.774 2012: File System remote_fs1 unmounted by the system with return code 5 reason code 0
Fri Dec 14 14:50:13.775 2012: I/O error
Fri Dec 14 14:50:13 JST 2012: mmcommon preunmount invoked. File system: fs1 Reason: SGPanic
Pinned topic Failover problem of remote mount filesystem on GPFS3.5
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2012-12-15T00:12:56Z at 2012-12-15T00:12:56Z by SystemAdmin
dlmcnabb 120000P4JT1012 Posts
Re: Failover problem of remote mount filesystem on GPFS3.52012-12-14T22:07:54ZThis is the accepted answer. This is the accepted answer.If you kill an NSD server (or a disk on an NSD server) and then revive it, there is no automatic failback. You need to run
mmnsddiscover -N all -a (or -d "disk1;disk2:..."
The "-N all" will divide up the discovery in two steps. It will tell all the NSD servers to rediscover which disks they have direct access to, and then it tells all client nodes to re-evaluate which NSD server to use.
Note: there was a bug in mmnsddiscover in 22.214.171.124/16/17 and 126.96.36.199/4 causing some disks to lose their names in GPFS structures.
SystemAdmin 110000D4XK2092 Posts
Re: Failover problem of remote mount filesystem on GPFS3.52012-12-15T00:12:56ZThis is the accepted answer. This is the accepted answer.
- dlmcnabb 120000P4JT
It looks like "mmnsddiscover" recovered NSD servers. I'll perform more tests to make sure my environment is properly configured ...