Topic
  • 3 replies
  • Latest Post - ‏2013-06-19T14:21:35Z by renarg
renarg
renarg
119 Posts

Pinned topic Log Recovery failed

‏2013-06-18T20:29:09Z |

Hallo All,

we are a little bit in trouble with a dmapi enabled filesystem due to log recovery failed. What happend was a relocation of a half of a three site cluster.

The config seems to be two nodes as quorum node and one as quorum buster node wit the desc only config. what happend after the restart of the relocated node A we had diskdown situation on failure group 2 through a uncoordinated maintenance on these controller. The second problem after the relocation was finished we had problems with a spanning tree problem in the cluster network.

MC Murphy is great;-).

So now we had after mmchdisk gpfst02 start -d "NSD_A;NSD_B; NSD_DESC" an error like to many disk are unavailable. Here comes the question we see log recovery failed1 Why? We had the descriptor disk available and all disk on one site of two failuregroups. is there a way to force the log recovery? Or must we wait to bring the failing disks already up and do then a start cmd.

We are level 3.5.0.10. Linux RHEL6.2

The error itself looks like 

Recovery Log IO failed, Unmounting filesystem gpfst02

Too many disks are unavailable

return code 218 reason 225

Log recovery failed.

 

  • dlmcnabb
    dlmcnabb
    1012 Posts

    Re: Log Recovery failed

    ‏2013-06-19T05:46:52Z  

    There must be some other disk that is down, or one of these disks cannot be found by the mmchdisk start code. What does mmlsdisk say is the status of all the disks?

  • renarg
    renarg
    119 Posts

    Re: Log Recovery failed

    ‏2013-06-19T06:22:36Z  

    Hallo Dan,

    here is the output:

    > mmlsdisk gpfst02 -L
    disk         driver   sector     failure holds    holds                                    storage
    name         type       size       group metadata data  status        availability disk id pool         remarks
    ------------ -------- ------ ----------- -------- ----- ------------- ------------ ------- ------------ ---------
    d01DataT018  nsd         512           1 Yes      Yes   ready         down               1 system
    d01DataT118  nsd         512           0 Yes      Yes   ready         up                 2 system        desc
    d01DataT016  nsd         512           1 Yes      Yes   ready         down               3 system
    d01DataT116  nsd         512           0 Yes      Yes   ready         unrecovered        4 system        desc
    d01Desc0002  nsd         512           2 No       No    ready         up                 5 system        desc
    Number of quorum disks: 3
    Read quorum value:      2
    Write quorum value:     2
    Do you have any hints to come over these unrecovered disk? Can it be that same data or metadata are not right replicated?

    Thanks Renar

  • renarg
    renarg
    119 Posts

    Re: Log Recovery failed

    ‏2013-06-19T14:21:35Z  

     

    Hallo Dan,

    after the availability of the Failuregroup 2 we start the disk now succesfully. My problem here is, how can we analyze these situations if we had a log recovery error or there are know bugs in these code level? All the info with mmlsfs and mmdf seems that all data and metadata was replicated. OK we had here another installation problem with the gpl-layer we forgot to deinstall the old  gpfs.gplbin-2.6.32-220.el6.x86_64-3.5.0-07.x86_64 after the build of gpfs.gplbin-2.6.32-220.el6.x86_64-3.5.0-10.x86_64. Can it be that these problem has influence for this. I think this beavior are not very clear documented that the old gpl layer and the remaining links must be removed. 

    Thanks Renar