Recently we lost a 600GB metadataOnly disk (a JBOD) which, along with 11 others, provides the metadata for one of our filesystems. This disk was in the 500 failure group.
Am I correct that if we suffer one additional disk failure in the other failure group will will loose metadata? If yes, would it be forever gone, or is there a process to rediscover (I'm guessing not).
If the above is true then I've dramatically missed something and need to begin redesigning our filesystem. It seems far too likely that we could loose a 2nd drive in the 2nd failure group before we successfully finish running a mmrestripefs FS -r (which, on this filesystem, takes about 36 hours).
Pinned topic the risk of metadata loss with 2 failure groups and metadataonly drives
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2013-02-20T03:15:34Z at 2013-02-20T03:15:34Z by botemout
dlmcnabb 120000P4JT1012 Posts
Re: the risk of metadata loss with 2 failure groups and metadataonly drives2013-02-14T06:54:16ZThis is the accepted answer. This is the accepted answer.You can keep running if more disks fail in FG 500, but as soon as you lose a disk in a different metadata FG, then the filesystem will unmount and not be allowed to mount again. The problem is that there will be some piece of metadata that has a replica on both of the down disks, so there will be no available replica.
The only recovery at that point is to mount the filesystem in restricted mode (-o rs) and just read out what can be found.
I don't understand why mmrestripefs -r (after suspending the dead disk, or just running mmdeldisk) takes 36 hours. It just has to find metadata that is unreplicated because of the lost disk, and re-replicate just those objects on other disks. So at most it has to read write just one disks worth of metadata. (A mmrestripefs -b would take a lot longer.)
botemout 2700038J8Q70 Posts
Re: the risk of metadata loss with 2 failure groups and metadataonly drives2013-02-20T03:15:34ZThis is the accepted answer. This is the accepted answer.
- dlmcnabb 120000P4JT
I've decided that the risk of two drives failing in different failure groups is too great and will be raiding these drives.
The mmrestripefs that I did took about 23 hours. Unfortunately, this particular filesystem doesn't have the metadata disks in a separate storage pool (for me to run with -P).