I'm just trying to understand what fsck does exactly with no luck. Questions go first, explanation follows.
1) What does IndblockBad translates to, i.e. what is "bad"? A block this inode resides in? Inode data? My metadata are replicated - which copy it would be then? List of blocks it points to? A checksum? On what? I can't figure out what triggers this message.
2) Some files are detected as corrupted by mmfsck, some are not and I'm trying to figure out why some files slip through. What's detection logic in here, i.e. what checks are done by mmfsck exactly?
3) How do I verify what gets corrupted - list of blocks inode points to or content of these blocks? Is there a way to do something like capturing full snapshot of inodes to do comparison when I discover corrupted file later?
Longer story - my scenario is that fsck reports issues like the one below:
InodeProblemList: 1 entries
iNum snapId status keep delete noScan new error
It all begins when user reports garbled file (backup copy confirms that), mmfsck is run on mounted filesystem and no errors related to reported file are found - but others are, like the one above, and they do point to other corrupted files. So, I'm left with files which are scrambled and mmfsck not reporting them, files which are scrambled and gpfs somehow discovering them and files becoming corrupt at 1-2 a week rate.
In case that helps - my metadata are replicated (i.e. original and one replica), data are not. Corruption is block based - i.e. gpfs block is either fully fine or every single byte in one differs when compared to backup copy. Corrupted blocks are spread on tens of luns. Files can be read with no errors, but they do differ to backup copy. GPFS is 3.4.0-15, os Linux. Files which get corrupted are not recent ones, majority of them is >4 months old.
Any pointers are welcomed warmly. I'm happy to post any data requested.