Example of using the gpfs.readReplicaRule string
The following example illustrates the application of gpfs.readReplicaRule during a file read.
mmchattr --set-attr gpfs.readReplicaRule="b=1:2 r=1,x; b=3 r=1; b=3 r=0; d=3,1" <filename>
The following data block disk address list is for a file with 2 DataReplicas and 3 MaxDataReplicas:
0: 1:3750002688 1: 2:4904091648 2: (null)
3: 3:4860305408 4: 1:3750002688 5: (null)
6: 2:3750002688 7: (null) 8: (null)
9: 3:3750002688 10: (null) 11: (null)
12: (null) 13: (null) 14: (null)
The replica rule picks the reference replica for each block that is read as follows:
block 0: The initial replica set is (1:3750002688, 2:4904091648).
There is no matching block_sub_rule. The file_sub_rule “d=3,1”
excludes disks 3 and 1 which leaves the replica set as (2:4904091648).
So the block will be read from DA 2:4904091648.
block 1: The initial replica set is (3:4860305408, 1:3750002688).
There is a matching block_sub_rule “b=1:2 r=1,x” which picks the replica
DA 1:3750002688 for read. In case GPFS fails to read from this DA,
then it will not try the other replica and returns read error.
block 2: The initial replica set is (2:3750002688).
There is a matching block_sub_rule “b=1:2 r=1,x” which first picks replica index
1 from the set. Since there is no such replica, the next replica_index in the rule is
then evaluated which is ‘x’ ie. the last valid replica in the replica set. This is
DA 2:3750002688.
block 3: The initial replica set is (3:3750002688).
There is a matching block_sub_rule “b=3 r=1” which picks replica index 1
from the set. Since there is no such replica, a read error is returned.
Note that the block_sub_rule “b=3 r=0” is not applied as only the first matching
block_sub_rule is applied.
block 4: This is a hole. So there is no replica rule evaluation done and
the read returns a block of zeroes.
If you have permission to read a file, you can check whether the gpfs.readReplicaRule extended attribute is set in the file by one of the following methods:
mmlsattr --get-attr gpfs.readReplicaRule <FilePath>
mmlsattr -n gpfs.readReplicaRule <FilePath>
getfattr --absolute-names -n gpfs.readReplicaRule <FilePath>
Alternatively, a policy rule such as the following example can be used to show the gpfs.readReplicaRule extended attribute:
DEFINE(DISPLAY_NULL,[COALESCE($1,'_NULL_')])
RULE EXTERNAL LIST 'files' EXEC ''
RULE LIST 'files' DIRECTORIES_PLUS SHOW(DISPLAY_NULL(XATTR('ATTR')))
The policy rule can be applied in the following way:
mmapplypolicy <dir> -P <policyRuleFile> -I defer -f /tmp -L 0 -M ATTR="gpfs.readReplicaRule"
mmlsattr --get-attr gpfs.readReplicaRule --inode-number <SnapPath/InodeNumber>
To verify the effects of a gpfs.readReplicaRule string, you can dump the distribution of disk numbers for each replica of each block of the file by using any of the following commands:
mmlsattr -D <FilePath>
mmlsattr --dump-data-block-disk-numbers <FilePath>
Running one of those commands provides output similar to the following example:
Block Index Replica 0 Replica 1 Replica 2
----------- --------- --------- ---------
0 *1 2 -
1 3 *1 -
2 *2 - -
3 3 - - e
4 - - -
A disk number that is prefixed with an asterisk indicates the data block replica that is read from the disk for that block. By default, the first valid data block replica is always returned on read. An e at the end of a row indicates that the gpfs.readReplicaRule selected an invalid replica and hence the read of this block returns an error. You can change which data block replica is read by changing the readReplicaPolicy global configuration option, the diskReadExclusionList global configuration option, or the gpfs.readReplicaRule extended attribute. Thus, using a memory dump is a good way to check the effects of these settings.
To dump the disk distribution with the inode number instead of the file name, run the following commands:
mmlsattr -D --inode-number <SnapPath/InodeNumber>
mmlsattr --dump-data-block-disk-numbers --inode-number <SnapPath/InodeNumber>
After the correct gpfs.readReplicaRule string is determined and the file data is verified to be valid, then the data block replica mismatches in the file can be repaired with the following command:
mmrestripefile -c <Filename>
After the repair is completed, you can delete the gpfs.readReplicaRule extended attribute. Alternatively, you can defer repair of the file and continue to do read and write operations on the file with the gpfs.readReplicaRule extended attribute present.
- Identify the mismatched data block replicas in a file.
- Ensure that the readReplicaRuleEnabled global configuration option is set to yes.
- Write the gpfs.readReplicaRule extended attribute to select a replica index for each data block with mismatched replicas.
- Verify that the gpfs.readReplicaRule extended attribute selects the replicas as expected with the mmlsattr -D command.
- Validate the file by processing it through its associated application. Note: The validation process should involve only reads of the file. Any attempt to write to the blocks with mismatched replicas will overwrite all replicas. If the replica that is selected by the gpfs.readReplicaRule extended attribute is incorrect, then writing to the block using the bad replica will permanently corrupt the block.
- If the file validation fails, then retry steps 2, 3, and 4 with a different replica index for the problem data blocks.
- After the file passes validation, repair the data block replica mismatches in the file with the mmrestripefile -c command.
- Delete the gpfs.readReplicaRule extended attribute.