Check the data locality
This topic lists the steps to check the data locality for IBM Spectrum Scale™ 4.2.2.0 and later.
You can refer the output from /usr/lpp/mmfs/samples/fpo/mmgetlocation about the options. You can run /usr/lpp/mmfs/samples/fpo/mmgetlocation -f <absolute-file-path> to get the block location of the <absolute-file-path>. Also, you can run /usr/lpp/mmfs/samples/fpo/mmgetlocation -d <absolute-dir-path> to get the block location summary of <absolute-dir-path>.
The following is one example of output:
# /usr/lpp/mmfs/samples/fpo/mmgetlocation -f /sncfs/file1G
[FILE INFO]
------------------------------------------------------------------------
blockSize 1024 KB
blockGroupFactor 128
metadataBlockSize 131072K
writeAffinityDepth 1
flags:
data replication: 2 max 2
storage pool name: fpodata
metadata replication: 2 max 2
Chunk 0 (offset 0) is located at disks: [ data_c8f2n04_sdg c8f2n04 ] [ data_c8f2n05_sdf c8f2n05 ]
...
Chunk 7 (offset 939524096) is located at disks: [ data_c8f2n04_sdg c8f2n04 ] [ data_c8f2n05_sdf c8f2n05 ]
[SUMMARY INFO]
----------------------------------------------------------------------------------------------------------
Replica num Nodename TotalChunkst
Replica 1 : c8f2n04: Total : 8
Replica 2 : c8f2n05: Total : 8
[root@c8f2n04 fpo]#
From the summary at the end of the output, you can know, for the file /sncfs/file1G, 8 chunks of the 1st replica are located on the node c8f2n04. The 8 chunks of the 2nd replica are located on the node c8f2n05.
For IBM Spectrum Scale earlier than 4.2.2.0 perform the following steps to get block location of files.
cd /usr/lpp/mmfs/samples/fpo/
g++ -g -DGPFS_SNC_FILEMAP -o tsGetDataBlk -I/usr/lpp/mmfs/include/ tsGetDataBlk.C -L/usr/lpp/mmfs/lib/ -lgpfs
./tsGetDataBlk <filename> -s 0 -f <data-pool-block-size * blockGroupFactor> -r 3
Check
the output of the program
tsGetDataBlk:[root@gpfstest2 sncfs]# /usr/lpp/mmfs/samples/fpo/tsGetDataBlk /sncfs/test -r 3
File length: 1073741824, Block Size: 2097152
Parameters: startoffset:0, skipfactor: META_BLOCK, length: 1073741824, replicas 3
numReplicasReturned: 3, numBlksReturned: 4, META_BLOCK size: 268435456
Block 0 (offset 0) is located at disks: 2 4 6
Block 1 (offset 268435456) is located at disks: 2 4 6
Block 2 (offset 536870912) is located at disks: 2 4 6
Block 3 (offset 805306368) is located at disks: 2 4 6
In the above example, the block size of data pool is 2Mbytes, the blockGroupFactor of the data pool is 128. So, the META_BLOCK (or chunk) size is 2MB * 128 = 256Mbytes. Each output line represents one chunk. For example, Block 0 in the above is located in the disks with disk id 2, 4 and 6 for 3 replica.
In order to know the node on which the three replicas of Block 0 are located, check the mapping between disk ID and nodes:
[root@gpfstest2 sncfs]# mmlsdisk sncfs –L
disk driver sector failure holds holds avail- storage
name type size group metadata data status ability disk id pool remarks
------------ -------- ------ ----------- -------- ----- ------- --------- ------- --------- ---------
node1_sdb nsd 512 1 Yes No ready up 1 system desc
node1_sdc nsd 512 1,0,1 No Yes ready up 2 datapool
node2_sda nsd 512 1 Yes No ready up 3 system
node2_sdb nsd 512 2,0,1 No Yes ready up 4 datapool
node6_sdb nsd 512 2 Yes No ready up 5 system desc
node6_sdc nsd 512 3,0,1 No Yes ready up 6 datapool
node7_sdb nsd 512 2 Yes No ready up 7 system
node7_sdd nsd 512 4,0,2 No Yes ready up 8 datapool
node11_sdb nsd 512 3 Yes No ready up 9 system desc
node11_sdd nsd 512 1,1,1 No Yes ready up 10 datapool desc
node9_sdb nsd 512 3 Yes No ready up 11 system
node9_sdd nsd 512 2,1,1 No Yes ready up 12 datapool
node10_sdc nsd 512 4 Yes No ready up 13 system desc
node10_sdf nsd 512 3,1,1 No Yes ready up 14 datapool
node12_sda nsd 512 4 Yes No ready up 15 system
node12_sdb nsd 512 4,1,2 No Yes ready up 16 datapool
[root@gpfstest2 sncfs]# mmlsnsd
File system Disk name NSD servers
---------------------------------------------------------------------------
sncfs node1_sdb gpfstest1.cn.ibm.com
sncfs node1_sdc gpfstest1.cn.ibm.com
sncfs node2_sda gpfstest2.cn.ibm.com
sncfs node2_sdb gpfstest2.cn.ibm.com
sncfs node6_sdb gpfstest6.cn.ibm.com
sncfs node6_sdc gpfstest6.cn.ibm.com
sncfs node7_sdb gpfstest7.cn.ibm.com
sncfs node7_sdd gpfstest7.cn.ibm.com
sncfs node11_sdb gpfstest11.cn.ibm.com
sncfs node11_sdd gpfstest11.cn.ibm.com
sncfs node9_sdb gpfstest9.cn.ibm.com
sncfs node9_sdd gpfstest9.cn.ibm.com
sncfs node10_sdc gpfstest10.cn.ibm.com
sncfs node10_sdf gpfstest10.cn.ibm.com
sncfs node12_sda gpfstest12.cn.ibm.com
sncfs node12_sdb gpfstest12.cn.ibm.com
The three replicas of Block 0 are located in disk id 2 (NSD name node1_sdc, node name is gpfstest1.cn.ibm.com), disk id 4 (NSD name node2_sdb, node name is gpfstest2.cn.ibm.com), and disk id 6 (NSD name node6_sdc, node name is gpfstest6.cn.ibm.com). Check each block of the file to see if the blocks are located correctly. If all blocks are not located correctly, fix the data locality.