Check the data locality

This topic lists the steps to check the data locality for IBM Spectrum Scale™ 4.2.2.0 and later.

Start of changeYou can use /usr/lpp/mmfs/samples/fpo/mmgetlocation to query the block location of file.
Note: /usr/lpp/mmfs/samples/fpo/mmgetlocation is based on /usr/lpp/mmfs/samples/fpo/tsGetDataBlk. Ensure GNU GCC is installed from Linux distro before you invoke /usr/lpp/mmfs/samples/fpo/mmgetlocation.
End of change

You can refer the output from /usr/lpp/mmfs/samples/fpo/mmgetlocation about the options. You can run /usr/lpp/mmfs/samples/fpo/mmgetlocation -f <absolute-file-path> to get the block location of the <absolute-file-path>. Also, you can run /usr/lpp/mmfs/samples/fpo/mmgetlocation -d <absolute-dir-path> to get the block location summary of <absolute-dir-path>.

The following is one example of output:

# /usr/lpp/mmfs/samples/fpo/mmgetlocation -f /sncfs/file1G

[FILE INFO]
------------------------------------------------------------------------

blockSize 1024 KB
blockGroupFactor 128
metadataBlockSize 131072K
writeAffinityDepth 1
flags: 
data replication: 2 max 2
storage pool name: fpodata
metadata replication: 2 max 2

Chunk 0 (offset 0) is located at disks: [ data_c8f2n04_sdg c8f2n04 ] [ data_c8f2n05_sdf c8f2n05 ] 
...
Chunk 7 (offset 939524096) is located at disks: [ data_c8f2n04_sdg c8f2n04 ] [ data_c8f2n05_sdf c8f2n05 ] 

[SUMMARY INFO]
----------------------------------------------------------------------------------------------------------
Replica num Nodename TotalChunkst

Replica 1 : c8f2n04: Total : 8
Replica 2 : c8f2n05: Total : 8
[root@c8f2n04 fpo]# 

From the summary at the end of the output, you can know, for the file /sncfs/file1G, 8 chunks of the 1st replica are located on the node c8f2n04. The 8 chunks of the 2nd replica are located on the node c8f2n05.

For IBM Spectrum Scale earlier than 4.2.2.0 perform the following steps to get block location of files.

cd /usr/lpp/mmfs/samples/fpo/
g++ -g -DGPFS_SNC_FILEMAP -o tsGetDataBlk -I/usr/lpp/mmfs/include/ tsGetDataBlk.C -L/usr/lpp/mmfs/lib/ -lgpfs
./tsGetDataBlk <filename>  -s 0 -f <data-pool-block-size * blockGroupFactor> -r 3
Check the output of the program tsGetDataBlk:
[root@gpfstest2 sncfs]# /usr/lpp/mmfs/samples/fpo/tsGetDataBlk /sncfs/test -r 3
File length: 1073741824, Block Size: 2097152
Parameters: startoffset:0, skipfactor: META_BLOCK, length: 1073741824, replicas 3
numReplicasReturned: 3, numBlksReturned: 4, META_BLOCK size: 268435456
Block 0 (offset 0) is located at disks:  2   4   6 
Block 1 (offset 268435456) is located at disks:  2   4   6 
Block 2 (offset 536870912) is located at disks:  2   4   6 
Block 3 (offset 805306368) is located at disks:  2   4   6 

In the above example, the block size of data pool is 2Mbytes, the blockGroupFactor of the data pool is 128. So, the META_BLOCK (or chunk) size is 2MB * 128 = 256Mbytes. Each output line represents one chunk. For example, Block 0 in the above is located in the disks with disk id 2, 4 and 6 for 3 replica.

In order to know the node on which the three replicas of Block 0 are located, check the mapping between disk ID and nodes:

Check the mapping between disks and nodes by mmlsdisk (the 9th column is the disk id of NSD) and mmlsnsd:
 [root@gpfstest2 sncfs]# mmlsdisk sncfs –L
disk         driver   sector     failure holds    holds            avail-            storage
name         type       size       group metadata data     status  ability   disk id pool       remarks   
------------ -------- ------ ----------- -------- -----    ------- --------- ------- ---------  ---------
node1_sdb    nsd         512           1    Yes      No    ready      up             1 system      desc
node1_sdc    nsd         512       1,0,1    No       Yes   ready      up             2 datapool      
node2_sda    nsd         512           1    Yes      No    ready      up             3 system        
node2_sdb    nsd         512       2,0,1    No       Yes   ready      up             4 datapool      
node6_sdb    nsd         512           2    Yes      No    ready      up             5 system      desc
node6_sdc    nsd         512       3,0,1    No       Yes   ready      up             6 datapool      
node7_sdb    nsd         512           2    Yes      No    ready      up             7 system        
node7_sdd    nsd         512       4,0,2    No       Yes   ready      up             8 datapool      
node11_sdb   nsd        512           3     Yes      No    ready      up             9 system      desc
node11_sdd   nsd        512       1,1,1     No       Yes   ready      up            10 datapool    desc
node9_sdb    nsd         512           3    Yes      No    ready      up            11 system        
node9_sdd    nsd         512       2,1,1    No       Yes   ready      up            12 datapool      
node10_sdc   nsd        512           4     Yes      No    ready      up            13 system      desc
node10_sdf   nsd         512       3,1,1    No       Yes   ready      up            14 datapool      
node12_sda   nsd        512           4     Yes      No    ready      up            15 system        
node12_sdb   nsd        512       4,1,2     No       Yes   ready      up            16 datapool      
[root@gpfstest2 sncfs]# mmlsnsd
 File system   Disk name    NSD servers                                    
---------------------------------------------------------------------------
 sncfs         node1_sdb    gpfstest1.cn.ibm.com     
 sncfs         node1_sdc    gpfstest1.cn.ibm.com     
 sncfs         node2_sda    gpfstest2.cn.ibm.com     
 sncfs         node2_sdb    gpfstest2.cn.ibm.com     
 sncfs         node6_sdb    gpfstest6.cn.ibm.com     
 sncfs         node6_sdc    gpfstest6.cn.ibm.com     
 sncfs         node7_sdb    gpfstest7.cn.ibm.com     
 sncfs         node7_sdd    gpfstest7.cn.ibm.com     
 sncfs         node11_sdb   gpfstest11.cn.ibm.com    
 sncfs         node11_sdd   gpfstest11.cn.ibm.com    
 sncfs         node9_sdb    gpfstest9.cn.ibm.com     
 sncfs         node9_sdd    gpfstest9.cn.ibm.com     
 sncfs         node10_sdc   gpfstest10.cn.ibm.com    
 sncfs         node10_sdf   gpfstest10.cn.ibm.com    
 sncfs         node12_sda   gpfstest12.cn.ibm.com    
 sncfs         node12_sdb   gpfstest12.cn.ibm.com  

The three replicas of Block 0 are located in disk id 2 (NSD name node1_sdc, node name is gpfstest1.cn.ibm.com), disk id 4 (NSD name node2_sdb, node name is gpfstest2.cn.ibm.com), and disk id 6 (NSD name node6_sdc, node name is gpfstest6.cn.ibm.com). Check each block of the file to see if the blocks are located correctly. If all blocks are not located correctly, fix the data locality.