Check the data locality

This topic lists the steps to check the data locality for IBM Storage Scale.

Perform the following steps to check the data locality for IBM Storage Scale releases:
  • For IBM Storage Scale 4.2.2.0 and earlier, run /usr/lpp/mmfs/samples/fpo/tsGetDataBlk.
  • For IBM Storage Scale 4.2.2.x, run /usr/lpp/mmfs/samples/fpo/mmgetlocation.
  • For IBM Storage Scale 4.2.3, mmgetlocation supports the -Y option.
Note: /usr/lpp/mmfs/samples/fpo/mmgetlocation is based on /usr/lpp/mmfs/samples/fpo/tsGetDataBlk. Ensure that GNU GCC is installed from Linux® distro before invoking /usr/lpp/mmfs/samples/fpo/mmgetlocation.

You can use /usr/lpp/mmfs/samples/fpo/mmgetlocation to query the block location of file.

You can refer the output from /usr/lpp/mmfs/samples/fpo/mmgetlocation about the options. You can run /usr/lpp/mmfs/samples/fpo/mmgetlocation -f <absolute-file-path> to get the block location of the <absolute-file-path>. Also, you can run /usr/lpp/mmfs/samples/fpo/mmgetlocation -d <absolute-dir-path> to get the block location summary of <absolute-dir-path>.

For IBM Storage Scale 4.2.2.x, run /usr/lpp/mmfs/samples/fpo/mmgetlocation.

The following is a sample output:

# /usr/lpp/mmfs/samples/fpo/mmgetlocation -f /sncfs/file1G

[FILE INFO]
------------------------------------------------------------------------

blockSize 1024 KB
blockGroupFactor 128
metadataBlockSize 131072K
writeAffinityDepth 1
flags: 
data replication: 2 max 2
storage pool name: fpodata
metadata replication: 2 max 2

Chunk 0 (offset 0) is located at disks: [ data_c8f2n04_sdg c8f2n04 ] [ data_c8f2n05_sdf c8f2n05 ] 
...
Chunk 7 (offset 939524096) is located at disks: [ data_c8f2n04_sdg c8f2n04 ] [ data_c8f2n05_sdf c8f2n05 ] 

[SUMMARY INFO]
----------------------------------------------------------------------------------------------------------
Replica num Nodename TotalChunkst

Replica 1 : c8f2n04: Total : 8
Replica 2 : c8f2n05: Total : 8
[root@c8f2n04 fpo]# 

The summary at the end of the output shows that, for the file /sncfs/file1G, 8 chunks of the first replica are located on the node c8f2n04. The 8 chunks of the second replica are located on the c8f2n05 node.

For IBM Storage Scale 4.2.2.0 and earlier, perform the following steps to get the block location of files.

cd /usr/lpp/mmfs/samples/fpo/
g++ -g -DGPFS_SNC_FILEMAP -o tsGetDataBlk -I/usr/lpp/mmfs/include/ tsGetDataBlk.C -L/usr/lpp/mmfs/lib/ -lgpfs
./tsGetDataBlk <filename>  -s 0 -f <data-pool-block-size * blockGroupFactor> -r 3
Check the output of the tsGetDataBlk program:
[root@gpfstest2 sncfs]# /usr/lpp/mmfs/samples/fpo/tsGetDataBlk /sncfs/test -r 3
File length: 1073741824, Block Size: 2097152
Parameters: startoffset:0, skipfactor: META_BLOCK, length: 1073741824, replicas 3
numReplicasReturned: 3, numBlksReturned: 4, META_BLOCK size: 268435456
Block 0 (offset 0) is located at disks:  2   4   6 
Block 1 (offset 268435456) is located at disks:  2   4   6 
Block 2 (offset 536870912) is located at disks:  2   4   6 
Block 3 (offset 805306368) is located at disks:  2   4   6 

In the above example, the block size of data pool is 2 Mbytes, the blockGroupFactor of the data pool is 128. So, the META_BLOCK (or chunk) size is 2MB * 128 = 256Mbytes. Each output line represents one chunk. For example, Block 0 in the above is located in the disks with disk id 2, 4 and 6 for 3 replica.

To know the node on which the three replicas of Block 0 are located, check the mapping between disk ID and nodes:

Check the mapping between disks and nodes by mmlsdisk (the 9th column is the disk id of NSD) and mmlsnsd:
 [root@gpfstest2 sncfs]# mmlsdisk sncfs –L
disk         driver   sector     failure holds    holds            avail-            storage
name         type       size       group metadata data     status  ability   disk id pool       remarks   
------------ -------- ------ ----------- -------- -----    ------- --------- ------- ---------  ---------
node1_sdb    nsd         512           1    Yes      No    ready      up             1 system      desc
node1_sdc    nsd         512       1,0,1    No       Yes   ready      up             2 datapool      
node2_sda    nsd         512           1    Yes      No    ready      up             3 system        
node2_sdb    nsd         512       2,0,1    No       Yes   ready      up             4 datapool      
node6_sdb    nsd         512           2    Yes      No    ready      up             5 system      desc
node6_sdc    nsd         512       3,0,1    No       Yes   ready      up             6 datapool      
node7_sdb    nsd         512           2    Yes      No    ready      up             7 system        
node7_sdd    nsd         512       4,0,2    No       Yes   ready      up             8 datapool      
node11_sdb   nsd        512           3     Yes      No    ready      up             9 system      desc
node11_sdd   nsd        512       1,1,1     No       Yes   ready      up            10 datapool    desc
node9_sdb    nsd         512           3    Yes      No    ready      up            11 system        
node9_sdd    nsd         512       2,1,1    No       Yes   ready      up            12 datapool      
node10_sdc   nsd        512           4     Yes      No    ready      up            13 system      desc
node10_sdf   nsd         512       3,1,1    No       Yes   ready      up            14 datapool      
node12_sda   nsd        512           4     Yes      No    ready      up            15 system        
node12_sdb   nsd        512       4,1,2     No       Yes   ready      up            16 datapool      
[root@gpfstest2 sncfs]# mmlsnsd
 File system   Disk name    NSD servers                                    
---------------------------------------------------------------------------
 sncfs         node1_sdb    gpfstest1.cn.ibm.com     
 sncfs         node1_sdc    gpfstest1.cn.ibm.com     
 sncfs         node2_sda    gpfstest2.cn.ibm.com     
 sncfs         node2_sdb    gpfstest2.cn.ibm.com     
 sncfs         node6_sdb    gpfstest6.cn.ibm.com     
 sncfs         node6_sdc    gpfstest6.cn.ibm.com     
 sncfs         node7_sdb    gpfstest7.cn.ibm.com     
 sncfs         node7_sdd    gpfstest7.cn.ibm.com     
 sncfs         node11_sdb   gpfstest11.cn.ibm.com    
 sncfs         node11_sdd   gpfstest11.cn.ibm.com    
 sncfs         node9_sdb    gpfstest9.cn.ibm.com     
 sncfs         node9_sdd    gpfstest9.cn.ibm.com     
 sncfs         node10_sdc   gpfstest10.cn.ibm.com    
 sncfs         node10_sdf   gpfstest10.cn.ibm.com    
 sncfs         node12_sda   gpfstest12.cn.ibm.com    
 sncfs         node12_sdb   gpfstest12.cn.ibm.com  

The three replicas of Block 0 are located in disk ID 2 (NSD name node1_sdc, node name is gpfstest1.cn.ibm.com), disk ID 4 (NSD name node2_sdb, node name is gpfstest2.cn.ibm.com), and disk ID 6 (NSD name node6_sdc, node name is gpfstest6.cn.ibm.com). Check each block of the file to see if the blocks are located correctly. If the blocks are not located correctly, fix the data locality.