Configure storage type data replication
To get the file system data replica values, run the mmlsfs <fsName> -r -R command to review the output values. The value of -r is the default number of data replicas and the value of -R is the maximum number of data replicas.
For different storage modes, refer to the following table for recommended combination for dfs.replication, gpfs.replica.enforced and file system data replica.
Storage mode | dfs.replication | gpfs.replica.enforced | File system data replica | Comments |
---|---|---|---|---|
#1 FPO (gpfs.storage.type=local) |
3 | gpfs or dfs | -r = 3 -R = 3 | Other combinations are not recommended. |
#2 IBM Storage Scale
System (gpfs.storage.type=shared) |
1 | dfs | -r = 1
-R = 2 -r = 1 -R = 3 |
Follow the HDFS protocol. But the job will fail if one DN is down after
getBlockLocation is returned. Potential issue: Does not show the advantage that all DN can access the blocks. If you are using this configuration you must use the mmlsattr command to check the file replication value. If the set file replication value is less than the dfs.replication value, the HDFS interface cannot be used to check the file replication value because the NameNode returns at least the dfs.replication value in the shared storage mode. |
#3 IBM Storage Scale
System (gpfs.storage.type=shared) |
2 or 3 | gpfs | -r = 1
-R = 2 -r = 1 -R = 3 |
Follow the HDFS protocol (returns 2 or 3 DNs) but does not match the real storage usage on
GPFS level. Job will not fail if one DN is down after getBlockLocation is returned. Potential risk: Upper-layer applications calculate the disk space consumption as replication * file size, thinking a file takes more storage space than it actually does. HDFS Transparency will still use the actual disk space correctly. |
#4 IBM Storage Scale
System (gpfs.storage.type=shared) |
1 | gpfs | -r = 1
-R = 2 -r = 1 -R = 3 |
Do not use if the application wants to set the replication value from HDFS protocol. |
#5 IBM Storage Scale
System (gpfs.storage.type=shared) |
2 or 3 | dfs | -r = 1
-R = 2 -r = 1 -R = 3 |
All the data will be set as replica 2 or 3 which will not take advantage of using IBM Storage Scale
System or SAN storage. If you are using this configuration you must use the mmlsattr command to check the file replication value. If the set file replication value is less than the dfs.replication value, the HDFS interface cannot be used to check the file replication value because the NameNode returns at least the dfs.replication value in the shared storage mode. |
- The dfs.replication is defined in the hdfs-site.xml file. The gpfs.storage.type and gpfs.replica.enforced are defined in the gpfs-site.xml file.
- Starting from HDFS Transparency version 3.1.1-1, the default value for dfs.replication is 3 in hdfs-site.xml and gpfs.replica.enforced is gpfs in gpfs-site.xml.
- The dfs.replication value should be smaller or equal to the DataNode count.