The difference between HDFS Transparency and native HDFS

The configuration that differ from HDFS in IBM Storage Scale.

Property name Value New definition or limitation
dfs.storage.policy.enabled True/false Not supported by HDFS Transparency.

This means that storage policy commands like hdfs storagepolicies and configuration like fs.setStoragePolicy are not supported.

dfs.permissions.enabled True/false For HDFS protocol, permission check is always done.
dfs.namenode.acls.enabled True/false For native HDFS, the NameNode manages all meta data including the ACL information. HDFS can use this information to turn on or off the ACL checking. However, for IBM Storage Scale, HDFS protocol will not save the meta data. When ACL checking is on, the ACL will be set and stored in the IBM Storage Scale file system. If the admin turns ACL checking off, the ACL entries set before are still stored in IBM Storage Scale and remain effective. This will be improved in the next release.
dfs.blocksize Long digital Must be an integer multiple of the IBM Storage Scale file system blocksize (mmlsfs -B), the maximal value is 1024 * file-system-data-block-size (mmlsfs -B).
dfs.namenode.fs-limits.max-xattrs-per-inode INT Does not apply to HDFS Transparency.
dfs.namenode.fs-limits.max-xattr-size INT Does not apply to HDFS Transparency.
dfs.namenode.fs-limits.max-component-length Not checked Does not apply to HDFS Transparency; the file name length is controlled by IBM Storage Scale. Refer IBM Storage Scale FAQ for file name length limit (255 unicode-8 chars).
Native HDFS caching Not supported IBM Storage Scale has its own caching mechanism.
NFS Gateway Not supported IBM Storage Scale provides POSIX interface and taking IBM Storage Scale protocol could give your better performance and scaling.
Functional limitations
  • The maximum number of Extended Attributes (EA) is limited by IBM Storage Scale and the total size of the EA key. Also, the value must be less than a metadata block size in IBM Storage Scale.
  • The EA operation on snapshots is not supported.
  • Raw namespace is not implemented because it is not used internally.
  • If gpfs.replica.enforced is configured as gpfs, the Hadoop shell command hadoop dfs -setrep does not take effect. Also, hadoop dfs -setrep -w stops functioning and does not exit. Also, if one file is smaller than inode size (by default, it is 4Kbytes per inode), IBM Storage Scale will store the file as data-in-inode. For these kinds of small files, the data replica of these data-in-inode file will be the replica of meta data instead of replica of data.
  • HDFS Transparency NameNode does not provide safemode because it is stateless.
  • HDFS Transparency NameNode does not need the second NameNode like native HDFS because it is stateless.
  • Maximal replica for IBM Storage Scale is 3.
  • hdfs fsck does not work against HDFS Transparency. Instead, run mmfsck.
  • IBM Storage Scale has no ACL entry number limit (maximal entry number is limited by Int32).
  • distcp --diff is not supported over snapshot.
  • + in file name is not supported if taking the schema hftp://. If not taking hftp://, + in file name works.
  • In HDFS, files can only be appended. If a file is uploaded into the same location with the same file name to overwrite the existing file, then HDFS can detect this according to the inode change. However, IBM Storage Scale supports POSIX interface and other protocol interfaces (for example, NFS/SMB) and one file could be changed from non HDFS interface. Therefore, files loaded for Hadoop services to process cannot be modified until the process completes. Else the service or job fails.
  • To view a list of HDFS 3.1.1 community fixes that are not yet supported by HDFS Transparency, see HDFS 3.1.1 community fixes.
  • GPFS quota is not supported in HDFS Transparency.
  • The -px option for the hadoop fs -cp -px command is not supported when SELinux is enabled. This is because HDFS cannot handle the system extended attribute (xattr) operation.
  • hadoop namenode -recover is not supported.
  • hdfs haadmin -refreshNodes is not supported.

hdfs haadmin

For CES HDFS integration, from IBM Storage Scale version 5.0.4.2, the hdfs haadmin command has the following new options:
  • -checkHealth -scale
  • -transitionToActive/-transitionToStandby

The option scale is used to retrieve the health and service state of the NameNodes on IBM Storage Scale.

The option -transitionToActive/-transitionToStandby is used to change the state of the local NameNode to active or standby.

Usage:
haadmin [-ns <nameserviceId>]
 [-checkHealth -scale [-all] [-Y]] # For Spectrum Scale usage only
 [-transitionToActive [--forceactive] -scale] # For Spectrum Scale usage only
 [-transitionToStandby -scale] # For Spectrum Scale usage only
 [-transitionToActive [--forceactive] <serviceId>]

Examples:

On the NameNode, check NameNode status by running the following command:
# /usr/lpp/mmfs/hadoop/bin/hdfs haadmin -checkHealth -scale -all
On the first NameNode, transition first NameNode to ACTIVE by running the following command:
 # /usr/lpp/mmfs/hadoop/bin/hdfs haadmin -transitionToActive --forceactive -scale
On the second NameNode, transition second NameNode to ACTIVE by running the following command:
# /usr/lpp/mmfs/hadoop/bin/hdfs haadmin -transitionToActive --forceactive -scale
On the secondary NameNode, transition second NameNode to STANDBY by running the following command:
# /usr/lpp/mmfs/hadoop/bin/hdfs haadmin -transitionToStandby -scale