Hadoop distcp support
The hadoop distcp command is used for data migration from HDFS to the IBM Storage Scale file system and between two IBM Storage Scale file systems.
There are no additional configuration changes. The hadoop distcp command is supported in HDFS transparency 2.7.0-2 (gpfs.hdfs-protocol-2.7.0-2) and later.
hadoop distcp hdfs://nn1_host:8020/source/dir
hdfs://nn2_host:8020/target/dir
Known Issues and Workaround
Issue 1: Permission is denied when the hadoop distcp command is run with the root credentials.
org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE,
inode="/user/root/.staging":hdfs:hdfs:drwxr-xr-x
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319).Configure root as a super user. Add the super user account to gpfs.supergroup in gpfs-site.xml to configure the root as the super user or run the related hadoop distcp command with the super user credentials. This is applicable only for HDFS Transparency 2.7.3-x. From HDFS Transparency 3.0.x, the configuration gpfs.supergroup has been removed from HDFS Transparency.
Issue 2: Access time exception while copying files from IBM Storage Scale to HDFS with the -p option
[hdfs@c8f2n03 conf]$ hadoop distcp -overwrite -p
hdfs://c16f1n03.gpfs.net:8020/testc16f1n03/
hdfs://c8f2n03.gpfs.net:8020/testc8f2n03
Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Access time for HDFS is not configured. Set the dfs.namenode.accesstime.precision configuration parameter at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:101)
WorkaroundChange the dfs.namenode.accesstime.precision value from 0 to a value such as 3600000 (1 hour) in hdfs-site.xml for the HDFS cluster.
Issue 3: The distcp command fails when the src director is root.
[hdfs@c16f1n03 root]$ hadoop distcp hdfs://c16f1n03.gpfs.net:8020/
hdfs://c8f2n03.gpfs.net:8020/test5
16/03/03 22:27:34 ERROR tools.DistCp: Exception encountered
java.lang.NullPointerException
at org.apache.hadoop.tools.util.DistCpUtils.getRelativePath(DistCpUtils.java:144)
at org.apache.hadoop.tools.SimpleCopyListing.writeToFileListing(SimpleCopyListing.java:353)
WorkaroundSpecify at least one directory or file at the source directory.