Hadoop distcp support
The hadoop distcp command is used for data migration from HDFS to the IBM Spectrum Scale™ file system and between two IBM Spectrum Scale file systems.
There are no additional configuration changes. The hadoop distcp command is supported in HDFS transparency 2.7.0-2 (gpfs.hdfs-protocol-2.7.0-2) and later.
hadoop distcp hdfs://nn1_host:8020/source/dir hdfs://nn2_host.:8020/target/dir
Known Issues and Workaround
Issue 1: Permission is denied when the hadoop distcp command is run with the root credentials.
org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE,
inode="/user/root/.staging":hdfs:hdfs:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319).Add the super user account to gpfs.supergroup in gpfs-site.xml to configure the root as the super user or run the related hadoop distcp command with the super user credentials.
Issue 2: Access time exception while copying files from IBM Spectrum Scale to HDFS with the -p option
[hdfs@c8f2n03 conf]$ hadoop distcp -overwrite -p hdfs://c16f1n03.gpfs.net:8020/testc16f1n03/
hdfs://c8f2n03.gpfs.net:8020/testc8f2n03
Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Access time for HDFS is not configured. Set the dfs.namenode.accesstime.precision configuration parameter at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:101)
WorkaroundChange the dfs.namenode.accesstime.precision value from 0 to a value such as 3600000 (1 hour) in hdfs-site.xml for the HDFS cluster.
Issue 3: The distcp command fails when the src director is root.
[hdfs@c16f1n03 root]$ hadoop distcp hdfs://c16f1n03.gpfs.net:8020/ hdfs://c8f2n03.gpfs.net:8020/test5
16/03/03 22:27:34 ERROR tools.DistCp: Exception encountered
java.lang.NullPointerException
at org.apache.hadoop.tools.util.DistCpUtils.getRelativePath(DistCpUtils.java:144)
at org.apache.hadoop.tools.SimpleCopyListing.writeToFileListing(SimpleCopyListing.java:353)
WorkaroundSpecify at least one directory or file at the source directory.