Overview

IBM StorageĀ® Scale HDFS Transparency, also known as HDFS Protocol, offers a set of interfaces that allows applications to use HDFS clients to access IBM Storage Scale through HDFS RPC requests.

For more information about HDFS Transparency, see IBM Storage Scale support for Hadoop.

Currently, if the jobs running on the native HDFS cluster plan to access data from IBM Storage Scale, the option is to use distcp or Hadoop Storage Tiering mode with native HDFS federation.

Using Hadoop distcp requires the data to be copied between the native HDFS and the IBM Storage Scale HDFS Transparency cluster and this must be done before accessing. There are two copies of the same data consuming the storage space. For more information, see Hadoop distcp support.

If you are using Hadoop Storage Tiering mode with native HDFS federation to federate native HDFS and IBM Storage Scale HDFS Transparency, the jobs running on the Hadoop cluster with native HDFS can read and write the data from IBM Storage Scale in real time. There would be only one copy of the data. However, the ViewFs schema used in federation with native HDFS is not certified by the Hive community and HDFS federation is not supported by Hortonworks HDP 2.6.
Note: Hadoop Storage Tiering mode with native HDFS federation is not supported in HDFS Transparency.