Overview
All data transmission and metadata operations in HDFS are through the RPC mechanism and processed by the NameNode and the DataNode services within HDFS.
IBM Storage® Scale HDFS protocol implementation integrates both the NameNode and the DataNode services and responds to the request as if it were HDFS. Advantages of the HDFS transparency are as follows:
- HDFS-compliant APIs or shell-interface command.
- Application client isolation from storage. Application client might access data in IBM Storage Scale file system without GPFS™ client installed.
- Improved security management by Kerberos authentication and encryption for RPCs.
- Simplified file system monitor by Hadoop Metrics2 integration.
In the following sections, DFS client is the node installed with HDFS client package. Hadoop Node is the node that is installed with any Hadoop-based components (such as Hive, Hbase, Pig, and Ranger). Hadoop service is the Hadoop-based application or components. HDFS Transparency node is the node running HDFS Transparency NameNode or DataNode.
Integration of Cluster Export Services (CES) protocol and deployment toolkit with HDFS Transparency are supported starting with HDFS Transparency 3.1.1 and IBM Storage Scale 5.0.4.2. For more information, see HDFS Transparency overview.
For information about downloading the HDFS Transparency package, see HDFS Transparency download.