HDFS transparency

The IBM Spectrum Scale HDFS transparency offers a set of interfaces that allows applications to use HDFS client to access IBM Spectrum Scale through HDFS RPC requests.

All data transmission and metadata operations in HDFS are through RPC and they are processed by the NameNode and the DataNode services within HDFS. IBM Spectrum® Scale HDFS transparency implementation integrates both the NameNode and the DataNode services and responds to the request.

In IBM Spectrum Scale 5.0.4.2 or later, CES can be used to control the HDFS NameNode service to ensure high availability based on CES failover capability. CES nodes in the cluster are used as NameNodes in this configuration. Alternatively, you can configure your cluster to run the non-CES-based HDFS service.

Note: The GUI displays the HDFS Transparency page in the Services page if CES-based HDFS is configured in the cluster. If non-CES-based HDFS service is enabled in the cluster, the GUI displays the Hadoop Connector page in the Services page instead of the HDFS Transparency page.

You can monitor the following details of the HDFS transparency service from the GUI:

  • NameNode: Lists the CES nodes where the metadata of the HDFS service is serviced. You can also see the status of CES service and the IP address assignment on these nodes. CES IP address is assigned to only one NameNode per HDFS cluster.
  • DataNode: Provides the details of the nodes on which the HDFS file system is stored.
  • Events: Lists the events that are reported against the HDFS transparency service.