Hadoop Storage Tiering mode without native HDFS federation

Edit online

This topic shows how to architect and configure a Hadoop Storage Tiering solution with a suite of test cases executed based on this configuration.

The Hadoop Storage Tiering with IBM Storage® Scale architecture is shown in Figure 1 and Figure 2:

Hadoop Storage Tiering with IBM Storage Scale with single HPD cluster — Figure 1. Hadoop Storage Tiering with IBM Storage Scale without HDP cluster

Hadoop Storage Tiering with IBM Storage Scale with 2 HPD clusters — Figure 2. Hadoop Storage Tiering with IBM Storage Scale with HDP clusters

The architecture for the Hadoop Storage Tiering has a native HDFS cluster (local cluster), seen on the left hand side, and an IBM Storage Scale HDFS Transparency cluster (remote cluster), seen on the right hand side. The jobs running on the native HDFS cluster can access the data from the native HDFS or from the IBM Storage Scale HDFS Transparency cluster according to the input or output data path or from the metadata path. For example, Hive job from Hive metadata path.

Note: The Hadoop cluster deployed on the IBM Storage Scale HDFS Transparency cluster side is not a requirement for Hadoop Storage Tiering with IBM Storage Scale solution. This Hadoop cluster deployed on the IBM Storage Scale HDFS Transparency cluster side shows that a Hadoop cluster can access data via HDFS or POSIX from the IBM Storage Scale file system.

This documentation configuration setup was done without the HDP components on the remote cluster.

This document used the following software versions for testing:

Table 1. Software stack and version details for Hadoop Storage Tiering test
Clusters	Stack	Version
HDP cluster	Ambari	2.6.1.0
	HDP	2.6.4.0
	HDP-Utils	1.1.0.22
IBM Storage Scale and HDFS Transparency cluster	IBM Storage Scale	5.0.0
	HDFS Transparency	2.7.3-2
	IBM Storage Scale Ambari management pack	2.4.2.4