Hadoop Storage Tiering mode without native HDFS federation

This topic shows how to architect and configure a Hadoop Storage Tiering solution with a suite of test cases executed based on this configuration.

The Hadoop Storage Tiering with IBM Storage Scale architecture is shown in Figure 1 and Figure 2:
Figure 1. Hadoop Storage Tiering with IBM Storage Scale without HDP cluster
Hadoop Storage Tiering with IBM Storage Scale with single HPD cluster
Figure 2. Hadoop Storage Tiering with IBM Storage Scale with HDP clusters
Hadoop Storage Tiering with IBM Storage Scale with 2 HPD clusters

The architecture for the Hadoop Storage Tiering has a native HDFS cluster (local cluster), seen on the left hand side, and an IBM Storage Scale HDFS Transparency cluster (remote cluster), seen on the right hand side. The jobs running on the native HDFS cluster can access the data from the native HDFS or from the IBM Storage Scale HDFS Transparency cluster according to the input or output data path or from the metadata path. For example, Hive job from Hive metadata path.

Note: The Hadoop cluster deployed on the IBM Storage Scale HDFS Transparency cluster side is not a requirement for Hadoop Storage Tiering with IBM Storage Scale solution. This Hadoop cluster deployed on the IBM Storage Scale HDFS Transparency cluster side shows that a Hadoop cluster can access data via HDFS or POSIX from the IBM Storage Scale file system.

This documentation configuration setup was done without the HDP components on the remote cluster.

This document used the following software versions for testing:
Clusters Stack Version
HDP cluster Ambari 2.6.1.0
HDP 2.6.4.0
HDP-Utils 1.1.0.22
IBM Storage Scale & HDFS Transparency cluster IBM Storage Scale 5.0.0
HDFS Transparency 2.7.3-2
IBM Storage Scale Ambari management pack 2.4.2.4