This topic shows how to architect and configure a Hadoop Storage Tiering solution with a suite of test cases executed based on this configuration.
The Hadoop Storage Tiering with IBM StorageĀ® Scale architecture is shown in Figure 1 and Figure 2:
Figure 1. Hadoop Storage Tiering with IBM Storage Scale without HDP cluster
Figure 2. Hadoop Storage Tiering with IBM Storage Scale with HDP clusters
The architecture for the Hadoop Storage Tiering has a native HDFS cluster (local cluster), seen on the left hand side, and an IBM Storage Scale HDFS Transparency cluster (remote cluster), seen on the right hand side. The jobs running on the native HDFS cluster can access the data from the native HDFS or from the IBM Storage Scale HDFS Transparency cluster according to the input or output data path or from the metadata path. For example, Hive job from Hive metadata path.
Note: The Hadoop cluster deployed on the IBM Storage Scale HDFS Transparency cluster side is not a requirement for Hadoop Storage Tiering with IBM Storage Scale solution. This Hadoop cluster deployed on the IBM Storage Scale HDFS Transparency cluster side shows that a Hadoop cluster can access data via HDFS or POSIX from the IBM Storage Scale file system.
This documentation configuration setup was done without the HDP components on the remote cluster.
This document used the following software versions for testing:
Table 1. Software stack and version details for Hadoop Storage Tiering test