Multiple Hadoop clusters over the same file system

By using HDFS transparency, you can configure multiple Hadoop clusters over the same IBM Storage Scale file system. For each Hadoop cluster, you need one HDFS transparency cluster to provide the file system service.

Figure 1. Two Hadoop Clusters over the same IBM Storage Scale file system
Two Hadoop Clusters over the same IBM Storage Scale file system
You can configure Node1 to Node6 as an IBM Storage Scale cluster (FPO or shared storage mode). Then configure Node1 to Node3 as one HDFS transparency cluster and Node4 to Node6 as another HDFS transparency cluster. HDFS transparency cluster1 and HDFS transparency cluster2 take different configurations by changing /usr/lpp/mmfs/hadoop/etc/hadoop/gpfs-site.xml (for HDFS Transparency 2.7.3-x) or /var/mmfs/hadoop/etc/hadoop/gpfs-site.xml (for HDFS Transparency 3.0.x):
  1. Change the gpfs-site.xml for HDFS transparency cluster1 to store the data under /<gpfs-mount-point>/<hadoop1> (gpfs.data.dir=hadoop1 in gpfs-site.xml).
  2. Run mmhadoopctl connector syncconf /usr/lpp/mmfs/hadoop/etc/hadoop (for HDFS Transparency 2.7.3-x) or mmhadoopctl connector syncconf /var/mmfs/hadoop/etc/hadoop (for HDFS Transparency 3.0.x) to synchronize the gpfs-site.xml from Step1 to all other nodes in HDFS transparency cluster1.
  3. Change the gpfs-site.xml for HDFS transparency cluster2 to store the data under /<gpfs-mount-point>/<hadoop2> (gpfs.data.dir=hadoop2 in gpfs-site.xml).
  4. Run mmhadoopctl connector syncconf /usr/lpp/mmfs/hadoop/etc/hadoop (for HDFS Transparency 2.7.3-x) or mmhadoopctl connector syncconf /var/mmfs/hadoop/etc/hadoop (for HDFS Transparency 3.0.0) to synchronize the gpfs-site.xml from Step3 to all other nodes in HDFS transparency cluster2.
  5. Restart the HDFS transparency services.