Rack locality support for shared storage

HDFS Transparency 2.7.2-0, rack locality is supported for shared storage including IBM Storage® Scale System.

If your cluster meets the following conditions, you can enable this feature:

There is more than one rack in the IBM Storage Scale cluster.
Each rack has its own ToR (Top of Rack) Ethernet switch and there are rack-to-rack switches between the two racks.

Otherwise, enabling this feature will not benefit your Hadoop applications. The key advantage of the feature is to reduce the network traffic over the rack-to-rack Ethernet switch and make as many map/reduce tasks as possible to read data from the local rack.

The typical topology is shown by the following figure:

Figure 1. Topology of rack awareness locality for shared storage

For IBM Storage Scale over shared storage or IBM Storage Scale System, there is no data locality in the file system. The maximal file system block size from IBM Storage Scale file system is 16M bytes. However, on the Hadoop level, the dfs.blocksize is 128M bytes by default. The dfs.blocksize on the Hadoop level will be split into multiple 16MB blocks stored on the IBM Storage Scale file system. After enabling this feature, HDFS Transparency will consider the location of 8 blocks (16Mbytes * 8 = 128M bytes) including replica (if you take replica 2 for your file system) and will return the hostname with most of the data from the blocks to the applications so that the application can read most of the data from the local rack to reduce the rack-to-rack switch traffic. If there are more than one HDFS Transparency DataNodes in the selected rack, HDFS Transparency randomly returns one of them as the DataNode of the block location for that replica.

Enabling rack-awareness locality for shared storage

Select the HDFS Transparency nodes from the Hadoop node in Figure 1. You can select all of the Hadoop nodes as the HDFS Transparency nodes, or part of them as the HDFS Transparency nodes.
All of the selected HDFS Transparency nodes must be installed with IBM Storage Scale and can mount the file system locally. Select at least one of the Hadoop node from each of the rack for HDFS Transparency.
Select all Hadoop Yarn Node Managers as the HDFS Transparency nodes to avoid data transfer delays from the HDFS Transparency node to the Yarn Node Manager node for Map/Reduce jobs.

On the HDFS Transparency NameNode, modify the /usr/lpp/mmfs/hadoop/etc/hadoop/core-site.xml (for HDFS Transparency 2.7.x) or /var/mmfs/hadoop/etc/hadoop/core-site.xml (for HDFS Transparency 3.0.x):

<property>
    <name>net.topology.table.file.name</name>
    <value>/usr/lpp/mmfs/hadoop/etc/hadoop/topology.data</value>
  </property>
  <property>
    <name>net.topology.node.switch.mapping.impl</name>
    <value>org.apache.hadoop.net.TableMapping</value>
  </property>

On the HDFS Transparency NameNode, create the topology in /usr/lpp/mmfs/hadoop/etc/hadoop/topology.data (for HDFS Transparency2.7.x) or /var/mmfs/hadoop/etc/hadoop/topology.data (for HDFS Transparency 3.0.x):
```
# vim topology.data 

192.0.2.0         /dc1/rack1
192.0.2.1         /dc1/rack1
192.0.2.2          /dc1/rack1
192.0.2.3          /dc1/rack1
192.0.2.4           /dc1/rack2
192.0.2.5           /dc1/rack2
192.0.2.6           /dc1/rack2
192.0.2.7          /dc1/rack2
```
Note: The topology.data file uses IP addresses. To configure two IP addresses, see the Dual network interfaces section. The IP addresses here must be the IP addresses used for Yarn services and the IBM Storage Scale NSD server.
Also, it is required to specify the IP addresses for the IBM Storage Scale NSD servers. For Figure 1, specify the IP and corresponding rack information for NSD Server 1/2/3/4/5/6.
On the HDFS Transparency NameNode, modify the /usr/lpp/mmfs/hadoop/etc/hadoop/gpfs-site.xml (for HDFS Transparency 2.7.x) or /var/mmfs/hadoop/etc/hadoop/gpfs-site.xml (for HDFS Transparency 3.0.x):
```
  <property>
    <name>gpfs.storage.type</name>
    <value>rackaware</value>
  </property>
```
On the HDFS Transparency NameNode, run the mmhadoopctl connector syncconf /usr/lpp/mmfs/hadoop/etc/hadoop (for HDFS Transparency 2.7.x) or mmhadoopctl connector syncconf /var/mmfs/hadoop/etc/hadoop (for HDFS Transparency 3.0.x) command to synchronize the configurations to all the HDFS Transparency nodes.
Note: If you have HDP with Ambari Mpack 2.4.2.1 and later, the connector syncconf cannot be executed. Ambari manages the configuration syncing through the database.
(optional): To configure multi-cluster between IBM Storage Scale NSD servers and an IBM Storage Scale HDFS Transparency cluster, you must configure password-less access from the HDFS Transparency NameNode to at least one of the contact nodes from the remote cluster. For 2.7.3-2, HDFS Transparency supports only the root password-less ssh access. From 2.7.3-3, support of non-root password-less ssh access is added.
If password-less ssh access configuration cannot be set up, starting from HDFS transparency 2.7.3-2, you can configure gpfs.remotecluster.autorefresh as false in the /usr/lpp/mmfs/hadoop/etc/hadoop/gpfs-site.xml. This prevents Transparency from automatically accessing the remote cluster to retrieve information.
1. If you are using Ambari, add the gpfs.remotecluster.autorefresh=false field in IBM Storage Scale service > Configs tab > Advanced > Custom gpfs-site.
2. Stop and start all the services.
3. Manually generate the mapping files and copy them to all the HDFS Transparency nodes. For more information, see option 3 under the Passwordless ssh access section.