Tuning for IBM Storage Scale over shared storage or IBM Storage Scale System

If you deploy the Hadoop cluster through Ambari (for both HortonWorks and IBM® BigInsights® IOP), Ambari will do some default tuning according to your cluster.

The following table lists the most important configurations that we need to check for running HDFS Transparency over IBM Storage Scale System or shared storage:
Table 1. Tuning configurations for Transparency over IBM Storage Scale System or shared storage
Configurations Default value Recommended value Comments
dfs.replication 1 3, at least more than 1 Also set gpfs.storage.type to shared. For more information, see Configure storage type data replication.
dfs.blocksize 134217728 536870912
io.file.buffer.size 4096(bytes) Data block size of your file system or integral multiple of data block size of your file system but <= 1M The max value should be <= 1M.

If the value is too high, more JVM GC operations will occur.

dfs.datanode.handler.count 10 Refer the comments Calculate this according to Hadoop node number and Transparency DataNode number: (40 * HadoopNodes)/TransparencyDataNodeNumber
dfs.namenode.handler.count 10 Refer the comments This depends on the resource of NameNode and the Hadoop node number. If taking IBM Storage Scale Ambari Integration, 100 * loge (DataNodeNumber) is used to calculate the value for dfs.namenode.handler.count.

If not taking IBM Storage Scale Ambari integration, you could take 400 for ~10 Hadoop nodes; take 800 or higher for ~20 Hadoop nodes.

dfs.ls.limit 1000 100000
dfs.client.read.shortcircuit.streams.cache.size 4096 Refer the comments Change it as the IBM Storage Scale file system data blocksize.
dfs.datanode.transferTo.allowed true false If this is true, the IO will be 4K mmap() for gpfs.

The above tuning should also be done for Hadoop HDFS client. If you take HortonWorks HDP, change the above configuration on Ambari GUI. After these changes, you need to restart all services and ensure that these changes are synced into /etc/hadoop/conf/hdfs-site.xml and /usr/lpp/mmfs/hadoop/etc/hadoop/hdfs-site.xml (for HDFS Transparency 2.7.x) or /var/mmfs/hadoop/etc/hadoop/hdfs-site.xml (for HDFS Transparency 3.0.x). If you take open source Apache Hadoop, you need to update these configurations for Hadoop clients ($HADOOP_HOME/etc/hadoop/hdfs-site.xml) and take /usr/lpp/mmfs/bin/mmhadoopctl to sync your changes to HDFS Transparency configuration on all HDFS Transparency nodes.