Configuration and tuning of Hadoop workloads

All configuration options listed in this section are applicable only to Hadoop-like applications such as Hadoop and Spark:
Configuration Default Value Recommended Comment
disableInodeUpdateOnFdatasync No Yes  
dataDiskCacheProtectionMethod 0 2 Change this to 2 if you turn on dataOnly disk write cache (without battery protection).
Note: If the cluster is not dedicated for Hadoop workloads, take the default value for the above configurations.
For Hadoop-like workloads, one JVM process can open a lot of files. Therefore, tune the ulimit values:
vim /etc/security/limits.conf
# add the following lines at the end of /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
* soft nproc 65536
* hard nproc 65536

kernel.pid_max

Usually, the default value is 32K. If you see the error allocate memory or unable to create new native thread, try to increase kernel.pid_max by adding kernel.pid_max=99999 at the end of /etc/sysctl.conf and then sysctl -p.