Configuration and tuning of Hadoop workloads
All configuration options listed in this section are applicable only to Hadoop-like applications
such as Hadoop and Spark:
Configuration | Default Value | Recommended | Comment |
---|---|---|---|
disableInodeUpdateOnFdatasync |
No | Yes | |
dataDiskCacheProtectionMethod
|
0 | 2 | Change this to 2 if you turn on dataOnly disk write cache (without battery
protection). |
Note: If the cluster is not dedicated for Hadoop workloads, take the default value for the
above configurations.
For Hadoop-like workloads, one JVM process can open a lot of files. Therefore, tune the
ulimit
values:vim /etc/security/limits.conf
# add the following lines at the end of /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
* soft nproc 65536
* hard nproc 65536
kernel.pid_max
Usually, the default value is 32K. If you see the error allocate
memory or unable to create new native thread, try to
increase kernel.pid_max
by adding kernel.pid_max=99999
at the end
of /etc/sysctl.conf and then sysctl -p
.