Common Tuning

Tuning for Yarn comprises of two parts: tuning MapReduce2 and tuning Yarn.

Follow the configurations in to tune MapReduce2 and Table 1 to tune Yarn.

Table 1. Tuning MapReduce2
Configurations	Comments
mapreduce.map.memory.mb	Default: 1024 MB Recommended: 2048 MB
mapreduce.reduce.memory.mb	Default: 1024 MB Recommended: 4096 MB or larger The value could be considered according to the yarn.nodemanager.resource.memory-mb and the current task number on one node. For example, if you configure 100 GB for yarn.nodemanager.resource.memory-mb and you have
mapreduce.map.java.opts	Recommended: 75% * mapreduce.map.memory.mb or 80% * mapreduce.map.memory.mb
mapreduce.reduce.java.opts	Recommended: 75% * mapreduce.reduce.memory.mb or 80% * mapreduce.reduce.memory.mb.
mapreduce.job.reduce.slowstart.completedmaps	Default: 0.05 Different Yarn jobs could take different value for this configuration. You could specify this value when submitting Yarn job if your job wants to take different value for this.
mapreduce.map.cpu.vcores	1
mapreduce.reduce.cpu.vcores	1
mapreduce.reduce.shuffle.parallelcopies	Default: 5 Recommend: 30+
mapreduce.tasktracker.http.threads	Default: 40 If your cluster has more than 40 nodes, you could increase this to ensure that the reduce task on each host could have at least 1 thread for shuffle data copy.
yarn.app.mapreduce.am.job.task.listener.thread-count	Default: 30 If you have larger cluster for job (for example. your cluster is larger than 20 nodes and 16 logic processors per node) you could increase this to try.
mapreduce.task.io.sort.mb	Default: 100(MB) Recommended: 70% * mapreduce.map.java.opts
mapreduce.map.sort.spill.percent	Default: 80 Take default value and not change this.
mapreduce.client.submit.file.replication	Default: 10 Change it as the default replica of your IBM Storage® Scale file system (check this by `mmlsfs <your-fs-name> -r`).
mapreduce.task.timeout	Default: 300000ms Change it into 600000s if you are running benchmark.

The following configurations are not used by Yarn and you do not need to change them:

mapreduce.jobtracker.handler.count
mapreduce.cluster.local.dir
mapreduce.cluster.temp.dir