Common Tuning

Tuning for Yarn comprises of two parts: tuning MapReduce2 and tuning Yarn.

Follow the configurations in to tune MapReduce2 and Table 1 to tune Yarn.

Table 1. Tuning MapReduce2
Configurations Comments
mapreduce.map.memory.mb Default: 1024 MB

Recommended: 2048 MB

mapreduce.reduce.memory.mb Default: 1024 MB

Recommended: 4096 MB or larger

The value could be considered according to the yarn.nodemanager.resource.memory-mb and the current task number on one node. For example, if you configure 100 GB for yarn.nodemanager.resource.memory-mb and you have

mapreduce.map.java.opts Recommended: 75% * mapreduce.map.memory.mb or 80% * mapreduce.map.memory.mb
mapreduce.reduce.java.opts Recommended: 75% * mapreduce.reduce.memory.mb or 80% * mapreduce.reduce.memory.mb.
mapreduce.job.reduce.slowstart.completedmaps Default: 0.05

Different Yarn jobs could take different value for this configuration. You could specify this value when submitting Yarn job if your job wants to take different value for this.

mapreduce.map.cpu.vcores 1
mapreduce.reduce.cpu.vcores 1
mapreduce.reduce.shuffle.parallelcopies Default: 5

Recommend: 30+

mapreduce.tasktracker.http.threads Default: 40

If your cluster has more than 40 nodes, you could increase this to ensure that the reduce task on each host could have at least 1 thread for shuffle data copy.

yarn.app.mapreduce.am.job.task.listener.thread-count Default: 30

If you have larger cluster for job (for example. your cluster is larger than 20 nodes and 16 logic processors per node) you could increase this to try.

mapreduce.task.io.sort.mb Default: 100(MB)

Recommended: 70% * mapreduce.map.java.opts

mapreduce.map.sort.spill.percent Default: 80

Take default value and not change this.

mapreduce.client.submit.file.replication Default: 10

Change it as the default replica of your IBM Storage Scale file system (check this by mmlsfs <your-fs-name> -r).

mapreduce.task.timeout Default: 300000ms

Change it into 600000s if you are running benchmark.

The following configurations are not used by Yarn and you do not need to change them:
mapreduce.jobtracker.handler.count
mapreduce.cluster.local.dir
mapreduce.cluster.temp.dir