Common Tuning
Tuning for Yarn comprises of two parts: tuning MapReduce2 and tuning Yarn.
Follow the configurations in to tune MapReduce2 and Table 1 to tune Yarn.
Configurations | Comments |
---|---|
mapreduce.map.memory.mb | Default: 1024 MB Recommended: 2048 MB |
mapreduce.reduce.memory.mb | Default: 1024 MB Recommended: 4096 MB or larger The value could be considered according to the yarn.nodemanager.resource.memory-mb and the current task number on one node. For example, if you configure 100 GB for yarn.nodemanager.resource.memory-mb and you have |
mapreduce.map.java.opts | Recommended: 75% * mapreduce.map.memory.mb or 80% * mapreduce.map.memory.mb |
mapreduce.reduce.java.opts | Recommended: 75% * mapreduce.reduce.memory.mb or 80% * mapreduce.reduce.memory.mb. |
mapreduce.job.reduce.slowstart.completedmaps | Default: 0.05 Different Yarn jobs could take different value for this configuration. You could specify this value when submitting Yarn job if your job wants to take different value for this. |
mapreduce.map.cpu.vcores | 1 |
mapreduce.reduce.cpu.vcores | 1 |
mapreduce.reduce.shuffle.parallelcopies | Default: 5 Recommend: 30+ |
mapreduce.tasktracker.http.threads | Default: 40 If your cluster has more than 40 nodes, you could increase this to ensure that the reduce task on each host could have at least 1 thread for shuffle data copy. |
yarn.app.mapreduce.am.job.task.listener.thread-count | Default: 30 If you have larger cluster for job (for example. your cluster is larger than 20 nodes and 16 logic processors per node) you could increase this to try. |
mapreduce.task.io.sort.mb | Default: 100(MB) Recommended: 70% * mapreduce.map.java.opts |
mapreduce.map.sort.spill.percent | Default: 80 Take default value and not change this. |
mapreduce.client.submit.file.replication | Default: 10 Change it as the default replica of your IBM Storage Scale file system (check this by |
mapreduce.task.timeout | Default: 300000ms Change it into 600000s if you are running benchmark. |
mapreduce.jobtracker.handler.count
mapreduce.cluster.local.dir
mapreduce.cluster.temp.dir