Maximal Map and Reduce Task Calculation

This topic describes the calculations for Maximal Map and Reduce Task.

Value Calculation
MaxMapTaskPerWave_mem (yarn.nodemanager.resource.memory-mb * YarnNodeManagerNumber - yarn.app.mapreduce.am.resource.mb)/mapreduce.map.memory.mb
MaxMapTaskPerWave_vcore (yarn.nodemanager.resource.cpu-vcores * YarnNodeManagerNumber - yarn.app.mapreduce.am.resource.cpu)/yarn.scheduler.minimum-allocation-vcores
TotalMapTaskPerWave Equal to MaxMapTaskPerWave_mem by default
MaxReduceTaskPerNode_mem (yarn.nodemanager.resource.memory * YarnNodeManagerNumber - yarn.app.mapreduce.am.resource.mb)/mapreduce.reduce.memory.mb
MaxReduceTaskPerNode_vcore (yarn.nodemanager.resource.cpu-vcores * YarnNodeManagerNumber - yarn.app.mapreduce.am.resource.cpu)/yarn.scheduler.minimum-allocation-vcores
TotalReduceTaskPerWave Equal to MaxReduceTaskPerNode_mem by default

If yarn.scheduler.capacity.resource-calculator is not changed, by default, MaxMapTaskPerWave_mem will take effective. Under this situation, if the MaxMapTaskPerWave_vcore is more than 2 times of MaxMapTaskPerWave_mem, you still have a lot of CPU resource and you could increase MaxMapTaskPerWave_mem by either increasing yarn.nodemanager.resource.memory-mb or decreasing mapreduce.map.memory.mb. However, if MaxMapTaskPerWave_vcore is smaller than MaxMapTaskPerWave_mem, means more than 1 task will run on the same logic processor and this might bring additional context switch cost.

It is similar for MaxReduceTaskPerNode_mem and MaxReduceTaskPerNode_vcore.

TotalMapTaskPerWave is the totally concurrent task number that you could run in one wave. TotalReduceTaskPerWave is the totally concurrent task number that you could run in one wave.