Maximal Map and Reduce Task Calculation
This topic describes the calculations for Maximal Map and Reduce Task.
Value | Calculation |
---|---|
MaxMapTaskPerWave_mem | (yarn.nodemanager.resource.memory-mb * YarnNodeManagerNumber - yarn.app.mapreduce.am.resource.mb)/mapreduce.map.memory.mb |
MaxMapTaskPerWave_vcore | (yarn.nodemanager.resource.cpu-vcores * YarnNodeManagerNumber - yarn.app.mapreduce.am.resource.cpu)/yarn.scheduler.minimum-allocation-vcores |
TotalMapTaskPerWave | Equal to MaxMapTaskPerWave_mem by default |
MaxReduceTaskPerNode_mem | (yarn.nodemanager.resource.memory * YarnNodeManagerNumber - yarn.app.mapreduce.am.resource.mb)/mapreduce.reduce.memory.mb |
MaxReduceTaskPerNode_vcore | (yarn.nodemanager.resource.cpu-vcores * YarnNodeManagerNumber - yarn.app.mapreduce.am.resource.cpu)/yarn.scheduler.minimum-allocation-vcores |
TotalReduceTaskPerWave | Equal to MaxReduceTaskPerNode_mem by default |
If yarn.scheduler.capacity.resource-calculator is not changed, by default, MaxMapTaskPerWave_mem will take effective. Under this situation, if the MaxMapTaskPerWave_vcore is more than 2 times of MaxMapTaskPerWave_mem, you still have a lot of CPU resource and you could increase MaxMapTaskPerWave_mem by either increasing yarn.nodemanager.resource.memory-mb or decreasing mapreduce.map.memory.mb. However, if MaxMapTaskPerWave_vcore is smaller than MaxMapTaskPerWave_mem, means more than 1 task will run on the same logic processor and this might bring additional context switch cost.
It is similar for MaxReduceTaskPerNode_mem and MaxReduceTaskPerNode_vcore.
TotalMapTaskPerWave is the totally concurrent task number that you could run in one wave. TotalReduceTaskPerWave is the totally concurrent task number that you could run in one wave.