Enabling concurrent file-to-file merge using the mrsh utility
Within the MapReduce framework, you can tune the merge-sort during the reducer phase for performance gains. You can do this using the mrsh utility. By default, concurrent file-to-file merge is disabled.
About this task
Note: Concurrent merge and hash-bash aggregation cannot coexist.
If you enable concurrent file-to-file merge, ensure that hash-based
aggregation is disabled.
Procedure
Add the pmr.reduce.f2f.factor option
to your job submission command:
$ mrsh jar jarfile [classname] -Dpmr.reduce.f2f.factor=value [args]
where value is a range of numbers greater than 1 (exclusive) but less than or equal to 2 (inclusive).
For example:
mrsh jar $PMR_HOME/version/os_type/samples/hadoop-examples-1.1.1.jar wordcount -Dpmr.reduce.f2f.factor=1.5 hdfs://host_name:9000/input hdfs://host_name:9000/output