Enabling concurrent file-to-file merge using the mrsh utility

Within the MapReduce framework, you can tune the merge-sort during the reducer phase for performance gains. You can do this using the mrsh utility. By default, concurrent file-to-file merge is disabled.

About this task

Note: Concurrent merge and hash-bash aggregation cannot coexist. If you enable concurrent file-to-file merge, ensure that hash-based aggregation is disabled.

Procedure

Add the pmr.reduce.f2f.factor option to your job submission command:

$ mrsh jar jarfile [classname] -Dpmr.reduce.f2f.factor=value [args]

where value is a range of numbers greater than 1 (exclusive) but less than or equal to 2 (inclusive).

For example:

mrsh jar $PMR_HOME/version/os_type/samples/hadoop-examples-1.1.1.jar wordcount -Dpmr.reduce.f2f.factor=1.5 hdfs://host_name:9000/input hdfs://host_name:9000/output