Hash-based aggregation for key sorting
Within the MapReduce framework, key sorting occurs during the spill and merge phase of a map task and during the fetch-merge and sort phase of a reduce task. Within the map phase, all <key, value> pairs are compared first by partition ID (integer), then by key (of the type specified by the user). Within the reduce phase, the reduce always and sometimes unnecessarily sorts data by key. Avoid this key sorting on mappers and reducers, especially inside partitions when a strict order between keys is not needed, by configuring a hash table-based aggregation. This configuration also improves the performance of certain types of queries.
Note: Hash-bash aggregation and concurrent file-to-file merge cannot
coexist. If you enable hash-based aggregation, ensure that concurrent
merge is disabled.