The Partitioner class

Partitioner controls the partitioning of intermediate data. It assigns a partition number for each intermediate <key, value> pair. The framework uses this number to redistribute intermediate records before the reduce step starts. Records with the same partition number will be stored on the same dataslice. You can define your own Partitioner class, which overrides the getPartition(KEY key, VALUE value, int numPartitions) partitioning method. The value of numPartitions is calculated by the framework and corresponds to the number of available NPS dataslices that can run map/reduce tasks.

When no partitioner is defined for a job, data is redistributed according to values produced by NPS' build-in hash function applied to intermediate keys.