Partitioning is based on a key column modulo the number of partitions. This method is similar to hash by field, but involves simpler computation.
In data mining, data is often arranged in buckets, that is, each record has a tag containing its bucket number. You can use the modulus partitioner to partition the records according to this number. The modulus partitioner assigns each record of an input data set to a partition of its output data set as determined by a specified key field in the input data set. This field can be the tag field.
partition_number = fieldname mod number_of_partitions
Column name | SQL type |
---|---|
bucket | Integer |
date | Date |
The bucket is specified as the key field, on which the modulus operation is calculated.
bucket | date |
---|---|
64123 | 1960-03-30 |
61821 | 1960-06-27 |
44919 | 1961-06-18 |
22677 | 1960-09-24 |
90746 | 1961-09-15 |
21870 | 1960-01-01 |
87702 | 1960-12-22 |
4705 | 1961-12-13 |
47330 | 1961-03-21 |
88193 | 1962-03-12 |
Partition 0 | Partition 1 | Partition 2 | Partition 3 |
---|---|---|---|
61821 1960-06-27 | 21870 1960-01-01 | 64123 1960-03-30 | |
22677 1960-09-24 | 87702 1960-12-22 | 44919 1961-06-18 | |
47051961-12-13 | 47330 1961-03-21 | ||
88193 1962-03-12 | 90746 1961-09-15 |
None of the key fields can be divided evenly by 4, so no data is written to Partition 0.