Criteria for selecting distribution keys

Use the following rules when you are selecting a unique or non-unique column as the distribution key for the table:
  • Use columns for the distribution key that distribute table rows evenly across the data slices. The more singular the values for a column, the more optimal their distribution.
  • Use columns for the distribution key that is based on the selection set that you use most frequently to retrieve rows from the table.
  • Select as few columns as possible for the distribution key to optimize the generality of the selection.
  • Base the column selection on an equality search because if both tables distribute on the equality columns, the system can perform the join operation locally.
  • Do not use boolean keys, for example, True/False, I/0, or M/F, because the system distributes rows with the same hash value to the same data slices; thus, the table would be divided across only two data slices.