Select a distribution key

When you are choosing the columns as the distribution keys for a table, choose columns that result in a uniform distribution of the rows and optimal access to the data.

Consider the following factors:
  • The more distinct the distribution key values, the better.
  • The system distributes rows with the same distribution key value to the same data slice.
  • Parallel processing is more efficient when you distribute table rows evenly across the data slices.
  • Tables that are used together should use the same columns for their distribution key. For example, in an order system application, use the customer ID as the distribution key for both the customer table and the order table.
  • If a particular key is used largely in equijoin clauses, then that key is a good choice for the distribution key.