Specify distribution keys

In Netezza Performance Server for Cloud Pak for Data and Netezza Performance Server for Cloud Pak for Data System, the table distribution key is used to determine how to distribute (or stripe) the table data across all active data slices in the system. The Netezza Performance Server system requires that all tables have a distribution method, either hash or random.

When you use the commands CREATE TABLE or CREATE TABLE AS, you can either specify the method or allow the Netezza Performance Server to select one.

  • With the DISTRIBUTE ON (hash) command, you can specify up to four columns as the distribution key.
  • If there is no obvious group of columns that can be combined as the distribution key, you can specify random distribution. Random distribution means that the Netezza Performance Server distributes the data randomly across the data slices.
    Random distribution results in the following:
    • Reducing skew when you are loading data.
    • Eliminating the need to pick a distribution key when you are loading a large database that has many tables with few rows. In such cases, picking a good distribution key might have little performance benefit, but it gains the advantage of a dispersed distribution of data.
    • Allowing you to verify a good distribution key by first loading the data randomly, then by using the GENERATE STATISTICS command, and running selects on the database columns to get the min/max and counts. With this information, you can better choose which columns to use for the distribution key.
    • If you do not specify a distribution when you create a table, the system chooses a distribution key and there is no way to control that choice.