Choose a distribution key for a subset table

When you run a query, the results flow from the data slices to the SPUs to the host to the application. A query can create a table (rather than return results to the application). If you create a subset/summary table, your subset table inherits the parent table distribution key, and the subset records are created and stored locally on each data slice.

For example, perhaps you have a large table with many records and columns, and want to create a summary table from it, maybe with just one day of data, and with only some of the original columns. If the new table uses the same distribution key as the original table, then the new records will reside on the same data slices as the original table records. The system has no need to send the records to the host (and consume transmission time and host processing power). Rather, the SPUs create the records locally. The SPUs read from the same data slices and write back out to same data slices. This way of creating a table is much more efficient. In this case, the SPU is basically communicating with only its data slices.

Choosing the same distribution key causes the system to create the table local to each data slice (reading from the original table, writing to the new table).

create [ temporary | temp ] TABLE table_name [ (column [, ...] ) ]
as select_clause [ distribute on ( column [, ...] ) ];

When you create a subset table or temp table, you do not specify a new distribution key or distribution method. Instead, allow the new table to inherit the distribution key of the parent table. This avoids the extra data distribution that can occur because of the non-match of inherited and specified keys.