Data skew

The performance of the system is directly linked to uniform distribution of the user data across all of the data slices in the system. When you create a table and then load the data into the system, the rows of the table should be distributed uniformly among all the data slices. If some data slices have more rows of a table than others, the data slices with more data and the SPUs that manage them work harder, longer, and need more resources and time to complete their jobs. These data slices and the SPUs that manage them become a performance bottleneck for your queries. Uneven distribution of data is called skew. An optimal table distribution has no skew.

Important: If you configure the system to use random chunk distribution, tables that are created with DISTRIBUTE ON RANDOM are intentionally skewed to one or a small number of extents to reduce the allocated space. These chunked tables are usually very small tables that fit within one or a few extents and therefore are not as impacted by skew as are larger tables with millions of rows.

Skew can happen while you are distributing or loading the data into the following types of tables:

Base tables: Database administrators define the tables within databases for the user data.
Intra-session tables: Applications or SQL users create temp tables.