Data skew
The performance of the system is directly linked to uniform distribution of the user data across all of the data slices in the system. When you create a table and then load the data into the system, the rows of the table should be distributed uniformly among all the data slices. If some data slices have more rows of a table than others, the data slices with more data and the SPUs that manage them work harder, longer, and need more resources and time to complete their jobs. These data slices and the SPUs that manage them become a performance bottleneck for your queries. Uneven distribution of data is called skew. An optimal table distribution has no skew.
- Base tables
- Database administrators define the tables within databases for the user data.
- Intra-session tables
- Applications or SQL users create temp tables.