db_skew script
The db_skew script exports the details of database skew for all the tables in a specified database. This script can be helpful in identifying storage problems that you encounter while the data is at rest on disk.
The performance of the system is influenced by distribution of the user data across all of the partitions in the system. When a user creates a table and then loads the data into the system, ideally the data should be distributed evenly among all partitions. When some partitions have more data than others, the partitions with more data work slower and need more resources to complete their jobs. Such uneven distribution of data is called skew, and it might become a performance bottleneck for users' queries.
The db_skew command is part of the Db2 support tools. To run it, a database name is required following -d | -db. By default, all the tables with storage usage skew more than 100 MB are be listed. You can also specify minimum and maximum values to filter out specific tables. Arguments like -sort name, -sort size, -sort skew help to sort the output.
Syntax
db_skew -d|-db <dbname> [amount] [-verbose] [-sort <name>|<skew>|<size>]
Parameters
- amount
- Specifies the amount of table skew. Any table with this amount of skew, or more, is listed in
the output.
The default value is 100, as in 100 MB. If the space usage for a particular table varies by more than 100 MB, from the data slice with the least amount of data to the data slice with the greatest amount of data, that table is listed in the output.
Specifying a value of 0 results in all non-empty tables in the database being listed.
- -verbose
- For each of the skewed tables, includes information about the distribution keys in the output.
- -sort name|size|skew
- Sorts output. Options are as follows:
- name
- Sorts the output by the database and table names (the Database and Table columns in the output). This is the default behavior.
- size
- Sorts the output by the table size (the Total MB column in the output).
- skew
- Sorts the output by the amount of skew (the Min-Max SKEW (MB) column in the output).
Output
$ db_skew -d BLUDB 0
Table Skew That Is > 0 MB
===========================
Schema | Table | Total MB | Minimum | Average | Maximum | Min-Max SKEW (MB)| SKEW Ratio | Partitions
----------+-----------------------------+----------+---------+---------+---------+------------------+--------------+--------
TEST_01 | CTAS_EXAMPLE | 536 | 0 | 6 | 536 | 536 | .00 - 92.00 | 7
TEST_01 | BALANCED_DISTRIBUTION | 6,176 | 67 | 67 | 67 | 0 | 1.00 - 1.00 | 18
TEST_01 | EMPTY_TABLE | 0 | 0 | 0 | 0 | 0 | .00 - .00 | 12
TEST_02 | SAMPLE_TABLE | 13,717 | 134 | 149 | 201 | 67 | .90 - 1.35 | 3
TEST_02 | TOTALLY_SKEWED_DISTRIBUTION | 536 | 0 | 6 | 536 | 536 | .00 - 92.00 | 2
TEST_03 | MY_TEMP1 | 536 | 0 | 6 | 536 | 536 | .00 - 92.00 | 8
================================================================================
The Min-Max SKEW (MB) is the difference between the maximum and minimum skew value expressed in MB.
The skew ratio (the value of the SKEW Ratio column) is expressed as a range. At the low end, it might be .00, which indicates that one or more partitions are storing no data for the table. At the high end, the value might match the number of partitions on the box. For example, for a database which has 32 partitions, the maximum value would be 32.00, and it would indicate that one partition contains all data for the table, that is, the table is totally skewed. A table that is using the same amount of storage across all data slices (that is, there is no discernible skew) has a ratio of 1.00 - 1.00.
All the storage sizes are rounded to the nearest MB for display purposes. Values of less than .5 are rounded down, and values of .5 or more are rounded up. Therefore, a value of 0 does not necessarily mean that the table is empty. It does mean that the storage value is less than 0.5 MB.