Best practices for table partitioning in a warehouse environment (Updated September 2019)
Table or Range partitioning is a powerful feature of Db2 that facilitates good database design principles that can help lead to easier maintenance operations, increased data availability and more optimized queries.
Note: Table partitioning is a feature of row store tables only.
But why is table partitioning so good in a warehousing environment? Here are some reasons:
- Range-specific maintenance operations
- Where data partitions (ranges of data within a table) are placed in individual table spaces, maintenance operations can be targeted at active data only.
- Many Db2 commands, for example REORG and BACKUP, can be specified to execute against specific table spaces or data partitions. This can significantly reduce maintenance times.
- Data lifecycle management
- Table partitioning large fact tables by date means that older data in a table can be detached as an online operation. This can help eliminate the need to perform costly delete statements.
- In addition, as data ages, it can be moved as an online operation to less costly storage. In Db2 V11.5 this multi-temperature data management is facilitated by the new storage groups feature.
- Partition elimination
- Range Partitioning benefits queries where the query spans one or a subset of the range partitioning key. The Db2 optimizer can then eliminate entire data partitions from the query, and this reduction in rows/read (I/O) can help increase query performance.
- Local indexes
- Local indexes can help to significantly reduce index maintenance and increase query performance where significant sorting is not required.
- In addition, local indexes can be placed in separate table spaces which provides more flexibility in building a backup schedule and a recovery strategy. For example, in a restore scenario you have the choice between restore or rebuild of indexes.
- Backup perfomance
- Backup performance can be improved by backing up just those table spaces (table ranges) that are active.
- By balancing the average size of your table spaces, parallelism within the BACKUP operations can also increase, helping to reduce the elapsed time of your backup operations.