Clustered base tables

A clustered base table (CBT) is a user table that contains data that is organized by using one to four organizing keys. An organizing key is a column of the table that you specify for clustering the table records; the organizing keys are used to group records within the table and save them in the same or nearby extents. Netezza Performance Server also creates zone maps for the organizing columns to accelerate the performance of queries on that table that restrict using the organizing keys.

The following figure shows a simple model of a table, such as a transaction table. In its unorganized form, the data is organized by the date and time that each transaction occurred, and the color indicates a unique transaction. If your queries on the table most often query by date/time restrictions, those queries run well because the date/time organization matches the common restrictions of the queries.

However, if most queries restrict on transaction type, you can increase query performance by organizing the records by transaction type. Queries that restrict on transaction type will have improved performance because the records are organized and grouped by the key restriction; the query can obtain the relevant records more quickly, whereas they would have to scan much more of the table in the date/time organization to find the relevant transactions. By organizing the data in the table so that commonly filtered data is located in the same or nearby disk extents, your queries can take advantage of zone maps to eliminate unnecessary disk scans to find the relevant records.

Figure 1. Organizing tables with CBTs

CBTs are most often used for large fact or event tables that can have millions or billions of rows. If the table does not have a record organization that matches the types of queries that run against it, scanning the records of such a large table requires a lengthy processing time as full disk scans can be needed to gather the relevant records. By reorganizing the table to match your queries against it, you can group the records to take advantage of zone maps and improve performance.

CBTs offer several benefits:

  • CBTs support “multi-dimension” lookups where you can organize records by one, two, three, or four lookup keys. In the example that is shown in the previous figure, if your queries commonly restrict on transaction type and store ID, you can organize records by using both of those keys to improve query performance.
  • CBTs improve query performance by adding more zone maps for a table because the organizing key columns are also zone mapped (if the organizing column data type supports zone maps).
  • CBTs increase the supported data types for zone-mapped columns to improve performance for queries that restrict along multiple dimensions.
  • CBTs incrementally organize data within your user tables in situations where data cannot easily be accumulated in staging areas for pre-ordering before insertions/loads. CBTs can help you to eliminate or reduce pre-sorting of new table records before a load/insert operation.
  • CBTs save disk space. Unlike indexes, materialized views and other auxiliary data structures, CBTs do not replicate the base table data and do not allocate more data structures.