Explain information for column-organized tables
Explain information is captured to support column-organized tables. You can use this information to determine how your application performs when it uses this functionality.
The CTQ plan operator represents the transition between column-organized data processing and row-organized data processing.
- Set the EXPLAIN mode on by using the CURRENT EXPLAIN MODE special
register as follows:
db2 SET CURRENT EXPLAIN MODE YES
- Issue your query against column-organized tables.
- Issue the db2exfmt command to format the contents
of the explain tables and obtain the access plan. The following example
shows you how to use this command against the SAMPLE database:
db2exfmt -d sample -1 -o output.exfmt
Improving the performance of queries on column-organized tables by using indexes
Some queries that have selective search conditions might run faster when accessing column-organized tables if indexes are used. Unique indexes are implicitly created to support enforced primary and unique key constraints. Indexes can also be explicitly created by using the CREATE INDEX statement. In previous versions of Db2®, unique indexes were used for select, update, or delete operations that affect only one row in a column-organized table. An example is shown in the following index access plan:
Rows
RETURN
( 1)
Cost
I/O
|
1
CTQ
( 2)
41.3466
6
|
1
NLJOIN
( 3)
41.3449
6
/----+-----\
1 1
CTQ TBSCAN
( 4) ( 6)
6.91242 34.4325
1 5
| |
1 98168
IXSCAN CO-TABLE: VICCHANG
( 5) /BIC/SZCCUST
6.91242 Q1
1
|
98168
INDEX: VICCHANG
/BIC/SZCCUST~0
Q1
This plan is equivalent to a FETCH-IXSCAN combination that is used to access row-organized data. For index access to column-organized data, row-organized data processing retrieves the rowid from the index by using IXSCAN(5) and passes it to column-organized data processing using CTQ(4). CTQ(4) represents a column-organized table queue that passes data from row-organized data processing to column-organized data processing. TBSCAN(6) locates the columns that are identified by the rowid. TBSCAN(6) might apply additional predicates if necessary, or reapply the IXSCAN predicates in some situations. Specifically, if the table is being accessed under the UR isolation level, or the access is in support of an update or delete operation, the TBSCAN needs to apply only those predicates that were not already applied by the IXSCAN. Otherwise, the TBSCAN needs to reapply all of the IXSCAN predicates. NLJOIN(3) represents the process of retrieving the rowid from row-organized data processing and passing it to the column-organized TBSCAN.
- The result of the index access is joined to another column-organized table
- The index scan returns at most one row
- Joins
- Aggregation
- Removal of duplicate rows
- Sorting by using column-organized processing
The explain representation for a column-organized FETCH is similar to that of a row-organized FETCH, except for arguments that are not applicable to column-organized processing. The example that is shown previously for a query that uses an index in previous versions of Db2 would appear as the following when a FETCH operator is used:
Rows
RETURN
( 1)
Cost
I/O
|
1
FETCH
( 2)
17.0787
1
/----+----\
1 98168
IXSCAN CO-TABLE: VICCHANG
( 3) /BIC/SZCCUST
6.91242 Q1
1
|
98168
INDEX: VICCHANG
/BIC/SZCCUST~0
Q1
- Jump scans
- Deferred fetch index plans (index ANDing, ORing, and list prefetch)
- Star join and zigzag join
- Scan Sharing
Intra-partition parallel index scans are not supported for column-organized tables.
Update and delete operations that use an index scan on a column-organized table are not supported by the FETCH operator. Update and delete operations that affect only a single row are supported by using either index-only access or the nested-loop fetch approach.
Common table expression
A common table expression defines the result of a table that you can specify in the FROM clause of an SQL Statement. Statements with common table expressions against column-organized tables can have more efficient execution plans.
WITH cse(c1,c2) AS (SELECT t1.c1, MAX(c2) FROM t1 GROUP BY t1.c1)
SELECT a.c2 FROM cse a, cse b
WHERE a.c1 = b.c2;
The
following sample execution plan correspond to that
query. Rows
RETURN
( 1)
Cost
I/O
|
25
LTQ
( 2)
2078.07
30
|
25
CTQ
( 3)
2074
30
|
25
^HSJOIN
( 4)
2073.79
30
/-+--\
25 25
TBSCAN TBSCAN
( 5) ( 9)
1036.85 1036.85
15 15
| |
25 25
TEMP TEMP
( 6) ( 6)
1033.24 1033.24
15 15
|
25
GRPBY
( 7)
1032.45
15
|
50000
TBSCAN
( 8)
792.149
15
|
50000
CO-TABLE: URSU
T1
Q1
The
execution plan includes the TEMP(6) operator, which materializes the results of the common table
expression during column-organized data
processing. Operators TBSCAN(5) and TBSCAN(9) scan the output of the TEMP(6) operator and send the
data to the HSJN(4) operator. Afterward, the CTQ(3) operator sends the results of the join operation
from column-organized
data processing to row-organized data processing.Column-organized sorts
SELECT * FROM tc1 ORDER BY c1,c2
SELECT rank() OVER (PARTITION BY c1 ORDER BY c2) AS rnk, c2 FROM tc1
A column-organized sort is typically executed in parallel using multiple database agents and can use different methods to distribute the data among the agents, depending on the semantics of the SQL statement. The type of parallel sorting method is indicated by the SORTTYPE argument of the SORT operator along with the sort key columns and the sort partitioning columns. The SORTTYPE argument can have the values GLOBAL, PARTITIONED, or MERGE for a column-organized SORT.
SELECT * FROM tc1 ORDER BY c1, c2
Rows
RETURN
( 1)
Cost
I/O
|
1000
LMTQ
( 2)
351.462
10
|
1000
CTQ
( 3)
289.471
10
|
1000
TBSCAN
( 4)
288.271
10
|
1000
SORT
( 5)
278.209
10
|
1000
TBSCAN
( 6)
85.5805
10
|
1000
CO-TABLE: DB2USER
TC1
Q1
Arguments:
---------
SORTKEY : (Sort Key column)
1: Q1.C1(A)
2: Q1.C2(A)
SORTTYPE: (Intra-Partition parallelism sort type)
GLOBAL
SELECT
c1,
c2,
c3,
MAX(c1) OVER (PARTITION BY c2),
MAX(c1) OVER (PARTITION BY c2, c3)
FROM
tc1
ORDER BY
c2
Rows
RETURN
( 1)
Cost
I/O
|
1000
LMTQ
( 2)
466.38
10
|
1000
CTQ
( 3)
411.188
10
|
1000
TBSCAN
( 4)
409.588
10
|
1000
SORT
( 5)
399.527
10
|
1000
TBSCAN
( 6)
288.271
10
|
1000
SORT
( 7)
278.209
10
|
1000
TBSCAN
( 8)
85.5805
10
|
1000
CO-TABLE: DB2USER
TC1
Q1
Arguments:
---------
PARTCOLS: (Table partitioning columns)
1: Q2.C2
SORTKEY : (Sort Key column)
1: Q2.C2(A)
2: Q2.C3(A)
SORTTYPE: (Intra-Partition parallelism sort type)
PARTITIONED
Arguments:
---------
SORTKEY : (Sort Key column)
1: Q3.C2(A)
SORTTYPE: (Intra-Partition parallelism sort type)
MERGE
SELECT
c1,
c2,
c3,
MAX(c1) OVER (PARTITION BY c2),
MAX(c1) OVER (PARTITION BY c2, c3)
FROM
tc1
Rows
RETURN
( 1)
Cost
I/O
|
1000
LTQ
( 2)
421.526
10
|
1000
CTQ
( 3)
383.134
10
|
1000
TBSCAN
( 4)
288.271
10
|
1000
SORT
( 5)
278.209
10
|
1000
TBSCAN
( 6)
85.5805
10
|
1000
CO-TABLE: DB2USER
TC1
Q1
Arguments:
---------
PARTCOLS: (Table partitioning columns)
1: Q2.C2
SORTKEY : (Sort Key column)
1: Q2.C2(R)
2: Q2.C3(A)
SORTTYPE: (Intra-Partition parallelism sort type)
PARTITIONED
Agent 1 | Agent 1 | Agent 1 | Agent 1 |
---|---|---|---|
C2 | C3 | C2 | C3 |
5 | 1 | 8 | 6 |
5 | 1 | 8 | 6 |
5 | 2 | 8 | 7 |
5 | 2 | 8 | 7 |
1 | 4 | 2 | 1 |
1 | 5 | 2 | 2 |
Since the window specification for both OLAP functions is PARTITION BY rather than ORDER BY, and the outer sub-select doesn't have an ORDER BY clause, strict order on C2 and C3 does not need to be produced by the SORT. Distinct values of C2 just need to be processed by the same database agent. For example, all rows with values C2 = 5 must be processed by the same agent in order to properly determine the maximum value of C1 for that group of values. Limiting the sorting to values of C3 within distinct values of C2 reduces the memory required to perform the sort. Performance might also be improved because partitions can be emitted out of sequence, avoiding the synchronization overhead to emit them in order.
Since the data is not strictly ordered on C2, the order indicator is set to "R" indicating that it is randomly ordered.
A single SORT operation can provide the partitioning needed for multiple OLAP functions if the partitioning specifications are identical or are proper subsets of each other. When this is the case, the SORT will be partitioned on the smallest set of columns that is a proper subset of all other OLAP partitioning specifications.