Column group statistics
If your query has more than one join predicate joining two tables, the Db2® optimizer calculates how selective each of the predicates is before choosing a plan for executing the query.
This query returns the names and raw material quality of all products. There are two join predicates:
SELECT PRODUCT.NAME, RAWMATERIAL.QUALITY FROM PRODUCT, RAWMATERIAL WHERE PRODUCT.COLOR = RAWMATERIAL.COLOR AND PRODUCT.ELASTICITY = RAWMATERIAL.ELASTICITY
The optimizer assumes that the two predicates are independent, which means that all variations of elasticity occur for each color. It then estimates the overall selectivity of the pair of predicates by using catalog statistics information for each table based on the number of levels of elasticity and the number of different colors. Based on this estimate, it might choose, for example, a nested loop join in preference to a merge join, or the reverse.
PRODUCT.COLOR = RAWMATERIAL.COLOR PRODUCT.ELASTICITY = RAWMATERIAL.ELASTICITY
However, these two predicates might not be independent. For example, highly elastic materials might be available in only a few colors, and the very inelastic materials might be available in a few other colors that are different from the elastic ones. In that case, the combined selectivity of the predicates eliminates fewer rows and the query returns more rows. Without this information, the optimizer might no longer choose the best plan.
The optimizer uses these statistics to detect cases of correlation and to dynamically adjust the combined selectivities of correlated predicates, thus obtaining a more accurate estimate of the join size and cost.
RUNSTATS ON TABLE PRODUCT ON COLUMNS ((COLOR, ELASTICITY))
When a query groups data by using keywords such as GROUP BY or DISTINCT, column group statistics also enable the optimizer to compute the number of distinct groupings.
Without any index or column group statistics, the optimizer estimates the number of groupings (and, in this case, the number of rows returned) as the product of the number of distinct values in DEPTNO, MGR, and YEAR_HIRED. This estimate assumes that the grouping key columns are independent. However, this assumption could be incorrect if each manager manages exactly one department. Moreover, it is unlikely that each department hires employees every year. Thus, the product of distinct values of DEPTNO, MGR, and YEAR_HIRED could be an overestimate of the actual number of distinct groups.
SELECT DEPTNO, YEARS, AVG(SALARY) FROM EMPLOYEE GROUP BY DEPTNO, MGR, YEAR_HIRED
In addition to JOIN predicate correlation, the optimizer manages correlation with simple equality predicates, such as:
RUNSTATS ON TABLE EMPLOYEE ON COLUMNS ((DEPTNO, MGR, YEAR_HIRED))
In this example, predicates on the DEPTNO column in the EMPLOYEE table are likely to be independent of predicates on the YEAR column. However, the predicates on DEPTNO and MGR are not independent, because each department would usually be managed by one manager at a time. The optimizer uses statistical information about columns to determine the combined number of distinct values and then adjusts the cardinality estimate to account for correlation between columns.
DEPTNO = 'Sales' AND MGR = 'John'
Column group statistics can also be used on statistical views. The column group statistics help adjust the skewed statistics in the statistical view when there is more than one strong correlation in the queries. The optimizer can use these statistics to obtain better cardinality estimates which might result in better access plans.