Statistics used from expression columns in statistical views

The query optimizer can use statistics from the expression columns in statistical views to obtain accurate cardinality estimates, which results in better access plans.

For more information about statistical views, see Statistical views.

Important: Hadoop and HBase tables use the ANALYZE command to gather statistics, and the RUNSTATS command is used to collect statistics for statistical views. Running the RUNSTATS command against Hadoop and HBase tables is not recommended, because doing so clears any previously collected statistics.

Consider a query that uses the UCASE scalar function:

SELECT * FROM t1, t2 WHERE t2.c2 = UCASE(t1.c1)

The query optimizer can use the statistics from a statistical view for these types of queries to obtain better access plans.

The earlier example query would benefit from a statistical view that includes the UCASE scalar function on column c1. The following example creates a view that includes the UCASE scalar function on column c1.

    CREATE VIEW sv AS (SELECT UCASE(c1) AS c1 FROM t1)
    ALTER VIEW sv ENABLE QUERY OPTIMIZATION
    RUNSTATS ON TABLE dba.sv WITH DISTRIBUTION

In another example, consider a query that uses the references to ARRAY elements or ROW fields for columns in a Hadoop table:


CREATE HADOOP TABLE t1 (
  c1 INT, c2 INT ARRAY[32])
;
CREATE VIEW sv_t1 (a) 
  AS (SELECT c2[1] FROM t1)
;
ALTER VIEW sv_t1 
  ENABLE QUERY OPTIMIZATION
;
RUNSTATS ON VIEW sv_t1 WITH DISTRIBUTION;

A query that references the ARRAY element expression c2[1] in a predicate can then make use of the statistics that are generated by this statistical view.

To obtain statistics for these types of queries, one side of the predicate must be an expression that matches an expression in the statistical view column definition exactly.

Here are some examples where the query optimizer does not use the statistics from a statistical view:

One side of the predicate in the query is an expression that is matched with more than one expression column in a statistical view:
```
create view SV14(c1, c2) as (select c1+c2, c1*c2 from t1 where c1 > 3);
alter view SV14 enable query optimization;
runstats on table schema.sv1 with distribution;
select * from t1 where (c1+c2) + (c1*c2) > 5 and c1 > 3;
```
Here the expression (c1+c2) + (c1*c2) matched to two columns in view SV14. The statistics of view SV14 for this expression are not used.
One side of the predicate in the query is an expression that is partially matched with an expression column in a statistical view:
```
create view SV15(c1, c2) as (select c1+c2, c1*c2 from t1 where c1 > 3);
alter view SV15 enable query optimization;
runstats on table schema.SV15 with distribution;
select * from t1 where (c1+c2) + 10 > 5 and c1 > 3;
```
Here the expression (c1+c2) + 10 is partially matched to c1+c2 in view SV15. The statistics of view SV15 for this expression are not used.

One side of the predicate in the query is indirectly matched to an expression column in a statistical view:

create view SV16(c1, c2) as (select c1+c2, c1*c2 from t1 where c1 > 3);
alter view SV16 enable query optimization;
runstats on table schema.SV16 with distribution;
select * from t3 left join table (select ta.c1 from t2 left join table 
  (select c1+c2,c3 from t1 where c1 > 3) as ta(c1,c3) on t2.c1 = ta.c3) as 
   tb(c1) on t3.c1= TB.C1;

Here the column TB.C1 indirectly matches the expression c1+c2 in view SV16. The statistics of view SV16 for this expression are not used.