# Statistics used from expression columns in statistical views

The query optimizer can use statistics from the expression columns in statistical views to obtain accurate cardinality estimates, which results in better access plans.

For more information about statistical views, see Statistical views.

Important: Hadoop and HBase tables use the ANALYZE command to gather statistics, and the
RUNSTATS command is used to collect statistics for statistical views. Running the RUNSTATS command
against Hadoop and HBase tables is not recommended, because doing so clears any previously collected
statistics.

Consider a query that uses the UCASE scalar
function:

`SELECT * FROM t1, t2 WHERE t2.c2 = UCASE(t1.c1)`

The query optimizer can use the statistics from a statistical view for these types of queries to obtain better access plans.

The earlier example query would benefit from a statistical view
that includes the UCASE scalar function on column c1. The following example
creates a view that includes the UCASE scalar function on column c1.

```
CREATE VIEW sv AS (SELECT UCASE(c1) AS c1 FROM t1)
ALTER VIEW sv ENABLE QUERY OPTIMIZATION
RUNSTATS ON TABLE dba.sv WITH DISTRIBUTION
```

In another example, consider a query that uses the references to ARRAY
elements or ROW fields for columns in a Hadoop
table:

```
CREATE HADOOP TABLE t1 (
c1 INT, c2 INT ARRAY[32])
;
CREATE VIEW sv_t1 (a)
AS (SELECT c2[1] FROM t1)
;
ALTER VIEW sv_t1
ENABLE QUERY OPTIMIZATION
;
RUNSTATS ON VIEW sv_t1 WITH DISTRIBUTION;
```

A
query that references the ARRAY element expression c2[1] in a predicate can
then make use of the statistics that are generated by this statistical view.To obtain statistics for these types of queries, one side of the predicate must be an expression that matches an expression in the statistical view column definition exactly.

Here are some examples where the query optimizer does not use the
statistics from a statistical view:

- One side of the predicate in the query is an expression that is
matched with more than one expression column in a statistical view:

Here the expression`create view SV14(c1, c2) as (select c1+c2, c1*c2 from t1 where c1 > 3); alter view SV14 enable query optimization; runstats on table schema.sv1 with distribution; select * from t1 where (c1+c2) + (c1*c2) > 5 and c1 > 3;`

`(c1+c2) + (c1*c2)`

matched to two columns in view SV14. The statistics of view SV14 for this expression are not used. - One side of the predicate in the query is an expression that is
partially matched with an expression column in a statistical view:

Here the expression`create view SV15(c1, c2) as (select c1+c2, c1*c2 from t1 where c1 > 3); alter view SV15 enable query optimization; runstats on table schema.SV15 with distribution; select * from t1 where (c1+c2) + 10 > 5 and c1 > 3;`

`(c1+c2) + 10`

is partially matched to`c1+c2`

in view SV15. The statistics of view SV15 for this expression are not used. - One side of the predicate in the query is indirectly matched to
an expression column in a statistical view:

Here the column TB.C1 indirectly matches the expression`create view SV16(c1, c2) as (select c1+c2, c1*c2 from t1 where c1 > 3); alter view SV16 enable query optimization; runstats on table schema.SV16 with distribution; select * from t3 left join table (select ta.c1 from t2 left join table (select c1+c2,c3 from t1 where c1 > 3) as ta(c1,c3) on t2.c1 = ta.c3) as tb(c1) on t3.c1= TB.C1;`

`c1+c2`

in view SV16. The statistics of view SV16 for this expression are not used.