Index and table statistics
Database optimizers rely on "relatively" up-to-date table and index statistics to generate optimal access plans.
Oracle does not need the statistics to be absolutely correct or current, just relatively correct and representative. As a result, you don't have to gather statistics every day for every table especially if your database is already large (in the terabyte range).
Starting in Oracle10g, Oracle by default automatically gathered statistics during its maintenance window for tables that underwent sufficient changes. Oracle bypasses statistics generation for tables that have not changed significantly.
Volatile tables
The following tables change significantly during the day and are not candidates for automatic statistics gathering:
- YFS_TASK_Q
- YFS_TASK
- YFS_EXPORT
- YFS_IMPORT
For example, the YFS_TASK_Q table represents task that are in different state of processing. That table grows and shrinks throughout the day. At night, when order processing has completed, the table will have few in-progress records. When automatic statistics gathering run during the maintenance window, the statistics will incorrectly present this table as a small table.
We recommend either one of the two options below for these tables:
- Delete statistics for these table and then lock down the statistics.
- Manually collect statistics during the day when the table is large and then lock down the statistics.
In the first option, in the absence of statistics, Oracle will assume a large table. The commands for the first options are:
exec dbms_stats.delete_table_stats(<schema owner>,'YFS_TASK_Q')
exec dbms_stats.lock_table_stats(<schema owner>,'YFS_TASK_Q')
The commands for the second option are:
exec dbms_stats.gather_table_stats (ownname => 'YANTRA', -
tabname=>'YFS_TASK_Q', -
estimate_percent => dbms_stats.auto_sample_size)
exec dbms_stats.lock_table_stats(<schema owner>,'YFS_TASK_Q')
Skewed columns and histograms
As part of generating the statistics, Oracle generates histograms for skewed columns.
Skewed columns are columns that have a non-uniform distribution of values. For example, the enterprise_key column in the YFS_ORDER_HEADER table may be made up of a few values where one value may be more prevalent. In contrast, columns such as the order_no is more uniformly distributed.
Given basic statistics such as number of rows and the number of distinct column values, Oracle tends to choose a full table scan when faced with a query, such as the one below, against columns with high skew and/or low cardinality:
select *
from yfs_order_header
where derived_from_order_header_key = '2011012412213801928344';
Queries such as this can result in table scans even if the columns are indexed. The example above was from an actual case (see below).
From the optimizer's perspective, the queries against these columns either return a small or a very large result set. To err on the side of caution, the optimizer generally chooses a table scan over an index range scan.
You can get the optimizer to choose a more optimal access plan by providing additional statistics in the form of histograms.
By default, Oracle creates histograms as part of the statistics generation. You can verify if a column has histograms by issuing the following command:
select table_name,column_name,histogram
from user_tab_columns
TABLE_NAME COLUMN_NAME HISTOGRAM
YFS_ORDER_LINE CHAINED_FROM_ORDER_LINE_KEY NONE
YFS_ORDER_LINE CHAINED_FROM_ORDER_HEADER_KEY NONE
YFS_ORDER_LINE DERIVED_FROM_ORDER_LINE_KEY FREQUENCY
YFS_ORDER_LINE DERIVED_FROM_ORDER_HEADER_KEY FREQUENCY
In the example above, Oracle created histograms for the two DERIVED_FROM columns but not the CHAINED_FROM columns. To manually create the histograms for the CHAINED_FROM columns, issue the following command:
exec dbms_stats.gather_table_stats (ownname => 'YANTRA', -
tabname=>'YFS_ORDER_LINE', -
estimate_percent => dbms_stats.auto_sample_size, -
method_opt=>'for columns size auto CHAINED_FROM_ORDER_LINE_KEY,
CHAINED_FROM_ORDER_HEADER_KEY');
When you rerun the histogram query, you should now get:
TABLE_NAME COLUMN_NAME HISTOGRAM
YFS_ORDER_LINE CHAINED_FROM_ORDER_LINE_KEY FREQUENCY
YFS_ORDER_LINE CHAINED_FROM_ORDER_HEADER_KEY FREQUENCY
YFS_ORDER_LINE DERIVED_FROM_ORDER_LINE_KEY FREQUENCY
YFS_ORDER_LINE DERIVED_FROM_ORDER_HEADER_KEY FREQUENCY
In the example above, the method_opt
with
the auto
parameter lets Oracle decide whether histograms
are to be created based on the column's data distribution and the
way the columns are being used by the application.
Identifying skewed columns
The following query helps you identify columns with skewed data distribution:
select ui.table_name,ui.index_name, column_name, column_position, num_rows,
distinct_keys as dist_keys
from user_indexes ui, user_ind_columns uic
where ui.table_name = uic.table_name and
ui.index_name = uic.index_name and
ui.num_rows > 0 and
ui.distinct_keys/ui.num_rows < 0.1
order by table_name, index_name, column_position
TABLE_NAME INDEX_NAME COLUMN_NAME NUM_ DIST
ROWS _KEYS
YFS_ORDER_LINE YFS_ORDER_LINE_I3 CHAINED_FROM_ORDER_HEADER_KEY 6552586 1
YFS_ORDER_LINE YFS_ORDER_LINE_I4 DERIVED_FROM_ORDER_HEADER_KEY 6357590 1
YFS_ORDER_LINE YFS_ORDER_LINE_I5 DERIVED_FROM_ORDER_LINE_KEY 6624191 1
YFS_ORDER_LINE YFS_ORDER_LINE_I6 CHAINED_FROM_ORDER_LINE_KEY 6534969 1
YFS_ORDER_LINE YFS_ORDER_LINE_I7 DEPENDENT_ON_LINE_KEY 6457481 1
In the example above, the customer does not use derived or chained orders.
You should ensure that histograms are added to indexed columns if the absence of histograms causes Oracle to choose an inefficient plan.