Index and table statistics

Database optimizers rely on "relatively" up-to-date table and index statistics to generate optimal access plans.

Oracle does not need the statistics to be absolutely correct or current, just relatively correct and representative. As a result, you don't have to gather statistics every day for every table especially if your database is already large (in the terabyte range).

Starting in Oracle10g, Oracle by default automatically gathered statistics during its maintenance window for tables that underwent sufficient changes. Oracle bypasses statistics generation for tables that have not changed significantly.

Volatile tables

The following tables change significantly during the day and are not candidates for automatic statistics gathering:

YFS_TASK_Q
YFS_TASK
YFS_EXPORT
YFS_IMPORT

For example, the YFS_TASK_Q table represents task that are in different state of processing. That table grows and shrinks throughout the day. At night, when order processing has completed, the table will have few in-progress records. When automatic statistics gathering run during the maintenance window, the statistics will incorrectly present this table as a small table.

We recommend either one of the two options below for these tables:

Delete statistics for these table and then lock down the statistics.
Manually collect statistics during the day when the table is large and then lock down the statistics.

In the first option, in the absence of statistics, Oracle will assume a large table. The commands for the first options are:


   exec dbms_stats.delete_table_stats(<schema owner>,'YFS_TASK_Q')
   
   exec dbms_stats.lock_table_stats(<schema owner>,'YFS_TASK_Q')

The commands for the second option are:


   exec dbms_stats.gather_table_stats (ownname => 'YANTRA', -
   tabname=>'YFS_TASK_Q', -
   estimate_percent => dbms_stats.auto_sample_size)
   
   exec dbms_stats.lock_table_stats(<schema owner>,'YFS_TASK_Q')

Skewed columns and histograms

As part of generating the statistics, Oracle generates histograms for skewed columns.

Skewed columns are columns that have a non-uniform distribution of values. For example, the enterprise_key column in the YFS_ORDER_HEADER table may be made up of a few values where one value may be more prevalent. In contrast, columns such as the order_no is more uniformly distributed.

Given basic statistics such as number of rows and the number of distinct column values, Oracle tends to choose a full table scan when faced with a query, such as the one below, against columns with high skew and/or low cardinality:


   select * 
   from yfs_order_header
   where derived_from_order_header_key = '2011012412213801928344';

Queries such as this can result in table scans even if the columns are indexed. The example above was from an actual case (see below).

Note: Customer reported that the Order Detail screen took over 4 minutes. The query that checks if the order is a derived order resulted in a table scan of the YFS_ORDER_HEADER table. When customer ran dbms_stats to create histograms, Order Detail screen dropped to 1 second.

From the optimizer's perspective, the queries against these columns either return a small or a very large result set. To err on the side of caution, the optimizer generally chooses a table scan over an index range scan.

You can get the optimizer to choose a more optimal access plan by providing additional statistics in the form of histograms.

By default, Oracle creates histograms as part of the statistics generation. You can verify if a column has histograms by issuing the following command:


   select table_name,column_name,histogram 
   from user_tab_columns
   
   TABLE_NAME      COLUMN_NAME                     HISTOGRAM
   YFS_ORDER_LINE  CHAINED_FROM_ORDER_LINE_KEY     NONE
   YFS_ORDER_LINE  CHAINED_FROM_ORDER_HEADER_KEY   NONE
   YFS_ORDER_LINE  DERIVED_FROM_ORDER_LINE_KEY     FREQUENCY
   YFS_ORDER_LINE  DERIVED_FROM_ORDER_HEADER_KEY   FREQUENCY

In the example above, Oracle created histograms for the two DERIVED_FROM columns but not the CHAINED_FROM columns. To manually create the histograms for the CHAINED_FROM columns, issue the following command:


   exec dbms_stats.gather_table_stats (ownname => 'YANTRA', -
   tabname=>'YFS_ORDER_LINE', -
   estimate_percent => dbms_stats.auto_sample_size, -
   method_opt=>'for columns size auto CHAINED_FROM_ORDER_LINE_KEY, 
CHAINED_FROM_ORDER_HEADER_KEY');

When you rerun the histogram query, you should now get:


   
   TABLE_NAME      COLUMN_NAME                     HISTOGRAM
   YFS_ORDER_LINE  CHAINED_FROM_ORDER_LINE_KEY     FREQUENCY
   YFS_ORDER_LINE  CHAINED_FROM_ORDER_HEADER_KEY   FREQUENCY
   YFS_ORDER_LINE  DERIVED_FROM_ORDER_LINE_KEY     FREQUENCY
   YFS_ORDER_LINE  DERIVED_FROM_ORDER_HEADER_KEY   FREQUENCY

In the example above, the method_opt with the auto parameter lets Oracle decide whether histograms are to be created based on the column's data distribution and the way the columns are being used by the application.

Identifying skewed columns

The following query helps you identify columns with skewed data distribution:


   select ui.table_name,ui.index_name, column_name, column_position, num_rows, 
distinct_keys as dist_keys
   from user_indexes ui, user_ind_columns uic
   where ui.table_name = uic.table_name and 
       ui.index_name = uic.index_name and 
       ui.num_rows > 0 and 
       ui.distinct_keys/ui.num_rows < 0.1
   order by table_name, index_name, column_position
   
TABLE_NAME      INDEX_NAME         COLUMN_NAME                    NUM_      DIST
                                                                  ROWS     _KEYS
YFS_ORDER_LINE  YFS_ORDER_LINE_I3  CHAINED_FROM_ORDER_HEADER_KEY  6552586     1
YFS_ORDER_LINE  YFS_ORDER_LINE_I4  DERIVED_FROM_ORDER_HEADER_KEY  6357590     1
YFS_ORDER_LINE  YFS_ORDER_LINE_I5  DERIVED_FROM_ORDER_LINE_KEY    6624191     1
YFS_ORDER_LINE  YFS_ORDER_LINE_I6  CHAINED_FROM_ORDER_LINE_KEY    6534969     1
YFS_ORDER_LINE  YFS_ORDER_LINE_I7  DEPENDENT_ON_LINE_KEY          6457481     1

In the example above, the customer does not use derived or chained orders.

You should ensure that histograms are added to indexed columns if the absence of histograms causes Oracle to choose an inefficient plan.