Methods of retrieving nickname statistics
You can choose the method to use for statistics collection, and you can limit your choices to specific columns and indexes. The catalog-based method has better performance. By contrast, the data-based method provides more up-to-date statistics, but takes longer to run.
- Catalog-based method
The catalog-based method copies statistics from data source catalog tables to the federated catalog table. Only those statistics that can be semantically mapped to federated statistics are copied. However, the nickname statistics are only as accurate and up-to-date as the information currently in the catalog at the remote source. If statistics information is out-of-date, the nickname statistics collected are also out-of-date. When you use the catalog-based method, ensure that statistics on the remote source are current.
Because statistics are copied from the remote source catalog to the catalog on the federated server, the catalog-based method of statistics collection is generally very fast.
- Data-based method
The data-based method does not depend on the statistics at the remote source. This method generates its own statistics empirically through the results of the queries that it issues against the nicknames. With this method, the statistics that are collected accurately reflect the remote data.
The data-based method can be slow if the row size of the nicknames involved is large. The queries typically involve large sorts and aggregates. For this reason, choose data-based statistics collection only if satisfactory statistics cannot be obtained by the catalog-based method.
If you want to increase the performance of the data-based method at the expense of the quality of the statistics gathered, limit statistics collection to the types of columns and indexes for which the benefit is greatest. Those types of columns include columns that are involved in predicates, join keys, grouping operations, or columns that are part of one or more indexes.
With the catalog-based method, you generally do not need to limit statistics collection to specific columns or indexes, because the overhead of this collection method is very low.