Cubing — Overview of optimizing remote data sources with

You can use InfoSphere™ Warehouse to optimize a federated star schema or snowflake schema to achieve significant improvements in your query performance.

The benefits include:

Having an integrated DB2® platform to access multiple IBM® and vendor products from
Extending the functional richness of DB2 for Linux, UNIX, and Windows to DB2 for z/OS® and DB2 for iSeries®
Improving the performance of queries to your federated databases and data sources

In your federated system, the fact and dimension tables might be on a remote server or across several remote servers. You must locally represent your star schema or snowflake schema for InfoSphere Warehouse. You can locally represent your star schema or snowflake schema with nicknames that refer to the remote tables, replicated copies of the remote tables, or a combination of nicknames and replicated tables. You must locally represent all of the fact tables and dimension tables in your star schema or snowflake schema in some way on your local server.

After your star schema or snowflake schema exists on your local server, you can use InfoSphere Warehouse to build a cube model based on that star schema or snowflake schema, and optimize that cube model using the Optimization Advisor. If you locally represent some of your tables with nicknames, specifying that the Optimization Advisor wizard can use data sampling provides much better summary table recommendations, but requires more time to create the recommendations. The Optimization Advisor might not be able to use data sampling in every situation, but it is recommended that you allow data sampling when possible.

You can create the recommended summary tables on the local server, and a query that is directed to tables in the remote star or snowflake schema can be answered in one of the following three ways:

Query routes to a local summary table

If the query can be answered by the data in the summary table, the DB2 optimizer will route the query directly to the local summary table, and will not need to interact with the remote tables at all. This routing scenario provides the most significant performance improvement.

Often, a query cannot be answered by only the summary table, but can be answered by joining the summary table with one or more dimension tables. In this case, you might consider collocating some or all of your dimension tables (but not your fact table) on the federated server so that the DB2 optimizer can complete the joins on local tables.

Query is pushed down to the remote data source

If the query cannot be answered by the data in the summary table and any collocated dimension tables, the DB2 optimizer will attempt to push the query down to the remote server. The remote server completes the query, and returns the result set to the local server.

If the result set is small, you can still achieve significant performance improvements. This method is possible if all of the tables in the remote star schema or snowflake schema exist on only one remote server.

Query is run locally on data pulled up from remote data sources

If the query cannot be answered by the data in the summary table and any collocated dimension tables and cannot be pushed down to the remote data source, the tables that are needed to satisfy the query are pulled up and copied from the remote server to the local server, and the query is completed locally. For example, if the star schema is located on a remote zSeries® server and the query uses a function that is available in DB2 but not in DB2 for z/OS, the query cannot be completed on the remote server.

This routing scenario might not provide performance improvement if the fact and dimension tables needed to answer the query are large.

Overview of optimizing remote data sources with InfoSphere Warehouse