Global analysis of federated database queries

The Db2® explain utility, which you can start by invoking the db2expln command, shows the access plan that is generated by the remote optimizer for those data sources that are supported by the remote explain function. The execution location for each operator is included in the command output.

You can also find the remote SQL statement that was generated for each data source in the SHIP or RETURN operator, depending on the type of query. By examining the details for each operator, you can see the number of rows that were estimated by the Db2 optimizer as input to and output from each operator.

Understanding Db2 optimization decisions

Consider the following key questions when you investigate ways to increase performance:
  • Why isn't a join between two nicknames of the same data source being evaluated remotely?
    Areas to examine include:
    • Join operations. Can the remote data source support them?
    • Join predicates. Can the join predicate be evaluated at the remote data source? If the answer is no, examine the join predicate.
  • Why isn't the GROUP BY operator being evaluated remotely?

    Examine the operator syntax, and verify that the operator can be evaluated at the remote data source.

  • Why is the statement not being completely evaluated by the remote data source?

    The Db2 optimizer performs cost-based optimization. Even if pushdown analysis indicates that every operator can be evaluated at the remote data source, the optimizer relies on its cost estimate to generate a global optimization plan. There are a great many factors that can contribute to that plan. For example, even though the remote data source can process every operation in the original query, its processing speed might be much slower than the processing speed of the Db2 server, and it might turn out to be more beneficial to perform the operations at the Db2 server instead. If results are not satisfactory, verify your server statistics in the SYSCAT.SERVEROPTIONS catalog view.

  • Why does a plan that is generated by the optimizer, and that is completely evaluated at a remote data source, perform much more poorly than the original query executed directly at the remote data source?
    Areas to examine include:
    • The remote SQL statement that is generated by the Db2 optimizer. Ensure that this statement is identical to the original query. Check for changes in predicate order. A good query optimizer should not be sensitive to the order of predicates in a query. The optimizer at the remote data source might generate a different plan, based on the order of input predicates. Consider either modifying the order of predicates in the input to the Db2 server, or contacting the service organization of the remote data source for assistance.

      You can also check for predicate replacements. A good query optimizer should not be sensitive to equivalent predicate replacements. The optimizer at the remote data source might generate a different plan, based on the input predicates. For example, some optimizers cannot generate transitive closure statements for predicates.

    • Additional functions. Does the remote SQL statement contain functions that are not present in the original query? Some of these functions might be used to convert data types; be sure to verify that they are necessary.