Pushdown analysis

Pushdown analysis tells the query optimizer if a remote data source can perform an operation. An operation can be a function, such as relational operator, system or user functions, or an SQL operator (GROUP BY, ORDER BY, and so on). The optimizer then makes a cost-based decision about whether or not to push down the operator. Even if pushdown analysis determines that a particular operation can be performed at the remote source, the optimizer might decide to execute it locally at the federated server, if doing so appears to consume fewer resources.

Pushdown analysis is performed on relational data sources. Nonrelational sources use the request-reply-compensate protocol.

Functions that cannot be pushed-down, can significantly impact query performance. Consider the effect of forcing a selective predicate to be evaluated locally instead of at the remote data source. This approach could require the federated server to retrieve the entire table from the remote data source, and then filter the table locally using the predicate. If your network is constrained—and the table is large—query performance could suffer.

Operators that are not pushed-down can also significantly impact query performance. For example, having a GROUP BY operator aggregate remote data locally could, once again, require the federated server to retrieve the entire table from the remote data source.

For example, suppose that the nickname EMP references the table EMPLOYEE. This table has 10,000 rows. One column contains the zip codes, and one column contains the salary for each employee. The following query is sent to the federated server to count the number of employees per city who earn greater than 50,000 that live in a particular ZIP code range:

 
SELECT CITY, COUNT(*) FROM EMP
	WHERE ZIP BETWEEN 'CA1' AND 'CA5' AND SALARY > 50000
	GROUP BY CITY;

When the SQL compiler receives this statement, it considers several possibilities:

  • The collating sequences of the data source and the federated server are the same. It is likely that both predicates will be pushed down, because they are likely to reduce the size of the intermediate result set sent from the data source to the federated server. It is usually more efficient to filter and group results at the data source instead of copying the entire table to the federated server and performing the operations locally. Pushdown analysis determines if operations can be performed at the data source. Since the collating sequences are the same, the predicates and the GROUP BY operation can take place at the data source.
  • The collating sequences are the same, and the query optimizer knows that the federated server is very fast. It is possible that the query optimizer will decide that performing the GROUP BY operation locally is the best (least cost) approach. The predicates will be pushed-down to the data source for evaluation. This is an example of pushdown analysis combined with global optimization.
  • The collating sequences are not the same. Most likely, the SALARY predicate will be pushed down to the data source, because numeric columns are sorted the same, regardless of collating sequence. However, the predicate on ZIP will not be pushed down because it is order-dependant on a character column. The GROUP BY will not be pushed down unless the predicates on both ZIP and SALARY are also pushed down.

The SQL compiler will consider the available access plans, and then choose the plan that is the most efficient.

In general, the goal is to ensure that the query optimizer considers pushing down the functions and operators to the data sources for evaluation. Many factors can affect whether a function or an SQL operator is evaluated at a remote data source. The key factors which influence the query optimizer are: server characteristics, nickname characteristics, and query characteristics.