AQP example

Here is an example query with an explanation of how AQP could work.

SELECT * from t1, t2, t3, t4 
WHERE t1.c1=t2.c1 AND t1.c2=t3.c2 
AND t1.c3 = CURRENT DATE - t4.c3 
AND t1.c5 < 50 AND t2.c6 > 40 
AND t3.c7 < 100 AND t4.c8 - t4.c9 < 5

The WHERE clause of the preceding query contains a predicate, t1.c3 = CURRENT DATE - t4.c3, that is difficult to estimate. The estimation difficulty is due to the derivation applied to column t4.c3 and the derivation involving columns t4.c8 and t4.c9. For the purposes of this example, the predicate t1.c3 = CURRENT DATE - t4.c3 actually eliminates all or nearly all records in the join.

Due to characteristics of the columns involved in that predicate, the statistical estimate has many rows returned from the join. The optimizer selects join order t1, t3, t2, t4 based on the following record count estimates.

  • Join t1 to t3 produces 33,000,000 rows.
  • Join t1, t3 result to t2 produces 1,300,000 rows.
  • Join t1, t3, t2 result to t4 (final result set) produces 5 million rows.

The join order is reasonable assuming that the final result set actually produces 5 million rows, but the estimate is incorrect. The query performs poorly since tables t1, t3, t2 are joined first, producing 1,300,000 rows. These rows are all rejected by table t4 and the t1.c3 = CURRENT DATE - t4.c3 predicate (join starvation).

AQP identifies t4 as the forced primary table. The optimizer would choose t1 as the second table in the join order since there are no join conditions between t4 and t2 or t3. Since the join condition between tables t4 and t1 selects few rows, this plan is likely many orders of magnitude faster than the original plan.