Compiler rewrite example: Operation movement

During query rewrite, the optimizer can move the point at which the DISTINCT operation is performed, to reduce the cost of this operation. In some cases, the DISTINCT operation can be removed completely.

For example, if the EMPNO column of the EMPLOYEE table were defined as the primary key, the following query:
   select distinct empno, firstnme, lastname
     from employee
could be rewritten by removing the DISTINCT clause:
   select empno, firstnme, lastname
     from employee

In this example, because the primary key is being selected, the compiler knows that each returned row is unique. In this case, the DISTINCT keyword is redundant. If the query is not rewritten, the optimizer must build a plan with necessary processing (such as a sort) to ensure that the column values are unique.

Example - General predicate pushdown

Altering the level at which a predicate is normally applied can result in improved performance. For example, the following view provides a list of all employees in department D11:
   create view d11_employee
     (empno, firstnme, lastname, phoneno, salary, bonus, comm) as
       select empno, firstnme, lastname, phoneno, salary, bonus, comm
         from employee
         where workdept = 'D11'
The following query against this view is not as efficient as it could be:
   select firstnme, phoneno
     from d11_employee
     where lastname = 'BROWN'
During query rewrite, the compiler pushes the lastname = 'BROWN' predicate down into the D11_EMPLOYEE view. This allows the predicate to be applied sooner and potentially more efficiently. The actual query that could be executed in this example is as follows:
   select firstnme, phoneno
     from employee
     where
       lastname = 'BROWN' and
       workdept = 'D11'

Predicate pushdown is not limited to views. Other situations in which predicates can be pushed down include UNION, GROUP BY, and derived tables (nested table expressions or common table expressions).

Example - Decorrelation

In a partitioned database environment, the compiler can rewrite the following query, which is designed to find all of the employees who are working on programming projects and who are underpaid.
   select p.projno, e.empno, e.lastname, e.firstname,
       e.salary+e.bonus+e.comm as compensation
     from employee e, project p
     where
       p.empno = e.empno and
       p.projname like '%PROGRAMMING%' and
       e.salary+e.bonus+e.comm <
         (select avg(e1.salary+e1.bonus+e1.comm)
           from employee e1, project p1
           where
             p1.projname like '%PROGRAMMING%' and
             p1.projno = p.projno and
             e1.empno = p1.empno)

Because this query is correlated, and because both PROJECT and EMPLOYEE are unlikely to be partitioned on PROJNO, the broadcasting of each project to each database partition is possible. In addition, the subquery would have to be evaluated many times.

The compiler can rewrite the query as follows:
  • Determine the distinct list of employees working on programming projects and call it DIST_PROJS. It must be distinct to ensure that aggregation is done only once for each project:
       with dist_projs(projno, empno) as
         (select distinct projno, empno
           from project p1
           where p1.projname like '%PROGRAMMING%')
  • Join DIST_PROJS with the EMPLOYEE table to get the average compensation per project, AVG_PER_PROJ:
       avg_per_proj(projno, avg_comp) as
         (select p2.projno, avg(e1.salary+e1.bonus+e1.comm)
           from employee e1, dist_projs p2
           where e1.empno = p2.empno
           group by p2.projno)
  • The rewritten query is:
       select p.projno, e.empno, e.lastname, e.firstname,
           e.salary+e.bonus+e.comm as compensation
         from project p, employee e, avg_per_prog a
         where
           p.empno = e.empno and
           p.projname like '%PROGRAMMING%' and
           p.projno = a.projno and
           e.salary+e.bonus+e.comm < a.avg_comp

This query computes the avg_comp per project (avg_per_proj). The result can then be broadcast to all database partitions that contain the EMPLOYEE table.