Compiler rewrite example: Operation merging

A SELECT statement that uses views can restrict the join order of the table and can also introduce redundant joining of tables. If the views are merged during query rewrite, these restrictions can be lifted.

Suppose you have access to the following two views that are based on the EMPLOYEE table: one showing employees that have a high level of education and the other showing employees that earn more than $35,000 per year:
   create view emp_education (empno, firstnme, lastname, edlevel) as
     select empno, firstnme, lastname, edlevel
       from employee
       where edlevel > 17

   create view emp_salaries (empno, firstname, lastname, salary) as
     select empno, firstnme, lastname, salary
       from employee
       where salary > 35000
The following user-written query lists those employees who have a high level of education and who earn more than $35,000 per year:
   select e1.empno, e1.firstnme, e1.lastname, e1.edlevel, e2.salary
     from emp_education e1, emp_salaries  e2
     where e1.empno = e2.empno
During query rewrite, these two views could be merged to create the following query:
   select e1.empno, e1.firstnme, e1.lastname, e1.edlevel, e2.salary
     from employee e1, employee e2
     where
       e1.empno = e2.empno and
       e1.edlevel > 17 and
       e2.salary  > 35000

By merging the SELECT statements from the two views with the user-written SELECT statement, the optimizer can consider more choices when selecting an access plan. In addition, if the two views that have been merged use the same base table, additional rewriting might be performed.

Example - Subquery to join transformations

The SQL and XQuery compiler will take a query containing a subquery, such as:
   select empno, firstnme, lastname, phoneno
     from employee
     where workdept in
       (select deptno
         from department
         where deptname = 'OPERATIONS')
and convert it to a join query of the form:
   select distinct empno, firstnme, lastname, phoneno
     from employee emp, department dept
     where
       emp.workdept = dept.deptno and
       dept.deptname = 'OPERATIONS'

A join is generally much more efficient to execute than a subquery.

Example - Redundant join elimination

Queries sometimes have unnecessary joins.
   select e1.empno, e1.firstnme, e1.lastname, e1.edlevel, e2.salary
     from employee e1,
          employee e2
    where e1.empno = e2.empno
      and e1.edlevel > 17
      and e2.salary  > 35000
The SQL and XQuery compiler can eliminate the join and simplify the query to:
   select empno, firstnme, lastname, edlevel, salary
     from employee
     where
       edlevel > 17 and
       salary  > 35000
The following example assumes that a referential constraint exists between the EMPLOYEE and DEPARTMENT tables on the department number. First, a view is created.
   create view peplview as
     select firstnme, lastname, salary, deptno, deptname, mgrno
       from employee e department d
       where e.workdept = d.deptno
Then a query such as the following:
   select lastname, salary
     from peplview
becomes:
   select lastname, salary
     from employee
     where workdept not null

Note that in this situation, even if you know that the query can be rewritten, you might not be able to do so because you do not have access to the underlying tables. You might only have access to the view (shown previously). Therefore, this type of optimization has to be performed by the database manager.

Redundancy in referential integrity joins likely occurs when:
  • Views are defined with joins
  • Queries are automatically generated

Example - Shared aggregation

Using multiple functions within a query can generate several calculations that take time. Reducing the number of required calculations improves the plan. The compiler takes a query that uses multiple functions, such as the following:
   select sum(salary+bonus+comm) as osum,
       avg(salary+bonus+comm) as oavg,
       count(*) as ocount
     from employee
and transforms it:
   select osum, osum/ocount ocount
     from (
       select sum(salary+bonus+comm) as osum,
           count(*) as ocount
         from employee
     ) as shared_agg
This rewrite halves the required number of sums and counts.