Joins

A join is the process of combining data from two or more tables based on some common domain of information. Rows from one table are paired with rows from another table when information in the corresponding rows match on the basis of the joining criterion (the join predicate).

For example, consider the following two tables:
TABLE1 TABLE2
PROJ PROJ_ID PROJ_ID NAME
A 1 1 Sam
B 2 3 Joe
C 3 4 Mary
D 4 1 Sue
2 Mike
To join TABLE1 and TABLE2, such that the PROJ_ID columns have the same values, use the following SQL statement:
   select proj, x.proj_id, name
     from table1 x, table2 y
     where x.proj_id = y.proj_id
In this case, the appropriate join predicate is: where x.proj_id = y.proj_id.
The query yields the following result set:
PROJ PROJ_ID NAME
A 1 Sam
A 1 Sue
B 2 Mike
C 3 Joe
D 4 Mary
Depending on the nature of any join predicates, as well as any costs determined on the basis of table and index statistics, the optimizer chooses one of the following join methods:
  • Nested-loop join
  • Merge join
  • Hash join

When two tables are joined, one table is selected as the outer table and the other table is regarded as the inner table of the join. The outer table is accessed first and is scanned only once. Whether the inner table is scanned multiple times depends on the type of join and the indexes that are available. Even if a query joins more than two tables, the optimizer joins only two tables at a time. If necessary, temporary tables are created to hold intermediate results.

You can provide explicit join operators, such as INNER or LEFT OUTER JOIN, to determine how tables are used in the join. Before you alter a query in this way, however, you should allow the optimizer to determine how to join the tables, and then analyze query performance to decide whether to add join operators.