Hierarchical queries

A hierarchical query is a form of recursive query that retrieves a hierarchy, such as a bill of materials, from relational data by using a CONNECT BY clause.

Enablement

You enable hierarchical query support by setting the DB2_COMPATIBILITY_VECTOR registry variable to hexadecimal value 0x08 (bit position 4), and then stop and restart the instance to have the new setting take effect.
db2set DB2_COMPATIBILITY_VECTOR=08
db2stop
db2start
To take full advantage of the Db2® compatibility features for Oracle applications, the recommended setting for the DB2_COMPATIBILITY_VECTOR is ORA, which sets all of the compatibility bits.

You can then use CONNECT BY syntax, including pseudocolumns, unary operators, and the SYS_CONNECT_BY_PATH scalar function.

A hierarchical query contains a CONNECT BY clause that defines the join conditions between parent and child elements. Connect-by recursion uses the same subquery for the seed (START WITH clause) and the recursive step (CONNECT BY clause). This combination provides a concise method of representing recursions such as bills-of-material, reports-to-chains, or email threads.

Connect-by recursion returns an error if a cycle occurs. A cycle occurs when a row produces itself, either directly or indirectly. By using the optional CONNECT BY NOCYCLE clause, you can direct the recursion to ignore the duplicated row, thus avoiding both the cycle and the error. Hierarchical queries or connect-by recursion differs fromDb2 recursion. For more information about the differences, see Port CONNECT BY to DB2®.

hierarchical-query-clause

A subselect that includes a hierarchical-query-clause is called a hierarchical query.

Read syntax diagramSkip visual syntax diagramstart-with-clause connect-by-clause
start-with-clause
Read syntax diagramSkip visual syntax diagramSTART WITHsearch-condition
connect-by-clause
Read syntax diagramSkip visual syntax diagramCONNECT BYNOCYCLE search-condition
start-with-clause
START WITH denotes the seed of the recursion. The start-with-clause specifies the intermediate result table H1 for the hierarchical query. Table H1 consists of those rows of R for which the search-condition is true. If you do not specify the start-with-clause, H1 is the entire intermediate result table R. The rules for the search-condition within the start-with-clause are the same as those within the WHERE clause.
connect-by-clause
CONNECT BY describes the recursive step. The connect-by-clause produces the intermediate result table Hn+1 from Hn by joining Hn with R, using the search condition. If you specify the NOCYCLE keyword, the repeated row is not included in the intermediate result table Hn+1. An error is not returned. The rules for the search-condition within the connect-by-clause are the same as those within the WHERE clause, except that OLAP specifications cannot be specified (SQLSTATE 42903).

After a first intermediate result table H1 is established, subsequent intermediate result tables H2, H3, and so forth are generated. The subsequently created intermediate result tables are generated by joining Hn with table R using the connect-by-clause as a join condition to produce Hn+1. R is the result of the FROM clause of the subselect and any join predicates in the WHERE clause. The process stops when Hn+1 yields an empty result table. The result table H of the hierarchical-query-clause is the result as if UNION ALL were applied for every intermediate result table..

You can use the unary operator PRIOR to distinguish column references to Hn, the previous recursive step or parent, from column references to R. Consider the following example:
   CONNECT BY MGRID = PRIOR EMPID
MGRID is resolved with R, and EMPID is resolved within the columns of the previous intermediate result table Hn.

Rules

  • If the intermediate result table Hn+1 would return a row from R for a hierarchical path that is the same as a row from R that is already in that hierarchical path, an error is returned (SQLSTATE 560CO).
  • If the NOCYCLE keyword is specified, an error is not returned, but the repeated row is not included in the intermediate result table Hn+1.
  • A maximum of 64 levels of recursion is supported (SQLSTATE 54066).
  • A subselect that is a hierarchical query returns the intermediate result set in a partial order, unless you destroy that order by using an explicit ORDER BY clause, a GROUP BY or HAVING clause, or a DISTINCT keyword in the select list. The partial order returns rows such that rows produced in Hn+1 for a particular hierarchy immediately follow the row in Hn that produced them. You can use the ORDER SIBLINGS BY clause to enforce order within a set of rows produced by the same parent.
  • A hierarchical query is not supported for a materialized query table (SQLSTATE 428EC).
  • You cannot use the CONNECT BY clause with XML functions or XQuery (SQLSTATE 428H4).
  • You cannot specify a NEXT VALUE expression for a sequence in the following places (SQLSTATE 428F9):
    • The parameter list of the CONNECT_BY_ROOT operator or a SYS_CONNECT_BY_PATH function
    • START WITH and CONNECT BY clauses

Notes

  • Hierarchical query support affects the subselect in the following ways:
    • The clauses of the subselect are processed in the following sequence:
      1. FROM clause
      2. hierarchical-query-clause
      3. WHERE clause
      4. GROUP BY clause
      5. HAVING clause
      6. SELECT clause
      7. ORDER BY clause
      8. FETCH FIRST clause
    • Special rules apply to the order of processing the predicates in the WHERE clause. The search-condition is factored into predicates along with its AND conditions (conjunctions). If a predicate is an implicit join predicate (that is, it references more than one table in the FROM clause), the predicate is applied before the hierarchical-query-clause is applied. Any predicate referencing at most one table in the FROM clause is applied to the intermediate result table of the hierarchical-query-clause.

      If you write a hierarchical query involving joins, use explicit joined tables with an ON clause to avoid confusion about the application of WHERE clause predicates.

    • You can specify the ORDER SIBLINGS BY clause. This clause specifies that the ordering applies only to siblings within the hierarchies.
  • A pseudocolumn is a qualified or unqualified identifier that has meaning in a specific context and shares the same namespace as columns and variables. If an unqualified identifier does not identify a column or a variable, the identifier is checked to see whether it identifies a pseudocolumn.

    LEVEL is a pseudocolumn for use in hierarchical queries. The LEVEL pseudocolumn returns the recursive step in the hierarchy at which a row was produced. All rows that are produced by the START WITH clause return the value 1. Rows that are produced by applying the first iteration of the CONNECT BY clause return 2, and so on. The data type of the column is INTEGER NOT NULL.

    You must specify LEVEL in the context of a hierarchical query. You cannot specify LEVEL in the START WITH clause, as an argument of the CONNECT_BY_ROOT operator, or as an argument of the SYS_CONNECT_BY_PATH function (SQLSTATE 428H4).

  • Unary operators that support hierarchical queries are CONNECT_BY_ROOT and PRIOR.
  • A functions that supports hierarchical queries is the SYS_CONNECT_BY_PATH scalar function.

Examples

  • The following reports-to-chain example illustrates connect-by recursion. The example is based on a table named MY_EMP, which is created and populated with data as follows:
    CREATE TABLE MY_EMP(
      EMPID  INTEGER NOT NULL PRIMARY KEY,
      NAME   VARCHAR(10),
      SALARY DECIMAL(9, 2),
      MGRID  INTEGER);
    
    INSERT INTO MY_EMP VALUES ( 1, 'Jones',    30000, 10);
    INSERT INTO MY_EMP VALUES ( 2, 'Hall',     35000, 10);
    INSERT INTO MY_EMP VALUES ( 3, 'Kim',      40000, 10);
    INSERT INTO MY_EMP VALUES ( 4, 'Lindsay',  38000, 10);
    INSERT INTO MY_EMP VALUES ( 5, 'McKeough', 42000, 11);
    INSERT INTO MY_EMP VALUES ( 6, 'Barnes',   41000, 11);
    INSERT INTO MY_EMP VALUES ( 7, 'O''Neil',  36000, 12);
    INSERT INTO MY_EMP VALUES ( 8, 'Smith',    34000, 12);
    INSERT INTO MY_EMP VALUES ( 9, 'Shoeman',  33000, 12);
    INSERT INTO MY_EMP VALUES (10, 'Monroe',   50000, 15);
    INSERT INTO MY_EMP VALUES (11, 'Zander',   52000, 16);
    INSERT INTO MY_EMP VALUES (12, 'Henry',    51000, 16);
    INSERT INTO MY_EMP VALUES (13, 'Aaron',    54000, 15);
    INSERT INTO MY_EMP VALUES (14, 'Scott',    53000, 16);
    INSERT INTO MY_EMP VALUES (15, 'Mills',    70000, 17);
    INSERT INTO MY_EMP VALUES (16, 'Goyal',    80000, 17);
    INSERT INTO MY_EMP VALUES (17, 'Urbassek', 95000, NULL);
    The following query returns all employees working for Goyal, as well as some additional information, such as the reports-to-chain:
      1 SELECT NAME,
      2        LEVEL,
      3        SALARY,
      4        CONNECT_BY_ROOT NAME AS ROOT,
      5        SUBSTR(SYS_CONNECT_BY_PATH(NAME, ':'), 1, 25) AS CHAIN
      6   FROM MY_EMP
      7   START WITH NAME = 'Goyal'
      8   CONNECT BY PRIOR EMPID = MGRID
      9   ORDER SIBLINGS BY SALARY;
      NAME       LEVEL       SALARY      ROOT  CHAIN
      ---------- ----------- ----------- ----- ---------------
      Goyal                1    80000.00 Goyal :Goyal
      Henry                2    51000.00 Goyal :Goyal:Henry
      Shoeman              3    33000.00 Goyal :Goyal:Henry:Shoeman
      Smith                3    34000.00 Goyal :Goyal:Henry:Smith
      O'Neil               3    36000.00 Goyal :Goyal:Henry:O'Neil
      Zander               2    52000.00 Goyal :Goyal:Zander
      Barnes               3    41000.00 Goyal :Goyal:Zander:Barnes
      McKeough             3    42000.00 Goyal :Goyal:Zander:McKeough
      Scott                2    53000.00 Goyal :Goyal:Scott

    Lines 7 and 8 comprise the core of the recursion: The optional START WITH clause describes the WHERE clause that is to be used on the source table to seed the recursion. In this case, only the row for employee Goyal is selected. If the START WITH clause is omitted, the entire source table is used to seed the recursion. The CONNECT BY clause describes how, given the existing rows, the next set of rows is to be found. The unary operator PRIOR is used to distinguish values in the previous step from those in the current step. PRIOR identifies EMPID as the employee ID of the previous recursive step, and MGRID as originating from the current recursive step.

    The LEVEL pseudocolumn in line 2 indicates the current level of recursion.

    CONNECT_BY_ROOT is a unary operator that always returns the value of its argument as it was during the first recursive step; that is, the values that are returned by an explicit or implicit START WITH clause.

    SYS_CONNECT_BY_PATH() is a binary function that prepends the second argument to the first and then appends the result to the value that it produced in the previous recursive step. The arguments must be character types.

    Unless explicitly overridden, connect-by recursion returns a result set in a partial order; that is, the rows that are produced by a recursive step always follow the row that produced them. Siblings at the same level of recursion have no specific order. The ORDER SIBLINGS BY clause in line 9 defines an order for these siblings, which further refines the partial order, potentially into a total order.

  • Return the organizational structure of the DEPARTMENT table. Use the level of the department to visualize the hierarchy.
       SELECT LEVEL, CAST(SPACE((LEVEL - 1) * 4) || '/' || DEPTNAME
           AS VARCHAR(40)) AS DEPTNAME
         FROM DEPARTMENT
         START WITH DEPTNO = 'A00'
         CONNECT BY NOCYCLE PRIOR DEPTNO = ADMRDEPT
    The query returns:
    LEVEL       DEPTNAME                                
    ----------- ----------------------------------------
              1 /SPIFFY COMPUTER SERVICE DIV.       
              2     /PLANNING                       
              2     /INFORMATION CENTER             
              2     /DEVELOPMENT CENTER             
              3         /MANUFACTURING SYSTEMS      
              3         /ADMINISTRATION SYSTEMS     
              2     /SUPPORT SERVICES               
              3         /OPERATIONS                 
              3         /SOFTWARE SUPPORT           
              3         /BRANCH OFFICE F2           
              3         /BRANCH OFFICE G2           
              3         /BRANCH OFFICE H2           
              3         /BRANCH OFFICE I2           
              3         /BRANCH OFFICE J2