Using recursive queries

Some applications work with data that is recursive in nature. To query this type of data, you can use a hierarchical query or a recursive common table expression.

One example of recursive data is a Bill of Materials (BOM) application that works with the expansion of parts and its component subparts. For example, a chair might be made of a seat unit and a leg assembly. The seat unit might consist of a seat and two arms. Each of these parts can be further broken down into its subparts until there is a list of all the parts needed to build a chair.

DB2® for i provides two ways of defining a recursive query. The first one is called a hierarchical query which uses the CONNECT BY clause to define how a parent row is to be associated with its child rows. The second method is to use a recursive common table expression. This uses a common table expression to define the first, or seed, rows and then uses a UNION to define how the child rows are determined.

Each of these methods of defining a recursive query has advantages and disadvantages. The CONNECT BY syntax is much simpler to understand, but has fewer ways to derive the data in its query. CONNECT BY can be specified in any subselect anywhere in a query. A recursive common table expression has more options for how the union is defined to generate the child rows.

There are a couple of behavioral differences between a connect by recursive query and a recursive common table expression query. First, they differ in how they handle cyclic data. This difference is discussed in the examples. Second, connect by allows a sort among siblings. This is also shown in the examples. Finally, the two implementations differ in how the data is put on a queue that is used to implement the recursion. By default a recursive common table expression's data tends to come out in breadth first order, first in first out. With connect by, the order is designed to come out depth first. This means that rows in a recursive step immediately follow their parent row. The recursive common table expression syntax gives you a choice of depth or breadth first hierarchical order by adding the SEARCH clause. The connect by syntax is always depth first.

In the trip planner examples for these recursive methods, airline flights and train connections are used to find transportation paths between cities. The following table definitions and data are used in the examples.

CREATE TABLE FLIGHTS (DEPARTURE CHAR(20),
                      ARRIVAL CHAR(20), 
                      CARRIER CHAR(15),
                      FLIGHT_NUMBER CHAR(5), 
                      PRICE INT);


INSERT INTO FLIGHTS VALUES('New York', 'Paris', 'Atlantic', '234', 400);
INSERT INTO FLIGHTS VALUES('Chicago', 'Miami', 'NA Air', '2334', 300);
INSERT INTO FLIGHTS VALUES('New York', 'London', 'Atlantic', '5473', 350);
INSERT INTO FLIGHTS VALUES('London', 'Athens'  , 'Mediterranean', '247', 340);
INSERT INTO FLIGHTS VALUES('Athens', 'Nicosia' , 'Mediterranean', '2356', 280); 
INSERT INTO FLIGHTS VALUES('Paris', 'Madrid' , 'Euro Air',  '3256', 380);
INSERT INTO FLIGHTS VALUES('Paris', 'Cairo' , 'Euro Air', '63', 480);
INSERT INTO FLIGHTS VALUES('Chicago', 'Frankfurt', 'Atlantic', '37', 480);
INSERT INTO FLIGHTS VALUES('Frankfurt', 'Moscow', 'Asia Air', '2337', 580);
INSERT INTO FLIGHTS VALUES('Frankfurt', 'Beijing', 'Asia Air',  '77', 480); 
INSERT INTO FLIGHTS VALUES('Moscow', 'Tokyo', 'Asia Air', '437', 680);
INSERT INTO FLIGHTS VALUES('Frankfurt', 'Vienna', 'Euro Air', '59', 200);
INSERT INTO FLIGHTS VALUES('Paris', 'Rome', 'Euro Air', '534', 340);
INSERT INTO FLIGHTS VALUES('Miami', 'Lima', 'SA Air', '5234', 530);
INSERT INTO FLIGHTS VALUES('New York', 'Los Angeles', 'NA Air', '84', 330);
INSERT INTO FLIGHTS VALUES('Los Angeles', 'Tokyo', 'Pacific Air', '824', 530);
INSERT INTO FLIGHTS VALUES('Tokyo', 'Hawaii', 'Asia Air', '94', 330);
INSERT INTO FLIGHTS VALUES('Washington', 'Toronto', 'NA Air', '104', 250);

CREATE TABLE TRAINS(DEPARTURE CHAR(20), 
                    ARRIVAL CHAR(20),
                    RAILLINE CHAR(15), 
                    TRAIN CHAR(5), 
                    PRICE INT); 

INSERT INTO TRAINS VALUES('Chicago', 'Washington', 'UsTrack', '323', 90;
INSERT INTO TRAINS VALUES('Madrid', 'Barcelona', 'EuroTrack', '5234', 60);
INSERT INTO TRAINS VALUES('Washington' , 'Boston' , 'UsTrack', '232', 50);

CREATE TABLE FLIGHTSTATS(FLIGHT# CHAR(5), 
                         ON_TIME_PERCENT DECIMAL(5,2), 
                         CANCEL_PERCENT DECIMAL(5,2)); 

INSERT INTO FLIGHTSTATS VALUES('234', 85.0, 0.20);
INSERT INTO FLIGHTSTATS VALUES('2334',  92.0, 0.10);
INSERT INTO FLIGHTSTATS VALUES('5473', 86.2, 0.10);
INSERT INTO FLIGHTSTATS VALUES('247',  91.0, 0.10);
INSERT INTO FLIGHTSTATS VALUES('2356', 91.0, 0.10);
INSERT INTO FLIGHTSTATS VALUES('3256', 92.0 , 0.10);
INSERT INTO FLIGHTSTATS VALUES('63', 90.5 , 0.10);
INSERT INTO FLIGHTSTATS VALUES('37', 87.0 , 0.20);
INSERT INTO FLIGHTSTATS VALUES('2337', 80.0, 0.20);
INSERT INTO FLIGHTSTATS VALUES('77', 86.0, 0.10);
INSERT INTO FLIGHTSTATS VALUES('437', 81.0, 0.10);
INSERT INTO FLIGHTSTATS VALUES('59',  85.0, 01.0);
INSERT INTO FLIGHTSTATS VALUES('534', 87.0 , 01.0);
INSERT INTO FLIGHTSTATS VALUES('5234', 88.0, 0.20);
INSERT INTO FLIGHTSTATS VALUES('84', 88.0, 0.1);
INSERT INTO FLIGHTSTATS VALUES('824', 93.0, 0.10);
INSERT INTO FLIGHTSTATS VALUES('94', 92.0, 0.10);
INSERT INTO FLIGHTSTATS VALUES('104', 93.0, 0.10);

Using CONNECT BY hierarchical queries

Suppose you want to find out what cities you can fly to if you start in Chicago, and how many separate flights it will take to get there. The following query shows you that information.

SELECT CONNECT_BY_ROOT departure AS origin, departure, arrival, LEVEL AS flight_count
    FROM  flights
    START WITH departure = 'Chicago'
    CONNECT BY PRIOR arrival = departure

This query returns the following information.

Table 1. Results of the previous query
ORIGIN	DEPARTURE	ARRIVAL	FLIGHT_COUNT
Chicago	Chicago	Miami	1
Chicago	Miami	Lima	2
Chicago	Chicago	Frankfurt	1
Chicago	Frankfurt	Vienna	2
Chicago	Frankfurt	Beijing	2
Chicago	Frankfurt	Moscow	2
Chicago	Moscow	Tokyo	3
Chicago	Tokyo	Hawaii	4

There are several parts to this hierarchical query. There is an initial selection which defines the initial seed for the recursion. In this case, it is the rows from the flights table that START WITH a departure from ‘Chicago'. The CONNECT BY clause is used to define how the rows that have already been generated are to be 'connected' to generate more rows for subsequent iterations of the query. The PRIOR unary operator tells DB2 how to select a new row based on the results of the previous row. The recursive join column (typically one column but could have several) selected by the result of the START WITH clause is referenced by the PRIOR keyword. This means that the previous row's ARRIVAL city becomes the new row's PRIOR value for the DEPARTURE city. This is encapsulated in the clause CONNECT BY PRIOR arrival = departure.

There are two other connect by features illustrated in this example query. The unary operator CONNECT_BY_ROOT is used to define a fixed expression value that is determined in the initialization step and is the same for all the generated recursive result rows. Typically, it is your starting value for that particular iteration as you might have multiple START WITH values. In this query, it defines in the result set the ORIGIN of the different destination options from Chicago. If the START WITH clause selected multiple cities, ORIGIN would indicate which city a row used as its start value.

LEVEL is one of three pseudo columns available when using connect by recursion. The value of LEVEL reflects the recursion level of the current row. In this example, LEVEL also reflects the number of flights it would take to get from the city of ORIGIN (Chicago) to the different ARRIVAL cities.

A hierarchical query is run just like its equivalent recursive common table expression query and generates the same result set. See Using recursive common table expressions and recursive views. The only difference is in the order of the returned rows. The connect by query returns the rows in depth first order; every row of the result set immediately follows its parent row. The recursive common table expression query returns the rows in breadth first order; all the rows for one level are returned, then all the rows that were generated from the previous level are returned.

Example: Two tables used for recursion using CONNECT BY

Now, suppose you start in Chicago but want to add in transportation options by rail in addition to flights and you want to know which cities you can get to and how many connections it would take.

The following connect by query returns that information. Note that in the corresponding recursive common table expression example, Example: Two tables used for recursion using recursive common table expressions, we can also distinguish between the number of rail vs number of airline connections and sum the ongoing ticket cost of the connections to that destination. Those calculations are examples of derivations allowed using the more complex but more flexible recursive common table expression syntax. That capability is not available when using the connect by syntax.

SELECT CONNECT_BY_ROOT departure AS departure, arrival, LEVEL - 1 connections
  FROM
      ( SELECT departure, arrival FROM flights
           UNION
        SELECT departure, arrival FROM trains) t
  START WITH departure = 'Chicago'
  CONNECT BY PRIOR arrival = departure;

This query returns the following information.

Table 2. Results of the previous query
DEPARTURE	ARRIVAL	CONNECTIONS
Chicago	Miami	0
Chicago	Lima	1
Chicago	Frankfurt	0
Chicago	Vienna	1
Chicago	Beijing	1
Chicago	Moscow	1
Chicago	Tokyo	2
Chicago	Hawaii	3
Chicago	Washington	0
Chicago	Boston	1
Chicago	Toronto	1

In this example, there are two data sources feeding the recursion, a list of flights and a list of trains. In the final results, you see how many connections are needed to travel between the cities.

Example: Sibling ordering using CONNECT BY

One of the drawbacks of recursive common table expressions is that you cannot order the results among siblings based on a particular column value. You can do this with connect by. For example, if you want to output destinations from New York but you also want to order your hierarchical data among siblings by a certain value, such as the cost of a ticket to that destination, you can do that by specifying the ORDER SIBLINGS BY clause.

SELECT CONNECT_BY_ROOT departure AS origin, departure, arrival, 
      LEVEL level, price ticket_price
  FROM  flights
  START WITH departure = 'New York'
  CONNECT BY PRIOR arrival = departure
  ORDER SIBLINGS BY price ASC

This query returns the following information.

Table 3. Results of the previous query
ORIGIN	DEPARTURE	ARRIVAL	LEVEL	TICKET_PRICE
New York	New York	LA	1	330
New York	LA	Tokyo	2	530
New York	Tokyo	Hawaii	3	330
New York	New York	London	1	350
New York	London	Athens	2	340
New York	Athens	Nicosia	3	280
New York	New York	Paris	1	400
New York	Paris	Rome	2	340
New York	Paris	Madrid	2	380
New York	Paris	Cairo	2	480

The result table shows all the destinations possible from the origin city of New York. All sibling destinations (those destinations that originate from the same departure city) are output sorted by ticket price. For example, the destinations from Paris are Rome, Madrid and Cairo; they are output ordered by ascending ticket price. Note that the output shows New York to LA as the first destination directly from New York because it has a less expensive ticket price (330) than did the direct connects to London or Paris which are 350 and 400 respectively.

Example: Cyclic data checks using CONNECT BY

The key to any recursive process, whether is it a recursive program or a recursive query, is that the recursion must be finite. If not, you will get into a never ending loop. CONNECT BY is unlike recursive common table expressions in that it always checks for infinite recursion and terminates that cycle automatically so you never have to worry about a runaway query.

By default, if connect by encounters cyclic data, it will issue an SQL error, SQ20451: Cycle detected in hierarchical query. This error causes termination of the query so no results are returned.

If you want results back and just want the infinite cycle to stop, you can specify the NOCYCLE keyword on the CONNECT BY clause. This means no error will be issued for cyclic data.

Using the NOCYCLE option along with the CONNECT_BY_ISCYCLE pseudo column is a way you can find cyclic data and correct the data if desired.

Inserting the following row into the FLIGHTS table results in potentially infinite recursion since Paris goes to Cairo and Cairo goes to Paris.

INSERT INTO flights VALUES ('Cairo', 'Paris', 'Atlantic', '1134', 440);

The following query illustrates the tolerance of the cyclic data by specifying NOCYCLE. In addition, the CONNECT_BY_ISCYCLE pseudo column is used to identify cyclic rows and the function SYS_CONNECT_BY_PATH is used to build an Itinerary string of all the connection cities leading up to the destination. SYS_CONNECT_BY_PATH is implemented as a CLOB data type so you have a large result column to reflect deep recursions.

SELECT CONNECT_BY_ROOT departure AS origin, arrival,
     SYS_CONNECT_BY_PATH(TRIM(arrival), ' : ') itinerary, CONNECT_BY_ISCYCLE cyclic
  FROM flights	
  START WITH departure = 'New York'
  CONNECT BY NOCYCLE PRIOR arrival = departure;

This query returns the following information.

Table 4. Results of the previous query
ORIGIN	ARRIVAL	ITINERARY	CYCLIC
New York	Paris	: Paris	0
New York	Rome	: Paris : Rome	0
New York	Cairo	: Paris : Cairo	0
New York	Paris	: Paris : Cairo : Paris	1
New York	Madrid	: Paris : Madrid	0
New York	London	: London	0
New York	Athens	: London : Athens	0
New York	Nicosia	: London : Athens : Nicosia	0
New York	LA	: LA	0
New York	Tokyo	: LA : Tokyo	0
New York	Hawaii	: LA : Tokyo : Hawaii	0

Note that the result set row that reflects cyclic data often depends on where you start in the cycle with the START WITH clause.

Example: Pseudo column CONNECT_BY_ISLEAF in CONNECT BY

There may be times when processing recursive data that you may want to know which rows result in no further recursion. In other words, which rows are leaf rows or have no children in the hierarchy.

In the following query, you can find out which destinations are final destinations; in other words, which destinations have no outbound flights. The CONNECT_BY_ISLEAF pseudo column will be 0 if it is not a leaf and 1 if it is. You can also specify CONNECT_BY_ISLEAF in a WHERE predicate to see only leaf rows.

SELECT CONNECT_BY_ROOT departure AS origin, arrival,
     SYS_CONNECT_BY_PATH(TRIM(arrival), ' : ') itinerary, CONNECT_BY_ISLEAF leaf
  FROM flights	
  START WITH departure = 'New York'
  CONNECT BY PRIOR arrival = departure;

This query returns the following information.

Table 5. Results of the previous query
ORIGIN	ARRIVAL	ITINERARY	LEAF
New York	Paris	: Paris	0
New York	Rome	: Paris : Rome	1
New York	Cairo	: Paris : Cairo	1
New York	Madrid	: Paris : Madrid	1
New York	London	: London	0
New York	Athens	: London : Athens	0
New York	Nicosia	: London : Athens : Nicosia	1
New York	LA	: LA	0
New York	Tokyo	: LA : Tokyo	0
New York	Hawaii	: LA : Tokyo : Hawaii	1

Example: Join predicates and where clause selection with CONNECT BY

Often times the hierarchical nature of your data is reflected in one table but you need to join those results to additional tables to fully determine the output of the row.

In a connect by query you can use any type of join supported by DB2 for i including INNER JOIN, LEFT OUTER JOIN, and LEFT EXCEPTION JOIN. When you explicitly use a JOIN clause, the predicate specified in the ON clause is applied first, before the connect by operation, and any WHERE clause in the connect by query is applied after the recursion. The WHERE selection is applied after the connect by so that the recursive process results don't end too soon.

In the following query, you are looking for all the flight connections starting in New York that have an ON_TIME_PERCENT greater than 90%.

SELECT CONNECT_BY_ROOT departure AS origin, departure, arrival,
      flight_number, on_time_Percent AS onTime
  FROM flights INNER JOIN flightstats ON flight_number = flight#	
  WHERE on_time_percent > 90
  START WITH departure = 'New York'
  CONNECT BY PRIOR arrival = departure;

This query returns the following information.

Table 6. Results of the previous query
ORIGIN	DEPARTURE	ARRIVAL	FLIGHT#	ONTIME
New York	Paris	Cairo	63	90.50
New York	Paris	Madrid	3256	92.00
New York	London	Athens	247	91.00
New York	Athens	Nicosia	2356	91.00
New York	LA	Tokyo	824	93.00
New York	Tokyo	Hawaii	94	92.00

This query can also be expressed without using the JOIN syntax. The query optimizer will pull out of the WHERE clause those predicates that are join predicates to be processed first and leave any remaining WHERE predicates to be evaluated after the recursion.

SELECT CONNECT_BY_ROOT departure AS origin, departure, arrival, flight_number, on_time_percent AS onTime
  FROM flights, flightstats  
  WHERE  flight_number = flight# AND on_time_percent > 90
  START WITH departure = 'New York'
  CONNECT BY PRIOR arrival = departure;

In this second example, if the WHERE predicates are more complex, you may need to aid the optimizer by explicitly pulling out the JOIN predicates between the flights and flightstats tables and using both an ON clause and a WHERE clause.

If you want additional search conditions to be applied as part of the recursion process, for example you never want to take a flight with an on time percentage of less than 90%, you can also control the join results by putting the join in a derived table with a join predicate and a WHERE clause.

SELECT CONNECT_BY_ROOT departure AS origin, departure, arrival, flight_number, on_time_percent AS onTime
  FROM (SELECT departure, arrival, flight_number, on_time_percent
             FROM flights, flightstats  
             WHERE flight_number = flight# AND on_time_percent > 90) t1
  START WITH departure='New York'
  CONNECT BY PRIOR arrival = departure;

Another option is to put the selection predicates in the START WITH and CONNECT BY clauses.

SELECT CONNECT_BY_ROOT departure AS origin, departure, arrival, flight_number, on_time_percent AS onTime
  FROM flights, flightstats  
  WHERE flight_number = flight# 
  START WITH departure = 'New York' AND on_time_percent > 90
  CONNECT BY PRIOR arrival = departure AND on_time_percent > 90

In this case, you would be out of luck as there are no direct flights out of New York with a greater than 90% on time statistic. Since there is nothing to seed the recursion, no rows are returned from the query.

Using recursive common table expressions and recursive views

Suppose you want to find out what cities you can fly to if you start in Chicago, and how many separate flights it will take to get there. The following query shows you that information.

WITH destinations (origin, departure, arrival, flight_count) AS    
    (SELECT a.departure, a.departure, a.arrival, 1 
            FROM flights a
            WHERE a.departure = 'Chicago'
     UNION ALL
     SELECT r.origin, b.departure, b.arrival, r.flight_count + 1 
            FROM destinations r, flights b
            WHERE r.arrival = b.departure)
SELECT origin, departure, arrival, flight_count 
    FROM destinations

This query returns the following information.

Table 7. Results of the previous query
ORIGIN	DEPARTURE	ARRIVAL	FLIGHT_COUNT
Chicago	Chicago	Miami	1
Chicago	Chicago	Frankfurt	1
Chicago	Miami	Lima	2
Chicago	Frankfurt	Moscow	2
Chicago	Frankfurt	Beijing	2
Chicago	Frankfurt	Vienna	2
Chicago	Moscow	Tokyo	3
Chicago	Tokyo	Hawaii	4

This recursive query is written in two parts. The first part of the common table expression is called the intialization fullselect. It selects the first rows for the result set of the common table expression. In this example, it selects the two rows in the flights table that get you directly to another location from Chicago. It also initializes the number of flight legs to one for each row it selects.

The second part of the recursive query joins the rows from the current result set of the common table expression with other rows from the original table. It is called the iterative fullselect. This is where the recursion is introduced. Notice that the rows that have already been selected for the result set are referenced by using the name of the common table expression as the table name and the common table expression result column names as the column names.

In this recursive part of the query, any rows from the original table that you can get to from each of the previously selected arrival cities are selected. A previously selected row's arrival city becomes the new departure city. Each row from this recursive select increments the flight count to the destination by one more flight. As these new rows are added to the common table expression result set, they are also fed into the iterative fullselect to generate more result set rows. In the data for the final result, you can see that the total number of flights is actually the total number of recursive joins (plus 1) it took to get to that arrival city.

A recursive view looks very similar to a recursive common table expression. You can write the previous recursive common table expression as a recursive view like this:

CREATE VIEW destinations (origin, departure, arrival, flight_count) AS    
     SELECT departure, departure, arrival, 1 
            FROM flights 
            WHERE departure = 'Chicago'
     UNION ALL
     SELECT r.origin, b.departure, b.arrival, r.flight_count + 1 
            FROM destinations r, flights b
            WHERE r.arrival = b.departure)

The iterative fullselect part of this view definition refers to the view itself. Selection from this view returns the same rows as you get from the previous recursive common table expression. For comparison, note that connect by recursion is allowed anywhere a SELECT is allowed, so it can easily be included in a view definition.

Example: Two starting cities using recursive common table expressions

Suppose you are willing to fly from either Chicago or New York, and you want to know where you could go and how much it would cost.

WITH destinations (departure, arrival, connections, cost) AS
    (SELECT a.departure, a.arrival, 0, price
            FROM flights a 
            WHERE a.departure = 'Chicago' OR
                  a.departure = 'New York'
     UNION ALL 
     SELECT r.departure, b.arrival, r.connections + 1,
                  r.cost + b.price 
            FROM destinations r, flights b 
            WHERE r.arrival = b.departure) 
SELECT departure, arrival, connections, cost 
       FROM destinations

This query returns the following information.

Table 8. Results of the previous query
DEPARTURE	ARRIVAL	CONNECTIONS	COST
Chicago	Miami	0	300
Chicago	Frankfurt	0	480
New York	Paris	0	400
New York	London	0	350
New York	Los Angeles	0	330
Chicago	Lima	1	830
Chicago	Moscow	1	1,060
Chicago	Beijing	1	960
Chicago	Vienna	1	680
New York	Madrid	1	780
New York	Cairo	1	880
New York	Rome	1	740
New York	Athens	1	690
New York	Tokyo	1	860
Chicago	Tokyo	2	1,740
New York	Nicosia	2	970
New York	Hawaii	2	1,190
Chicago	Hawaii	3	2,070

For each returned row, the results show the starting departure city and the final destination city. It counts the number of connections needed rather than the total number of flight and adds up the total cost for all the flights.

Example: Two tables used for recursion using recursive common table expressions

Now, suppose you start in Chicago but add in transportation by railway in addition to the airline flights, and you want to know which cities you can go to.

The following query returns that information:

WITH destinations (departure, arrival, connections, flights, trains, cost) AS
  (SELECT f.departure, f.arrival, 0, 1, 0, price 
          FROM  flights f 
          WHERE f.departure = 'Chicago' 
   UNION ALL
   SELECT t.departure, t.arrival, 0, 0, 1, price 
          FROM trains t 
          WHERE t.departure = 'Chicago'
   UNION ALL
   SELECT r.departure, b.arrival, r.connections + 1 , r.flights + 1, r.trains, 
             r.cost + b.price
          FROM  destinations r, flights b  
          WHERE r.arrival = b.departure
   UNION ALL
   SELECT r.departure, c.arrival, r.connections + 1 ,
             r.flights, r.trains + 1, r.cost + c.price  
          FROM destinations r, trains c 
          WHERE r.arrival = c.departure) 
SELECT departure, arrival, connections, flights, trains, cost 
       FROM destinations

This query returns the following information.

Table 9. Results of the previous query
DEPARTURE	ARRIVAL	CONNECTIONS	FLIGHTS	TRAINS	COST
Chicago	Miami	0	1	0	300
Chicago	Frankfurt	0	1	0	480
Chicago	Washington	0	0	1	90
Chicago	Lima	1	2	0	830
Chicago	Moscow	1	2	0	1,060
Chicago	Beijing	1	2	0	960
Chicago	Vienna	1	2	0	680
Chicago	Toronto	1	1	1	340
Chicago	Boston	1	0	2	140
Chicago	Tokyo	2	3	0	1,740
Chicago	Hawaii	3	4	0	2,070

In this example, there are two parts of the common table expression that provide initialization values to the query: one for flights and one for trains. For each of the result rows, there are two recursive references to get from the previous arrival location to the next possible destination: one for continuing by air, the other for continuing by train. In the final results, you would see how many connections are needed and how many airline or train trips can be taken.

Example: DEPTH FIRST and BREADTH FIRST options for recursive common table expressions

The two examples here show the difference in the result set row order based on whether the recursion is processed depth first or breadth first.

Note: The search clause is not supported directly for recursive views. You can define a view that contains a recursive common table expression to get this function.

The option to determine the result using breadth first or depth first is a recursive relationship sort based on the recursive join column specified for the SEARCH BY clause. When the recursion is handled breadth first, all children are processed first, then all grandchildren, then all great grandchildren. When the recursion is handled depth first, the full recursive ancestry chain of one child is processed before going to the next child.

In both of these cases, you specify an extra column name that is used by the recursive process to keep track of the depth first or breadth first ordering. This column must be used in the ORDER BY clause of the outer query to get the rows back in the specified order. If this column is not used in the ORDER BY, the DEPTH FIRST or BREADTH FIRST processing option is ignored.

The selection of which column to use for the SEARCH BY column is important. To have any meaning in the result, it must be the column that is used in the iterative fullselect to join from the initialization fullselect. In this example, ARRIVAL is the column to use.

The following query returns that information:

WITH destinations (departure, arrival, connections, cost) AS
    (SELECT f.departure, f.arrival, 0, price       
            FROM flights f 
            WHERE f.departure = 'Chicago'
     UNION ALL 
     SELECT r.departure, b.arrival, r.connections + 1,
                r.cost + b.price 
            FROM destinations r, flights b 
            WHERE r.arrival = b.departure) 
    SEARCH DEPTH FIRST BY arrival SET ordcol 
SELECT * 
   FROM destinations 
   ORDER BY ordcol

This query returns the following information.

Table 10. Results of the previous query
DEPARTURE	ARRIVAL	CONNECTIONS	COST
Chicago	Miami	0	300
Chicago	Lima	1	830
Chicago	Frankfurt	0	480
Chicago	Moscow	1	1,060
Chicago	Tokyo	2	1,740
Chicago	Hawaii	3	2,070
Chicago	Beijing	1	960
Chicago	Vienna	1	680

In this result data, you can see that all destinations that are generated from the Chicago-to-Miami row are listed before the destinations from the Chicago-to-Frankfort row.

Next, you can run the same query but request the result to be ordered breadth first.

WITH destinations (departure, arrival, connections, cost) AS 
    (SELECT f.departure, f.arrival, 0, price
            FROM flights f 
            WHERE f.departure = 'Chicago' 
     UNION ALL 
     SELECT r.departure, b.arrival, r.connections + 1,
                r.cost + b.price 
            FROM destinations r, flights b 
            WHERE r.arrival = b.departure) 
    SEARCH BREADTH FIRST BY arrival SET ordcol 
SELECT * 
  FROM destinations 
  ORDER BY ordcol

This query returns the following information.

Table 11. Results of the previous query
DEPARTURE	ARRIVAL	CONNECTIONS	COST
Chicago	Miami	0	300
Chicago	Frankfurt	0	480
Chicago	Lima	1	830
Chicago	Moscow	1	1,060
Chicago	Beijing	1	960
Chicago	Vienna	1	680
Chicago	Tokyo	2	1,740
Chicago	Hawaii	3	2,070

In this result data, you can see that all the direct connections from Chicago are listed before the connecting flights. The data is identical to the results from the previous query, but in a breadth first order. As you can see, there is no ordering done based on any values of the column used for depth or breadth first processing. To get ordering, the ORDER SIBLINGS BY construct available with the CONNECT BY form of recursion can be used.

Example: Cyclic data using recursive common table expressions

The key to any recursive process, whether it is a recursive programming algorithm or querying recursive data, is that the recursion must be finite. If not, you will get into a never ending loop. The CYCLE option allows you to safeguard against cyclic data. Not only will it terminate repeating cycles but it also allows you to optionally output a cycle mark indicator that may lead you to find cyclic data.

Note: The cycle clause is not supported directly for recursive views. You can define a view that contains a recursive common table expression to get this function.

For a final example, suppose we have a cycle in the data. By adding one more row to the table, there is now a flight from Cairo to Paris and one from Paris to Cairo. Without accounting for possible cyclic data like this, it is quite easy to generate a query that will go into an infinite loop processing the data.

The following query returns that information:

INSERT INTO FLIGHTS VALUES('Cairo', 'Paris', 'Euro Air', '1134', 440)


WITH destinations (departure, arrival, connections, cost, itinerary) AS
    (SELECT f.departure, f.arrival, 1, price, 
                CAST(f.departure CONCAT f.arrival AS VARCHAR(2000))
            FROM flights f  
            WHERE f.departure = 'New York'
     UNION ALL 
     SELECT r.departure, b.arrival, r.connections + 1 ,
                r.cost + b.price, CAST(r.itinerary CONCAT b.arrival AS VARCHAR(2000))
            FROM destinations r, flights b 
            WHERE r.arrival = b.departure) 
    CYCLE arrival SET cyclic_data TO '1' DEFAULT '0' 
SELECT departure, arrival, itinerary, cyclic_data  
      FROM destinations

This query returns the following information.

Table 12. Results of the previous query
DEPARTURE	ARRIVAL	ITINERARY	CYCLIC_DATA
New York	Paris	New York Paris	0
New York	London	New York London	0
New York	Los Angeles	New York Los Angeles	0
New York	Madrid	New York Paris Madrid	0
New York	Cairo	New York Paris Cairo	0
New York	Rome	New York Paris Rome	0
New York	Athens	New York London Athens	0
New York	Tokyo	New York Los Angeles Tokyo	0
New York	Paris	New York Paris Cairo Paris	1
New York	Nicosia	New York London Athens Nicosia	0
New York	Hawaii	New York Los Angeles Tokyo Hawaii	0

In this example, the ARRIVAL column is defined in the CYCLE clause as the column to use for detecting a cycle in the data. When a cycle is found, a special column, CYCLIC_DATA in this case, is set to the character value of '1' for the cycling row in the result set. All other rows will contain the default value of '0'. When a cycle on the ARRIVAL column is found, processing will not proceed any further in the data so the infinite loop will not happen. To see if your data actually has a cyclic reference, the CYCLIC_DATA column can be referenced in the outer query. You can choose to exclude cyclic rows by adding a predicate: WHERE CYCLIC_DATA = 0.