Using recursive queries
Some applications work with data that is recursive in nature. To query this type of data, you can use a hierarchical query or a recursive common table expression.
One example of recursive data is a Bill of Materials (BOM) application that works with the expansion of parts and its component subparts. For example, a chair might be made of a seat unit and a leg assembly. The seat unit might consist of a seat and two arms. Each of these parts can be further broken down into its subparts until there is a list of all the parts needed to build a chair.
Each of these methods of defining a recursive query has advantages and disadvantages. The CONNECT BY syntax is much simpler to understand, but has fewer ways to derive the data in its query. CONNECT BY can be specified in any subselect anywhere in a query. A recursive common table expression has more options for how the union is defined to generate the child rows.
There are a couple of behavioral differences between a connect by recursive query and a recursive common table expression query. First, they differ in how they handle cyclic data. This difference is discussed in the examples. Second, connect by allows a sort among siblings. This is also shown in the examples. Finally, the two implementations differ in how the data is put on a queue that is used to implement the recursion. By default a recursive common table expression's data tends to come out in breadth first order, first in first out. With connect by, the order is designed to come out depth first. This means that rows in a recursive step immediately follow their parent row. The recursive common table expression syntax gives you a choice of depth or breadth first hierarchical order by adding the SEARCH clause. The connect by syntax is always depth first.
CREATE TABLE FLIGHTS (DEPARTURE CHAR(20),
ARRIVAL CHAR(20),
CARRIER CHAR(15),
FLIGHT_NUMBER CHAR(5),
PRICE INT);
INSERT INTO FLIGHTS VALUES('New York', 'Paris', 'Atlantic', '234', 400);
INSERT INTO FLIGHTS VALUES('Chicago', 'Miami', 'NA Air', '2334', 300);
INSERT INTO FLIGHTS VALUES('New York', 'London', 'Atlantic', '5473', 350);
INSERT INTO FLIGHTS VALUES('London', 'Athens' , 'Mediterranean', '247', 340);
INSERT INTO FLIGHTS VALUES('Athens', 'Nicosia' , 'Mediterranean', '2356', 280);
INSERT INTO FLIGHTS VALUES('Paris', 'Madrid' , 'Euro Air', '3256', 380);
INSERT INTO FLIGHTS VALUES('Paris', 'Cairo' , 'Euro Air', '63', 480);
INSERT INTO FLIGHTS VALUES('Chicago', 'Frankfurt', 'Atlantic', '37', 480);
INSERT INTO FLIGHTS VALUES('Frankfurt', 'Moscow', 'Asia Air', '2337', 580);
INSERT INTO FLIGHTS VALUES('Frankfurt', 'Beijing', 'Asia Air', '77', 480);
INSERT INTO FLIGHTS VALUES('Moscow', 'Tokyo', 'Asia Air', '437', 680);
INSERT INTO FLIGHTS VALUES('Frankfurt', 'Vienna', 'Euro Air', '59', 200);
INSERT INTO FLIGHTS VALUES('Paris', 'Rome', 'Euro Air', '534', 340);
INSERT INTO FLIGHTS VALUES('Miami', 'Lima', 'SA Air', '5234', 530);
INSERT INTO FLIGHTS VALUES('New York', 'Los Angeles', 'NA Air', '84', 330);
INSERT INTO FLIGHTS VALUES('Los Angeles', 'Tokyo', 'Pacific Air', '824', 530);
INSERT INTO FLIGHTS VALUES('Tokyo', 'Hawaii', 'Asia Air', '94', 330);
INSERT INTO FLIGHTS VALUES('Washington', 'Toronto', 'NA Air', '104', 250);
CREATE TABLE TRAINS(DEPARTURE CHAR(20),
ARRIVAL CHAR(20),
RAILLINE CHAR(15),
TRAIN CHAR(5),
PRICE INT);
INSERT INTO TRAINS VALUES('Chicago', 'Washington', 'UsTrack', '323', 90;
INSERT INTO TRAINS VALUES('Madrid', 'Barcelona', 'EuroTrack', '5234', 60);
INSERT INTO TRAINS VALUES('Washington' , 'Boston' , 'UsTrack', '232', 50);
CREATE TABLE FLIGHTSTATS(FLIGHT# CHAR(5),
ON_TIME_PERCENT DECIMAL(5,2),
CANCEL_PERCENT DECIMAL(5,2));
INSERT INTO FLIGHTSTATS VALUES('234', 85.0, 0.20);
INSERT INTO FLIGHTSTATS VALUES('2334', 92.0, 0.10);
INSERT INTO FLIGHTSTATS VALUES('5473', 86.2, 0.10);
INSERT INTO FLIGHTSTATS VALUES('247', 91.0, 0.10);
INSERT INTO FLIGHTSTATS VALUES('2356', 91.0, 0.10);
INSERT INTO FLIGHTSTATS VALUES('3256', 92.0 , 0.10);
INSERT INTO FLIGHTSTATS VALUES('63', 90.5 , 0.10);
INSERT INTO FLIGHTSTATS VALUES('37', 87.0 , 0.20);
INSERT INTO FLIGHTSTATS VALUES('2337', 80.0, 0.20);
INSERT INTO FLIGHTSTATS VALUES('77', 86.0, 0.10);
INSERT INTO FLIGHTSTATS VALUES('437', 81.0, 0.10);
INSERT INTO FLIGHTSTATS VALUES('59', 85.0, 01.0);
INSERT INTO FLIGHTSTATS VALUES('534', 87.0 , 01.0);
INSERT INTO FLIGHTSTATS VALUES('5234', 88.0, 0.20);
INSERT INTO FLIGHTSTATS VALUES('84', 88.0, 0.1);
INSERT INTO FLIGHTSTATS VALUES('824', 93.0, 0.10);
INSERT INTO FLIGHTSTATS VALUES('94', 92.0, 0.10);
INSERT INTO FLIGHTSTATS VALUES('104', 93.0, 0.10);
Using CONNECT BY hierarchical queries
SELECT CONNECT_BY_ROOT departure AS origin, departure, arrival, LEVEL AS flight_count
FROM flights
START WITH departure = 'Chicago'
CONNECT BY PRIOR arrival = departure
This query returns the following information.
ORIGIN | DEPARTURE | ARRIVAL | FLIGHT_COUNT |
---|---|---|---|
Chicago | Chicago | Miami | 1 |
Chicago | Miami | Lima | 2 |
Chicago | Chicago | Frankfurt | 1 |
Chicago | Frankfurt | Vienna | 2 |
Chicago | Frankfurt | Beijing | 2 |
Chicago | Frankfurt | Moscow | 2 |
Chicago | Moscow | Tokyo | 3 |
Chicago | Tokyo | Hawaii | 4 |
There are several parts to this hierarchical query. There is an initial selection which defines the initial seed for the recursion. In this case, it is the rows from the flights table that START WITH a departure from ‘Chicago'. The CONNECT BY clause is used to define how the rows that have already been generated are to be 'connected' to generate more rows for subsequent iterations of the query. The PRIOR unary operator tells DB2 how to select a new row based on the results of the previous row. The recursive join column (typically one column but could have several) selected by the result of the START WITH clause is referenced by the PRIOR keyword. This means that the previous row's ARRIVAL city becomes the new row's PRIOR value for the DEPARTURE city. This is encapsulated in the clause CONNECT BY PRIOR arrival = departure.
There are two other connect by features illustrated in this example query. The unary operator CONNECT_BY_ROOT is used to define a fixed expression value that is determined in the initialization step and is the same for all the generated recursive result rows. Typically, it is your starting value for that particular iteration as you might have multiple START WITH values. In this query, it defines in the result set the ORIGIN of the different destination options from Chicago. If the START WITH clause selected multiple cities, ORIGIN would indicate which city a row used as its start value.
LEVEL is one of three pseudo columns available when using connect by recursion. The value of LEVEL reflects the recursion level of the current row. In this example, LEVEL also reflects the number of flights it would take to get from the city of ORIGIN (Chicago) to the different ARRIVAL cities.
A hierarchical query is run just like its equivalent recursive common table expression query and generates the same result set. See Using recursive common table expressions and recursive views. The only difference is in the order of the returned rows. The connect by query returns the rows in depth first order; every row of the result set immediately follows its parent row. The recursive common table expression query returns the rows in breadth first order; all the rows for one level are returned, then all the rows that were generated from the previous level are returned.
Example: Two tables used for recursion using CONNECT BY
Now, suppose you start in Chicago but want to add in transportation options by rail in addition to flights and you want to know which cities you can get to and how many connections it would take.
The following connect by query returns that information. Note that in the corresponding recursive common table expression example, Example: Two tables used for recursion using recursive common table expressions, we can also distinguish between the number of rail vs number of airline connections and sum the ongoing ticket cost of the connections to that destination. Those calculations are examples of derivations allowed using the more complex but more flexible recursive common table expression syntax. That capability is not available when using the connect by syntax.
SELECT CONNECT_BY_ROOT departure AS departure, arrival, LEVEL - 1 connections
FROM
( SELECT departure, arrival FROM flights
UNION
SELECT departure, arrival FROM trains) t
START WITH departure = 'Chicago'
CONNECT BY PRIOR arrival = departure;
This query returns the following information.
DEPARTURE | ARRIVAL | CONNECTIONS |
---|---|---|
Chicago | Miami | 0 |
Chicago | Lima | 1 |
Chicago | Frankfurt | 0 |
Chicago | Vienna | 1 |
Chicago | Beijing | 1 |
Chicago | Moscow | 1 |
Chicago | Tokyo | 2 |
Chicago | Hawaii | 3 |
Chicago | Washington | 0 |
Chicago | Boston | 1 |
Chicago | Toronto | 1 |
Example: Sibling ordering using CONNECT BY
One of the drawbacks of recursive common table expressions is that you cannot order the results among siblings based on a particular column value. You can do this with connect by. For example, if you want to output destinations from New York but you also want to order your hierarchical data among siblings by a certain value, such as the cost of a ticket to that destination, you can do that by specifying the ORDER SIBLINGS BY clause.SELECT CONNECT_BY_ROOT departure AS origin, departure, arrival,
LEVEL level, price ticket_price
FROM flights
START WITH departure = 'New York'
CONNECT BY PRIOR arrival = departure
ORDER SIBLINGS BY price ASC
This query returns the following information.
ORIGIN | DEPARTURE | ARRIVAL | LEVEL | TICKET_PRICE |
---|---|---|---|---|
New York | New York | LA | 1 | 330 |
New York | LA | Tokyo | 2 | 530 |
New York | Tokyo | Hawaii | 3 | 330 |
New York | New York | London | 1 | 350 |
New York | London | Athens | 2 | 340 |
New York | Athens | Nicosia | 3 | 280 |
New York | New York | Paris | 1 | 400 |
New York | Paris | Rome | 2 | 340 |
New York | Paris | Madrid | 2 | 380 |
New York | Paris | Cairo | 2 | 480 |
Example: Cyclic data checks using CONNECT BY
The key to any recursive process, whether is it a recursive program or a recursive query, is that the recursion must be finite. If not, you will get into a never ending loop. CONNECT BY is unlike recursive common table expressions in that it always checks for infinite recursion and terminates that cycle automatically so you never have to worry about a runaway query.By default, if connect by encounters cyclic data, it will issue an SQL error, SQ20451: Cycle detected in hierarchical query. This error causes termination of the query so no results are returned.
If you want results back and just want the infinite cycle to stop, you can specify the NOCYCLE keyword on the CONNECT BY clause. This means no error will be issued for cyclic data.
Using the NOCYCLE option along with the CONNECT_BY_ISCYCLE pseudo column is a way you can find cyclic data and correct the data if desired.
INSERT INTO flights VALUES ('Cairo', 'Paris', 'Atlantic', '1134', 440);
The following query illustrates the tolerance of the cyclic data by specifying NOCYCLE. In addition, the CONNECT_BY_ISCYCLE pseudo column is used to identify cyclic rows and the function SYS_CONNECT_BY_PATH is used to build an Itinerary string of all the connection cities leading up to the destination. SYS_CONNECT_BY_PATH is implemented as a CLOB data type so you have a large result column to reflect deep recursions.
SELECT CONNECT_BY_ROOT departure AS origin, arrival,
SYS_CONNECT_BY_PATH(TRIM(arrival), ' : ') itinerary, CONNECT_BY_ISCYCLE cyclic
FROM flights
START WITH departure = 'New York'
CONNECT BY NOCYCLE PRIOR arrival = departure;
This query returns the following information.
ORIGIN | ARRIVAL | ITINERARY | CYCLIC |
---|---|---|---|
New York | Paris | : Paris | 0 |
New York | Rome | : Paris : Rome | 0 |
New York | Cairo | : Paris : Cairo | 0 |
New York | Paris | : Paris : Cairo : Paris | 1 |
New York | Madrid | : Paris : Madrid | 0 |
New York | London | : London | 0 |
New York | Athens | : London : Athens | 0 |
New York | Nicosia | : London : Athens : Nicosia | 0 |
New York | LA | : LA | 0 |
New York | Tokyo | : LA : Tokyo | 0 |
New York | Hawaii | : LA : Tokyo : Hawaii | 0 |
Example: Pseudo column CONNECT_BY_ISLEAF in CONNECT BY
There may be times when processing recursive data that you may want to know which rows result in no further recursion. In other words, which rows are leaf rows or have no children in the hierarchy.In the following query, you can find out which destinations are final destinations; in other words, which destinations have no outbound flights. The CONNECT_BY_ISLEAF pseudo column will be 0 if it is not a leaf and 1 if it is. You can also specify CONNECT_BY_ISLEAF in a WHERE predicate to see only leaf rows.
SELECT CONNECT_BY_ROOT departure AS origin, arrival,
SYS_CONNECT_BY_PATH(TRIM(arrival), ' : ') itinerary, CONNECT_BY_ISLEAF leaf
FROM flights
START WITH departure = 'New York'
CONNECT BY PRIOR arrival = departure;
This query returns the following information.
ORIGIN | ARRIVAL | ITINERARY | LEAF |
---|---|---|---|
New York | Paris | : Paris | 0 |
New York | Rome | : Paris : Rome | 1 |
New York | Cairo | : Paris : Cairo | 1 |
New York | Madrid | : Paris : Madrid | 1 |
New York | London | : London | 0 |
New York | Athens | : London : Athens | 0 |
New York | Nicosia | : London : Athens : Nicosia | 1 |
New York | LA | : LA | 0 |
New York | Tokyo | : LA : Tokyo | 0 |
New York | Hawaii | : LA : Tokyo : Hawaii | 1 |
Example: Join predicates and where clause selection with CONNECT BY
Often times the hierarchical nature of your data is reflected in one table but you need to join those results to additional tables to fully determine the output of the row.In a connect by query you can use any type of join supported by DB2 for i including INNER JOIN, LEFT OUTER JOIN, and LEFT EXCEPTION JOIN. When you explicitly use a JOIN clause, the predicate specified in the ON clause is applied first, before the connect by operation, and any WHERE clause in the connect by query is applied after the recursion. The WHERE selection is applied after the connect by so that the recursive process results don't end too soon.
In the following query, you are looking for all the flight connections starting in New York that have an ON_TIME_PERCENT greater than 90%.
SELECT CONNECT_BY_ROOT departure AS origin, departure, arrival,
flight_number, on_time_Percent AS onTime
FROM flights INNER JOIN flightstats ON flight_number = flight#
WHERE on_time_percent > 90
START WITH departure = 'New York'
CONNECT BY PRIOR arrival = departure;
This query returns the following information.
ORIGIN | DEPARTURE | ARRIVAL | FLIGHT# | ONTIME |
---|---|---|---|---|
New York | Paris | Cairo | 63 | 90.50 |
New York | Paris | Madrid | 3256 | 92.00 |
New York | London | Athens | 247 | 91.00 |
New York | Athens | Nicosia | 2356 | 91.00 |
New York | LA | Tokyo | 824 | 93.00 |
New York | Tokyo | Hawaii | 94 | 92.00 |
SELECT CONNECT_BY_ROOT departure AS origin, departure, arrival, flight_number, on_time_percent AS onTime
FROM flights, flightstats
WHERE flight_number = flight# AND on_time_percent > 90
START WITH departure = 'New York'
CONNECT BY PRIOR arrival = departure;
In this second example, if the WHERE predicates are more complex, you may need to aid the optimizer by explicitly pulling out the JOIN predicates between the flights and flightstats tables and using both an ON clause and a WHERE clause.
If you want additional search conditions to be applied as part of the recursion process, for example you never want to take a flight with an on time percentage of less than 90%, you can also control the join results by putting the join in a derived table with a join predicate and a WHERE clause.
SELECT CONNECT_BY_ROOT departure AS origin, departure, arrival, flight_number, on_time_percent AS onTime
FROM (SELECT departure, arrival, flight_number, on_time_percent
FROM flights, flightstats
WHERE flight_number = flight# AND on_time_percent > 90) t1
START WITH departure='New York'
CONNECT BY PRIOR arrival = departure;
Another option is to put the selection predicates in the START WITH and CONNECT BY clauses.
SELECT CONNECT_BY_ROOT departure AS origin, departure, arrival, flight_number, on_time_percent AS onTime
FROM flights, flightstats
WHERE flight_number = flight#
START WITH departure = 'New York' AND on_time_percent > 90
CONNECT BY PRIOR arrival = departure AND on_time_percent > 90
In
this case, you would be out of luck as there are no direct flights
out of New York with a greater than 90% on time statistic. Since
there is nothing to seed the recursion, no rows are returned from
the query.Using recursive common table expressions and recursive views
WITH destinations (origin, departure, arrival, flight_count) AS
(SELECT a.departure, a.departure, a.arrival, 1
FROM flights a
WHERE a.departure = 'Chicago'
UNION ALL
SELECT r.origin, b.departure, b.arrival, r.flight_count + 1
FROM destinations r, flights b
WHERE r.arrival = b.departure)
SELECT origin, departure, arrival, flight_count
FROM destinations
This query returns the following information.
ORIGIN | DEPARTURE | ARRIVAL | FLIGHT_COUNT |
---|---|---|---|
Chicago | Chicago | Miami | 1 |
Chicago | Chicago | Frankfurt | 1 |
Chicago | Miami | Lima | 2 |
Chicago | Frankfurt | Moscow | 2 |
Chicago | Frankfurt | Beijing | 2 |
Chicago | Frankfurt | Vienna | 2 |
Chicago | Moscow | Tokyo | 3 |
Chicago | Tokyo | Hawaii | 4 |
This recursive query is written in two parts. The first part of the common table expression is called the intialization fullselect. It selects the first rows for the result set of the common table expression. In this example, it selects the two rows in the flights table that get you directly to another location from Chicago. It also initializes the number of flight legs to one for each row it selects.
The second part of the recursive query joins the rows from the current result set of the common table expression with other rows from the original table. It is called the iterative fullselect. This is where the recursion is introduced. Notice that the rows that have already been selected for the result set are referenced by using the name of the common table expression as the table name and the common table expression result column names as the column names.
In this recursive part of the query, any rows from the original table that you can get to from each of the previously selected arrival cities are selected. A previously selected row's arrival city becomes the new departure city. Each row from this recursive select increments the flight count to the destination by one more flight. As these new rows are added to the common table expression result set, they are also fed into the iterative fullselect to generate more result set rows. In the data for the final result, you can see that the total number of flights is actually the total number of recursive joins (plus 1) it took to get to that arrival city.
CREATE VIEW destinations (origin, departure, arrival, flight_count) AS
SELECT departure, departure, arrival, 1
FROM flights
WHERE departure = 'Chicago'
UNION ALL
SELECT r.origin, b.departure, b.arrival, r.flight_count + 1
FROM destinations r, flights b
WHERE r.arrival = b.departure)
The iterative fullselect part of this view definition refers to the view itself. Selection from this view returns the same rows as you get from the previous recursive common table expression. For comparison, note that connect by recursion is allowed anywhere a SELECT is allowed, so it can easily be included in a view definition.
Example: Two starting cities using recursive common table expressions
Suppose you are willing to fly from either Chicago or New York, and you want to know where you could go and how much it would cost.
WITH destinations (departure, arrival, connections, cost) AS
(SELECT a.departure, a.arrival, 0, price
FROM flights a
WHERE a.departure = 'Chicago' OR
a.departure = 'New York'
UNION ALL
SELECT r.departure, b.arrival, r.connections + 1,
r.cost + b.price
FROM destinations r, flights b
WHERE r.arrival = b.departure)
SELECT departure, arrival, connections, cost
FROM destinations
This query returns the following information.
DEPARTURE | ARRIVAL | CONNECTIONS | COST |
---|---|---|---|
Chicago | Miami | 0 | 300 |
Chicago | Frankfurt | 0 | 480 |
New York | Paris | 0 | 400 |
New York | London | 0 | 350 |
New York | Los Angeles | 0 | 330 |
Chicago | Lima | 1 | 830 |
Chicago | Moscow | 1 | 1,060 |
Chicago | Beijing | 1 | 960 |
Chicago | Vienna | 1 | 680 |
New York | Madrid | 1 | 780 |
New York | Cairo | 1 | 880 |
New York | Rome | 1 | 740 |
New York | Athens | 1 | 690 |
New York | Tokyo | 1 | 860 |
Chicago | Tokyo | 2 | 1,740 |
New York | Nicosia | 2 | 970 |
New York | Hawaii | 2 | 1,190 |
Chicago | Hawaii | 3 | 2,070 |
For each returned row, the results show the starting departure city and the final destination city. It counts the number of connections needed rather than the total number of flight and adds up the total cost for all the flights.
Example: Two tables used for recursion using recursive common table expressions
Now, suppose you start in Chicago but add in transportation by railway in addition to the airline flights, and you want to know which cities you can go to.
The following query returns that information:
WITH destinations (departure, arrival, connections, flights, trains, cost) AS
(SELECT f.departure, f.arrival, 0, 1, 0, price
FROM flights f
WHERE f.departure = 'Chicago'
UNION ALL
SELECT t.departure, t.arrival, 0, 0, 1, price
FROM trains t
WHERE t.departure = 'Chicago'
UNION ALL
SELECT r.departure, b.arrival, r.connections + 1 , r.flights + 1, r.trains,
r.cost + b.price
FROM destinations r, flights b
WHERE r.arrival = b.departure
UNION ALL
SELECT r.departure, c.arrival, r.connections + 1 ,
r.flights, r.trains + 1, r.cost + c.price
FROM destinations r, trains c
WHERE r.arrival = c.departure)
SELECT departure, arrival, connections, flights, trains, cost
FROM destinations
This query returns the following information.
DEPARTURE | ARRIVAL | CONNECTIONS | FLIGHTS | TRAINS | COST |
---|---|---|---|---|---|
Chicago | Miami | 0 | 1 | 0 | 300 |
Chicago | Frankfurt | 0 | 1 | 0 | 480 |
Chicago | Washington | 0 | 0 | 1 | 90 |
Chicago | Lima | 1 | 2 | 0 | 830 |
Chicago | Moscow | 1 | 2 | 0 | 1,060 |
Chicago | Beijing | 1 | 2 | 0 | 960 |
Chicago | Vienna | 1 | 2 | 0 | 680 |
Chicago | Toronto | 1 | 1 | 1 | 340 |
Chicago | Boston | 1 | 0 | 2 | 140 |
Chicago | Tokyo | 2 | 3 | 0 | 1,740 |
Chicago | Hawaii | 3 | 4 | 0 | 2,070 |
In this example, there are two parts of the common table expression that provide initialization values to the query: one for flights and one for trains. For each of the result rows, there are two recursive references to get from the previous arrival location to the next possible destination: one for continuing by air, the other for continuing by train. In the final results, you would see how many connections are needed and how many airline or train trips can be taken.
Example: DEPTH FIRST and BREADTH FIRST options for recursive common table expressions
The two examples here show the difference in the result set row order based on whether the recursion is processed depth first or breadth first.
The option to determine the result using breadth first or depth first is a recursive relationship sort based on the recursive join column specified for the SEARCH BY clause. When the recursion is handled breadth first, all children are processed first, then all grandchildren, then all great grandchildren. When the recursion is handled depth first, the full recursive ancestry chain of one child is processed before going to the next child.
In both of these cases, you specify an extra column name that is used by the recursive process to keep track of the depth first or breadth first ordering. This column must be used in the ORDER BY clause of the outer query to get the rows back in the specified order. If this column is not used in the ORDER BY, the DEPTH FIRST or BREADTH FIRST processing option is ignored.
The selection of which column to use for the SEARCH BY column is important. To have any meaning in the result, it must be the column that is used in the iterative fullselect to join from the initialization fullselect. In this example, ARRIVAL is the column to use.
The following query returns that information:
WITH destinations (departure, arrival, connections, cost) AS
(SELECT f.departure, f.arrival, 0, price
FROM flights f
WHERE f.departure = 'Chicago'
UNION ALL
SELECT r.departure, b.arrival, r.connections + 1,
r.cost + b.price
FROM destinations r, flights b
WHERE r.arrival = b.departure)
SEARCH DEPTH FIRST BY arrival SET ordcol
SELECT *
FROM destinations
ORDER BY ordcol
This query returns the following information.
DEPARTURE | ARRIVAL | CONNECTIONS | COST |
---|---|---|---|
Chicago | Miami | 0 | 300 |
Chicago | Lima | 1 | 830 |
Chicago | Frankfurt | 0 | 480 |
Chicago | Moscow | 1 | 1,060 |
Chicago | Tokyo | 2 | 1,740 |
Chicago | Hawaii | 3 | 2,070 |
Chicago | Beijing | 1 | 960 |
Chicago | Vienna | 1 | 680 |
In this result data, you can see that all destinations that are generated from the Chicago-to-Miami row are listed before the destinations from the Chicago-to-Frankfort row.
Next, you can run the same query but request the result to be ordered breadth first.
WITH destinations (departure, arrival, connections, cost) AS
(SELECT f.departure, f.arrival, 0, price
FROM flights f
WHERE f.departure = 'Chicago'
UNION ALL
SELECT r.departure, b.arrival, r.connections + 1,
r.cost + b.price
FROM destinations r, flights b
WHERE r.arrival = b.departure)
SEARCH BREADTH FIRST BY arrival SET ordcol
SELECT *
FROM destinations
ORDER BY ordcol
This query returns the following information.
DEPARTURE | ARRIVAL | CONNECTIONS | COST |
---|---|---|---|
Chicago | Miami | 0 | 300 |
Chicago | Frankfurt | 0 | 480 |
Chicago | Lima | 1 | 830 |
Chicago | Moscow | 1 | 1,060 |
Chicago | Beijing | 1 | 960 |
Chicago | Vienna | 1 | 680 |
Chicago | Tokyo | 2 | 1,740 |
Chicago | Hawaii | 3 | 2,070 |
In this result data, you can see that all the direct connections from Chicago are listed before the connecting flights. The data is identical to the results from the previous query, but in a breadth first order. As you can see, there is no ordering done based on any values of the column used for depth or breadth first processing. To get ordering, the ORDER SIBLINGS BY construct available with the CONNECT BY form of recursion can be used.
Example: Cyclic data using recursive common table expressions
The key to any recursive process, whether it is a recursive programming algorithm or querying recursive data, is that the recursion must be finite. If not, you will get into a never ending loop. The CYCLE option allows you to safeguard against cyclic data. Not only will it terminate repeating cycles but it also allows you to optionally output a cycle mark indicator that may lead you to find cyclic data.
For a final example, suppose we have a cycle in the data. By adding one more row to the table, there is now a flight from Cairo to Paris and one from Paris to Cairo. Without accounting for possible cyclic data like this, it is quite easy to generate a query that will go into an infinite loop processing the data.
The following query returns that information:
INSERT INTO FLIGHTS VALUES('Cairo', 'Paris', 'Euro Air', '1134', 440)
WITH destinations (departure, arrival, connections, cost, itinerary) AS
(SELECT f.departure, f.arrival, 1, price,
CAST(f.departure CONCAT f.arrival AS VARCHAR(2000))
FROM flights f
WHERE f.departure = 'New York'
UNION ALL
SELECT r.departure, b.arrival, r.connections + 1 ,
r.cost + b.price, CAST(r.itinerary CONCAT b.arrival AS VARCHAR(2000))
FROM destinations r, flights b
WHERE r.arrival = b.departure)
CYCLE arrival SET cyclic_data TO '1' DEFAULT '0'
SELECT departure, arrival, itinerary, cyclic_data
FROM destinations
This query returns the following information.
DEPARTURE | ARRIVAL | ITINERARY | CYCLIC_DATA |
---|---|---|---|
New York | Paris | New York Paris | 0 |
New York | London | New York London | 0 |
New York | Los Angeles | New York Los Angeles | 0 |
New York | Madrid | New York Paris Madrid | 0 |
New York | Cairo | New York Paris Cairo | 0 |
New York | Rome | New York Paris Rome | 0 |
New York | Athens | New York London Athens | 0 |
New York | Tokyo | New York Los Angeles Tokyo | 0 |
New York | Paris | New York Paris Cairo Paris | 1 |
New York | Nicosia | New York London Athens Nicosia | 0 |
New York | Hawaii | New York Los Angeles Tokyo Hawaii | 0 |
In this example, the ARRIVAL column is defined in the CYCLE clause as the column to use for detecting a cycle in the data. When a cycle is found, a special column, CYCLIC_DATA in this case, is set to the character value of '1' for the cycling row in the result set. All other rows will contain the default value of '0'. When a cycle on the ARRIVAL column is found, processing will not proceed any further in the data so the infinite loop will not happen. To see if your data actually has a cyclic reference, the CYCLIC_DATA column can be referenced in the outer query. You can choose to exclude cyclic rows by adding a predicate: WHERE CYCLIC_DATA = 0.