Tips for using SQL to query foreign key relationships

This article discusses how to write SQL queries to find foreign key relationships in IBM® DB2® for Linux®, UNIX®, and Windows® database. Given a table with a primary key, you will learn how to return the children and descendant tables, as well as the referential integrity (RI) relationship paths from the parent table to these children and descendants. Also, you will see how the query can be modified to return the results for all tables in the database.

Petrus Chan (petrus@ca.ibm.com), Advisory Software Developer, IBM

Author photo for Petrus ChanPetrus Chan has worked in IBM DB2 for Linux, UNIX, and Window's SQL Compiler for 15 years. Petrus specializes in DB2 active data features such as constraints, triggers, and materialized query tables, and holds four data systems and methodologies patents.



10 January 2013

Also available in Chinese Russian

Introduction

When there are a lot of foreign key constraints in a database, you may find it difficult to visualize the foreign key relationships among tables. This article discusses how to write SQL queries to find the foreign key relationships in DB2 for Linux, UNIX, and Windows.

The following variations will be discussed.

  • Given a foreign key parent table, return the RI (Referential Integrity) children and descendants, as well as the RI relationship paths from the parent table to these children and descendants.
  • Modify the query provided to return results for all tables in the database.

Sample schema

The sample schema shown in Listing 1 will be used for the examples shown in this article.

Listing 1. Sample schema
set schema newton;
                
create table grandparent (i1 int not null primary key, i2 int, i3 int);
create table parent (i1 int not null primary key, i2 int);
create table parent2 (i1 int  not null primary key, i2 int);
create table child  (i1 int not null primary key, i2 int, i3 int);
create table grandchild  (i1 int not null primary key, i2 int, i3 int);
                
alter table parent add constraint fkp1 foreign key (i2) references grandparent;
alter table parent2 add constraint fkp2 foreign key (i2) references grandparent;
alter table child add constraint fk1 foreign key (i2) references parent;
alter table child add constraint fk2 foreign key (i3) references parent2;
alter table grandchild add constraint fk3 foreign key (i2) references child;
alter table grandchild add constraint fk4 foreign key (i3) references parent2;
                
create table gp (i1 int not null, i2 int not null, i3 int, primary key (i1, i2));
create table p1 (i1 int not null primary key, i2 int, i3 int);
create table c11 (i1 int not null primary key, i2 int);
create table c12 (i1 int not null primary key, i2 int);
                
alter table p1 add constraint fkp1 foreign key (i2, i3) references gp;
alter table c11 add constraint fkc11 foreign key (i2) references p1;
alter table c12 add constraint fkc12 foreign key (i2) references p1;
alter table gp add constraint fkgp1 foreign key (i2) references c12;
                
create table self (i1 int not null primary key, i2 int);
alter table self add constraint fk_self foreign key (i2) references self;

How to display all of the RI constraints

In the simplest form, you can obtain a listing of all the foreign key constraints by querying the catalog view SYSCAT.REFERENCES.

SELECT * FROM SYSCAT.REFERENCES

The results can be joined with SYSCAT.KEYCOLUSE to find the foreign key columns.

To produce a comma-separated list of foreign key columns used in RI constraints, you can use the LISTAGG() aggregate function when joining SYSCAT.REFERENCES with SYSCAT.KEYCOLUSE, as shown in Listing 2.

Listing 2. LISTAGG() aggregate function
select  substr(R.reftabschema,1,12) as P_Schema, substr(R.reftabname,1,12) as PARENT,
        substr(R.tabschema,1,12) as C_Schema, substr (R.tabname,1,12) as CHILD,
        substr(R.constname,1,12) as CONSTNAME, 
        substr(LISTAGG(C.colname,', ') WITHIN GROUP (ORDER BY C.colname),1,20) as FKCOLS 
from syscat.references R, syscat.keycoluse C 
where R.constname = C.constname and R.tabschema = C.tabschema and R.tabname = C.tabname
group by R.reftabschema, R.reftabname, R.tabschema, R.tabname, R.constname;

The results will look similar to what is shown in Listing 3.

Listing 3. LISTAGG() aggregate function output
P_SCHEMA     PARENT       C_SCHEMA     CHILD        CONSTNAME    FKCOLS              
------------ ------------ ------------ ------------ ------------ ----------------
NEWTON       PARENT       NEWTON       CHILD        FK1          I2                  
NEWTON       PARENT2      NEWTON       CHILD        FK2          I3                  
NEWTON       CHILD        NEWTON       GRANDCHILD   FK3          I2                  
NEWTON       PARENT2      NEWTON       GRANDCHILD   FK4          I3                  
NEWTON       P1           NEWTON       C11          FKC11        I2                  
NEWTON       P1           NEWTON       C12          FKC12        I2                  
NEWTON       C12          NEWTON       GP           FKGP1        I2                  
NEWTON       GP           NEWTON       P1           FKP1         I2  , I3           
NEWTON       GRANDPARENT  NEWTON       PARENT       FKP1         I2                  
NEWTON       GRANDPARENT  NEWTON       PARENT2      FKP2         I2                  
NEWTON       SELF         NEWTON       SELF         FK_SELF      I2

However, in a complex database, the relationship between a parent table and its non-immediate descendants will not be obvious from the results of the simple query described previously.


Displaying all of the foreign key children and descendants of a given table

You could write a recursive query to traverse the RI relationships. However, if you are using DB2 for Linux, UNIX, and Windows, Version 9.7 or later, a hierarchical query is a better alternative to traverse the RI relationships in SYSCAT.REFERENCES recursively. See the Resources section for more information.

Use of hierarchical queries in DB2 requires the setting of the DB2_COMPATIBILITY_VECTOR:

db2set DB2_COMPATIBILITY_VECTOR=08
db2stop
db2start

Hierarchical queries allow the specification of the START WITH and CONNECT BY clauses.

For example, the query (Query 1) shown in Listing 4 returns all RI descendants and the unique path to them from the root table NEWTON.GRANDPARENT.

Listing 4. Query 1
with 
root_parents (root_parent_schema, root_parent_name) AS
 (select * from table(values ('NEWTON', 'GRANDPARENT')))
select 
 substr(connect_by_root reftabname,1,11) as root,
 substr (level, 1,5) as lvl,
--  substr(reftabschema,1,6) as parent_schema,
 substr(reftabname,1,11) as parent, 
--  substr(tabschema,1,6) as child_schema,
 substr(tabname,1,10) as child, 
 substr(constname,1,5) as cnst, 
 substr(sys_connect_by_path(reftabname, '->') || '->' || 
 substr(tabname,1,20),1,42) as chain
from syscat.references
start with (reftabschema,reftabname) in (select root_parent_schema, 
                                        root_parent_name from root_parents)
connect by nocycle prior  tabname = reftabname 
                      and tabschema = reftabschema;

Note: The schema name has been removed from the output of all queries for better formatting.

Query 1 will return the results shown in Listing 5.

Listing 5. Query 1 output
ROOT        LVL   PARENT      CHILD      CNST  CHAIN                                     
----------- ----- ----------- ---------- ----- ------------------------------------------
GRANDPARENT 1     GRANDPARENT PARENT     FKP1  ->GRANDPARENT->PARENT                     
GRANDPARENT 2     PARENT      CHILD      FK1   ->GRANDPARENT->PARENT->CHILD              
GRANDPARENT 3     CHILD       GRANDCHILD FK3   ->GRANDPARENT->PARENT->CHILD->GRANDCHILD  
GRANDPARENT 1     GRANDPARENT PARENT2    FKP2  ->GRANDPARENT->PARENT2                    
GRANDPARENT 2     PARENT2     CHILD      FK2   ->GRANDPARENT->PARENT2->CHILD             
GRANDPARENT 3     CHILD       GRANDCHILD FK3   ->GRANDPARENT->PARENT2->CHILD->GRANDCHILD 
GRANDPARENT 2     PARENT2     GRANDCHILD FK4   ->GRANDPARENT->PARENT2->GRANDCHILD

Note: The query will display all of the unique paths from the given root table to each of its descendant tables. If there is a diamond shape RI path, both paths will be shown in the result. As shown in the previous example, both of the following paths will be shown.

  • GRANDPARENT->PARENT->CHILD->GRANDCHILD
  • GRANDPARENT->PARENT2->CHILD->GRANDCHILD

The following are several advantages of using hierarchical queries over traditional recursive queries.

  • The recursion is naturally handled by the START WITH and CONNECT BY clauses without the need to write a recursive query.
  • The pseudocolumn LEVEL automatically returns the level of the table from root parent.
  • The scalar function sys_connect_by_path() builds a string representing a path from the root to a node in hierarchical queries. In the previous example, the function is used in the CHAIN column to build the RI relationship chain from the root table.
  • If there is any RI cycle, for example, GP → P1 → C12 → GP, the NOCYCLE clause allows you to direct the recursion to ignore the duplicated rows in the cycle.

Displaying the foreign key children and descendants relationship for all tables with no RI cycles in the database

If there are no RI cycles in the database, you can modify the common table expression root_parents to include all the tables with at least one child but no parent, as shown in Query 2 in Listing 6.

Listing 6. Query 2
with 
root_parents (root_parent_schema, root_parent_name) AS
 (select tabschema, tabname
    from syscat.tables
    where parents = 0 and children > 0)
select 
  substr(connect_by_root reftabname,1,11) as root,
  substr (level, 1,5) as lvl,
--  substr(reftabschema,1,6) as parent_schema,
  substr(reftabname,1,11) as parent, 
--  substr(tabschema,1,6) as child_schema,
  substr(tabname,1,10) as child, 
  substr(constname,1,5) as cnst, 
  substr(sys_connect_by_path(reftabname, '->') || '->' || 
  substr(tabname,1,20),1,42) as chain
from syscat.references
start with (reftabschema,reftabname) in (select root_parent_schema, 
                                        root_parent_name from root_parents)
connect by prior  tabname = reftabname and tabschema = reftabschema;

However, Query 2 will not include tables that are in an RI cycle, because such tables will have parents > 0 and children > 0.


Displaying the foreign key children and descendant relationships which may have RI cycles

With user input of root tables in RI cycles

From a business logic point of view, some tables in an RI cycle will be a better candidate as a root parent table in the RI cycle. However, it will not be easy for a database to determine which table in an RI cycle should be the root parent table.

If there are not too many RI cycles in the system, you can write a semi-automatic query to manually specify one table per each RI cycle as the root parent table.

In Query 3, shown in Listing 7, by adding a VALUES clause in the UNION ALL operations in the common table expression root_parents to manually specify the root table in each RI cycle, the query will return all the RI children and descendants in the database.

Listing 7. Query 3
with 
root_parents (root_parent_schema, root_parent_name) AS
 (select tabschema, tabname
    from syscat.tables
    where parents = 0 and children > 0
  UNION ALL
  select * from table(values ('NEWTON', 'GP'), ('NEWTON', 'SELF')))
select 
  substr(connect_by_root reftabname,1,11) as root,
  substr (level, 1,3) as lvl,
--  substr(reftabschema,1,6) as parent_schema,
  substr(reftabname,1,11) as parent, 
--  substr(tabschema,1,6) as child_schema,
  substr(tabname,1,10) as child, 
  substr(constname,1,7) as cnstnam, 
  substr(sys_connect_by_path(reftabname, '->') || '->' || 
  substr(tabname,1,20),1,42) as chain
from syscat.references
start with (reftabschema,reftabname) in (select root_parent_schema, 
                                        root_parent_name from root_parents)
connect by NOCYCLE prior  tabname = reftabname and tabschema = reftabschema;

The result will look similar to what is shown in Listing 8.

Listing 8. Query 3 output
ROOT        LVL PARENT      CHILD      CNSTNAM CHAIN                                     
----------- --- ----------- ---------- ------- ------------------------------------------
SELF        1   SELF        SELF       FK_SELF ->SELF->SELF                              
GRANDPARENT 1   GRANDPARENT PARENT     FKP1    ->GRANDPARENT->PARENT                     
GRANDPARENT 2   PARENT      CHILD      FK1     ->GRANDPARENT->PARENT->CHILD              
GRANDPARENT 3   CHILD       GRANDCHILD FK3     ->GRANDPARENT->PARENT->CHILD->GRANDCHILD  
GRANDPARENT 1   GRANDPARENT PARENT2    FKP2    ->GRANDPARENT->PARENT2                    
GRANDPARENT 2   PARENT2     CHILD      FK2     ->GRANDPARENT->PARENT2->CHILD             
GRANDPARENT 3   CHILD       GRANDCHILD FK3     ->GRANDPARENT->PARENT2->CHILD->GRANDCHILD 
GRANDPARENT 2   PARENT2     GRANDCHILD FK4     ->GRANDPARENT->PARENT2->GRANDCHILD        
GP          1   GP          P1         FKP1    ->GP->P1                                  
GP          2   P1          C11        FKC11   ->GP->P1->C11                             
GP          2   P1          C12        FKC12   ->GP->P1->C12                             
GP          3   C12         GP         FKGP1   ->GP->P1->C12->GP

Without user input of root tables in RI cycles

If you prefer full automation and do not care which table in an RI cycle is selected as the root table, as long as one representative from each RI cycle is included, the following procedure will store one representative table from each RI cycle in a temporary table called CYCLEROOTS. At the end, the procedure uses the table names stored in CYCLEROOTS, shown previously in Listing 8, to display all RI chains in the database, as shown in Listing 9.

Listing 9. Query 3 to display all RI chains
-- If needed, create the user temporary tablespace for the temporary table.
CREATE BUFFERPOOL BUFFERPOOL4K  PAGESIZE 4K;
CREATE USER TEMPORARY TABLESPACE STMPTSP4 PAGESIZE 4K  BUFFERPOOL BUFFERPOOL4K;
                
-- create the temporary table to store one representative from each RI cycle as root
create GLOBAL TEMPORARY TABLE SESSION.CYCLEROOTS (SCHEMANAME VARCHAR(128),  
                                                  TABNAME VARCHAR(128));
                
--#SET TERMINATOR @
-- procedure to display RI chains in the database
CREATE PROCEDURE newton.FIND_RI_CHAINS ()
DETERMINISTIC 
NO EXTERNAL ACTION
DYNAMIC RESULT SETS 1
BEGIN
    DECLARE CYCLESCHEMA VARCHAR(128);
    DECLARE CYCLETABLE VARCHAR(128);
    DECLARE ROWS_FETCHED BIGINT;
    DECLARE C_CYCLETABLES CURSOR;
                
    -- This query will return the final result set after
    -- temporary table SESSION.CYCLEROOTS has been populated.
    DECLARE C_RESULTS CURSOR WITH RETURN TO CLIENT FOR
    WITH 
    ROOT_PARENTS (ROOT_PARENT_SCHEMA, ROOT_PARENT_NAME) AS
        (SELECT TABSCHEMA, TABNAME
            FROM SYSCAT.TABLES
            WHERE PARENTS = 0 AND CHILDREN > 0
        UNION ALL
        SELECT * FROM SESSION.CYCLEROOTS),
    HIERARCHY (ROOT, LEVEL, REFTABSCHEMA, REFTABNAME, TABSCHEMA, TABNAME, 
               CONSTNAME, CHAIN) AS
        (SELECT
            CONNECT_BY_ROOT REFTABNAME,
            LEVEL, 
            REFTABSCHEMA AS P_SCHEMA,
            REFTABNAME AS PARENT, 
            TABSCHEMA AS C_SCHEMA,
            TABNAME AS CHILD, 
            CONSTNAME AS CNSTNAM, 
            SUBSTR(SYS_CONNECT_BY_PATH(REFTABNAME, '->') || '->' || 
            SUBSTR(TABNAME,1,20),1,42) AS CHAIN
        FROM SYSCAT.REFERENCES
        START WITH (REFTABSCHEMA,REFTABNAME) IN (SELECT ROOT_PARENT_SCHEMA,
                                                 ROOT_PARENT_NAME
                                                 FROM ROOT_PARENTS)
        CONNECT BY NOCYCLE PRIOR  TABNAME = REFTABNAME AND TABSCHEMA =
         REFTABSCHEMA)
    SELECT
        SUBSTR(H.root,1,11) AS ROOT,
        CAST (H.LEVEL AS CHAR(2)) as LVL,
--       SUBSTR(H.REFTABSCHEMA,1,6) as P_SCHEMA,
        SUBSTR(H.REFTABNAME,1,11) as PARENT, 
--       SUBSTR(H.TABSCHEMA,1,6) as C_SCHEMA,
        SUBSTR(H.TABNAME,1,10) as CHILD, 
        SUBSTR(H.CONSTNAME,1,7) as CNSTNAM, 
        SUBSTR(H.CHAIN,1,42) as CHAIN
        FROM HIERARCHY H;

        -- initialize temporary table
        DELETE FROM SESSION.CYCLEROOTS;

        -- this query will return the remaining tables that are in RI cycles
        SET C_CYCLETABLES = CURSOR FOR
            WITH 
                ROOT_PARENTS (ROOT_PARENT_SCHEMA, ROOT_PARENT_NAME) AS
                  (SELECT TABSCHEMA, TABNAME
                    FROM SYSCAT.TABLES
                    WHERE PARENTS = 0 AND CHILDREN > 0
                   UNION ALL
                   SELECT * FROM SESSION.CYCLEROOTS),
                HIERARCHY (ROOT, LEVEL, REFTABSCHEMA, REFTABNAME, TABSCHEMA, 
                           TABNAME, CONSTNAME, CHAIN) AS
                (SELECT CONNECT_BY_ROOT REFTABNAME AS ROOT, LEVEL, 
                    REFTABSCHEMA AS PARENT_SCHEMA,
                    REFTABNAME AS PARENT, 
                    TABSCHEMA AS CHILD_SCHEMA,
                    TABNAME AS CHILD, 
                    CONSTNAME AS CONSTNAME, 
                    SUBSTR(SYS_CONNECT_BY_PATH(REFTABNAME, '->') || '->' || 
                    SUBSTR(TABNAME,1,20),1,50) AS CHAIN
                FROM SYSCAT.REFERENCES
                START WITH (REFTABSCHEMA,REFTABNAME) IN (SELECT ROOT_PARENT_SCHEMA,
                                                ROOT_PARENT_NAME FROM ROOT_PARENTS)
                CONNECT BY NOCYCLE PRIOR  TABNAME = REFTABNAME AND 
                                          TABSCHEMA =  REFTABSCHEMA)
            SELECT TABSCHEMA, TABNAME
                FROM SYSCAT.TABLES
                WHERE CHILDREN > 0
            EXCEPT
                SELECT REFTABSCHEMA, REFTABNAME  FROM HIERARCHY H;

        OPEN C_CYCLETABLES;

        -- Just select the first table as a root table from the remaining tables that are 
        -- in RI cycles
        FETCH C_CYCLETABLES INTO CYCLESCHEMA, CYCLETABLE;

        SET ROWS_FETCHED = CURSOR_ROWCOUNT(C_CYCLETABLES);

        -- Keep looping until the result set from C_CYCLETABLES is empty.
        WHILE (ROWS_FETCHED > 0) DO

            -- insert the select representative into temporary table SESSION.CYCLEROOTS 
            INSERT INTO SESSION.CYCLEROOTS VALUES (CYCLESCHEMA, CYCLETABLE);

            CLOSE C_CYCLETABLES;

            -- restart the cursor.  The result set will be different from the 
            -- the previous iterations of the loop.  All the tables that
            -- are in the same RI cycle as the newly added representative
            -- will not show up in the result set from the next iteration of the
            -- loop.
            OPEN C_CYCLETABLES;

            FETCH FROM C_CYCLETABLES INTO CYCLESCHEMA, CYCLETABLE;

            SET ROWS_FETCHED = CURSOR_ROWCOUNT(C_CYCLETABLES);

        END WHILE;

        CLOSE C_CYCLETABLES;

        -- When the loop exits, one representative from each RI cycle will have
        -- been added to the temporary table SESSION.CYCLEROOTS.  We will now 
        -- open C_RESULTS using the populated temporary table SESSION.CYCLEROOTS.

        OPEN C_RESULTS;

END@
--#SET TERMINATOR ;

CALL newton.FIND_RI_CHAINS();

DROP TABLE  SESSION.CYCLEROOTS;

The result will look similar to what is shown in Listing 10.

Listing 10. Query 3 to display all RI chains output
Result set 1
--------------
                
ROOT       LVL PARENT      CHILD      CNSTNAM CHAIN                                     
---------- --- ----------- ---------- ------- ------------------------------------------
SELF        1  SELF        SELF       FK_SELF ->SELF->SELF                              
GRANDPARENT 1  GRANDPARENT PARENT     FKP1    ->GRANDPARENT->PARENT                     
GRANDPARENT 2  PARENT      CHILD      FK1     ->GRANDPARENT->PARENT->CHILD              
GRANDPARENT 3  CHILD       GRANDCHILD FK3     ->GRANDPARENT->PARENT->CHILD->GRANDCHILD  
GRANDPARENT 1  GRANDPARENT PARENT2    FKP2    ->GRANDPARENT->PARENT2                    
GRANDPARENT 2  PARENT2     CHILD      FK2     ->GRANDPARENT->PARENT2->CHILD             
GRANDPARENT 3  CHILD       GRANDCHILD FK3     ->GRANDPARENT->PARENT2->CHILD->GRANDCHILD 
GRANDPARENT 2  PARENT2     GRANDCHILD FK4     ->GRANDPARENT->PARENT2->GRANDCHILD        
C12         1  C12         GP         FKGP1   ->C12->GP                                 
C12         2  GP          P1         FKP1    ->C12->GP->P1                             
C12         3  P1          C11        FKC11   ->C12->GP->P1->C11                        
C12         3  P1          C12        FKC12   ->C12->GP->P1->C12   
                
12 record(s) selected.
                
Return Status = 0

Note that table C12 has been selected as the representative of the RI cycle involving GP->P1->C12->GP.


Conclusion

This article showed how to write a hierarchy query to find the foreign key relationships in the database. In particular, it has given an example of the query to find the foreign key children and descendant tables of a given table. Examples of queries and procedures to find all the foreign key relationships of all tables in the database, both with and without user input of the root tables in RI cycles, have also been given.

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.
  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

  • Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=854198
ArticleTitle=Tips for using SQL to query foreign key relationships
publish-date=01102013