Skip to main content

Updatable UNION ALL Views in DB2 Universal Database Version 8

Photo: Serge Rielau
Serge Rielau started his life in DB2 six years ago through a year-long internship at the Almaden Research Center where he worked on inline SQL PL and structured types. In the following five years he has been responsible for various SQL enhancements; for example, typed views and generated columns. As a technical manager in the SQL Query Compiler he now drives language changes into DB2 as well as the SQL standard. He can be reached through comp.databases.ibm-db2 or at srielau@ca.ibm.com .

Summary:  Partitioning UNION ALL views are commonly used to simplify data maintenance and to overcome scalability limits of single tables. This article describes how DB2 Universal Database Version 8 for Linux, UNIX and Windows enables you to insert, update, and delete through views defined with UNION ALL. It explains in detail the patented algorithm used for insert processing.

Date:  17 Sep 2002
Level:  Introductory
Activity:  1357 views
Comments:  

Introduction

Partitioning UNION ALL views are commonly used to simplify data maintenance and to overcome scalability limits of single tables. This article describes how DB2® Universal DatabaseTM Version 8 for Linux, UNIX® and Windows® enables you to insert, update, and delete through views defined with UNION ALL. It explains in detail the patented algorithm used for insert processing.


The many uses of UNION ALL views

Support for views based on UNION ALL has been available in DB2 since Version 2. Since then, such views have become fairly common. Most uses of UNION ALL views fall into one of the following categories:

  • Unification of different, but semantically related tables.

    These kind of constructions are often found as relational mappings of object-oriented hierarchies and represent a horizontal partitioning of the data; that is, each specific "class" is represented by one table.

    A typical example would be a hierarchy composed of the classes person, employee, and manager in which person, employee and manager are each separate relational tables. To create a view of all persons, a UNION ALL view can be used to unify the tables.

  • Unification of like tables.

    Constructions of UNION ALL over like tables are often used to hide a range partitioning of the overall data set. Each partition is a separate table, which simplifies maintenance, increases size limits, and, in some cases can even improve query performance against the overall data.

    A typical example is to partition a year's worth of sales data into four tables, each containing the data for one quarter.

  • Integration of federated data sources.

    Sometimes data that must be unified is spread across local and remote databases. In this case UNION ALL is the only option to obtain a transparent view of all the data other than replicating the data locally.

    In DB2 remote objects are represented as nicknames and views can be constructed to relate them to local tables or other nicknames. See, for example, Building Federated Systems with Relational Connect and Database Views (PDF format).


Hands-on example

As a running example, let's unify like tables as discussed above.

  1. We need to first create one of the tables:
     
    	CREATE TABLE Q1(product_no INT, sales INT, date DATE);

  2. Now derive the other quarterly tables from it:
     
    	CREATE TABLE Q2 LIKE Q1; 
    	CREATE TABLE Q3 LIKE Q1; 
    	CREATE TABLE Q4 LIKE Q1;

  3. Each table must only hold data for its quarter. Table Q1 holds January (1) through March (3), table Q2 holds April (4) through June (6), and so on. Let's add the appropriate constraints.
     
    	ALTER TABLE Q1 ADD CONSTRAINT Q1_CHK_DATE 
       		CHECK (MONTH(date) IN (1, 2, 3)); 
    	ALTER TABLE Q2 ADD CONSTRAINT Q2_CHK_DATE 
       		CHECK (MONTH(date) IN (4, 5, 6)); 
    	ALTER TABLE Q3 ADD CONSTRAINT Q3_CHK_DATE 
       		CHECK (MONTH(date) IN (7, 8, 9)); 
    	ALTER TABLE Q4 ADD CONSTRAINT Q4_CHK_DATE 
       		CHECK (MONTH(date) IN (10, 11, 12));

  4. We are now ready to feed some "sales data" to the tables:
     
    	INSERT INTO Q1 VALUES (5,   6, '2001-01-02'), 
                          	      (8, 100, '2001-02-28'); 
    	INSERT INTO Q2 VALUES (3,  10, '2001-04-11'), 
                                  (5,  15, '2001-05-19'); 
    	INSERT INTO Q3 VALUES (1,  12, '2001-08-27'); 
    	INSERT INTO Q4 VALUES (3,  14, '2001-12-29'), 
                                  (2,  21, '2001-12-12');

  5. Finally, create the view for the full year's data by unifying all quarters:
     
    	CREATE VIEW FY AS 
      		SELECT product_no, sales, date FROM Q1 
    	UNION ALL 
      		SELECT product_no, sales, date FROM Q2 
    	UNION ALL 
      		SELECT product_no, sales, date FROM Q3 
    	UNION ALL 
      		SELECT product_no, sales, date FROM Q4; 
    

    Here is what the content of the view looks like:

     
    	SELECT * FROM FY ORDER BY date, product_no; 
    	PRODUCT_NO  SALES       DATE       
    	----------- ----------- ---------- 
              	5           6 	01/02/2001 
              	8         100 	02/28/2001 
              	3          10 	04/11/2001 
              	5          15 	05/19/2001 
              	1          12 	08/27/2001 
              	2          21 	12/12/2001 
              	3          14 	12/29/2001 
     	 7 record(s) selected.

Because UNION ALL views are so common, DB2 has specific rules targeted to optimize queries on union all to improve performance. For details, refer to Partitioning in DB2 Using UNION ALL views (PDF format).

For example, the following statement selects only from the Q1 and Q2 tables because DB2's theorem prover can use the check constraints on the tables to prove Q3 and Q4 are not affected:

 
	SELECT sum(sales) AS total FROM FY  
	WHERE MONTH(date) BETWEEN 3 AND 5; 
	TOTAL           
	----------- 
         	25 
         	 
	1 record(s) selected. 

Updating and deleting from the FY view is straightforward because DB2 knows from which table the row to be updated or deleted originated and therefore can modify it easily.

The following updates a row in Q3:

 
	UPDATE FY SET sales = 20 
	WHERE product_no = 1 AND date = '2001-08-27'; 

That same row can also be deleted from Q3:

 
	DELETE FROM FY 
	WHERE product_no = 1 AND date = '2001-08-27'; 

To reinsert the deleted row:

 
	INSERT INTO FY VALUES (1, 20, '2001-08-27');


Inserts in a nutshell

Hang on! At the beginning of the article we inserted into the base tables. Why? Before DB2 Version 8, it was not possible to insert through a UNION ALL view. Thus, the application needed to be aware of the makeup of the view and insert into the appropriate base table.

New and very neat logic now enables DB2 to accept inserts against a UNION ALL view:

 
	INSERT INTO FY VALUES (1, 20, '2001-06-03'), 
                              (2, 30, '2001-03-21'), 
                              (2, 25, '2001-08-30'); 
 

Semantically, DB2 attempts to insert any given row into each of the tables underlying the UNION ALL view. It then counts how many tables accepted the row:

  • If exactly one table accepts the row, the insert is accepted.
  • If no table accepts the row, a "no target" error is raised.
  • If more than one table accepts the row, then an "ambiguous target" error is raised.

In the view above, there would always be exactly one table because the constraints on each quarterly table cover all of the valid month values.

Consider what would happen if Q1_CHK_DATE had a typo in it which would enable April data, not January data, into the Q1 base table:

 
	ALTER TABLE Q1 DROP CONSTRAINT Q1_CHK_DATE; 
	 
	ALTER TABLE Q1 ADD CONSTRAINT Q1_CHK_DATE  
   		CHECK (MONTH(date) IN (4, 2, 3)); 

Now it is not possible to insert January data because there is no target table that can accept the data.

 
	INSERT INTO FY VALUES (5, 35, '2001-01-14'); 
 
	SQL20154N  The requested insert operation into view "SRIELAU.FY"  
	is not allowed because no target table can be determined for a row.   
	Reason code = "1".  SQLSTATE=23513 

Inserting April data, on the other hand, causes a conflict because both Q1 and Q2 could accept the row.

 
	INSERT INTO FY VALUES (3, 30, '2001-04-21'); 
	 
	SQL20154N  The requested insert operation into view "SRIELAU.FY" 
	is not allowed because no target table can be determined for a row.  
	Reason code = "2".  SQLSTATE=23513 

Note that the defined constraints didn't include explicit handling of a null date. Because constraints only reject what is false, all tables would permit a row with a null date, thereby generating an "ambiguous target" error.

While in the cases above the problem is clearly a bad constraint and therefore improper partitioning, not all UNION ALL views are partitioning, and thus it is important that DB2 not be picky about the makeup of the constraints. What matters is not a partitioning of the base table's data, but the partitioning of the input data given the constraints of the base tables.

But how does insert into a UNION ALL view really work? Isn't it expensive to insert into each table? What about a system with multiple database partitions? Can this feature scale?


Looking inside

Let's take a detailed look inside the works for inserting through a UNION ALL works. To aid in understanding, each step is explained through SQL. However DB2's internal expressiveness is a superset of what SQL can express, so the translation can only be approximate.

But before we start, let's add some interesting logic by declaring that Q1 knows product number 9 as 6. A simple BEFORE trigger will do the trick:

 
	CREATE TRIGGER I_Q1_PNO NO CASCADE BEFORE INSERT ON Q1 
  		REFERENCING NEW AS n FOR EACH ROW MODE DB2SQL 
  		SET n.product_no =  
  	        CASE WHEN n.product_no = 9 THEN 6  
                     ELSE n.product_no END;

Further we define that null dates are interpreted to mean the current date. Here, too, we use a BEFORE trigger:

 
	CREATE TRIGGER I_Q1_NULL NO CASCADE BEFORE INSERT ON Q1 
  		REFERENCING NEW AS n FOR EACH ROW MODE DB2SQL 
  		SET n.date  
    		= COALESCE(n.date, CURRENT DATE); 
    		 
	CREATE TRIGGER I_Q2_NULL NO CASCADE BEFORE INSERT ON Q2 
  		REFERENCING NEW AS n FOR EACH ROW MODE DB2SQL 
  		SET n.date  
    		= COALESCE(n.date, CURRENT DATE); 
    		 
	CREATE TRIGGER I_Q3_NULL NO CASCADE BEFORE INSERT ON Q3 
  		REFERENCING NEW AS n FOR EACH ROW MODE DB2SQL 
  		SET n.date  
    		= COALESCE(n.date, CURRENT DATE); 
    		 
	CREATE TRIGGER I_Q4_NULL NO CASCADE BEFORE INSERT ON Q4 
  		REFERENCING NEW AS n FOR EACH ROW MODE DB2SQL 
  		SET n.date  
   		= COALESCE(n.date, CURRENT DATE);

Now assume that we want to insert the following three rows into FY:

 
INSERT INTO FY VALUES (7, 20, NULL), 
                      (9, 30, '2001-03-21'), 
                      (9, 25, '2001-04-30'); 
Input: 
PRODUCT_NO SALES DATE 
---------- ----- ---------- 
         7    20          - 
         9    30 2001-03-21 
         9    25 2001-04-30 

Now, let's see what DB2 does.

  1. First, the algorithm (shown in Figure 1) identifies each row uniquely:
     
    TABLE(SELECT ROWNUMBER() OVER() AS RN,  
                 product_no, sales, date  
          FROM Input) AS Step1 
     
    Step1: 
    RN PRODUCT_NO SALES DATE 
    -- ---------- ----- ---------- 
     1          7    20          - 
     2          9    30 2001-03-21 
     3          9    25 2001-04-30

  2. The DB2 compiler builds an in-memory table, which contains all the possible target table identifiers plus one row which we call the representative row (or, lovingly, "caboose" because it always trails the regular table identifiers).
     
    TABLE(VALUES ('Q1') 
                 ('Q2') 
                 ('Q3') 
                 ('Q4') 
                 ('ZZ')  /* Caboose */ 
         ) AS Step2(table_id) 
     
    Step2: 
    TABLE_ID 
    -------- 
          Q1 
          Q2 
          Q3 
          Q4 
          ZZ

    Note that the Caboose is always there.


    Figure 1. DB2's algorithm for inserting through UNION ALL views
    DB2's algorithm for inserting through UNION ALL views
  3. Now we build a cross-product of the input and the in-memory table by using a nested loop join, which preserves the order of the table and gives us nice groups:
     
    TABLE(SELECT rn, table_id, product_no, sales, date  
          FROM Step1 INNER JOIN Step2) AS Step3 
     
    Step3: 
    RN TABLE_ID PRODUCT_NO SALES DATE 
    -- -------- ---------- ----- ---------- 
     1       Q1          7    20          - 
     1       Q2          7    20          - 
     1       Q3          7    20          - 
     1       Q4          7    20          - 
     1       ZZ          7    20          - 
     
     2       Q1          9    30 2001-03-21 
     2       Q2          9    30 2001-03-21 
     2       Q3          9    30 2001-03-21 
     2       Q4          9    30 2001-03-21 
     2       ZZ          9    30 2001-03-21 
     
     3       Q1          9    25 2001-04-30 
     3       Q2          9    25 2001-04-30 
     3       Q3          9    25 2001-04-30 
     3       Q4          9    25 2001-04-30 
     3       ZZ          9    25 2001-04-30

    Why did we replicate the rows? Very simply, it's because SQL requires that BEFORE triggers fire before check constraints are executed. Because BEFORE triggers on different tables can perform different modifications on the new transition variables, dedicated copies must be created for each potential target table.

  4. In our example, after triggers I_Q1_PNO, and I_Q1_NULL through I_Q4_NULL have fired, the intermediate result looks like this (assuming the CURRENT DATE is January 17, 2002):
     
    Step4: 
    RN TABLE_ID PRODUCT_NO SALES DATE 
    -- -------- ---------- ----- ---------- 
     1       Q1          7    20 2002-01-17 
     1       Q2          7    20 2002-01-17 
     1       Q3          7    20 2002-01-17 
     1       Q4          7    20 2002-01-17 
     1       ZZ          7    20          - 
     
     2       Q1          6    30 2001-03-21 
     2       Q2          9    30 2001-03-21 
     2       Q3          9    30 2001-03-21 
     2       Q4          9    30 2001-03-21 
     2       ZZ          9    30 2001-03-21 
     
     3       Q1          9    25 2001-04-30 
     3       Q2          9    25 2001-04-30 
     3       Q3          9    25 2001-04-30 
     3       Q4          9    25 2001-04-30 
     3       ZZ          9    25 2001-04-30

    Note: Because of the way the semantics are defined, before triggers for each table have to fire even if a given row will ultimately not be placed into the subject table of the trigger. Before triggers with side-effects such as sending e-mail or executing a NEXTVAL sequence-expression will thus fire more often than was perhaps anticipated by the definer of the trigger.

  5. Now that the before triggers have been taken into account, the check constraints are used to filter out all violating rows. To keep things interesting, let's assume we are still using the faulty check constraint of Q1_CHK_DATE that uses the months (4, 2, 3) instead of (1, 2, 3):
     
    TABLE(SELECT rn, table_id, product_no, ... 
          FROM Step4  
          WHERE (table_id = 'Q1'  
                 AND month(date) IN (4, 2, 3))  
             OR (table_id = 'Q2'  
                 AND month(date) IN (4, 5, 6)) 
             OR (table_id = 'Q3'  
                 AND month(date) IN (7, 8, 9))  
             OR (table_id = 'Q4'  
                 AND month(date) IN (10, 11, 12)) 
             OR (table_id = 'ZZ')) 
    AS Step5 
     
    Step5: 
    RN TABLE_ID PRODUCT_NO SALES DATE 
    -- -------- ---------- ----- ----------  
     1       ZZ          7    20          - 
     
     2       Q1          6    30 2001-03-21 
     2       ZZ          9    30 2001-03-21 
     
     3       Q1          9    25 2001-04-30 
     3       Q2          9    25 2001-04-30 
     3       ZZ          9    25 2001-04-30 
    

    Because the Caboose has no check constraints, it never gets filtered. So we end up with:

    • Only the caboose for the first row (the missing January).
    • One regular row (for Q1 plus the caboose that was left for the March row.
    • And April survived for Q1, Q2 and, of course, the caboose.
  6. All that's left to do now is to number each group:
     
    TABLE(SELECT rn, table_id, 
                 rownumber() over(partition by rn) as rank, 
                 product_no, ... 
          FROM Step5) AS Step6 
     
    Step6: 
    RN TABLE_ID RANK PRODUCT_NO SALES DATE 
    -- -------- ---- ---------- ----- ---------- 
     1       ZZ    1          7    20 2001-01-03 
     
     2       Q1    1          9    30 2001-03-21 
     2       ZZ    2          9    30 2001-03-21 
     
     3       Q1    1          9    25 2001-04-30 
     3       Q2    2          9    25 2001-04-30 
     3       ZZ    3          9    25 2001-04-30

  7. Now a simple test on the rank of the caboose tells us what to do:
    • 'ZZ' and 1 means the row was filtered out by the constraints of all tables. This is a "no target" error.
    • 'ZZ' and 2 means the constraints of exactly one table accepted the row. This is success.
    • 'ZZ' and anything greater than 2 means the constraints of more than one table accepted the row. This is an "ambiguous target" error.
     
    TABLE(SELECT rn, ....  
          FROM Step6 
          WHERE 1 = CASE WHEN table_id <> 'ZZ' THEN 1  
                         WHEN rank = 2 THEN 1 
                         WHEN rank = 1  
                           THEN raise_error('no target') 
                     ELSE raise_error('ambiguous target') 
                    END 
    AS Step7

  8. As required for atomicity, encountering one of the two errors rolls back the whole statement and thus all rows are rejected.

Following this test, the caboose rows can be filtered out and the added columns (except for the target table identifiers) can be projected out. Then the inserts occur to the respective tables using the table identifiers.

Now what's the meat of this? Why is this patented algorithm so good?

  • The in-memory table of the target tables is sorted by the SQL compiler when the statement is prepared and can be replicated on each processing node.
  • The numbering of the input can be done without a temporary table and can be local for the processing node.
  • The result of the nested loop join will preserve the input data (outer) as groups with the caboose being last in each group without the need for sorts or temp tables. The nested loop needs to process only data that is local to the node where it executes.
  • Before triggers are supported.
  • Check constraints are processed before the insert really happens, and no static analysis of the partitioning is required.
  • The final numbering in groups (ranking) does not need a sort and can be done locally on each processing node.

In essence, this means that, if in our example each row that passed into the algorithm originated from a different database partition, each partition can execute the complete algorithm in isolation and would not need to communicate with another node before sending the row to be inserted into the target table. This final communication can be hash-directed. Thus the algorithm provides no bottleneck for scalability.

Given that DB2 Version 8 also introduces fully parallelized and pipelined BEFORE trigger processing for common triggers, this algorithm simply rocks :-).


Potential improvements

What happens when we try to fix a typo in the specified date, which acts as a "partitioning key," after it has been inserted? This can cause a constraint violation error because DB2 can only update the row within the table.

             
	UPDATE FY  
	SET date = '2001-07-11' 
	WHERE product_no = 3 AND date = '2001-04-11'; 
 
	SQL0545N  The requested operation is not allowed because a row does  
	not satisfy the check constraint "SRIELAU.Q2.Q2_CHK_DATE".  
	SQLSTATE=23513 

Instead, the row needs to be deleted and reinserted:

 
	DELETE FROM FY  
	WHERE product_no = 3 AND date = '2001-04-11'; 
	 
	INSERT INTO FY VALUES (3, 10, '2001-07-11'); 

It seems that although DB2 Version 8 has taken a big step forward with updatability of UNION ALL views, we can envision further improvements:

  • Just as DB2 introduced updates to the partitioning key for tables in a multiple database partition group in DB2 Version 7, it may be expected that someday UNION ALL views will support updates that cause rows to migrate from one base table to another.

    For example, the following statement would cause a row to be moved from the Q2 table to the Q3 table. This means the row will be deleted from Q2 and then the updated row will be inserted into FY using the insert algorithm described in this article.

     
    	UPDATE FY  
    	SET date = '2001-07-11' 
    	WHERE product_no = 3 AND date = '2001-04-11';

  • Furthermore, it would be desirable to generalize insert through UNION ALL to exploit predicates in the view's body to dispatch the rows in addition or instead of check constraints. The view above would then look like this:
     
    CREATE VIEW FY AS 
      SELECT product_no, sales, date FROM Q1 
        WHERE month(date) IN (1, 2, 3) 
    UNION ALL 
      SELECT product_no, sales, date FROM Q2  
        WHERE month(date) IN (4, 5, 6) 
    UNION ALL 
      SELECT product_no, sales, date FROM Q3 
        WHERE month(date) IN (7, 8, 9) 
    UNION ALL 
      SELECT product_no, sales, date FROM Q4  
        WHERE month(date) IN (10, 11, 12) 
    WITH CASCADED CHECK OPTIONS; 
    

    Note that the check option clause above ensures that every row that is inserted into a view can also be selected back using the same view. Hence these kinds of views are called symmetric. If one accepts the semantics of insert through UNION ALL views as described in this article, here is a conceivable refinement:

    A row can be inserted into a UNION ALL view if the row is accepted by exactly one branch of the UNION ALL view.


Conclusion

In this paper we discussed the cases in which UNION ALL views are useful and what it means to update a view which is based on multiple base tables. A scalable algorithm was presented that addresses inserts into UNION ALL views. Insertable UNION ALL views are supported in DB2 UDB V8 for Linux, UNIX, and Windows to help you preserve full encapsulation of the underlying data for all SQL data manipulation statements.


About the author

Photo: Serge Rielau

Serge Rielau started his life in DB2 six years ago through a year-long internship at the Almaden Research Center where he worked on inline SQL PL and structured types. In the following five years he has been responsible for various SQL enhancements; for example, typed views and generated columns. As a technical manager in the SQL Query Compiler he now drives language changes into DB2 as well as the SQL standard. He can be reached through comp.databases.ibm-db2 or at srielau@ca.ibm.com .

Comments



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=13829
ArticleTitle=Updatable UNION ALL Views in DB2 Universal Database Version 8
publish-date=09172002
author1-email=srielau@ca.ibm.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers