Partitioning UNION ALL views are commonly used to simplify data maintenance and to overcome scalability limits of single tables. This article describes how DB2® Universal DatabaseTM Version 8 for Linux, UNIX® and Windows® enables you to insert, update, and delete through views defined with UNION ALL. It explains in detail the patented algorithm used for insert processing.
The many uses of UNION ALL views
Support for views based on UNION ALL has been available in DB2 since Version 2. Since then, such views have become fairly common. Most uses of UNION ALL views fall into one of the following categories:
- Unification of different, but semantically related tables.
These kind of constructions are often found as relational mappings of object-oriented hierarchies and represent a horizontal partitioning of the data; that is, each specific "class" is represented by one table.
A typical example would be a hierarchy composed of the classes person, employee, and manager in which person, employee and manager are each separate relational tables. To create a view of all persons, a UNION ALL view can be used to unify the tables.
- Unification of like tables.
Constructions of UNION ALL over like tables are often used to hide a range partitioning of the overall data set. Each partition is a separate table, which simplifies maintenance, increases size limits, and, in some cases can even improve query performance against the overall data.
A typical example is to partition a year's worth of sales data into four tables, each containing the data for one quarter.
- Integration of federated data sources.
Sometimes data that must be unified is spread across local and remote databases. In this case UNION ALL is the only option to obtain a transparent view of all the data other than replicating the data locally.
In DB2 remote objects are represented as nicknames and views can be constructed to relate them to local tables or other nicknames. See, for example, Building Federated Systems with Relational Connect and Database Views (PDF format).
As a running example, let's unify like tables as discussed above.
- We need to first create one of the tables:
CREATE TABLE Q1(product_no INT, sales INT, date DATE);
- Now derive the other quarterly tables from it:
CREATE TABLE Q2 LIKE Q1; CREATE TABLE Q3 LIKE Q1; CREATE TABLE Q4 LIKE Q1;
- Each table must only hold data for its quarter. Table Q1 holds January (1) through March (3), table Q2 holds April (4) through June (6), and so on. Let's add the appropriate constraints.
ALTER TABLE Q1 ADD CONSTRAINT Q1_CHK_DATE CHECK (MONTH(date) IN (1, 2, 3)); ALTER TABLE Q2 ADD CONSTRAINT Q2_CHK_DATE CHECK (MONTH(date) IN (4, 5, 6)); ALTER TABLE Q3 ADD CONSTRAINT Q3_CHK_DATE CHECK (MONTH(date) IN (7, 8, 9)); ALTER TABLE Q4 ADD CONSTRAINT Q4_CHK_DATE CHECK (MONTH(date) IN (10, 11, 12));
- We are now ready to feed some "sales data" to the tables:
INSERT INTO Q1 VALUES (5, 6, '2001-01-02'), (8, 100, '2001-02-28'); INSERT INTO Q2 VALUES (3, 10, '2001-04-11'), (5, 15, '2001-05-19'); INSERT INTO Q3 VALUES (1, 12, '2001-08-27'); INSERT INTO Q4 VALUES (3, 14, '2001-12-29'), (2, 21, '2001-12-12');
- Finally, create the view for the full year's data by unifying all quarters:
CREATE VIEW FY AS SELECT product_no, sales, date FROM Q1 UNION ALL SELECT product_no, sales, date FROM Q2 UNION ALL SELECT product_no, sales, date FROM Q3 UNION ALL SELECT product_no, sales, date FROM Q4;
Here is what the content of the view looks like:
SELECT * FROM FY ORDER BY date, product_no; PRODUCT_NO SALES DATE ----------- ----------- ---------- 5 6 01/02/2001 8 100 02/28/2001 3 10 04/11/2001 5 15 05/19/2001 1 12 08/27/2001 2 21 12/12/2001 3 14 12/29/2001 7 record(s) selected.
Because UNION ALL views are so common, DB2 has specific rules targeted to optimize queries on union all to improve performance. For details, refer to Partitioning in DB2 Using UNION ALL views (PDF format).
For example, the following statement selects only from the Q1 and Q2 tables because DB2's theorem prover can use the check constraints on the tables to prove Q3 and Q4 are not affected:
SELECT sum(sales) AS total FROM FY
WHERE MONTH(date) BETWEEN 3 AND 5;
TOTAL
-----------
25
1 record(s) selected.
|
Updating and deleting from the FY view is straightforward because DB2 knows from which table the row to be updated or deleted originated and therefore can modify it easily.
The following updates a row in Q3:
UPDATE FY SET sales = 20 WHERE product_no = 1 AND date = '2001-08-27'; |
That same row can also be deleted from Q3:
DELETE FROM FY WHERE product_no = 1 AND date = '2001-08-27'; |
To reinsert the deleted row:
INSERT INTO FY VALUES (1, 20, '2001-08-27'); |
Hang on! At the beginning of the article we inserted into the base tables. Why? Before DB2 Version 8, it was not possible to insert through a UNION ALL view. Thus, the application needed to be aware of the makeup of the view and insert into the appropriate base table.
New and very neat logic now enables DB2 to accept inserts against a UNION ALL view:
INSERT INTO FY VALUES (1, 20, '2001-06-03'),
(2, 30, '2001-03-21'),
(2, 25, '2001-08-30');
|
Semantically, DB2 attempts to insert any given row into each of the tables underlying the UNION ALL view. It then counts how many tables accepted the row:
- If exactly one table accepts the row, the insert is accepted.
- If no table accepts the row, a "no target" error is raised.
- If more than one table accepts the row, then an "ambiguous target" error is raised.
In the view above, there would always be exactly one table because the constraints on each quarterly table cover all of the valid month values.
Consider what would happen if Q1_CHK_DATE had a typo in it which would enable April data, not January data, into the Q1 base table:
ALTER TABLE Q1 DROP CONSTRAINT Q1_CHK_DATE; ALTER TABLE Q1 ADD CONSTRAINT Q1_CHK_DATE CHECK (MONTH(date) IN (4, 2, 3)); |
Now it is not possible to insert January data because there is no target table that can accept the data.
INSERT INTO FY VALUES (5, 35, '2001-01-14'); SQL20154N The requested insert operation into view "SRIELAU.FY" is not allowed because no target table can be determined for a row. Reason code = "1". SQLSTATE=23513 |
Inserting April data, on the other hand, causes a conflict because both Q1 and Q2 could accept the row.
INSERT INTO FY VALUES (3, 30, '2001-04-21'); SQL20154N The requested insert operation into view "SRIELAU.FY" is not allowed because no target table can be determined for a row. Reason code = "2". SQLSTATE=23513 |
Note that the defined constraints didn't include explicit handling of a null date. Because constraints only reject what is false, all tables would permit a row with a null date, thereby generating an "ambiguous target" error.
While in the cases above the problem is clearly a bad constraint and therefore improper partitioning, not all UNION ALL views are partitioning, and thus it is important that DB2 not be picky about the makeup of the constraints. What matters is not a partitioning of the base table's data, but the partitioning of the input data given the constraints of the base tables.
But how does insert into a UNION ALL view really work? Isn't it expensive to insert into each table? What about a system with multiple database partitions? Can this feature scale?
Let's take a detailed look inside the works for inserting through a UNION ALL works. To aid in understanding, each step is explained through SQL. However DB2's internal expressiveness is a superset of what SQL can express, so the translation can only be approximate.
But before we start, let's add some interesting logic by declaring that Q1 knows product number 9 as 6. A simple BEFORE trigger will do the trick:
CREATE TRIGGER I_Q1_PNO NO CASCADE BEFORE INSERT ON Q1
REFERENCING NEW AS n FOR EACH ROW MODE DB2SQL
SET n.product_no =
CASE WHEN n.product_no = 9 THEN 6
ELSE n.product_no END; |
Further we define that null dates are interpreted to mean the current date. Here, too, we use a BEFORE trigger:
CREATE TRIGGER I_Q1_NULL NO CASCADE BEFORE INSERT ON Q1
REFERENCING NEW AS n FOR EACH ROW MODE DB2SQL
SET n.date
= COALESCE(n.date, CURRENT DATE);
CREATE TRIGGER I_Q2_NULL NO CASCADE BEFORE INSERT ON Q2
REFERENCING NEW AS n FOR EACH ROW MODE DB2SQL
SET n.date
= COALESCE(n.date, CURRENT DATE);
CREATE TRIGGER I_Q3_NULL NO CASCADE BEFORE INSERT ON Q3
REFERENCING NEW AS n FOR EACH ROW MODE DB2SQL
SET n.date
= COALESCE(n.date, CURRENT DATE);
CREATE TRIGGER I_Q4_NULL NO CASCADE BEFORE INSERT ON Q4
REFERENCING NEW AS n FOR EACH ROW MODE DB2SQL
SET n.date
= COALESCE(n.date, CURRENT DATE); |
Now assume that we want to insert the following three rows into FY:
INSERT INTO FY VALUES (7, 20, NULL),
(9, 30, '2001-03-21'),
(9, 25, '2001-04-30');
Input:
PRODUCT_NO SALES DATE
---------- ----- ----------
7 20 -
9 30 2001-03-21
9 25 2001-04-30
|
Now, let's see what DB2 does.
- First, the algorithm (shown in Figure 1) identifies each row uniquely:
TABLE(SELECT ROWNUMBER() OVER() AS RN, product_no, sales, date FROM Input) AS Step1 Step1: RN PRODUCT_NO SALES DATE -- ---------- ----- ---------- 1 7 20 - 2 9 30 2001-03-21 3 9 25 2001-04-30 - The DB2 compiler builds an in-memory table, which contains all the possible target table identifiers plus one row which we call the representative row (or, lovingly, "caboose" because it always trails the regular table identifiers).
TABLE(VALUES ('Q1') ('Q2') ('Q3') ('Q4') ('ZZ') /* Caboose */ ) AS Step2(table_id) Step2: TABLE_ID -------- Q1 Q2 Q3 Q4 ZZ
Note that the Caboose is always there.
Figure 1. DB2's algorithm for inserting through UNION ALL views
- Now we build a cross-product of the input and the in-memory table by using a nested loop join, which preserves the order of the table and gives us nice groups:
TABLE(SELECT rn, table_id, product_no, sales, date FROM Step1 INNER JOIN Step2) AS Step3 Step3: RN TABLE_ID PRODUCT_NO SALES DATE -- -------- ---------- ----- ---------- 1 Q1 7 20 - 1 Q2 7 20 - 1 Q3 7 20 - 1 Q4 7 20 - 1 ZZ 7 20 - 2 Q1 9 30 2001-03-21 2 Q2 9 30 2001-03-21 2 Q3 9 30 2001-03-21 2 Q4 9 30 2001-03-21 2 ZZ 9 30 2001-03-21 3 Q1 9 25 2001-04-30 3 Q2 9 25 2001-04-30 3 Q3 9 25 2001-04-30 3 Q4 9 25 2001-04-30 3 ZZ 9 25 2001-04-30
Why did we replicate the rows? Very simply, it's because SQL requires that BEFORE triggers fire before check constraints are executed. Because BEFORE triggers on different tables can perform different modifications on the new transition variables, dedicated copies must be created for each potential target table.
- In our example, after triggers I_Q1_PNO, and I_Q1_NULL through I_Q4_NULL have fired, the intermediate result looks like this (assuming the CURRENT DATE is January 17, 2002):
Step4: RN TABLE_ID PRODUCT_NO SALES DATE -- -------- ---------- ----- ---------- 1 Q1 7 20 2002-01-17 1 Q2 7 20 2002-01-17 1 Q3 7 20 2002-01-17 1 Q4 7 20 2002-01-17 1 ZZ 7 20 - 2 Q1 6 30 2001-03-21 2 Q2 9 30 2001-03-21 2 Q3 9 30 2001-03-21 2 Q4 9 30 2001-03-21 2 ZZ 9 30 2001-03-21 3 Q1 9 25 2001-04-30 3 Q2 9 25 2001-04-30 3 Q3 9 25 2001-04-30 3 Q4 9 25 2001-04-30 3 ZZ 9 25 2001-04-30
Note: Because of the way the semantics are defined, before triggers for each table have to fire even if a given row will ultimately not be placed into the subject table of the trigger. Before triggers with side-effects such as sending e-mail or executing a NEXTVAL sequence-expression will thus fire more often than was perhaps anticipated by the definer of the trigger.
- Now that the before triggers have been taken into account, the check constraints are used to filter out all violating rows. To keep things interesting, let's assume we are still using the faulty check constraint of Q1_CHK_DATE that uses the months (4, 2, 3) instead of (1, 2, 3):
TABLE(SELECT rn, table_id, product_no, ... FROM Step4 WHERE (table_id = 'Q1' AND month(date) IN (4, 2, 3)) OR (table_id = 'Q2' AND month(date) IN (4, 5, 6)) OR (table_id = 'Q3' AND month(date) IN (7, 8, 9)) OR (table_id = 'Q4' AND month(date) IN (10, 11, 12)) OR (table_id = 'ZZ')) AS Step5 Step5: RN TABLE_ID PRODUCT_NO SALES DATE -- -------- ---------- ----- ---------- 1 ZZ 7 20 - 2 Q1 6 30 2001-03-21 2 ZZ 9 30 2001-03-21 3 Q1 9 25 2001-04-30 3 Q2 9 25 2001-04-30 3 ZZ 9 25 2001-04-30
Because the Caboose has no check constraints, it never gets filtered. So we end up with:
- Only the caboose for the first row (the missing January).
- One regular row (for Q1 plus the caboose that was left for the March row.
- And April survived for Q1, Q2 and, of course, the caboose.
- All that's left to do now is to number each group:
TABLE(SELECT rn, table_id, rownumber() over(partition by rn) as rank, product_no, ... FROM Step5) AS Step6 Step6: RN TABLE_ID RANK PRODUCT_NO SALES DATE -- -------- ---- ---------- ----- ---------- 1 ZZ 1 7 20 2001-01-03 2 Q1 1 9 30 2001-03-21 2 ZZ 2 9 30 2001-03-21 3 Q1 1 9 25 2001-04-30 3 Q2 2 9 25 2001-04-30 3 ZZ 3 9 25 2001-04-30
- Now a simple test on the rank of the caboose tells us what to do:
- 'ZZ' and 1 means the row was filtered out by the constraints of all tables. This is a "no target" error.
- 'ZZ' and 2 means the constraints of exactly one table accepted the row. This is success.
- 'ZZ' and anything greater than 2 means the constraints of more than one table accepted the row. This is an "ambiguous target" error.
TABLE(SELECT rn, .... FROM Step6 WHERE 1 = CASE WHEN table_id <> 'ZZ' THEN 1 WHEN rank = 2 THEN 1 WHEN rank = 1 THEN raise_error('no target') ELSE raise_error('ambiguous target') END AS Step7 - As required for atomicity, encountering one of the two errors rolls back the whole statement and thus all rows are rejected.
Following this test, the caboose rows can be filtered out and the added columns (except for the target table identifiers) can be projected out. Then the inserts occur to the respective tables using the table identifiers.
Now what's the meat of this? Why is this patented algorithm so good?
- The in-memory table of the target tables is sorted by the SQL compiler when the statement is prepared and can be replicated on each processing node.
- The numbering of the input can be done without a temporary table and can be local for the processing node.
- The result of the nested loop join will preserve the input data (outer) as groups with the caboose being last in each group without the need for sorts or temp tables. The nested loop needs to process only data that is local to the node where it executes.
- Before triggers are supported.
- Check constraints are processed before the insert really happens, and no static analysis of the partitioning is required.
- The final numbering in groups (ranking) does not need a sort and can be done locally on each processing node.
In essence, this means that, if in our example each row that passed into the algorithm originated from a different database partition, each partition can execute the complete algorithm in isolation and would not need to communicate with another node before sending the row to be inserted into the target table. This final communication can be hash-directed. Thus the algorithm provides no bottleneck for scalability.
Given that DB2 Version 8 also introduces fully parallelized and pipelined BEFORE trigger processing for common triggers, this algorithm simply rocks :-).
What happens when we try to fix a typo in the specified date, which acts as a "partitioning key," after it has been inserted? This can cause a constraint violation error because DB2 can only update the row within the table.
UPDATE FY SET date = '2001-07-11' WHERE product_no = 3 AND date = '2001-04-11'; SQL0545N The requested operation is not allowed because a row does not satisfy the check constraint "SRIELAU.Q2.Q2_CHK_DATE". SQLSTATE=23513 |
Instead, the row needs to be deleted and reinserted:
DELETE FROM FY WHERE product_no = 3 AND date = '2001-04-11'; INSERT INTO FY VALUES (3, 10, '2001-07-11'); |
It seems that although DB2 Version 8 has taken a big step forward with updatability of UNION ALL views, we can envision further improvements:
- Just as DB2 introduced updates to the partitioning key for tables in a multiple database partition group in DB2 Version 7, it may be expected that someday UNION ALL views will support updates that cause rows to migrate from one base table to another.
For example, the following statement would cause a row to be moved from the Q2 table to the Q3 table. This means the row will be deleted from Q2 and then the updated row will be inserted into FY using the insert algorithm described in this article.
UPDATE FY SET date = '2001-07-11' WHERE product_no = 3 AND date = '2001-04-11';
- Furthermore, it would be desirable to generalize insert through UNION ALL to exploit predicates in the view's body to dispatch the rows in addition or instead of check constraints. The view above would then look like this:
CREATE VIEW FY AS SELECT product_no, sales, date FROM Q1 WHERE month(date) IN (1, 2, 3) UNION ALL SELECT product_no, sales, date FROM Q2 WHERE month(date) IN (4, 5, 6) UNION ALL SELECT product_no, sales, date FROM Q3 WHERE month(date) IN (7, 8, 9) UNION ALL SELECT product_no, sales, date FROM Q4 WHERE month(date) IN (10, 11, 12) WITH CASCADED CHECK OPTIONS;
Note that the check option clause above ensures that every row that is inserted into a view can also be selected back using the same view. Hence these kinds of views are called symmetric. If one accepts the semantics of insert through UNION ALL views as described in this article, here is a conceivable refinement:
A row can be inserted into a UNION ALL view if the row is accepted by exactly one branch of the UNION ALL view.
In this paper we discussed the cases in which UNION ALL views are useful and what it means to update a view which is based on multiple base tables. A scalable algorithm was presented that addresses inserts into UNION ALL views. Insertable UNION ALL views are supported in DB2 UDB V8 for Linux, UNIX, and Windows to help you preserve full encapsulation of the underlying data for all SQL data manipulation statements.

Serge Rielau started his life in DB2 six years ago through a year-long internship at the Almaden Research Center where he worked on inline SQL PL and structured types. In the following five years he has been responsible for various SQL enhancements; for example, typed views and generated columns. As a technical manager in the SQL Query Compiler he now drives language changes into DB2 as well as the SQL standard. He can be reached through comp.databases.ibm-db2 or at srielau@ca.ibm.com .





