Use the MERGE statement in Informix Dynamic Server

This article discusses the merge statement of IBM Informix™ Dynamic Server, which follows the SQL 2003 standard. The merge command is especially useful in data warehouse (ETL) environments where transaction tables are merged into bigger warehouse tables.

Share:

Ravi Pachipala (rpachipa@us.ibm.com), Advisory Software Engineer, IBM

Ravi Pachipala works in the security and SQL components of the Informix server. Besides the Merge feature, Ravi has also worked on LBAC and password API changes for connection manager. He is currently actively developing security and SQL features in the server.



Neeraj Kapoor (niraj@us.ibm.com), Advisory Software Engineer, IBM

Neeraj Kapoor works in the SQL component of the Informix Dynamic Server group. Neeraj has more than 10 years of experience in developing features in SQL and query processing components of Informix Dynamic Server, Informix XPS, and DB2 for Linux, UNIX, and Windows database servers.



Manoj Ghogale (mghogale@in.ibm.com), Staff Software Engineer, IBM

Manoj Ghogale works in Quality Assurance development of the Informix Dynamic Server group. Manoj is currently working on automation framework.



10 February 2011

Also available in Chinese

Introduction

The merge statement is used to update, insert, or delete specific target table rows based on the condition specified in the on clause. The merge statement implemented in IBM Informix Dynamic Server follows the SQL 2003 standard. Merge is especially useful in data warehouse (ETL) environments where transaction tables are merged with bigger warehouse tables. Merge also replaces the client-side logic known as upsert that performs conditional updates and inserts. The merge statement improves usage and performance because the entire operation is performed from the server with a single statement.


Introducing user scenarios

The scenarios in this article illustrate situations where you might use the merge statement.

Scenario 1

Consider a situation where data from your transactional system must be moved into the warehouse. The schema of a data warehouse is not identical to the transactional schema. This requires that the data from the transactional system be transformed before being loading into the warehouse table. New records from the transactional system must be inserted into the warehouse table. Existing records are updated after going through the correct transformation.

What is the best possible way to accomplish this task? Many ETL tools exist that extract and transform data before loading into the warehouse. However, the transformation and cleaning process can be accomplished more easily with the merge SQL statement.

For example, assume you have a table called customer_trans in your transactional database. When this table is loaded into the target warehouse table customer, which is in the stores_demo database, the target table might have more or fewer columns than the customer_trans table. Even if the columns match, there could be some semantic differences between identical column names and types. Listing 1 shows an example merge statement for this scenario.

Listing 1. An example of a merge statement
MERGE INTO customer t
USING customer_trans s
ON s.customer_num=t.customer_num
WHEN MATCHED THEN
UPDATE SET t.address1 = s.address1 AND t.address2 = s.address2 AND 
t.city=s.city AND t.zipcode=s.zipcode AND t.phone=s.phone
WHEN NOT MATCHED THEN
INSERT (t.fname, t.lname, t.company, t.address1, t.address2, t.city, 
t.zipcode, t.phone) VALUES
 (s.fname, s.lname, s.company, s.address1, s.address2, s.city, 
 s.zipcode, s.phone);

Scenario 2

Consider another situation where, during application development, you want to update rows of a particular table if the condition is satisfied and you want to insert rows into the same table when the condition is not met. You can implement this kind of logic in the client application or as a stored procedure. Implementing the logic in the client application has a negative impact on performance. Implementing the logic in a stored procedure has a development and maintenance cost, because stored-procedure language support varies among different vendors. Using the merge statement in this scenario simplifies development and improves performance.

From an SQL perspective, the merge statement is a DML SQL statement with the same semantics as other DML statements, including insert, update, and delete. Although the merge statement performs conditional inserts, updates, and deletes, it is atomic in nature, which means either the entire merge operation is applied or none is applied, when transaction logging is enabled. There are a few restrictions with the usage of a merge statement, which are outlined in this article.

Conceptually, the merge function joins source table rows with target table rows by applying the on clause filter. All the joined rows are updated or deleted on the target. The source table rows that are not joined with the target are inserted into the target. Also, in a merge statement, a source table need not be a table. It can be a complex query or a view, as described in this article.


Understanding the syntax

Listing 2 shows a typical usage example of the merge statement using the stores_demo database. Use the transaction table customer_trans to merge rows into the fact table customer, updating the customers’ phone numbers if there are already records in the customer table, else insert the new customer record into the customer table.

Listing 2. Merge usage
CREATE TABLE customer_trans(
        customer_num            serial(101),
        fname                   char(15),
        lname                   char(15),
        company                 char(20),
        address1                char(20),
        address2                char(20),
        city                    char(15),
        state                   char(2),
        zipcode                 char(5),
        phone                   char(18)
        );


INSERT INTO customer_trans SELECT * FROM CUSTOMER WHERE 
customer_num = 101;

UPDATE customer_trans SET phone = '408-222-2345' WHERE 
customer_num = 101;

INSERT INTO customer_trans VALUES (129,'Mike','Plaza',
'Sporting Goods','430 Walnut Dr',NULL, 'Fremont','CA','94538',
'510-236-8892');

MERGE INTO customer as C
USING customer_trans as CT
ON C.customer_num = CT.customer_num
WHEN MATCHED THEN UPDATE SET C.phone = CT.phone
WHEN NOT MATCHED THEN INSERT VALUES (CT.customer_num, CT.fname, 
CT.lname, CT.company, CT.address1, CT.address2, CT.city, 
CT.state, CT.zipcode, CT.phone)

Listing 2 merges two rows, one record is updated, and one record is inserted. You can confirm that with the select statement in Listing 3.

Listing 3. Select statement usage
SELECT * FROM customer WHERE customer_num IN (101, 129);

customer_num  101
fname         Ludwig
lname         Pauli
company       All Sports Supplies
address1      213 Erstwild Court
address2
city          Sunnyvale
state         CA
zipcode       94086
phone         408-222-2345

customer_num  129
fname         Mike
lname         Plaza
company       Sporting Goods
address1      430 Walnut Dr
address2
city          Fremont
state         CA
zipcode       94538
phone         510-236-8892

2 row(s) retrieved.

Listing 4 shows the usage of the merge statement where the matched rows are deleted from the target table and the non-matched rows are inserted. The delete and update commands are mutually exclusive for the when matched clause, which means that only one of them can be specified in a merge statement at a time.

Listing 4. Merge delete
MERGE INTO customer as C
USING customer_trans as CT
ON C.customer_num != CT.customer_num
WHEN MATCHED THEN  DELETE
WHEN NOT MATCHED THEN INSERT VALUES (CT.customer_num, CT.fname, 
CT.lname, CT.company, CT.address1, CT.address2, CT.city, 
CT.state, CT.zipcode, CT.phone);

Listing 5 shows other usages of the merge statement where the WHEN MATCHED and WHEN NOT MATCHED clauses are optionally used. When one of these clauses is not specified, that part of the merge is ignored. In the first merge statement, all the non-matching rows would be ignored and matched rows would be updated in the sale table. Similarly, in the second merge statement, only non-matched rows would be inserted into the customer table, and all the matched rows would be ignored.

Listing 5. Exclusive update, insert, or delete
MERGE INTO customer AS C
USING customer_trans AS CT
ON C.customer_num = CT.customer_num
WHEN MATCHED THEN UPDATE SET C.phone = CT.phone;

MERGE INTO customer as C
USING customer_trans as CT
ON C.customer_num = CT.customer_num
WHEN NOT MATCHED THEN INSERT VALUES (CT.customer_num, CT.fname, 
CT.lname, CT.company, CT.address1, CT.address2, CT.city, CT.state,
CT.zipcode, CT.phone);

The Informix implementations of the merge statement allow the flexibility of ordering the WHEN MATCHED and WHEN NOT MATCHED clauses, as shown in Listing 6.

Listing 6. Flexible ordering of update, insert, and delete clauses
MERGE INTO customer as C
USING customer_trans as CT
ON C.customer_num = CT.customer_num
WHEN NOT MATCHED THEN INSERT VALUES (CT.customer_num, CT.fname, 
CT.lname, CT.company, CT.address1, CT.address2, CT.city, CT.state,
CT.zipcode, CT.phone)
WHEN MATCHED THEN UPDATE SET C.phone = CT.phone;

Learning merge rules

The merge statement enforces the following consistency checks that are based on relational principles.

  • No row in the target table should be updated or deleted more than once.
  • Inserted rows cannot be updated or deleted.

It is important to realize that the merge statement imposes certain inherent control over the data that is being merged. If a row has already been updated or deleted, then this row cannot be joined again with any source table row for another update or delete again. If this situation cannot be avoided, Informix throws a -26095 error message. For example, if a duplicate source table row matches exactly with the same target table row that has been updated, you will see the error message. Similarly, inserted rows in the target table should not be again considered to join with source table rows.

Figure 1 shows an example where the source and target table rows are merged. When Informix merges source table customer_trans into target table customer, the first row in the target table gets qualified for update because it matches a source row with the same value 10 in column customer_num. There is another row in the source table with a value of 10 for the customer_num column, which joins with the same row in the target table. Because the target row has already been updated once, Informix throws the -26095 error when this situation occurs.

Figure 1. Merge rules
Customer_trans rows with 10 value map to Customer_num row with 10, plus code in Listing 7

In a merge statement, the update operation obeys update statement rules and the insert operation obeys insert statement rules.


Understanding the internal processing

The merge statement first does a join (outer join or inner join, depending on the merge statement) between source and target table rows. The join condition is the on clause filter specified in the merge statement.

It is easier to describe this in detail using the explain output shown in Listing 7. Sequential scan is done on the source table, and index look-up is done on the target table. Also, the nested loop join method is used to join the source and target tables. Note that the join being performed is an outer join. Both the matched and non-matched rows are fed into the upsert iterator. The upsert iterator is the one that performs the actual updates or inserts and deletes.

Listing 7 also shows that temporary files are used to store the intermediate join result. In certain cases, as determined by the optimizer, the intermediate temp table is not required, which is reflected in the plan.

Listing 7. Sample explain output
QUERY: (OPTIMIZATION TIMESTAMP: 05-05-2010 16:32:39)
------
MERGE INTO customer as C
USING customer_trans as CT
ON C.customer_num = CT.customer_num
WHEN MATCHED THEN UPDATE SET C.phone = CT.phone
WHEN NOT MATCHED THEN INSERT VALUES (CT.customer_num, CT.fname, 
CT.lname, CT.company, CT.address1, CT.address2, CT.city, 
CT.state, CT.zipcode, CT.phone)


Estimated Cost: 4
Estimated # of Rows Returned: 1
Temporary Files Required For: MERGE

  1) informix.ct: SEQUENTIAL SCAN

  2) informix.c: INDEX PATH

    (1) Index Name: rcpachip. 100_1
        Index Keys: customer_num   (Serial, fragments: ALL)
        Lower Index Filter: 
informix.c.customer_num = informix.ct.customer_num NESTED LOOP JOIN

For the merge statement that contains both insert and update clauses, the outer join is performed between source and target tables with the target table null extended. Then the upsert iterator updates the rows that have joined and inserts the rows that are null extended into target table based on insert clause specification.

For the merge statement that contains only the update clause or the delete clause, the inner join is performed between source and target table. Matched rows in the target table are updated or deleted based on update or delete clause specifications, as shown in Figure 2.

Figure 2. Processing of the merge statements
Target and source tables feed into Join, which feeds into Upsert, and cycles through again

The on clause condition is applied entirely at the join level. The filters in the on clause can contain complex queries or simple expressions.

Under certain conditions, internal temporary tables are created during the execution of the merge statement, as shown in Listing 7.

The merge statement can also be embedded inside a stored procedure or inside a C or Java® UDR. The procedure can be invoked to perform the merge. The merge statement can also be executed on an updateable secondary node in a MACH-11 cluster environment.

For different isolation levels, the merge statement follows the same semantics as an update or insert statement. Update locks are held on the target table rows. These locks are converted to exclusive locks while performing the actual update. When an error occurs during merge processing, the entire transaction is rolled back.


Exploring constraints and violations

Merge statements follow the constraints on target or source tables. The behavior is similar to insert, update, or delete statements. Therefore a merge into a target table with a foreign key defined to another table’s primary key and having no corresponding data yields the error 691: Missing key in referenced table for referential constraint. If the target table has violations defined, the violation table and diagnostic table hold the non-conforming rows that fail to satisfy constraints. The tables also hold the unique indexes during insert or update operations on the target table caused by the merge statement, as shown in Listing 8.

Listing 8. Merge statements with constraints and violations
     1  DATABASE stores_demo;
     2  CREATE TABLE stock_br1
     3          (
     4          stock_num               smallint,
     5          manu_code               char(3),
     6          description             char(15),
     7          unit_price              money(6),
     8          unit                    char(4),
     9          unit_descr              char(15),
    10         primary key (stock_num, manu_code)
    11           );
    12
    13  ALTER TABLE stock ADD CONSTRAINT CHECK (unit_price > 1) 
    14         CONSTRAINT  chk_uprice FILTERING;
    15
    16  START VIOLATIONS TABLE FOR stock;
    17
    18  INSERT INTO stock_br1 VALUES(414,"HRO","football shoes",
    19                          '$1',"each","case");
    20
    21  MERGE INTO stock T
    22  USING (SELECT * FROM stock_br1) AS S
    23  ON(T.stock_num=S.stock_num) 
    24  WHEN NOT MATCHED THEN
    25  INSERT(T.stock_num,T.manu_code,T.description,T.unit_price,
    26         T.unit,T.unit_descr)
    27  VALUES(S.stock_num,S.manu_code,S.description,S.unit_price,
    28         S.unit,S.unit_descr);
    29
    30  SELECT * FROM stock_vio;
    31  SELECT * FROM stock_dia;
    32
    33  UPDATE stock_br1 SET unit_price='$100' WHERE unit_price='$1';
    34
    35  MERGE INTO stock T
    36  USING (SELECT * FROM stock_br1) AS S
    37  ON(T.stock_num=S.stock_num)
    38  WHEN NOT MATCHED THEN
    39  INSERT(T.stock_num,T.manu_code,T.description,T.unit_price,
    40         T.unit,T.unit_descr)
    41  VALUES(S.stock_num,S.manu_code,S.description,S.unit_price,
    42         S.unit,S.unit_descr);
    43
    44	SELECT * FROM stock WHERE stock_num=414;
    45	STOP VIOLATIONS TABLE FOR stock;

The code in Listing 8 creates an identical table stock_br1 as stock. The target table stock is altered and a constraint is added to the unit_price. The violation and diagnostic table is started. When you try to merge at line 21, you get the result 0 rows merged because the check constraint is violated. This is recorded in the violation and diagnostic tables. The merge becomes successful at line 35, because you don’t violate any constraint there.


Defining triggers

Multiple triggers can be defined on source and target tables. But during the merge operation, only insert, update, and delete triggers are actionable. The select triggers are ignored. The instead of trigger on the target view returns a 26096 error.

Because merge is a combination of insert, update, and delete statements, the insert, update, and delete triggers are activated before and after events. The for-each-row trigger is activated for each row that is inserted, updated, or deleted. Therefore during merge-insert operation, only for-each-row insert triggers on the target table are activated. Similarly, during merge-update and merge-delete operations, only for-each-row update or for-each-row delete triggers on the target table are activated. Listing 9 show the merge statement with triggers.

Listing 9. Merge with triggers
 DATABASE stores_demo;
 CREATE TABLE stock_br1
          (
          stock_num               smallint,
          manu_code               char(3),
          description             char(15),
          unit_price              money(6),
          unit                    char(4),
          unit_descr              char(15),
         primary key (stock_num, manu_code)
           );
  create table tmp_tb(col1 char(50));

  CREATE TRIGGER upd_up UPDATE OF unit_price
    ON stock
  FOR EACH ROW(insert into tmp_tb values('UPDATE FOR EACH ROW'));

  CREATE TRIGGER upsert2 UPDATE OF unit_price
  ON stock
  BEFORE(insert into tmp_tb values('UPDATE BEFORE'))
  AFTER(insert into tmp_tb values('UPDATE AFTER'));

  CREATE TRIGGER ins_up INSERT
  ON stock
  FOR EACH ROW(insert into tmp_tb values('INSERT FOR EACH ROW'));

  CREATE TRIGGER upsert1 INSERT
  ON stock
  BEFORE(insert into tmp_tb values('INSERT BEFORE'))
  AFTER(insert into tmp_tb values('INSERT AFTER'));


  INSERT INTO stock_br1 VALUES(414,"HRO","football shoes",'$100',
         "each","case");

  INSERT INTO stock_br1 VALUES(415,"HRO","football jsey",'$150',
         "each","case");

  MERGE INTO stock T
  USING (SELECT * FROM stock_br1) as S
  ON(T.stock_num=S.stock_num)
  WHEN MATCHED THEN UPDATE SET T.unit_price='$1000'
  WHEN NOT MATCHED THEN
  INSERT(T.stock_num,T.manu_code,T.description,T.unit_price,
         T.unit,T.unit_descr)
  VALUES(S.stock_num,S.manu_code,S.description,S.unit_price,
         S.unit,S.unit_descr);

  SELECT * FROM tmp_tb;

The code in Listing 9 creates a similar source table stock_br1 as Listing 8 did. In Listing 9, four triggers are created on the stock target table for each row as follows:

  • Update
  • Insert before
  • Insert after
  • Insert update

The merge statements insert only two rows in the stock table, but this triggers 3 events: upsert1, upsert2, and ins_up. Even if only inserts are occurring, all update events are also triggered.

Listing 10 shows the output of the SQL statement in the last line.

Listing 10. Output of SQL statement
INSERT BEFORE
UPDATE BEFORE
INSERT FOR EACH ROW
INSERT FOR EACH ROW
INSERT AFTER
UPDATE AFTER

Using views

Views can be a source or target for merge statements, but as with inserts, updates, or deletes, you need the required privileges. You can also use views inside the using clause of the merge statement. The trigger instead of is not supported on views if the target table is a view.

Listing 11 shows the merge statement with views.

Listing 11. The merge statement with views
DATABASE stores_demo;
CREATE VIEW
        custview
          (firstname, lastname, company, city)
    AS
    SELECT fname, lname, company, city
    FROM customer
    WHERE city = 'Redwood City'
    WITH CHECK OPTION;

GRANT DELETE, INSERT, SELECT, UPDATE
    ON custview
    TO PUBLIC;

CREATE TABLE s_view(
    firstname   char(15),
    lastname    char(15),
    company char(20),
    city    char(15)
    );

INSERT INTO s_view VALUES ("Brian","Cahill","NIKE","Redwood City");
INSERT INTO s_view VALUES ("Chris","Hong","ADIDAS","Redwood City");
INSERT INTO s_view VALUES ("Kevin","Mills","REEBOK","Redwood City");
INSERT INTO s_view VALUES ("Anthony","Higgins","Play Ball!",
                           "Redwood City");
INSERT INTO s_view VALUES ("Donald","Quinn","Quinn's Sports",
                           "Redwood City");

MERGE INTO custview T
USING (select * from s_view) as S
ON(T.firstname=S.firstname)
WHEN MATCHED THEN UPDATE
SET company="NIKE"
WHEN NOT MATCHED THEN
INSERT(T.firstname,T.lastname,T.company,T.city)
VALUES(S.firstname,S.lastname,S.company,S.city);

SELECT * from custview;

In Listing 11, you are creating a view and a table from where you want to merge the data. The merge statement updates two rows (for Anthony and Donald, because they match the update condition) and inserts three rows that are not in the target table. The output from running Listing 11 is shown in Listing 12.

Listing 12. The output of running the merge statement with views
firstname       lastname        company              city            

Anthony         Higgins         NIKE                 Redwood City   
Donald          Quinn           NIKE                 Redwood City   
Roy             Jaeger          AA Athletics         Redwood City   
Frank           Albertson       Sporting Place       Redwood City   
Arnold          Sipes           Kids Korner          Redwood City   
Brian           Cahill          NIKE                 Redwood City   
Chris           Hong            ADIDAS               Redwood City   
Kevin           Mills           REEBOK               Redwood City   

8 row(s) retrieved.

Supporting a distributed scenario

The merge statement supports distributed queries with the restriction that the target table cannot be a remote table. You can have distributed queries inside the using clause that access databases on more than one server instance. The merge statement also enables remote table, synonym, and views references in the subqueries of the set clause and in the subqueries used inside the values clause.

On server instance r_server, create the source table and insert data, as shown in Listing 13.

Listing 13. Merge statements in distributed scenarios
DATABASE stores_demo;
CREATE TABLE stock_br1
        (
        stock_num               smallint,
        manu_code               char(3),
        description             char(15),
        unit_price              money(6),
        unit                    char(4),
        unit_descr              char(15),
       primary key (stock_num, manu_code)
         );

INSERT INTO stock_br1 VALUES(414,"HRO","football shoes",
'$100',"each","case");

On the server with the target table, run the merge statements, as shown in Listing 14.

Listing 14. Running the merge statements in distributed scenarios
DATABASE stores_demo;
MERGE INTO stock T
USING (SELECT * FROM r_server@stores_demo:stock_br1) as S
ON(T.stock_num=S.stock_num)
WHEN NOT MATCHED THEN
INSERT(T.stock_num,T.manu_code,T.description,
T.unit_price,T.unit,T.unit_descr)
VALUES(S.stock_num,S.manu_code,S.description,
S.unit_price,S.unit,S.unit_descr);

The merge statement in Listing 14 inserts data from the remote table as a source table.


Handling label-based access control

The merge statement can be applied on label-based access control (LBAC) tables. LBAC write access rules are enforced while merging into the target table. LBAC read access rules are enforced while reading from both source and target tables. Following are different ways the merge statement can be used on LBAC tables.

Both source and target tables are protected by the same security policy.
If the IDSSECURITYLABEL column is not specified in the insert column list of the merge table, the user’s label is implicitly inserted. If the IDSSECURITYLABEL column is one of the columns in the insert column list, an explicit label is inserted into the target table.

As LBAC rules are enforced, the user performing the merge must have the appropriate label, otherwise the user must be granted exemptions so that rows of all labels can be successfully updated or inserted.

The source table is protected, and the target table is unprotected.
Rows that go into the target lose their protection, because the target table is unprotected. LBAC read access rules are enforced while reading from source table, however.
The target table is protected, and the source table is unprotected.
A default label can be specified for all rows inserted into the target table. The user performing the merge must have exemptions granted or the appropriate label granted to perform the merge successfully.

A label can be inserted explicitly or implicitly. If the user performing the merge is granted exemption on LBAC write access rules, the entire merge operation is likely to succeed because the user has exemption to modify and insert rows at any label. When the user has exemptions, rows inserted into the target table can have implicit or explicit labels. If the label is not explicitly specified in the insert column value list, the row inserted has the label of the user.

On the other hand, if the user doesn’t have any exemptions but labels are granted, the success of the merge operation depends on which rows from source table qualified to join with target table rows and whether any updated or inserted rows are at the user’s label. If some rows have labels that are different than the user’s labels and qualify for an update, the merge operation might fail due to lack of appropriate LBAC credentials. Listing 15 demonstrates this.

In Listing 15, c1 is an IDSSECURITYLABEL column. The user has exemptions as described above, he has no label granted. All inserted rows will have the same label as the source table rows.

Listing 15. Merge on LBAC tables and columns
MERGE INTO target t
USING source s
ON s.c2=t.c2
WHEN MATCHED THEN
UPDATE SET t.c3="updated"
WHEN NOT MATCHED THEN
INSERT (t.c1, t.c2, t.c3) VALUES (s.c1, s.c2, "inserted");

If the user doesn’t have the exemptions, Informix returns an error when Listing 15 is executed, as shown in Listing 16.

Listing 16. Error message for missing LBAC credentials
MERGE INTO target t
USING source s
on s.c2=t.c2
WHEN MATCHED THEN
UPDATE set t.c3="updated"
WHEN NOT MATCHED THEN
INSERT (t.c1, t.c2, t.c3) values (s.c1, s.c2, "inserted");
 8249: User does not have the LBAC credentials 
to perform UPDATE on table (sqlqa.target).
Error in line 12
Near character position 55

Consider another example where the rows in the source and target tables are engineer and manager labels. Assume that the manager label dominates the engineer label. If the user performing the merge has an engineer label, the only qualified rows for insert or update will be at the engineer label level, and the merge operation succeeds. If the user performing the merge has a manager label, a few qualified rows will be at manager label level and a few other rows will be at the engineer label level. Clearly a user at manager level cannot modify rows at engineer level, and the merge operation fails.


Auditing the merge statement

There is no separate audit mnemonic to audit the merge statement. But if you set up the audit mask for insert, update, and delete events (INRW, UPRW, and DLRW), those events get logged when the merge statement is executed.


Specifying directives

All directives specified in the merge statement are applied to the join between source and target tables. The explain directive is one exception that is applied to the merge statement in general. Some of the directives that are applicable include the following:

  • USE_NL
  • USE_HASH
  • AVOID_HASH
  • FULL
  • AVOID_FULL
  • INDEX
  • ORDERED

The noexecute directive can be specified to get the explain output without actually executing the merge statement. When a directive cannot be applied, the appropriate message appears in the explain output.

The Informix optimizer might not honor some of the directives that you specify. Most commonly, when both update and insert clauses are specified in a merge statement, an outer join is performed between the source and target tables. Similarly, if only an insert clause is specified, an outer join is performed between source and target tables. In this outer join, the source table is the dominant table whose rows are preserved. The target table, which is null extended, is the subservient table. Informix doesn’t allow the subservient table in an outer join to be on the probe side of a hash-join or on the outer side of a nested loop join. In Listing 17, the optimizer does not honor the use_hash directive because the source table (dominant table) is specified to be on the build side.

Listing 17. Merge with directives
MERGE {+ USE_HASH(CT/BUILD) } INTO customer as C
USING customer_trans as CT
ON C.customer_num = CT.customer_num
WHEN MATCHED THEN UPDATE SET C.phone = CT.phone
WHEN NOT MATCHED THEN INSERT VALUES (CT.customer_num,
 CT.fname, CT.lname, CT.company, CT.address1, 
CT.address2, CT.city, CT.state, CT.zipcode, CT.phone)


QUERY: 
------
MERGE {+ USE_HASH(CT/BUILD) } INTO customer as C
USING customer_trans as CT
ON C.customer_num = CT.customer_num
WHEN MATCHED THEN UPDATE SET C.phone = CT.phone
WHEN NOT MATCHED THEN INSERT VALUES (CT.customer_num, 
CT.fname, CT.lname, CT.company, CT.address1, CT.address2,
 CT.city, CT.state, CT.zipcode, CT.phone)


DIRECTIVES FOLLOWED:
DIRECTIVES NOT FOLLOWED:
USE_HASH ( ct/BUILD ) 
Join directives not compatible with Outerjoin nesting.

Estimated Cost: 4
Estimated # of Rows Returned: 1
Temporary Files Required For: MERGE

  1) informix.ct: SEQUENTIAL SCAN

  2) informix.c: INDEX PATH

    (1) Index Name: informix. 100_1
        Index Keys: customer_num   (Serial, fragments: ALL)
        Lower Index Filter: 
informix.c.customer_num = informix.ct.customer_num
NESTED LOOP JOIN

There are no restrictions when the merge statement contains only an update clause. In this case, an inner join is performed between the source and target tables, and the matched rows updated. In an inner join, the source and target tables can be on either side of a hash-join or in a nested-loop join.


Using the merge statement to perform an update and delete join

In an update join, an update is performed based on the join condition between the target table and another table. Similarly for a delete join, rows from the target table are deleted based on the join condition. Informix does not support specific syntax for update or delete joins, but the same can be achieved with a special use of the merge statement.

If the merge statement contains only the when matched then clause, the matched rows in the target table are updated or deleted. The unmatched rows from the source table are ignored. By performing only updates or deletes, this accomplishes the same function as update or delete joins. Listing 18 shows how an inner join is performed between the source and target tables, and there are no restrictions on the kind of plan that can be used.

Listing 18. Update and delete join
MERGE {+ FULL(customer) } INTO customer as C
USING customer_trans as CT
ON C.customer_num = CT.customer_num
WHEN MATCHED THEN UPDATE SET C.phone = CT.phone
WHEN NOT MATCHED THEN INSERT VALUES 
(CT.customer_num, CT.fname, CT.lname, CT.company, 
CT.address1, CT.address2, CT.city, CT.state, CT.zipcode, CT.phone)
~
QUERY: 
------
MERGE {+ FULL(customer) } INTO customer as C
USING customer_trans as CT
ON C.customer_num = CT.customer_num
WHEN MATCHED THEN UPDATE SET C.phone = CT.phone
WHEN NOT MATCHED THEN INSERT VALUES (CT.customer_num, 
CT.fname, CT.lname, CT.company,
CT.address1, CT.address2, CT.city, 
CT.state, CT.zipcode, CT.phone)


DIRECTIVES FOLLOWED:
FULL ( customer )
DIRECTIVES NOT FOLLOWED:

Estimated Cost: 4
Estimated # of Rows Returned: 1

  1) informix.ct: SEQUENTIAL SCAN

  2) informix.c: SEQUENTIAL SCAN


DYNAMIC HASH JOIN
    Dynamic Hash Filters: informix.c.customer_num = 
informix.ct.customer_num

Listing 19 shows the delete join. In this example, the merge statement specifies the following:

  • Stock as the target table
  • Manufact as the source table
  • A join condition in the on clause
  • The delete clause to delete the stock rows with incorrect manufacturer codes
Listing 19. A merge and delete join
MERGE INTO stock 
USING manufact
ON stock.manu_code != manufact.manu_code
WHEN MATCHED THEN DELETE;

Porting an update and delete join from extended parallel server

The extended parallel server (XPS) enables users to use a join on tables for certain columns to be updated. A user can use in the set clause columns from any table that they list in the from clause to specify values for the columns and rows to update, as shown in Listing 20.

Listing 20. XPS update join
UPDATE stock SET 
stock.description = stock.description||manufact.manu_code 
FROM stock, manufact 
WHERE stock.manu_code = manufact.manu_code;

You can achieve similar results using the merge statement as shown in Listing 21.

Listing 21. Informix server update join using the merge statement
MERGE INTO stock 
USING manufact
ON stock.manu_code = manufact.manu_code
WHEN MATCHED THEN UPDATE SET 
stock.description = stock.description||manufact.manu_code ;

Listing 22 shows another XPS update join example.

Listing 22. Another XPS update join
UPDATE stock SET 
stock.description = stock.description||manufact.manu_code  
FROM stock, manufact, items 
WHERE stock.manu_code = manufact.manu_code and 
items.manu_code = stock.manu_code ;

The equivalent way to use the merge statement in Informix server to accomplish the same results as the XPS update join from Listing 22 is shown in Listing 23.

Listing 23. Equivalent update join in IDS using the merge statement
MERGE INTO stock 
USING (select manufact.manu_code as manu_code, 
FROM items, manufact 
WHERE items.manu_code = manufact.manu_code) source
ON stock.manu_code = source.manu_code
WHEN MATCHED THEN SET 
stock.description = stock.description||source.manu_code ;

Similarly, for delete statements in XPS, you can delete information from a table based on information contained in one or more other tables. You can specify the using keyword or a second from keyword to introduce the list of tables that you want to join using the where clause, as shown in Listing 24.

Listing 24. XPS delete join
DELETE FROM stock  USING stockdiscount, stock WHERE
 stockdiscount.unit_discount < 5 and 
stock_discount.stock_num = stock.stock_num;

The equivalent way to use the merge statement in Informix server to accomplish the same results as the XPS delete join from Listing 24 is shown in Listing 25.

Listing 25. Equivalent delete join in IDS using the merge statement
MERGE INTO stock 
USING (select stock_num from stockdiscount where 
unit_discount < 5) source
ON stock.stock_num = source.stock_num
WHEN MATCHED THEN DELETE;

Remembering additional restrictions

There are certain restrictions that the merge statement enforces, and violating those will result in an error. Following are some of the restrictions, assuming target table is the table where the data will be merged.

  • The target cannot be a typed table in the same table hierarchy as the target table.
  • The target cannot be a virtual table interface (VTI) table.
  • The target cannot be in a database of a remote IDS instance.
  • The target cannot be a system catalog table or an external table.
  • The target cannot be a view on which an instead of trigger is defined.
  • The target cannot be a read-only view.
  • The target cannot be a pseudo-table, which is a memory-resident object in a system database, such as the sysmaster or sysadmin databases.
  • The target cannot be a data source of any subquery of the same merge statement, including subqueries on any of the following clauses:
    • on
    • set
    • values
    • Any clause of a select statement within the merge statement

Optimizing performance

The merge statement executes with in the server entirely, so it performs better than other methods used to perform the same task. However, if the performance is not satisfactory, you can capture the explain output to investigate the possible reason. It is possible that the access method used to scan the source and target tables and the join method being used to perform the join might not be the right ones.

Directives are a low-cost way to try alternatives if the performance is not satisfactory. Experiment by specifying directives to change the access method or join methods. If the join method being used is nested-loop, specify the directive use_hash so that hash joins are chosen.

To illustrate, a customer reported that a particular merge statement (with 22000 source table rows and 9 million target table rows) was taking around 4 hours to finish. The explain output showed that IDS was using a hash-join plan. By applying directives and forcing IDS to use a nested-loop plan (with index on target table) the query finished in less than a minute. The IDS optimizer tries to come up with the right plan in most cases. If you encounter a case where the optimizer generated a sub-optimal plan, directives can be used to tweak response times of alternative plans to select the right plan.

Try creating indexes on source and target tables so that different plans are chosen.

Experiment with using a full table scan or using the index to scan a particular table.


Conclusion

The merge statement is a very handy tool in warehouse-based systems. You can easily change application logic into a simple merge statement to achieve the same results. This article describes basic usage of merge statements and describes usage with other features of Informix server, including constraints, triggers, views, and audit.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=623841
ArticleTitle=Use the MERGE statement in Informix Dynamic Server
publish-date=02102011