Load and import error checking for ETL users in DB2 for Linux, UNIX, and Windows

Checking load and import operations in a partitioned database environment

This article demonstrates how to check LOAD and IMPORT operations through ETL calls to SYSPROC.ADMIN_CMD in a DB2® for Linux®, UNIX®, and Windows® database partitioning (DPF) environment. Using ETL calls, you can accomplish error checking with more control and precision than most ETL tools provide.

Alex Levy (alex@sustainablesoftware.net), Consultant, Sustainable Software Ltd.

Alex Levy photoAlex Levy is an independent DB2 consultant for Sustainable Software Ltd. His interests include expanding the adoption of DB2 within the NGO and small-to-medium business sectors.



11 July 2013

Also available in Chinese Russian

Introduction

Why is this necessary? Many ETL tools are generic and database-agnostic. They are constrained by use of Java™ to run all utilities either through a call to stored procedure SYSPROC.ADMIN_CMD, or to use vendor extensions like the SAP or BODS bulk loader. These extensions are effectively unpublished APIs and often rely on inefficient commitcount and warningcount processing. They are also not very good at pinpointing the exact data error in the input. Further, some ETL tools are also constrained by their internal design to use IMPORT in a serial row-by-row insert fashion without buffered inserts, either because that is the only way they can catch errors or because they wish to allow write access to the target table while it is being updated.

Reporting and error handling is actually much clearer, less complex and less verbose from the command line interface, not least because you can test the operating system return code from the utility rather than that from the stored procedure.

DB2's LOAD utility offers the same and often improved functionality as IMPORT for reporting and error handling. Given that LOAD is unlogged, handles its own parallelism and is much faster than IMPORT at volume, the conclusion is that ETL operations can easily be enhanced to error-check and to replace IMPORT by LOAD wherever possible, and certainly for bulk data.

The examples that follow will use the ADMIN_CMD procedure in a 5 partition database (one catalog and 4 data partitions), and they demonstrate how to check for errors by using simple SQL calls.

Test setup

The examples use a simple table defined and populated as shown in Listing 1:

Listing 1. Defining populating the test table
-- Demonstrate LOAD and IMPORT options. Create a simple test table of 100 
rows (approx 25 rows per partition) and populate:

set current schema foo;
create table bar (
  col1 smallint not null,col2 char(1) not null, some_date date not null)
  distribute by hash(col1) in s_fact4k1;

-- Add a unique constraint so we can later simulate invalid data on load or 
import files.
alter table foo.bar add constraint bar_u1 unique (col1);

-- create an exceptions table for the load, if needed;
-- this is only for constraint violations.
create table bar_exception like bar distribute by hash(col1) in s_fact4k1;

insert into bar
  with units(unit) as (values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)),
     tens(ten) as (select unit * 10 from units)
select ten + unit, 'Y', current date
  from tens, units;

-- Another benefit of load: create a sampled statistics profile for use 
with the load command; this only works in load replace mode. 
runstats on table foo.bar with distribution on all columns and detailed 
  indexes all tablesample system(10) set profile only;

-- create a load/import file 
--(the order by clause is not required but makes the file easier to read)
export to /tmp/foo.bar.del of del select * from foo.bar order by col1;

Scenario 1 — Normal running

Listing 2 shows the output from IMPORT in a normal run.

Listing 2. S1.1 Output from IMPORT
set current schema foo
DB20000I  The SQL command completed successfully.
delete from bar
DB20000I  The SQL command completed successfully.
call sysproc.admin_cmd('import from /tmp/foo.bar.del of del allow write access messages
on server insert into foo.bar')
  Result set 1
  --------------
  ROWS_READ            ROWS_SKIPPED         ROWS_INSERTED        ROWS_UPDATED        
ROWS_REJECTED        ROWS_COMMITTED       MSG_RETRIEVAL
MSG_REMOVAL
  -------------------- -------------------- -------------------- ------------------
-- -------------------- -------------------- --------------------------------------
100                    0                  100                    0
0                  100 SELECT SQLCODE, MSG FROM 
TABLE(SYSPROC.ADMIN_GET_MSGS('71481989_DB2INST1')) AS MSG
CALL SYSPROC.ADMIN_REMOVE_MSGS('71481989_DB2INST1')   

  1 record(s) selected  
  Return Status = 0

This is self-explanatory, but there is one catch: the return status of 0 is the return from the call to sysproc.admin_cmd, not the status of the actual IMPORT itself. The zero means "I ran without error whether or not the parameter command I executed actually succeeded." To test if there are errors or rejections on the import itself, you will need to do 2 things:

  1. Test the operating system return code.
  2. Query the result set to compare ROWS_READ with ROWS_INSERTED, or test ROWS_REJECTED.

Alternately get the "bottom line" message (this may also be useful for application logging):

Listing 3. Testing for errors or rejections on the import
$ db2 –x " select substr(msg,1,254) from 
TABLE(SYSPROC.ADMIN_GET_MSGS('71481989_DB2INST1')) AS MSG where sqlcode = 
'SQL3149N'"

"100" rows were processed from the input file.  "100" rows were successfully 
inserted into the table.  "0" rows were rejected.

One further observation: the table function SYSPROC.ADMIN_GET_MSGS returns a table of SQLCODEs from each step of the IMPORT. It does NOT return the overall SQLCODE which you would expect to be zero. So if you query the table function you get:

Listing 4. Querying the table function ADMIN_GET_MSGS
$ db2 " select dbpartitionnum, sqlcode, substr(msg,1,254) from 
TABLE(SYSPROC.ADMIN_GET_MSGS('71481989_DB2INST1')) AS MSG"
DBPARTITIONNUM SQLCODE   3
-------------- --------- -------
             - SQL3109N  The utility is beginning to load data from file
                         "/db2/landing/foo.bar.del".
             - SQL3110N  The utility has completed processing.  
                         "100" rows were read from the input file.
             - SQL3221W  ...Begin COMMIT WORK. Input Record Count = "100".
             - SQL3222W  ...COMMIT of any database changes was successful.
             - SQL3149N  "100" rows were processed from the input file.  
                         "100" rows were successfully inserted into the table.
                         "0" rows were rejected.

Now let's compare this to the output from a successful LOAD.

S1.2 Output from LOAD

You will receive 2 result sets. Result set 1 is the output from the splitting and partitioning of the load file, which is the step that actually precedes the data being placed into the table. Result set 2 is the data placement itself.

Listing 5. LOAD output - Result set 1
call sysproc.admin_cmd('load from /db2/landing/foo.bar.del of del messages on 
server insert into foo.bar nonrecoverable without prompting')
  Result set 1
  --------------

  ROWS_READ            ROWS_SKIPPED         ROWS_LOADED          ROWS_REJECTED     
ROWS_DELETED         ROWS_COMMITTED       ROWS_PARTITIONED     
NUM_AGENTINFO_ENTRIES MSG_RETRIEVAL
MSG_REMOVAL
  -------------------- -------------------- -------------------- ------------------
-- -------------------- -------------------- -------------------- -----------------
---- ------------------------------------------------------------------------------
----------------------------------
                   100                    -                    -                   
 0                    -                    -                  100                  
 6 SELECT DBPARTITIONNUM, AGENTTYPE, SQLCODE, MSG FROM 
TABLE(SYSPROC.ADMIN_GET_MSGS('548225794_DB2INST1')) AS MSG
CALL SYSPROC.ADMIN_REMOVE_MSGS('548225794_DB2INST1')

Therefore at this stage, ROWS_LOADED and ROWS_COMMITTED will always be NULL, not zero. Note also that there are 6 agents: one to coordinate, one to do the partitioning or splitting, and 4 further agents, one per data partition.

Listing 6. LOAD output - Result set 2
Result set 2
  --------------
  DBPARTITIONNUM SQLCODE     TABSTATE             AGENTTYPE
  -------------- ----------- -------------------- --------------------
               1           0 NORMAL               LOAD
               2           0 NORMAL               LOAD
               3           0 NORMAL               LOAD
               4           0 NORMAL               LOAD
               0           0 NORMAL               PARTITIONING
               0           0 NORMAL               PRE_PARTITIONING

  6 record(s) selected.

  Return Status = 0

Now notice that here you actually get the ‘real’ SQLCODE! The test for a clean load is simply that each partition, the pre-partitioning and partitioning agents all return a zero SQLCODE.

You still get the step by step SQLCODEs as well, by following the call to SYSPROC.ADMIN_GET_MSGS. The list of "OK" step SQLCODEs is therefore simply:

Listing 7. Fetching SQLCODEs from ADMIN_GET_MSGS
$ db2 "SELECT  distinct SQLCODE FROM 
TABLE(SYSPROC.ADMIN_GET_MSGS('548225794_DB2INST1')) AS MSG where sqlcode is not 
null order by 1 for fetch only"

SQLCODE
---------
SQL27903I
SQL27910I
SQL27914I
SQL27920I
SQL27921I
SQL27935I
SQL27936I
SQL27937I
SQL27939I
SQL27950I
SQL3109N
SQL3110N
SQL3213I
SQL3500W
SQL3501W
SQL3515W
SQL3519W
SQL3520W

  18 record(s) selected.

Scenario 2 — Duplicate errors

We'll now introduce a duplicate into the load file that would break the unique constraint.

Listing 8. Introducing an error
 set current schema foo ;
-- clear down table for IMPORT
truncate table foo.bar immediate;

-- introduce some invalid (duplicate) data into the import/load file
!echo "99,\"Y,\"20110527" >> /tmp/foo.bar.del ;
-- import, with messages
call sysproc.admin_cmd('import from /tmp/foo.bar.del of del allow write access 
messages on server insert into foo.bar');

-- clear down table again, this time for LOAD
truncate table foo.bar immediate;

-- A load which will also capture the duplicate data in an exceptions table:
call sysproc.admin_cmd('load from /tmp/foo.bar.del of del
     messages on server
     insert into foo.bar
     FOR EXCEPTION FOO.BAR_EXCEPTION nonrecoverable without prompting');

select * from foo.bar_exception;

The next listing shows the output:

Listing 9. Output from IMPORT after introduction of an error
  Result set 1
  --------------

  ROWS_READ            ROWS_SKIPPED         ROWS_INSERTED        ROWS_UPDATED
ROWS_REJECTED        ROWS_COMMITTED       MSG_RETRIEVAL
MSG_REMOVAL
  -------------------- -------------------- -------------------- ------------------
-- -------------------- -------------------- --------------------------------------
101                    0                  100                    0                  
1                  101 SELECT SQLCODE, MSG FROM 
TABLE(SYSPROC.ADMIN_GET_MSGS('2021311734_DB2INST1')) AS MSG
CALL SYSPROC.ADMIN_REMOVE_MSGS('2021311734_DB2INST1')

  1 record(s) selected.
  Return Status = 0
SQL3107W  There is at least one warning message in the message file.

This is very clear. As before the return status from SYSPROC.ADMIN_CMD is 0, but ROWS_REJECTED shows 1, ROWS_READ is 101, and ROWS_INSERTED is just 100. But wait! Can you explain why ROWS_COMMITTED is 101? The answer is below.

Retrieving messages from the generated command does pinpoint the error.

Listing 10. Output from IMPORT after introduction of an error
SQL3148W  A row from the input file was not inserted into the table.  SQLCODE
"-803" was returned.
SQL0803N  One or more values in the INSERT statement, UPDATE statement, or
foreign key update caused by a DELETE statement are not valid because the
primary key, unique constraint or unique index identified by "1" constrains
table "FOO.BAR" from having duplicate values for the index key.
SQLSTATE=23505     
SQL3185W  The previous error occurred while processing data from row "101" of
the input file.
SQL3110N  The utility has completed processing.  "101" rows were read from the
input file.
SQL3221W ...Begin COMMIT WORK. Input Record Count = "101".
SQL3222W ...COMMIT of any database changes was successful.
SQL3149N  "101" rows were processed from the input file.  "100" rows were
successfully inserted into the table.  "1" rows were rejected.

This highlights a key difference between IMPORT and LOAD. IMPORT will attempt to load every record on file without checking it first, then recover from an error if one occurs. LOAD is more sophisticated:

Listing 11. Output from LOAD after introduction of an error
call sysproc.admin_cmd('load from /tmp/foo.bar.del of del WARNINGCOUNT 0 messages 
on server insert into foo.bar FOR EXCEPTION FOO.BAR_EXCEPTION nonrecoverable 
without prompting')

  Result set 1
  --------------

  ROWS_READ            ROWS_SKIPPED         ROWS_LOADED          ROWS_REJECTED        
ROWS_DELETED         ROWS_COMMITTED       ROWS_PARTITIONED     
NUM_AGENTINFO_ENTRIES MSG_RETRIEVAL
MSG_REMOVAL
  -------------------- -------------------- -------------------- ------------------
-- -------------------- -------------------- --------
                   101                    -                    -
0                    -                    -                  101
6 SELECT DBPARTITIONNUM, AGENTTYPE, SQLCODE, MSG FROM
 TABLE(SYSPROC.ADMIN_GET_MSGS('1672226768_DB2INST1')) AS MSG 
CALL SYSPROC.ADMIN_REMOVE_MSGS('1672226768_DB2INST1')

Now why is nothing rejected? Because this is the output from the pre-partitioning step and no partitioning has yet taken place — the duplicate record is valid in isolation. Note that none of the columns say ROWS_LOADED because none have yet been loaded. Turning to the second result set:

Listing 12. Output from LOAD after introduction of an error -- Result set 2
  Result set 2
  --------------                

  DBPARTITIONNUM SQLCODE     TABSTATE             AGENTTYPE
  -------------- ----------- -------------------- --------------------
               1           0 NORMAL               LOAD
               2           0 NORMAL               LOAD
               3           0 NORMAL               LOAD
               4           0 NORMAL               LOAD
               0           0 NORMAL               PARTITIONING
               0           0 NORMAL               PRE_PARTITIONING

  6 record(s) selected.

So where is the evidence of a rejected record? In this scenario, you look in the exceptions table; and you look for SQLCODEs that do not fit the acceptable SQLCODE patterns from S1.2. This is easy to do in SQL. This query template is exactly the same for each load — all that needs change is the argument to ADMIN_GET_MSGS:

Listing 13. Looking for SQLCODEs that don't fit acceptable patterns
select  dbpartitionnum, agenttype, sqlcode, substr(msg,1,254) message
from    table(sysproc.admin_get_msgs('1672226768_DB2INST1')) msg
            left outer join
        foo.load_sqlcode fl
            on msg.sqlcode = fl.ok_sqlcode
where   fl.ok_sqlcode is null
and     msg.sqlcode is not null
order   by 1
;

DBPARTITIONNUM AGENTTYPE SQLCODE   MESSAGE
-------------- --------- --------- ----------------------------------------------
---------------------------------------------------------------------------------
----------------------------------------------------------------------------------
---------------------------------------------
            3 load      SQL3509W  The utility has deleted "1" rows from the table.


$ db2 "select * from foo.bar_exception"

COL1   COL2 SOME_DATE
------ ---- ----------
    99 Y    05/27/2011

  1 record(s) selected.

Scenario 3 — Invalid data

This example shows what happens when you introduce invalid data into the import/load file.

Listing 14. Introducing invalid data
-- - 1 row with an out of range numeric value ( > smallint)
-- - 1 row with a strapline from Cato instead of a date
-- - 1 row with a correctly formatted date but an invalid value of 32nd February
!echo "32768,\"N\",20110531" >> /tmp/foo.bar.del ;
!echo "32767,\"N\",\"Ceterum censeo Carthaginem esse delendam.\"" >> /tmp/foo.bar.del ;
!echo "32767,\"N\",20110232" >> /tmp/foo.bar.del ;

delete from bar
DB20000I  The SQL command completed successfully.

S3.1 IMPORT with invalid data

Listing 15. Output from IMPORT after introduction of an invalid data
call sysproc.admin_cmd('import from /tmp/foo.bar.del of del allow write access 
messages on server insert into foo.bar')

  Result set 1
  --------------

  ROWS_READ            ROWS_SKIPPED         ROWS_INSERTED        ROWS_UPDATED
ROWS_REJECTED        ROWS_COMMITTED       MSG_RETRIEVAL
MSG_REMOVAL
  -------------------- -------------------- -------------------- ------------------
-- -------------------- -------------------- --------                   103
 0                  100                    0                    3                  
103 SELECT SQLCODE, MSG FROM TABLE(SYSPROC.ADMIN_GET_MSGS('245743373_DB2INST1')) AS 
MSG
CALL SYSPROC.ADMIN_REMOVE_MSGS('245743373_DB2INST1')

  1 record(s) selected.

  Return Status = 0

SQL3107W  There is at least one warning message in the message file.

This is correct. Three records have been rejected, but note that the IMPORT utility did not pre-validate the records. It simply attempted to insert them serially and then recovered from each SQL error. If we follow through on the call to sysproc.admin_get_msgs, you will see that each error is identified over several messages, but that the identity of the invalid row (that is, its position in the input file) might be in any of the messages for that error and is sometimes duplicated — it depends on the error itself and this aspect of reporting is not consistently presented.

Listing 16. Output from IMPORT after introduction of an error
SQL3118W  The field value in row "101" and column "1" cannot be converted to a
SMALLINT value, but the target column is not nullable.  The row was not
loaded. 
SQL3128W  The field containing ""Ceterum censeo Carthaginem es" in row "102"
and column "3" was truncated into a DATE field because the data is longer than
the database column.
SQL3148W  A row from the input file was not inserted into the table.  SQLCODE
"=180" was returned.
SQL0180N  The syntax of the string representation of a datetime value is
incorrect. SQLSTATE=22007
SQL3185W  The previous error occurred while processing data from row "102" of
the input file.
SQL3148W  A row from the input file was not inserted into the table.  SQLCODE
"-181" was returned.
SQL0181N  The string representation of a datetime value is out of range.
SQLSTATE=22007
SQL3185W  The previous error occurred while processing data from row "103" of
the input file.
SQL3110N  The utility has completed processing.  "103" rows were read from the
input file.            
SQL3149N  "103" rows were processed from the input file.  "100" rows were
successfully inserted into the table.  "3" rows were rejected.

The second thing to notice is that the message SQL3107W There is at least one warning message in the message file gets returned AFTER the return code from SYSPROC.ADMIN_CMD. This didn't happen with the duplicate row scenario. But it does mean the Java code can now test the operating system return code.

S3.2 LOAD with invalid data

For the load, notice the use of the dumpfile modifier to capture invalid input records:

Listing 17. Output from LOAD after introduction of invalid data
call sysproc.admin_cmd('load from /tmp/foo.bar.del of del modified by 
dumpfile=/tmp/humptydumpty messages on server insert into foo.bar nonrecoverable 
without prompting')


  Result set 1
  --------------

  ROWS_READ            ROWS_SKIPPED         ROWS_LOADED          ROWS_REJECTED
ROWS_DELETED         ROWS_COMMITTED       ROWS_PARTITIONED     
NUM_AGENTINFO_ENTRIES MSG_RETRIEVAL
MSG_REMOVAL
  -------------------- -------------------- -------------------- ------------------
-- -------------------- -------------------- -------------------- -----------------
102                    -                    -                    2
-                    -                  102                    33 SELECT 
DBPARTITIONNUM, AGENTTYPE, SQLCODE, MSG FROM 
TABLE(SYSPROC.ADMIN_GET_MSGS('721439627_DB2INST1')) AS MSG
CALL SYSPROC.ADMIN_REMOVE_MSGS('721439627_DB2INST1')

  1 record(s) selected.


  Result set 2
  --------------
  --------------

  DBPARTITIONNUM SQLCODE     TABSTATE             AGENTTYPE
  -------------- ----------- -------------------- --------------------
               1           0 NORMAL               LOAD
               2           0 NORMAL               LOAD
               3        3107 NORMAL               LOAD
               4           0 NORMAL               LOAD
               0        3107 NORMAL               PARTITIONING
               0           0 NORMAL               PRE_PARTITIONING

  6 record(s) selected.

  Return Status = 0

SQL3107W  There is at least one warning message in the message file.

There are some interesting differences here:

  • Only 102 records were loaded and two were rejected. What happened to the remaining invalid record? It was rejected by the partitioning agent before it could hit the table (as shown below).
  • The -3107 stands out like a sore thumb in result set 2.
  • A call to SYSPROC.ADMIN_GET_MSGS will identify the partition at which records were rejected, as does result set 2 above.
  • The row number of the rejected row is that in the partitioned load file, not the unitary file.
Listing 18. Call to SYSPROC.ADMIN_GET_MSGS
select  dbpartitionnum, agenttype, sqlcode, substr(msg,1,254) message
from    table(sysproc.admin_get_msgs('721439627_DB2INST1')) msg
            left outer join
        foo.load_sqlcode fl
            on msg.sqlcode = fl.ok_sqlcode
where   fl.ok_sqlcode is null
and     msg.sqlcode is not null
order   by 1
;

DBPARTITIONNUM AGENTTYPE SQLCODE   MESSAGE
-------------- --------- --------- ------------------------------------------------
             0 part      SQL27930N A record was rejected during partitioning with
reason  code "1" and partition-relative record number  "1".
             0 part      SQL6044N  The syntax of the string representation "32768" 
with value of datatype "SMALL INTEGER" and length "5" is not correct.
             3 load      SQL0180N  The syntax of the string representation of a 
datetime value is incorrect.  SQLSTATE=22007
             3 load      SQL0181N  The string representation of a datetime value is 
out of range.SQLSTATE=22007
             3 load      SQL3125W  The character data in row "4" and column "3" was 
truncated because the data is longer than the target database column.
             3 load      SQL3185W  The previous error occurred while processing data
 from row "4" of the input file.
             3 load      SQL3185W  The previous error occurred while processing data
 from row "5" of the input file.
             - -         SQL3107W  There is at least one warning message in the 
message file.

In the load command, we specified the dumpfile as /tmp/humptydumpty. DB2 appends the utility name and partition number to each file, thus:

Listing 19. Dump files
-rw-r-----  1 db2inst1   db2sygrp            21 May 31 11:28 humptydumpty.part.000
-rw-r-----  1 db2inst1   db2sygrp             0 May 31 11:28 humptydumpty.load.001
-rw-r-----  1 db2inst1   db2sygrp             0 May 31 11:28 humptydumpty.load.002
-rw-r-----  1 db2inst1   db2sygrp            73 May 31 11:28 humptydumpty.load.003
-rw-r-----  1 db2inst1   db2sygrp             0 May 31 11:28 humptydumpty.load.004

So the invalid data is readily identifiable by partition, in any non-zero length file:

Listing 20. Indentifying invalid data by partition
$ for i in $(ls -1 /tmp/humptydumpty*)
> do
> cat $i
> done
Listing 21. Results
32767,"N","Ceterum censeo Carthaginem esse delendam."
32767,"N",20110232
32768,"N",20110531

S4 — Other tips

Tip 1: Load file splitting

Splitting the file can be justified for transform and data scrub edits outside of DB2. Indeed I have used that technique extensively for iSeries data transforms into DB2 data warehouses. There is no efficient way to split a file using IMPORT, since the data is still in file format. One would need to load ALL the data into a short-life table and then query it by the a call to the dbpartitionnum() function.

However the LOAD utility automatically splits the file and loads the each split file. To stop it after the split and before the actual load, use the following syntax:

Listing 22. Stopping load after the split
call sysproc.admin_cmd('load from /tmp/foo.bar.del of del
                        modified by dumpfile=/tmp/humptydumpty
                        messages on server
                        insert into foo.bar
                        nonrecoverable without prompting
                        partitioned db config mode partition_only
                        part_file_location /db2/landing/db2utility_messages')

PART_FILE_LOCATION defaults to the current working directory if you do not specify it explicitly. The results:

Listing 23. Result set 1
  Result set 1
  --------------

  ROWS_READ            ROWS_SKIPPED         ROWS_LOADED          ROWS_REJECTED
ROWS_DELETED         ROWS_COMMITTED       ROWS_PARTITIONED
NUM_AGENTINFO_ENTRIES MSG_RETRIEVAL
MSG_REMOVAL
  -------------------- -------------------- -------------------- ------------------
-- -------------------- -------------------- --------------------
                     0                    -                    - 
0                    -                    -                  100
6 SELECT DBPARTITIONNUM, AGENTTYPE, SQLCODE, MSG FROM 
TABLE(SYSPROC.ADMIN_GET_MSGS('1679601175_DB2INST1')) AS MSG
CALL SYSPROC.ADMIN_REMOVE_MSGS('1679601175_DB2INST1')

  1 record(s) selected.

Notice that ROWS_READ is 0 and ROWS_LOADED is null. However ROWS_PARTITIONED is 100. In result set 2, observe that the AGENTTYPE is ‘LOAD_TO_FILE’, not ‘LOAD’.

Listing 24. Result set 2
Result set 2
  --------------

  DBPARTITIONNUM SQLCODE     TABSTATE             AGENTTYPE
  -------------- ----------- -------------------- --------------------
               1           0 NORMAL               LOAD_TO_FILE
               2           0 NORMAL               LOAD_TO_FILE
               3           0 NORMAL               LOAD_TO_FILE
               4           0 NORMAL               LOAD_TO_FILE
               0           0 NORMAL               PARTITIONING
               0           0 NORMAL               PRE_PARTITIONING

  6 record(s) selected.

  Return Status = 0

To load all the split files, simply run in LOAD_ONLY mode:

Listing 25. Loading the split files
call sysproc.admin_cmd('load from /tmp/foo.bar.del of del
                        modified by dumpfile=/tmp/humptydumpty
                        messages on server
                        insert into foo.bar
                        nonrecoverable without prompting
                        partitioned db config mode load_only
                        part_file_location /tmp/db2utility_messages')

Output is identical to the Scenario #1 load above. The only difference is that there are no partitioning and pre-partitioning agents at partition 0, because that work has already been done.

To load at just one partition, say partition 1, you add the OUTPUT_DBPARTNUMS parameter. You do of course lose all load parallelism this way:

Listing 26. Loading at just one partition
call sysproc.admin_cmd('load from /tmp/foo.bar.del of del
                        modified by dumpfile=/tmp/humptydumpty
                        messages on server
                        insert into foo.bar
                        nonrecoverable without prompting
                        partitioned db config mode load_only
			            output_dbpartnums (1)	
                        part_file_location /tmp/db2utility_messages')

And to specify multiple partitions, just comma separate them, for example output_dbpartnums(1,2,4).

Tip 2: Loading without files

To load from a table into another table does not require staging via a flat file, unless you wish to split the data first. You can still do this using a variant on the load from cursor syntax and without having to reference the dbpartitionnum() function. Assuming a copy of foo.bar in foo.bar2:

Listing 27. Loading from a table
call sysproc.admin_cmd('load from  (select * from foo.bar2) of cursor
                        messages on server
                        insert into foo.bar
                        nonrecoverable without prompting
                        partitioned db config mode partition_only
                        part_file_location /db2/landing/db2utility_messages')

Finally it is your responsibility to tidy up the split files from the PART_FILE_LOCATION. The file names will start with ‘LOAD’, and identify the partition and login of the runner, but otherwise are not especially meaningful, for example:

Listing 28. The split files
-rw-r----- 1 db2fgvp  db2fgvp          0 May 31 14:19 LOAD_6P9P52_DB2INST1
-rw-r--r-- 1 db2fgvp  db2fgvp       1008 May 31 14:19 LOAD_6P9P52.load.001_DB2INST1

(Note: these files are written by the fenced user.)


Conclusion

The examples above demonstrate how ETL developers can error check data movement utilities by simple SQL calls, and with much more control and precision than many ETL tools provide.

There is one example for normal running (no errors in the data input); one example for a duplicate error (where a unique or primary constraint is violated); and one example for "bad" data handling (for example invalid date or timestamp representation). There is also a demonstration of how to use the LOAD utility to split a master file by partition in a DPF environment. It is NOT a DBA recommendation of best practice, but simply a means of splitting a file (or table) very efficiently if that is required.

Finally there is also an example of a load by cursor that removes the need for file inputs.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=936752
ArticleTitle=Load and import error checking for ETL users in DB2 for Linux, UNIX, and Windows
publish-date=07112013