The data movement utilities in DB2 are simple to use with a single partition database environment. However, there are extra considerations when it comes to large data warehouse environment having terabytes of STAGING, STORE, and DATAMART tables on Database Partition Feature (DPF) environment of InfoSphere Warehouse. In this article we will look at the data movement options available in DB2 and the best suitable methods for data warehouse environment.
The table below lists the data movement options available in DB2 9.7, along with a description of how each is commonly used and an example.
Table 1. Data movement options available in DB2
| Utility name | Purpose | Usage practice | Example |
|---|---|---|---|
| Export | To export data from a database table to a file | This method can be used if you want to keep the table data in a file for future use or to refresh another environment from the current data. | EXPORT TO DATAMART. F_CUST_PROF.DEL OF DEL MESSAGES DATAMART. F_CUST_PROF.EXPORT.MSG SELECT * FROM DATAMART.F_CUST_PROF; |
| Load | To perform fast data inserts into a existing table in the database | This is the utility you are looking if your primary concern is data insert performance. It inserts formatted pages into the database instead of a row-by-row insertion. The database administrator/user can also choose not to log the activity in transaction logs. But be aware, this utility has the capability to fully exploit the system resources. | LOAD FROM DATAMART. F_CUST_PROF.DEL OF DEL SAVECOUNT 10000 MESSAGES DATAMART. F_CUST_PROF.LOAD.MSG INSERT INTO DATAMART.F_CUST_PROF; |
| Import | To insert data from a file to a table or view | This utility is beneficial when inserting data to a view and table that has a constraint, and when your intention is not to place the table in 'set integrity' state. Also, it is useful if you have triggers and you want them to be fired while inserting data. | IMPORT FROM DATAMART. F_CUST_PROF.DEL OF DEL COMMITCOUNT 1000 MESSAGES DATAMART. F_CUST_PROF.IMPORT.MSG INSERT INTO DATAMART. F_CUST_PROF;
If you have an IXF format data file, you can create a table and insert data in one single command, if the table doesn't exist in the target environment |
| db2move | Copy tables from one environment to another at schema level (multiple tables in general) | When you have many tables that need to be copied between environments based on the schema, this can easily be achieved using db2move with the COPY option. | Export all the tables from STAGING schema:db2move SourceDB EXPORT –sn STAGING
Export all the tables from the database: db2move SourceDB EXPORT
Import all the table data into a target database: db2move TargetDB IMPORT
Copy data from source to target: db2move SourceDB COPY –co TARGET_DB TargetDB USER <username>USING <password> |
Database administrators often find it difficult to copy a large volume of data from one database server to another over the network. Some of the major challenges to accomplish this task are:
- The volume of data
- Terabytes of data
- Hundreds of tables
- Tables with hundreds of millions of records along with thousands of range partitions
- The need for faster data transfer and data reloads
- The requirement to evenly distribute the data across the database partition nodes
Now that you have an overview of movement options in DB2, take a detailed look at techniques with the actual code for data movement.
DB2 warehouse data movement techniques
In this section, we will examine the techniques available to move a DATAMART from source to target database. We will evaluate the pros and cons of each method, with the goal of easing the work for database administrators during data transfer.
The examples were tested on an IBM Balanced Warehouse D5100 with the following configuration:
- 11 physical nodes and 41 logical nodes
- Each server has 4 CPUs, 2800 MHz
- Each server has 32 GB RAM
- SUSE Linux 10 2.6.16.60-0.66.1-smp
- DB2 9.5 FP7
- Database size: 6.8 TB
- Major fact tables are range partitioned and the number of partitions in each table is about 21237
The data copy was from the production of Balanced Warehouse to the user acceptance test (UAT) Balanced Warehouse of a similar infrastructure and DB2 version.
As a business requirement, the two UAT database schemas, DATASTORE and DATAMARTS, were refreshed from production.
As discussed earlier, there are many techniques available in DB2 to refresh large data sets from one environment to another. Those techniques are:
- Export data on local database server, transfer data file and load data locally on target database server (small tables with hash partitioning, or non-hash partitioned)
- Export data from local database server and remotely load data to target database server
- Export data from remote database server and locally load data to target database server
- Export data to an operating system pipe and import or load data from pipe to target remote database server
- Export data on local database server in parallel (each part residing in its respective partition file system), use a data file transfer, and then load the parts locally in parallel
The following sections examine each of these techniques, with examples.
Export data on local database server, transfer data file and load data locally on target database server (small tables with hash partitioning or non-hash partitioned)
Figure 1. Technique #1
Here are the steps to implement this technique.
- Connect to the SourceDB locally on the source database server.
CONNECT TO SourceDB;
- Perform a DB2 export on table in source database server.
EXPORT TO DATAMARTS.SCENARIO_CALENDAR.DEL OF DEL MESSAGES DATAMARTS.SCENARIO_CALENDAR.MSG SELECT * FROM DATAMARTS.SCENARIO_CALENDAR;
- Compress the exported file to shorten the time required for file transfer between
the servers.
gzip DATAMARTS.SCENARIO_CALENDAR F.DEL
- Transfer the compressed file from SourceDB server to TargetDB server using sftp or
scp.
cd <export file path>
sftp username@<targetDB Server hostname> put DATAMARTS.SCENARIO_CALENDAR.DEL.gz OR scp DATAMARTS.SCENARIO_CALENDAR.DEL.gz username@<targetDB Server hostname>:/<PATH>
- Uncompress the transferred file at the target database server.
gunzip DATAMARTS.SCENARIO_CALENDAR.DEL.gz
- Connect to the TargetDB locally on the target database server.
CONNECT TO TargetDB;
- Perform a load or import.
LOAD FROM DATAMARTS.SCENARIO_CALENDAR.DEL OF DEL SAVECOUNT 10000 MESSAGES DATAMARTS.SCENARIO_CALENDAR.LOAD.MSG INSERT INTO DATAMARTS.SCENARIO_CALENDAR;
- In case of load command, perform a SET INTEGRITY at the end of the operation.
SET INTEGRITY FOR DATAMARTS.SCENARIO_CALENDAR IMMEDIATE CHECKED;
- Perform a RUNSTATS to keep the statistics up to date.
RUNSTATS ON TABLE DATAMARTS.SCENARIO_CALENDAR WITH DISTRIBUTION AND DETAILED INDEXES ALL;
Export data from local database server and remotely load data to target database server.
Figure 2. Technique #2
Follow these steps to implement this technique.
- Catalog the target database in the source database server.
CATALOG TCPIP NODE TargetND REMOTE TargetDBServer.ibm.com SERVER 50001; CATALOG DATABASE TargetDB AT NODE TargetND; - Connect to the SourceDB locally on the source database server.
CONNECT TO SourceDB; - Perform a DB2 export from table on the source database server.
EXPORT TO DATAMARTS.SCENARIO_CALENDAR.DEL OF DEL MESSAGES DATAMARTS.SCENARIO_CALENDAR.msg SELECT * FROM DATAMARTS.SCENARIO_CALENDAR; - Connect to the TargetDB remotely on the target database server.
CONNECT TO TargetDB user <username> using <Password>;
- Perform a load or import remotely from source to target.
LOAD CLIENT FROM DATAMARTS.SCENARIO_CALENDAR.DEL OF DEL SAVECOUNT 10000 MESSAGES DATAMARTS.SCENARIO_CALENDAR.LOAD.msg INSERT INTO DATAMARTS.SCENARIO_CALENDAR;
- In case of load command, perform a SET INTEGRITY at the end of the operation.
SET INTEGRITY FOR DATAMARTS.SCENARIO_CALENDAR IMMEDIATE CHECKED;
- Perform a RUNSTATS to keep the statistics up to date.
RUNSTATS ON TABLE DATAMARTS.SCENARIO_CALENDAR WITH DISTRIBUTION AND DETAILED INDEXES ALL;
Export data from remote database server and locally load data to target database server.
Figure 3. Technique #3
Follow these steps to implement this technique.
- Catalog the source database in the target database server.
CATALOG TCPIP NODE SourceND REMOTE SourceDBServer.ibm.com SERVER 50001; CATALOG DATABASE SourceDB AT NODE SourceND;
- Connect to the source database remotely from the target server.
CONNECT TO SourceDB user <username> using <password>;
- Perform a DB2 export from table remotely.
EXPORT TO DATAMARTS.SCENARIO_CALENDAR.DEL OF DEL MESSAGES DATAMARTS.SCENARIO_CALENDAR.msg SELECT * FROM DATAMARTS.SCENARIO_CALENDAR;
- Connect to the TargetDB locally on the target database server.
CONNECT TO TargetDB user <username> using <Password>;
- Perform a local load or import.
LOAD FROM DATAMARTS.SCENARIO_CALENDAR.DEL OF DEL SAVECOUNT 10000 MESSAGES DATAMARTS.SCENARIO_CALENDAR.LOAD.msg INSERT INTO DATAMARTS.SCENARIO_CALENDAR;
- In case of load command, perform a SET INTEGRITY at the end of the operation.
SET INTEGRITY FOR DATAMARTS.SCENARIO_CALENDAR IMMEDIATE CHECKED;
- Perform a RUNSTATS to keep the statistics up to date.
RUNSTATS ON TABLE DATAMARTS.SCENARIO_CALENDAR WITH DISTRIBUTION AND DETAILED INDEXES ALL;
Export data to an operating system pipe and load data from pipe to target remote database server.
Figure 4. Technique #4
Follow these steps to implement this technique.
- Catalog the source database in the target database server:
CATALOG TCPIP NODE SourceND REMOTE SourceDBServer.ibm.com SERVER 50001; CATALOG DATABASE SourceDB AT NODE SourceND;
- Create an operating system pipe in the target database server.
mkfifo datapipe ls –ltr datapipe prw-r--r-- 1 bculinux bcuigrp 0 2011-09-18 16:32 datapipe
- Connect to the source database remotely from the target server.
CONNECT TO SourceDB user <username> using <password>;
- Export data from source database and write it to one end of the OS pipe (datapipe).
In the business scenario, the database administration team refreshed UAT from
PROD.
only for the period 2011 of records 12904084.
EXPORT TO datapipe OF DEL MODIFIED BY COLDEL, MESSAGES FACT_CUST_FPI_VALIDATION.EXP.msg SELECT * FROM DATAMARTS.F_CUST_FPI_VALIDATION WHERE REC_LOAD_DT > '2011-01-01-00.00.00.000000' WITH UR;
- Connect to the source database remotely from the target server.
CONNECT TO TargetDB user <username> using <password>;
- Import or load data from another end of OS pipe into a regular hash range
partitioned table.
IMPORT FROM datapipe OF DEL MODIFIED BY COLDEL, MESSAGES FACT_CUST_FPI_VALIDATION.IMP.msg INSERT INTO DATAMARTS.FACT_CUST_FPI_VALIDATION; LOAD FROM datapipe OF DEL MODIFIED BY COLDEL, MESSAGES FACT_CUST_FPI_VALIDATION.LD.msg INSERT INTO DATAMARTS.FACT_CUST_FPI_VALIDATION;
Note: Be cautious when using the load command in a partitioned database having many range partitions. Load may fail with SQLCODE SQL0973N, and you may lose the data in the target table.
Take an application snapshot of the target database. You will see an insert occurring on the target database server table. Until the data transfer is complete, you will see both the export to pipe and import from pipe running on the server.
The following listings summarize the status at source and target after completion:
Listing 1. From the source database connection
db2 "EXPORT TO datapipe OF DEL MODIFIED BY COLDEL, MESSAGES FPI_VALIDATION.EXP.msg SELECT * FROM DATAMARTS.F_CUST_FPI_VALIDATION WHERE REC_LOAD_DT > '2011-01-01-00.00.00.000000' WITH UR" Number of rows exported: 12904084 |
Listing 2. From the target database connection
db2 "IMPORT FROM datapipe OF DEL MODIFIED BY COLDEL, MESSAGES
FPI_VALIDATION.IMP.msg INSERT INTO
DATAMARTS.FACT_CUST_FPI_VALIDATION"
Number of rows read = 12904084
Number of rows skipped = 0
Number of rows inserted = 12904084
Number of rows updated = 0
Number of rows rejected = 0
Number of rows committed = 12904084
|
To remove the operating system pipe after completion, use the following command:
rm datapipe |
Limitations and workarounds for technique #4
When loading data into a heavily range-partitioned table in a database partitioned environment, you may encounter the following error.
SQL0973N Not enough storage is available in the "UTIL_HEAP_SZ" heap to process the statement. SQLSTATE=57011
Here are some ways to work around the problem:
- Increase the UTIL_HEAP_SZ database configuration parameter value to a maximum of 524288 pages of 4K page size. Then force all applications and reactivate the database.
- Decrease DATA BUFFER number while loading the data.
- If the previous two steps don't enable you to complete the load, perform a load terminate.
- Create a temporary table DATAMARTS.FACT_CUST_FPI_VALIDATION_TMP without range partitions and perform a load on the temporary table from the pipe with nonrecoverable clause to make the load faster.
- Upon completion, perform an
INSERT INTO ... SELECT * FROM.... You can perform a NOT LOGGED INITIALLY insert to improve the performance of the data loads. The listing below shows such an insert.
Listing 3. NOT LOGGED INITIALLY inserts
db2 +c "ALTER TABLE DATAMARTS.FACT_CUST_FPI_VALIDATION
ACTIVATE NOT LOGGED INITIALLY"
db2 "INSERT INTO DATAMARTS.FACT_CUST_FPI_VALIDATION SELECT *
FROM DATAMARTS.FACT_CUST_FPI_VALIDATION_TMP"
|
Note: If a NOT LOGGED INITIALLY insert fails for some reason, you must recreate the table.
Export data on local database server, transfer the data file, and load data locally on target database server (small tables with hash partitioned or non-hash partitioned).
Figure 5. Technique #5
At the source database server, perform the following steps to export data.
- Create soft links from the admin node export directory to all the data nodes. In this example, the export directory is
$HOME/db2backup/exports.
ln -s /db2fs/bculinux/NODE0001 NODE1 ln -s /db2fs/bculinux/NODE0002 NODE2 ......... ln -s /db2fs/bculinux/NODE0040 NODE40
- The next listing shows the resulting files after creating the soft links.
ls –ltr lrwxrwxrwx 1 bculinux bcuigrp 24 2011-04-13 19:25 NODE1 -> /db2fs/bculinux/NODE0001 lrwxrwxrwx 1 bculinux bcuigrp 24 2011-04-13 19:25 NODE2 -> /db2fs/bculinux/NODE0002 ......... lrwxrwxrwx 1 bculinux bcuigrp 24 2011-04-13 19:28 NODE40 -> /db2fs/bculinux/NODE0040
- In each physical data node server, create directory structure similar to this.
mkdir –p /db2fs/bculinux/NODE0001/exports/datamarts mkdir –p /db2fs/bculinux/NODE0002/exports/datamarts ......... mkdir –p /db2fs/bculinux/NODE0040/exports/datamarts
- Find the hash partition columns from SYSCAT.COLUMNS for the table that needs to be
exported.
db2 "SELECT SUBSTR(COLNAME,1,20) COLNAME, PARTKEYSEQ FROM SYSCAT.COLUMNS WHERE TABNAME=''F_CUST_PROFTBLTY_TMP' AND TABSCHEMA='DATAMARTS'" COLNAME PARTKEYSEQ -------------------- ---------- ACCT_KEY 0 BUSS_UNIT_KEY 1 CALENDAR_MONTH_KEY 2 CRNCY_CODE 0
In this table we have two hash partition columns. We choose one of the hash columns to export data in the respective partition.
- Export data across all the partitions in parallel using the DB2 EXPORT command as shown in this listing.
db2_all "\"|| db2 \"EXPORT TO $HOME/db2backup/exports/NODE##/exports/datamarts/ DATAMARTS.F_CUST_PROFTBLTY_TMP.del OF DEL SELECT * FROM DATAMARTS.F_CUST_PROFTBLTY_TMP WHERE DBPARTITIONNUM (BUSS_UNIT_KEY)=##\""
This command exports each partition data to respective partition nodes.
- Perform file copy from each source database server node to the target database
server using scp.
scp -p <File Name> <User Name>@<Host Name>:<File Name>
At the target database server, perform the following steps to export the data.
- Create soft links as illustrated in the source database, as shown in Listing 4.
- Load from each partition data in parallel using DB2 load command.
Listing 4. Parallel LOAD commanddb2_all "<<-0<<\" db2 -v \"LOAD FROM db2backup/exports/NODE##/exports/datamarts DATAMARTS.F_CUST_PROFTBLTY_TMP.del OF DEL INSERT INTO DATAMARTS.F_CUST_PROFTBLTY_TMP NONRECOVERABLE \""
This command would load each partition data to respective partition nodes.
Pros and cons of each technique
Table 2 highlights the benefits and drawbacks of these techniques.
Table 2. Pros and Cons
| Technique | Pros | Cons |
|---|---|---|
| 1. Export locally and load locally |
|
|
| 2. Export locally and load remotely |
|
|
| 3. Export remotely and load locally |
|
|
| 4. Export and load using pipe |
|
|
| 5. Parallel export and parallel load |
|
|
This article has described how to use the DB2 data movement utilities to meet requirements in a balanced warehouse environment with special considerations for multi-partitioned fact tables. You should now have a better understanding of the issues which can occur when working with huge fact tables that are heavily range partitioned. If you encounter LBAC protected data, please refer back to IBM Information Center for more details.
Special thanks to Nelson Coish, President, Coish Consulting Inc for helping the author to complete and fine-tune the article.
Learn
- Consult the IBM DB2 9.7
Database for Linux, UNIX, and Windows Information Center to
find information that you need to use the DB2 9.7 family of products and features.
-
Visit the developerWorks resource page for DB2 for Linux, UNIX, and Windows
to read articles and tutorials and connect to other resources to expand your DB2 skills.
-
Visit the developerWorks resource page data warehousing
to grow your skills on InfoSphere Warehouse and learn more about warehousing topics.
- Attend a free
developerWorks Live! briefing to get up-to-speed quickly on IBM
products and tools as well as IT industry trends.
- Watch developerWorks
on-demand demos ranging from product installation and setup demos for
beginners, to advanced functionality for experienced developers.
- Learn more about Information Management at the developerWorks Information Management
zone. Find technical documentation,
how-to articles, education, downloads, product information, and
more.
- Stay current with
developerWorks technical events and webcasts.
- Follow developerWorks on
Twitter.
Get products and technologies
-
Download a free trial version of DB2 for Linux, UNIX, and Windows.
-
Evaluate IBM
products in the way that suits you best: Download a product trial, try
a product online, use a product in a cloud environment.
- Build your next
development project with
IBM trial software,
available for download directly from developerWorks.
Discuss
- Participate in the discussion forum.
- Get involved in the My developerWorks
community. Connect with other developerWorks users while exploring the
developer-driven blogs, forums, groups, and wikis.
- Check out the
developerWorks
blogs and get involved in the
developerWorks community.

Mohan works as a lead DB2 Database Administrator at Reckitt Benckiser plc focusing on balanced warehouse business intelligence solutions, application performance tuning, and DB2 implementation/migration. While with the IBM Software Lab, India, he worked with the High Performance On Demand Solutions (HiPODS) team to provide the performance solutions to clients. Earlier he worked with the DB2 product development team, India. Mohan is an IBM certified DB2 Advanced Database Administrator, DB2 Application Developer and DB2 Problem Determination Master. Mohan completed his M Tech (Master of Technology) in computer science in the year 2004 and Executive MBA from IIMC in the year 2007.




