© 2003 International Business Machines Corporation. All rights reserved.
Read the disclaimer first.
Data warehouse administrators frequently face the challenge of integrating data in a heterogeneous database environment. You may have data stored in DB2®, but want to take advantage of the capabilities of IBM Red Brick Warehouse to analyze your data. So you're faced with the problem of transferring data from DB2 to Red Brick. What's the best way to accomplish your goal?
Integrating these two database systems through IBM DB2 Warehouse Manager brings technical advantages and business benefits. In this article, I'll share my experiences of making DB2 Warehouse Manager work with Red Brick Warehouse. The configuration suggestions which follow are intended to help you to automate large data transfers between DB2 and Red Brick Warehouse installations.
In an Export, Transform and Load (ETL) process, a database can be either a source of the data flow, or a target of the data flow. In DB2 Warehouse Manager, Red Brick can be configured to export data (as the source) or to receive imported data (as the target) from other sources, such as DB2. These are the areas of concern in making these products work together. Other than configuring a specific Red Brick data source or target, Warehouse Manager processes involving Red Brick are no different than any other data source when you're using data warehouse functions such as transformation, job control and scheduling.
This article has two sections:
- First we look at how to export data from Red Brick.
I'll discuss the options for exporting data from a Red Brick Warehouse: using ODBC, or using the Red Brick SQL Export command. - The second section covers how to import data into Red Brick.
Here I'll discuss the ways of importing data into a Red Brick Warehouse: using ODBC, or using the Red Brick loader, called Table Management Utility (TMU), through a User Defined Program (UDP).
I developed and tested the recommendations in this article on a Windows NT® 4.0 workstation with the following software installed:
- IBM DB2 Universal DatabaseTM V7.2 (with DB2 Control Center) and DB2 Warehouse Manager
- IBM Red Brick Warehouse 6.11
We'll assume that you already have basic working knowledge of DB2 Warehouse Manager as well as Red Brick Warehouse. The business intelligence tutorial in DB2's "First Steps" option is a good starting point for Warehouse Manager. On the other side, walking through the AROMA database example in Red Brick's Administrator's Guide is an excellent way to learn Red Brick.
In my examples I use a Red Brick database called AROMA. AROMA is accessed using the user name system and the password manager. We will be using a table called MARKET which is defined in AROMA. The scripts for creating this database can be found in the Red Brick installation under the sample_input directory.
You may export data from a Red Brick warehouse in one of two ways.
A Red Brick database can be defined as a Generic ODBC source in DB2 Warehouse Manager by following these steps:
- First, you need to define an ODBC data source name (DSN) to access a Red Brick database.
- Then, in DB2 Data Warehouse Center, right click the Warehouse Sources folder, then follow Define->Generic ODBC to finish the standard configuation steps.
- After that, you can use an SQL step in the Warehouse Manager to retrieve data from any Red Brick table through the ODBC interface.
First, follow the steps described above in "Use ODBC". Let us assume you used sqlstr as the query string in the original SQL step. Change the query string to:
export to output_file ddl_file tmu_file format_type ( sqlstr ); |
The detailed export syntax can be found in Red Brick's SQL Reference Guide. The export statement is invoked through ODBC just like any other normal query, but it directly writes the output of the query execution into files, a directory, or a pipe, bypassing the overhead of the ODBC layer. The supported file formats include Fixed-Length, Delimited, and Binary. In addition, support for XML format was added in the Red Brick 6.2 release. All the output files from the SQL Export must be accessible from the local machine.
For example, to export every row in the MARKET
table to c:\temp\market.txt, the query string of the SQL step will look like the following:
export to 'c:\temp\market.txt' ddlfile 'c:\temp\market.ddl' tmufile 'c:\temp\market.tmu' format external (select * from market); |
When you specify external format, as we did in the command above, each column in the output will be stored in fixed length. Here is a sample row from market.txt:
00000000001 Atlanta GA Atlanta South .... |
As for the other two files, the market.ddl contains
a create table statement, which defines each column in the select list column
of the query as a table column.The market.tmu file contains a TMU script for loading the market.txt
file into the table defined by market.ddl.
The information supplied in these two files can be used to define the schema information for the exported files. Users can then define the exported files as Flat File Warehouse Sources and use them in later ETL steps.
Which export method should you use?
Of the two export methods, the SQL Export method is the preferred one because of its high performance. Note that the files that result from SQL Export must be accessible from the local machine. If users need to move data between machines, they can use an additional FTP step to ship the result files to a remote machine.
The ODBC method is slower than SQL Export. On the other hand, it can move data across machines without requiring additional steps.
Like exporting data, you have a choice between two methods for importing data.
A Red Brick database can be defined as an ODBC data target. There is no native Red Brick data target in DB2 V7.2. Luckily, there's a workaround. Users can use the existing target type "DB2 UDB for VSE". But there are a few problems to be aware of if you choose to use this workaround.
Although the VSE target uses ODBC, it embeds some DB2-only logic. There are two main problems.
- First, the VSE target tries to find a table called IBMSNAP_REGISTER when retrieving the table list. To get around this problem, you will need to create the table. Log in as a DBA and issue the following SQL to the Red Brick database:
grant connect, resource to asn with asn; connect as asn asn; create table ibmsnap_register ( source_owner char(18)not null, source_table char(18)not null, source_view_qual smallint not null, global_record char(1) not null, source)_structure smallint not null, source_condensed char(1) not null, source_complete char(1) not null, cd_owner char(18), cd_table char(18), phys_change_owner char(18), phys_change_table char(18), cd_old_synchpoint char(10), cd_new_synchpoint char(10), disable_refresh smallint not null, ccd_owner char(18), ccd_table char(18), ccd_old_synchpoint char(10), synchpoint char(10), synchtime timestamp, ccd_condensed char(1), ccd_complete char(1), arch_level char(4) not null, description char(254), before_img_prefix varchar(4), conflict_level char(1), partition_keys_chg char(1) );
After defining the IBMSNAP_REGISTER table, you can add the Red Brick database as a Warehouse ODBC Target by following this click sequence: Warehouse Target->Define->ODBC target->DB2->DB2 VSE in Data Warehouse Center.
- The second problem is that the VSE target generates a CREATE TABLE statement using a table name preceded with a user name qualifier. Unfortunately, Red Brick does not currently support that syntax.
Because of this syntax difference, you cannot use the "Automatically Generate Target Table" option. To avoid the problem, for SQL steps using the Red Brick target, bring up its properties dialog box, and under the Target Table tab, uncheck "Data Warehouse Center created table". You then must use the Red Brick RISQL program to create the target table separately.
Note that this problem is fixed in the Red Brick Warehouse 6.20.xC2 release where user name qualifiers are supported.
After resolving these two issues, the Red Brick target can be defined successfully and many functions can be performed.
Users can configure a UDP program in Warehouse Manager to use the Red Brick TMU, the high-speed loader for Red Brick Warehouse. To set up UDP parameters, a user specifies a TMU executable path, a TMU control file, a target database, user name and password. After that, the TMU UDP will be available on the tool panel of the Process Model window. It can be selected to create a step in an ETL process. When the step is executed, the TMU will be invoked to load data into the Red Brick database. Please note that the target database does not need to be defined as a Warehouse Target. Note also that the input files here must be accessible from the local machine.
Let's look at an example of defining a TMU UDP. For our example the TMU executable is
c:\redbrick\rbw\bin\rb_tmu.exe. The exact location of the TMU executable
depends on the Red Brick installation directory. Instead of system and manager,
you may have a different user name and password. We want to use this TMU to
load c:\temp\market.txt into the target table MARKET in the Red Brick AROMA
database.
Here is the SQL defining the MARKET table:
create table market( mktkey integer not null, hq_city char(20) not null, hq_state char(20) not null, district char(20) not null, region char(20) not null); |
The input file c:\temp\market.txt is a text file delimited by '*':
04*Houston*TX*New Orleans*South 05*New York*NY*New York*North |
The following TMU control file
c:\temp\market.tmu instructs TMU to load market.txt into the MARKET table:
load data inputfile 'c:\temp\market.txt' append format separated by '*' into table market ( mktkey integer external, hq_city char, hqstate char, district char, region char); |
Follow these steps to define a TMU UDP:
- In the left panel of the Data Warehouse Center window, open the User-Defined Programs and Transformers folder in the Administration folder.
- Right-click on it to define a group named "Red Brick TMU".
- Right-click on the "Red Brick TMU group" to bring up the "Define Program" window. Name it "Market TMU".
- In the Program tab, click Insert, and then edit the parameter entry as follows:
Name: TMU parameter Value: -d AROMA c:\temp\market.tmu system manager > c:\temp\log 2>&1
- In the Agent sites tab, add "Default DWC AgentSite" to the selected agent sites.
- Click OK and do a refresh to finish the definition process.
After that, the icon for Market
TMU UDP appears in the left panel of the Process Model window. Usually, the
Market TMU UDP step immediately follows the previous step, which creates
market.txt. The UDP can then be scheduled to run at a specified time or on
demand. The TMU's output is in c:\temp\log.
When there is a warning in the load process, for example, a column truncation,
TMU typically prints it and proceeds with the load. After the TMU finishes, it
exits with return code 1. Because the Warehouse Manager treats any return code
of 1 as an error, it will pop up an error message box even when the TMU
finishes the load successfully with expected warning messages. Users can choose to ignore this particular
error message. The log output in c:\temp\log can be used to check a load's status.
As you can see from the process above, other Red Brick programs can also be
used in a very similar way through a UDP program. For example, to get a
parallel load, one can use rb_ptmu.exe. In the Red Brick 6.2 release, a client
TMU rb_ctmu.exe is added to load data from a local machine to a remote server
directly. Using rb_ctmu.exe in the UDP, we can get remote loading.
Which import method should you use?
Of the two import methods I discussed, I would recommend the TMU UDP method for handling large volumes of data. This method does require that the input files and the TMU file be accessible from the local machine. If accessibility is a problem, you can use the client TMU in the Red Brick 6.2 release or use an additional FTP step to ship the files to the local machine.
The ODBC method can move data across machines. However, because it is much slower, the ODBC method is only suitable for handling small volumes of data.
In this article we discussed the alternatives for importing data into and exporting data from Red Brick Warehouse using DB2 Warehouse Manager. Because of their high performance, I recommend the SQL Export command and the TMU UDP program for exporting and importing large amounts of data. Either method will give you one more tool for managing your data warehouse.
The configuration recommendations contained in this article were derived under specific operating and environmental conditions. While the information has been reviewed for accuracy under the given conditions, the results obtained in your operating environments may vary significantly. Accordingly, IBM does not provide any representations, assurances, guarantees, or warranties regarding results or performance.
I am grateful to Rick Cole, Bob Rumsby, Loganathan Venkatachalam, Albert Yao, and Cheung-Yuk Wu for their insightful comments.
Qi Jin is a software developer manager at the IBM Silicon Valley Lab in San Jose, California. He has worked in the data loading and kernel areas for IBM Red Brick Warehouse for the past six years. You can reach Qi Jin at qijin@us.ibm.com.




