Using IBM InfoSphere Warehouse Design Studio with pureXML data, Part 1: Create an ETL data flow to populate a hybrid data warehouse

Learn how to integrate business-critical XML data into your data warehouse using IBM® InfoSphere™ Warehouse Design Studio and DB2® 9.7 pureXML®. This two-part article series provides step-by-step instructions for using pureXML as both a source and target data source for extract, transform, and load (ETL) operations developed with InfoSphere Warehouse Design Studio. This article explains how to build a single data flow that uses an XML-based source table to populate two target data warehouse tables. One of these tables contains only relational data, while the other contains both relational and XML data.

Share:

Cynthia M. Saracco, Senior Software Engineer, IBM

Cindy Saracco photoCynthia M. Saracco is a Senior Solutions Architect at the IBM Silicon Valley Laboratory who specializes in emerging technologies and database management topics. She has 25 years of software industry experience. A former instructor for University of California Santa Cruz extension program, Cynthia has written 3 books and more than 70 technical papers.


developerWorks Master author
        level

25 March 2010

Also available in Russian Portuguese

Introduction

The increased use of XML to capture and exchange critical business data is prompting firms to evaluate how to integrate XML data in their data warehouse environments. This article explores how firms can use IBM InfoSphere Warehouse Design Studio to graphically create data flows that extract, transform, and load XML data managed by DB2. You will learn how to extract pureXML data from DB2 9.7, transform the XML as needed, and populate a hybrid relational/XML data warehouse.

For this article to be effective for you, some knowledge of data warehousing, ETL (extract-transform-load) technology, and XML will be useful. If you're not familiar with DB2 pureXML or IBM InfoSphere Warehouse Design Studio, brief introductions are provided in the following sections, and you can find other references in this article's Resources.

Introducing DB2 pureXML

DB2 provides businesses with a common application programming interface and database management platform for data modeled in tables and in XML hierarchies. This hybrid database management architecture enables businesses to integrate XML messages and documents into their database environments without shredding (or decomposing) the XML into columns of various tables. With DB2, application developers can retrieve relevant portions of the XML data easily and efficiently, and they can integrate XML and relational data with little effort. Figure 1 illustrates DB2's hybrid database architecture.

Figure 1. DB2's hybrid database architecture supports relational and XML data
diagram showing both XML and relational data, both accessible by SQL or XQuery

To use DB2 pureXML, administrators create a table with one or more columns of type XML. When XML data is entered into this column, DB2 automatically preserves its inherent structure and provides optimized access to the XML data as needed. XML indexing, compression, and other DB2 features provide for strong runtime performance and efficiency.

DB2 pureXML data is increasingly becoming part of data warehouse environments. Message-based applications, service-oriented architectures (SOA), Web-centric applications, and application integration projects often rely on XML to define how important business data is represented and exchanged.

As more and more critical business data is captured in XML, it's not surprising that businesses want to manage, share, query, and report on XML data. Some businesses are creating data warehouse staging areas that use pureXML as the underlying storage mechanism. Some businesses are extending the relational tables in their data warehouses with one or more pureXML columns to accommodate new business requirements.

Introducing InfoSphere Warehouse Design Studio

InfoSphere Warehouse Design Studio (Design Studio) provides data warehouse architects and developers with data modeling, query generation, data mining, and ETL services through an Eclipse-based interface. Using graphical tools, designers can perform several functions, including connecting to source and target databases, reverse-engineering physical data models, and building DB2 ETL jobs (data flows). This article focuses on using Design Studio to build an ETL data flow.

Design Studio includes a variety of SQL warehousing operators and XML operators to manipulate relational and pureXML data. These operators include the following:

  • Reading from source tables or files
  • Writing data to target tables or files
  • Performing query functions, such as joins, unions, aggregations

The XML operators are new in DB2 V9.7. The XML operators include:

XML Relational Mapping operator
Shreds XML data into relational columns of a result set.
XML Composer operator
Creates XML data from relational or XML data.
XQuery operator
Executes XQueries to retrieve specific XML data. The generated code uses the SQL XMLQuery() function to invoke the XQuery.
Two XML Node operators
Renames an XML node, such as an element or an attribute, and extracts one or more nodes from an XML document
Two XML File operators
Reads from or writes to XML files

The sample data flow discussed in this article uses the XML Relational Mapping operator and several Data Warehousing operators to populate a target relational table and a hybrid relational/XML table from a single XML source table.

Setting up the development and runtime environments

To create the sample scenarios described in this article, install DB2 9.7 Enterprise Server Edition and IBM InfoSphere Warehouse Design Studio 9.7. You can install both on a single Windows® system. Also, download the sample data and configuration script included with this article.

The sample operational data includes information about customer accounts and their financial holdings. The data are based on a subset of the open source Transaction Processing over XML (TPoX) benchmark. You'll learn more about the operational data and the data warehouse design shortly. For more information about TPoX, see Resources.


Introducing the scenario

The scenario described in this article uses operational data stored in DB2 pureXML as input to the extract, transform, and load (ETL) job that you'll design to populate a DB2-based data warehouse. You'll use Design Studio to extract data stored in one DB2 XML column and map this data to two tables in a DB2 data warehouse, as shown in Figure 2. One of these tables contains only traditional relational columns, while the second contains relational columns and one XML column.

Figure 2. The sample scenario calls for operational data stored in DB2 pureXML to serve as input to a DB2 data warehouse
Diagram shows XML data mapped to XML and relational data in the warehouse

This data model represents a common scenario in which portions of XML data are often shredded into a relational structure. Such portions represent data that business users might frequently analyze and query. Many business intelligence tools are optimized to support relational structures, so shredding frequently queried XML data into relational columns can be quite effective. However, business needs can vary over time, making it difficult for administrators to determine which XML values should be shredded into relational columns. Maintaining full XML data in the data warehouse enables users to immediately access important business data that wasn't previously shredded into a relational format.

To simplify the sample scenarios in this article, you'll use a single DB2 database to store both the operational data and warehouse data. Of course, in a production environment, operational data and warehouse data would be managed in separate databases, usually on separate servers. However, using a single test database enables you to concentrate on how Design Studio can manipulate pureXML data.

Source and target tables

The sample scenario stores operational XML data in the TPOXADMIN.ACCOUNT table, which serves as the source table to the ETL data flow. The ACCOUNT table contains one XML column (INFO). You can use Design Studio's data modeling wizards to create this table (as well as the data warehouse target tables). However, the sample scenario simply includes the DB2 DDL statements for all necessary tables.

Listing 1 shows how to create the TPOXADMIN.ACCOUNT table.

Listing 1. SQL to create the source ACCOUNT table
 create table tpoxadmin.account(info xml)

The INFO column contains details about the account, including its title, opening date, working balance, portfolio holdings, and other information. (In the TPoX benchmark, account documents such as these are fed into broader customer account documents.) Figure 3 shows a portion of one XML document stored in the ACCOUNT table referenced in this article.

Figure 3. Portion of an XML record stored in the TPOXADMIN.ACCOUNT table
Shows an XML record with begin and end tags

For testing simplicity, the target data warehouse database is also configured to be TPoX. Source information from the INFO column of TPOXADMIN.ACCOUNT will be mapped to two tables: the DWADMIN.ACCT table, which contains information about the overall account, and the DWADMIN.HOLDINGS table, which contains information about various investments (portfolio holdings) of a given account. Listing 2 shows how to define these tables.

Listing 2. Definitions for target data warehouse tables
-- DWADMIN.ACCT contains general information about accounts. 
create table dwadmin.acct (
     id                 int,   
     title              varchar(100), 
     currency           char(3), 
     workingbalance     int,  
     officer            varchar(50), 
     datechanged        date,
     timechanged        time, 
     totalholdings      int,
     holdingtypes       int,
     fullrecord         xml 
)

-- DWADMIN.HOLDINGS tracks specific investments (holdings) for a given account 
create table dwadmin.holdings (
     id                 int,    
     symbol             varchar(10), 
     type               varchar(25), 
     quantity           decimal(12,3) 
)

To understand how the XML source data in the INFO column of TPOXADMIN.ACCOUNT is mapped to the various columns of the data warehouse tables, see Table 1 and Table 2. Note that the structure of the target DWADMIN.ACCT table calls for more than just retrieving specific XML data values from source documents. In particular, the DATECHANGED and TIMECHANGED columns require simple transformation of data values, while the TOTALHOLDINGS and DISTINCTHOLDINGS columns require aggregating certain data.

Table 1. XML source data for columns in the DWADMIN.ACCT target table
Column name in DWADMIN.ACCTXPath expression or data source
ID/Account/@id
TITLE/Account/AccountTitle
CURRENCY/Account/Currency
BALANCE/Account/Balance/WorkingBalance
OFFICER/Account/AccountOfficer
DATECHANGEDExtracted from /Account/LastUpdate
TIMECHANGEDExtracted from /Account/LastUpdate
TOTALHOLDINGSComputed. (Count of /Account/Position/Type)
HOLDINGTYPESComputed. (Count distinct of /Account/Position/Type)
FULLRECORDEntire XML document (/Account)

Table 2 shows the XML source data for columns in the DWADMIN.HOLDINGS target table.

Table 2. XML source data for columns in the DWADMIN.HOLDINGS target table
Column name in DWADMIN.HOLDINGSXPath expression or data source
ID/Account/@id
SYMBOL/Account/Holdings/Position/Symbol
TYPE/Account/Holdings/Position/Type

Data flow design

As you might expect, there are several ways to program an ETL job for this scenario. In this article, you'll explore how to define a single data flow to accomplish all the necessary work. In the second part of this series, you'll implement the same scenario using multiple data flows and a single control flow to manage the execution sequence. The latter technique is useful for complex operations and helps promote code reuse.

The single data flow that you'll build in this article uses multiple Design Studio operators, including Table Source and Table Target operators, XML Relational Mapping operators, a Group By operator, and a Table Join operator. Figure 4 shows the overall design, which is discussed in more detail below.

Figure 4. Data flow for ETL scenario
flow chart of XMLTableACCTHoldingsGroupBy entities with Inputs and Outputs

The Table Source operator (shown in the upper left of Figure 4) identifies the TPOXADMIN.ACCOUNT table as the input source. XML data in this table feed two XML Relational Mapping operators. One of these operators (shown on the lower branch in Figure 4) feeds a Group By operator, which performs the aggregations shown in Table 1. However, one of the target tables (DWADMIN.ACCT) also requires that the full XML record be maintained for compliance purposes. Because XML columns can't participate in Group By clauses, you need a second XML Relational Mapping operator (shown on the upper branch of Figure 4). This operator will extract the account ID value from the XML data and preserve the full XML record. The account ID from this mapping operation can then be joined with the account ID included in the output of the Group By operation, creating a result set that will populate the target ACCT table.

Populating the target DWADMIN.HOLDINGS table is simpler. The output from one of the XML Relational Mapping operators (shown on the lower branch of Figure 4) can feed the target table directly.


Step 1: Preparing the DB2 database environment

The Downloads section of this article contains a DB2 script that creates the necessary operational (source) table and data warehouse (target) tables. It also populates the operational table with sample XML account information. Download the DWsetup.db2 file, open a DB2 command window, and issue the command shown in Listing 3.

Listing 3. Invoking the DB2 script
 db2 -td@ -vf DWsetup.db2

Note that this script is designed to support a DB2 9.7 server running on Windows.


Step 2: Preparing your Design Studio environment

Before creating your data flow, prepare the Design Studio environment. The following steps describe how to follow standard Design Studio processes to establish a connection to the TPoX database, create a Data Warehouse project, create a physical data model of the TPoX database through reverse engineering facilities, and create a new data flow for your work.

  1. In the Data Source Explorer, highlight the TPoX database, and right-click to connect to it.
  2. Enter the appropriate information, including your user name and password, as shown in Figure 5.
Figure 5. Establishing a connection to the TPoX database
Screen capture showing driver properties for TPoX database
  1. Test the connection to verify that you can connect to the database.
  2. In the Data Project Explorer pane, right-click to create a new data warehousing project with the name of your choice.
  3. Highlight the project you just created and navigate to the Data Flows folder.
  4. Right-click to create a new data flow with the name of your choice.
  5. For this scenario, specify that you intend to work against an online database, and select the TPoX database connection when prompted. Ignore any errors that might appear because the data flow is empty.
  6. Within your project, navigate to the Data Models folder.
  7. Right-click to create a new physical data model.
  8. Specify that you want to create the model by reverse engineering a DB2 9.7 database.
  9. Identify the TPoX database connection when prompted, and select TPOXADMIN and DWADMIN as the target schemas.

Step 3: Adding the source table to the data flow

Complete the following steps to program the flow to read source data from the TPOXADMIN.ACCOUNT table.

  1. From the Palette, select SQL Warehousing Operators > Sources and Tables > Table Source.
  2. Drag it onto the left side of the empty data flow. A wizard appears.
  3. Select TPOXADMIN.ACCOUNT as the source database table, as shown in Figure 6.
Figure 6. Identify TPOXADMIN.ACCOUNT as the source table
screen capture of ACCOUNT general info window showing souce database table
  1. Verify that the result set includes the INFO column (the XML column) in the output.
  2. Save your work.

Step 4: Adding an XML Relational Mapping operator

Complete the following steps to add an XML Relational Mapping operator, connecting it to the output of your source table. Program this operator to extract the account ID from each XML document and to include the full contents of each XML document in the result set. The sample target DWADMIN.ACCT table includes several relational columns as well as the original XML records. This operator extracts account ID values to drive a join operation that can produce the desired result set.

  1. From the Palette, select XML Operators > XML Relational Mapping.
  2. Drag it to the right of the Table Source icon.
  3. Connect the source table to the XML Relational Mapping operator.
  4. Double-click the XML Relational Mapping operator to program it.
  5. Identify the INFO column as the XML column.
  6. Specify a default namespace for input XML documents. As Figure 3 shows, all account records have an XML namespace.
  7. Click the yellow + icon to add a line to the Namespaces box, and enter http://tpox-benchmark.com/custacc as the URI.
  8. Leave the Prefix field blank to use this as the default namespace, as shown in Figure 7.
Figure 7. Define the default namespace for XPath expressions
Screen cap: MappingWithXML window showing tpox namespace with no Prefix
  1. Complete the following steps to specify the XQuery expression and column definition that will drive the underlying XMLTable() function. The XML Relational Mapping operator generates a DB2 query that includes an XMLTable() function call.
    1. Specify /Account/@id as the XQuery expression.
    2. Define the output result set to include an ID column of type integer that is derived from the node specified in the XQuery expression. The Path field should be set to . (a period). This causes DB2 to transform account ID information located at /Account/@id into integer values.
    3. Verify that your specification shows XQuery expression as /Account/@id and Column definition with id as the Column Name and INTEGER as the Data Type, as shown in Figure 8.
Figure 8. Extract account ID information, converting the output to an integer value
Column definition screen with /Account/@id, id, and INTEGER specified
  1. Verify that the output of this data flow operator includes the ID column as well as the original XML input column. The final result set in the sample shows two columns: fullrecord (XML) and id (integer), as shown in Figure 9.
Figure 9. Define final result set with two columns
Screen cap: Result Columns with 1 XML and 1 INTEGER expression for Data Types

Step 5: Adding a second XML Relational Mapping operator

The sample data flow needs two XML Relational Mapping operators. The second operator extracts a number of relational column values from various XML nodes. This operator also extracts the date and time information from a timestamp, populating two relational columns from one XML data value. Finally, this operator extracts one relational value that drives two aggregate operations computed in a subsequent Group By operation.

Like the first XML Relational Mapping operator you defined, this operator uses the INFO column of the TPOXADMIN.ACCOUNT table as its input source.

Complete the following steps to add the second XML Relational Mapping operator:

  1. From the Palette, select XML Operators > XML Relational Mapping.
  2. Drag it beneath the XML Relational Mapping operator already included in your flow.
  3. Connect the source table to this new XML Relational Mapping operator.
  4. Double-click the new XML Relational Mapping operator to program it.
  5. Identify the INFO column as the XML column.
  6. Specify a default namespace of http://tpox-benchmark.com/custacc.
  7. Specify the XQuery expression and column definition that will drive the underlying XMLTable() function. Because each account can have multiple investments, complete the following steps to specify an XQuery expression that will generate one row per investment per account.
    1. Specify /Account/Holdings/Position/Type as the row-generating XQuery expression.
    2. Define the output result set to include relational columns that will match those in the target DWADMIN.ACCT and DWADMIN.HOLDINGS tables, with the exception of the two computed columns (DISTINCTHOLDINGS and TOTALHOLDINGS). Table 3 shows the needed columns and their corresponding data types and path expressions.
Table 3. Column definition table for /Account/Holdings/Postion/Type
Column NameData TypeLengthScalePath
idINTEGER../../../@id
titleVARCHAR50../../../AccountTitle
currencyCHAR3../../../Currency
workingbalanceINTEGER../../../Balance/WorkingBalance@id
officerVARCHAR50../../../AccountOfficer
datechangedTIMESTAMP../../../LastUpdate
timechangedTIMESTAMP../../../LastUpdate
holdingtypeVARCHAR25.
symbolVARCHAR30../Symbol
qtyDECIMAL123../Quantity
  1. Specify the output of this data flow operator by mapping all available columns except the INFO column to corresponding result columns, as shown in Figure 10.
Figure 10. Define the result set to be generated by this operator
Screen cap: Columns and values from Table 3 prepended by XMLTABLE_08 for Expressions
  1. Change the definition of the DATECHANGED and TIMECHANGED columns in the result set so that their values are derived from appropriate SQL expressions involving DATE/TIME operations. You can specify these expressions by double-clicking each row, which causes another window to appear with an area for building or typing SQL expressions. Use DATE(table.column) for the DATECHANGED column and TIME(table.column) for the TIMECHANGED column in the result set. The expression date("XMLTABLE_08"."datechanged") defines the DATECHANGED result column, as shown in Figure 11.
Figure 11. Specify the appropriate DATE/TIME expression in the SQL text pane
SQL expression builder screen with SQL expression for datechanged

Step 6: Adding a Group By operator

With all the necessary relational values extracted from the source XML column, it's time to aggregate the two values needed for one of the target data warehouse tables (the DWADMIN.ACCT table). This table tracks the number of investments held by each account and the number of distinct types of holdings in each account. For example, an account that invested in four stock funds and three bond funds would have a total of seven holdings and two distinct types of holdings. The Group By operator enables you to generate a result set that includes the required aggregate operations. Complete the following steps to create and program the Group By operator.

  1. From the Palette, select SQL Warehousing Operators > Transformations > Group By.
  2. Drag it to the right of the second XML Relational Mapping operator.
  3. Connect the XML Relational Mapping operator to the Group By operator.
  4. Double-click the Group By operator to program it.
  5. Specify the columns to be included in the result set of this operator by completing the following steps:
    1. Remove the INFO column (the XML column) from the Select List shown in the result set. XML columns cannot be included in SQL Group By statements. In a subsequent data flow operation, you'll use a join operator to ensure that full XML account records (contained in the INFO column) are fed into the target data warehouse table.
    2. Remove HOLDINGTYPE, SYMBOL, and QTY columns from the result set. These columns are not needed for the DWADMIN.ACCT target table.
    3. Change the input expression for the TOTALHOLDINGS column to perform a count of HOLDINGTYPE values. Your expression should be similar to COUNT("INPUT_023_0"."holdingtype").
    4. Change the input expression for the DISTINCTHOLDINGS column to perform a count distinct of HOLDINGTYPE values. Your expression should be similar to COUNT(distinct "INPUT_023_0"."holdingtype").

    Figure 12 shows how to configure the Group By operator. Note that the TOTALHOLDINGS and the DISTINCTHOLDINGS columns are computed using COUNT() expressions.

Figure 12. Specify columns for the result set
Shows select list with COUNT expressions
  1. Specify the columns to be included in the generated GROUP BY clause, as shown in Figure 13.
Figure 13. Specify the columns to be included in the Group By clause
Screen cap shows Aggregation Group By available columns

Step 7: Adding a Table Join operator

At this point, you're ready to join data produced by the Group By operator with data produced by the initial XML Relational Mapping operator. After doing so, you'll have a single result set that can populate the target DWADMIN.ACCT table.

  1. From the Palette, select SQL Warehousing Operators > Transformations > Table Join.
  2. Drag it to the right of the Group By operator.
  3. Connect the first XML Relational Mapping operator to the Table Join.
  4. Connect the Group By operator to the Table Join.
  5. Double-click the Table Join operator to program it.
  6. Create an expression that performs an inner equi-join on the ID columns in the two result sets by clicking the button or by specifying it manually. For example, a Join condition might be "IN_045_0"."id" = "IN1_045_1"."id", as shown in Figure 14.
Figure 14. Build the join expression
Screen cap of JoinGroupedDataWithFullXML window showing the Join condition
  1. You can optionally use Design Studio to graphically construct the join expression, as shown in Figure 15.
Figure 15. Graphic construction of the join
Screen cap: SQL Condition Builder to select conditions or type SQL Text
  1. Identify the columns to be included in the result set, as shown in Figure 16. These columns should match those of the target DWADMIN.ACCT table.
Figure 16. Select the columns to be included in the operator's result set
Screen cap: Project Expressions Output area shows Avail Columns and Result Columns

Step 8: Adding a Target Table operator for ACCT

You're ready to define operators to populate the two target data warehouse tables. Complete the following steps to populate the DWADMIN.ACCT table.

  1. From the Palette, select SQL Warehousing Operators > Target Table.
  2. Drag it to the right of the Table Join operator
  3. When prompted, specify DWADMIN.ACCT as the Target Database Table, as shown in Figure 17.
Figure 17. Identify DWADMIN.ACCT as the target table
Screen cap: Insert : ACCT window showing DWADMIN.ACCT for Target database table
  1. Connect the port of the Table Join operator labeled Inner (for inner join) to the Target Table operator.
  2. Double-click the Target Table operator to program it.
  3. Specify that you want to clean the target table before performing insert operations by selecting Delete all rows, as shown in Figure 18. This enables you to easily test the data flow multiple times.
Figure 18. Delete all rows before insert
Screen cap: Insert : ACCT window with Clean Target Table Before Insert field set to Delete all rows
  1. Map the source columns from the Table Join operator's result set to the target columns of the table by dragging and dropping source columns (on the left) into the Map From area of the result columns definition on the right, as shown in Figure 19.
Figure 19. Identify the source data for each column in the target ACCT Table
Screen cap: Map Source to Target window with Available Columns and Result Columns

Step 9: Adding a Target Table operator for HOLDINGS

Now define a Target Table operator to populate the DWADMIN.HOLDINGS table by completing the following steps:

  1. From the Palette, select SQL Warehousing Operators > Target Table.
  2. Drag it to the right of the lower XML Relational Mapping operator. This operator also serves as input to the Group By operator.
  3. When prompted, specify DWADMIN.HOLDINGS as the Target database table, as shown in Figure 20.
Figure 20. Identify DWADMIN.HOLDINGS as the target table
Screen cap: Insert : HOLDINGS that shows DWADMIN.HOLDINGS as the Target database table
  1. Connect the XML Relational Mapping operator to the Target Table operator for DWADMIN.HOLDINGS.
  2. Double-click the Target Table operator to program it.
  3. Specify that you want to clean the target table before performing insert operations by selecting Delete all rows, as shown in Figure 21. This will enable you to easily test the data flow multiple times.
Figure 21. Delete all rows before insert
Screen cap: Advanced options shows Delete All Rows for Clean Target Table Before Insert field
  1. Map the appropriate source columns from the XML Relational Mapping operator to the target columns of the table, as shown in Figure 22. You can insert a subset of available columns into this target table.
Figure 22. Map available input to columns in HOLDINGS table
Screen cap: Map Source to Target for HOLDINGS shows Avail Columns and condition results

Step 10: Testing your work

Now you can test your data flow following the standard Design Studio process. You may find it helpful to use the debugger for your initial test, although this is optional.

  1. If you haven't already done so, save your work.
  2. In the Data Project Explorer pane, right-click your flow and select Validate.
  3. If any errors are detected, correct these before proceeding.
  4. If desired, set breakpoints in your data flow by right-clicking on a connection between operators in your data flow and clicking Toggle Breakpoint. Figure 23 shows how the data flow looks when breakpoints are set for connection.
Figure 23. Data flow with breakpoints set for debugging
Screen cap of XMLTableACCTHoldingsGroupBy with breakpoints after outputs and after Inner in Table Join
  1. In the Data Project Explorer pane, right-click your flow, and click Debug or Execute. If you debug the flow, you can monitor the output of each step in the data flow using the Resume icon. Note that the debugger might generate warnings related to DROP TABLE statements, which can be ignored.
  2. If desired, inspect the statements generated by debugging or executing your flow. A window appears containing the DB2 SQL and SQL/XML statements associated with your flow. Listing 4 contains a partial summary of the generated code for this sample data flow.
Listing 4. Data flow execution results, including generated code
Execution Result:
INFO: SQW10102I: Execution succeeded.


------------------------------------------------------
Execution Log:
C:/DWE/workspace/TPoXDW/run-profiles/logs/XMLTableACCTHoldingsGroupBy_SQW00...

 Data-Flows  Jan 25, 2010 1:09:49 PM  SQW03415I: Flow execution started for SQW00...
 Data-Flows  Jan 25, 2010 1:09:49 PM  SQW38501I: Execution log file:  C/DWE/works...
 Data-Flows  Jan 25, 2010 1:09:49 PM  SQW38509I: Execution temporary work directory...
 Data-Flows  Jan 25, 2010 1:09:52 PM  SQW03275I: Execution of SQL statement "SET...
 Data-Flows  Jan 25, 2010 1:09:52 PM  SQW03275I: Execution of SQL statement "CREATE...
  INFO
) AS
WITH
  Q22 (INFO) AS
    (SELECT INFO AS INFO

Conclusion

This series explores how IBM InfoSphere Warehouse Design Studio and DB2 pureXML can be used together to support increased use of XML in data warehouses. XML has emerged as the preferred format for data exchange in many organizations. Consequently, data architects are evaluating options for integrating XML into their data warehouses. Designing ETL jobs (or data flows) is an important aspect of integrating XML into data warehouse environments.

This article, the first in a two-part series, illustrated how to build a single data flow that uses an XML-based source table to populate two target data warehouse tables. One of these tables contains only relational data, while the other contains both relational and XML data. The second article in this series will explore how to implement a similar scenario using a control flow that sequences the execution of two different data flows that both rely on a common XML-based source table. Controlling the ETL work in this manner can be useful for guaranteeing the proper sequence for loading data warehouse tables that contain primary or foreign key dependencies.

Acknowledgments

Thanks to Matthias Nicola, Jing Shan, Lin Xu, and Shumin Wu for reviewing this article series.


Download

DescriptionNameSize
Sample DB2 script and data (for Windows)dwsetup.zip100KB

Resources

Learn

Get products and technologies

  • Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.
  • Download a free trial version of InfoSphere Warehouse Enterprise Edition.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, Big data and analytics, XML
ArticleID=477318
ArticleTitle=Using IBM InfoSphere Warehouse Design Studio with pureXML data, Part 1: Create an ETL data flow to populate a hybrid data warehouse
publish-date=03252010