The Federation: Database Interoperability, the Adventure Continues (Part 2)

The second of a two-part series on how to implement data federation using DB2 Information Integrator, this article focuses on a case study for a marketing campaign mailing list. In addition it covers dealing with big tables and data, creating XML documents from federated joins, and publishing federated XML via WebSphere MQ.

Share:

Marty Lurie (lurie@us.ibm.com), Systems Engineer, IBM

Marty LurieMarty Lurie started his computer career generating chads while attempting to write Fortran on an IBM 1130. His day job is in Systems Engineering for IBM's Informix organization, but if pressed he will admit he mostly plays with computers. His favorite program is the one he wrote to connect his Nordic Track to his laptop (the laptop lost two pounds, and lowered its cholesterol by 20%). Marty is an IBM-certified DB2 DBA, IBM-certified Business Intelligence Solutions Professional, and an Informix-certified Professional. He can be reached at lurie@us.ibm.com.


developerWorks Contributing author
        level

17 July 2003

© 2003 International Business Machines Corporation. All rights reserved.

Introduction

Federated databases increase productivity by providing a single view of many servers. This simplifies data access, and the federation server can join and optimize across different databases very effectively, as we saw in Part 1 of this article. To make Part 2 interesting, we'll focus on a case study using IBM DB2® Information IntegratorTM (DB2 II). The examples are intended to spark your thought process, not represent a suggested production environment. I find that case studies are easier to "sink your teeth into" more than abstract concepts.

Here is the challenge: our data is spread over many servers, as shown in Figure 1:

Figure 1. Campaign mailing list data spread over many data sources
Data spread over many data sources

Your mission, should you choose to accept it, is:

Create a mailing list for a marketing campaign. The data for this campaign is spread over multiple sources. Specifically, the required data sources are shown in Table 1:

Table 1. Data sources for mailing list

ContentData source
"do not mail" listOracle 9i
High value customersExcel® spreadsheet
Customer list, system of recordInformix® XPS
Customer credit ratingsInformix IDS

The marketing campaign you must create provides different incentives based on credit rating, customer prior purchasing history to yield the "high value" customers, and purchased marketing data.

The marketing list must be formatted as an XML stream and sent to the campaign production company via a persistent message queue.

The solution to the case study involves the following steps:

  1. Set up and test the data sources
  2. Develop the SQL to implement the marketing campaign
  3. "XML-ize" the query result, and enqueue the XML stream so it can be read by any queue client.

As promised in Part 1 of this article, the solution to the case study includes federated inserts, XML from federated joins, and XML and Websphere® MQ.

When my editor saw this challenge she said, "You're going to fit that into part 2?" Here goes...


First, a very brief word from our sponsors: the DB2 Information Integrator announcement

The DB2 Information Integrator product was announced on May 20, 2003. Here is the announcement letter, and here is a snippet from that letter in case you don't go find it yourself:

"IBM DB2 Information Integrator V8.1 represents the next generation of information integration software that provides a fundamental underpinning for e-business on demandTM. Responding quickly to changes means that businesses can quickly and easily correlate information not only across the enterprise, but also across its value chain. DB2 Information Integrator gives companies integrated, real-time access to business information across and beyond the enterprise. Such information may reside in diverse source systems (such as Oracle databases, MicrosoftTM spreadsheets, flat files) and be distributed across a variety of operating environments (such as Windows®, Linux, UNIX®, z/OS®)...."

Rather than spend a lot of time describing features, let's dive right in and use the features.


Setting up the data sources for the case study

In Part 1, we set up the Informix data sources, so I won't repeat that information here. See those instructions to set up the Informix sources. If you prefer the command line instead of a point and click interface see the Appendix from Part 1 of the article.

So in this section I'll describe:

  • Setting up and using and an Oracle data source for the do not mail list
  • Setting up and using an Excel data source for the marketing information
  • Bonus round: Setting up and using an ODBC data source
  • A very brief summary on how to create the customer table and credit table in Informix (see Part 1 for more details).

Setting up and using an Oracle data source

From Part 1 of this article we know the DB2 Information Integrator server just appears to the remote servers as a regular client program running queries. The federated environment we need to prepare for the case study consists of setting up a series of remote data sources. Don't be overwhelmed by the complexity; a DB2 Information Integrator server could be as simple as a single remote source with DB2 Information Integrator acting as a data access gateway. The steps to setting up data source are:

  1. Establish client connectivity.
  2. Configure the db2dj.ini file and set the ORACLE_HOME environmental variable.
  3. Create the wrapper.
  4. Create the server.
  5. Create a user mapping.
  6. Create table nicknames.

After we build this data source you'll see a pattern based on the two prior examples in Part 1 and this example, and should be able to configure federation to access just about any source given the required wrapper.

This section isn't a substitute for reading the fine documentation, but I will try to hit the highlights. There is now also a Redbook on Information Integration, see: Getting Started on Integrating Your Information.

Step 1: Establish client connectivity

Installing the Oracle client SDK on the DB2 Information Integrator server is rather uneventful. The critical step is validating that the connection works! Figure 2 shows the configuration screen that also provides a simple function test of the connectivity to the remote Oracle server.

The highlighted name in Figure 2, "foo", is the Node Name required for defining the configuration in Information Integrator to access the remote Oracle server.

Figure 2. Testing the connection to the Oracle server
Testing the connection to the Oracle server

Step 2: Configure the db2dj.ini file

By installing DB2 Information Integrator (I used the third and final beta release preceding the announcement) after installing the Oracle Client, the installation program was able to detect the location of the client code and add the required ORACLE_HOME stanza to the db2dj.ini file automatically. Listing 1 shows a Windows example of the db2dj.ini file:

Listing 1. The db2dj.ini file contains the Oracle client location and the parameters for Informix from Part 1.

C:\Program Files\IBM\SQLLIB\cfg>type db2dj.ini 
INFORMIXDIR=C:\PROGRA~1\Informix\CLIENT~1 
INFORMIXSERVER=fliif 
INFORMIXSERVER=flspy 
INFORMIXSERVER=flxps 
ORACLE_HOME=C:\oracle\ora92

When running DB2 II with the DB2 v8 fix pack 2 release, the ORACLE_HOME environmental variable must be set. The Win32 and UNIX commands are shown here for setting this variable:

set ORACLE_HOME=c:\oracle\ora92 
 
export ORACLE_HOME=/export/home/oracle

Step 3: Configure the wrapper

We learned in Part 1 that the DB2 Information Integrator wrapper tells DB2 Information Integrator which program libraries to use to get to the remote data source. I'll illustrate the Oracle setup, because Part 1 only included Informix remote servers.

There are both graphical and command line interfaces for configuring DB2 Information Integrator. To get to the graphical tool:

  • Run db2cc from the command line
  • In Windows, use Start-> Programs-> IBM DB2 -> General Administration Tools -> Control Center. Expand the tree on the left, left click on Federated Database Objects and take the Create Wrapper option. See Figure 3.
Figure 3. Creating a wrapper from the Control Center
Creating a wrapper from the Control Center

The next dialog defines the type of wrapper. For current releases of Oracle, versions 8 and 9, NET8 is the correct option. See Figure 4.

Figure 4. Choose NET8 for the Oracle wrapper name
Choose NET8 for the Oracle wrapper name

Your private tutor is always waiting behind the Show SQL button - please use it! Figure 5 shows an example of how you can find out exactly what commands you are executing so you can script the entire process for this or other data sources.

Figure 5. Behind the scenes of creating a wrapper
Behind the scenes of creating a wrapper

Step 4: Create the server

Now we need to identify the IP address and port for the remote Oracle system. There are a number of options available when creating the server. My favorite is DB2_MAXIMAL_PUSHDOWN. Figure 6 below shows how to do this from Control Center.

Figure 6. Creating a remote server
Creating a remote server

Step 5: Creating a user mapping

The user mapping provides authentication at the remote Oracle instance (Figure 7). To the remote database, the DB2 Information Integrator server looks like any other client. You can't expect the remote server to give up the goods without telling it who you are, now can you? We'll use the "scott" user id that comes as part of the standard Oracle install. No, my password for "scott" is not "tiger" and yours shouldn't be either! The user mapping GUI has some minor changes in the GA code; the screen shots here are from the final Beta test.

Figure 7. Creating a user mapping
Creating a user mapping

Step 6: Create table nicknames

For existing tables, the nickname is the local handle that the DB2 Information Integrator database uses to refer to the remote table. For a brand new table, the process is simplified.

For our case study we need a "do not mail" list stored in a database table. We'll create and populate a table on the remote Oracle instance from the db2=> command line.

First, let's create the table:

db2 => create table LURIE.do_not_mail (customer_num int)   
options (REMOTE_SERVER 'oralin',  
  REMOTE_SCHEMA 'SCOTT', REMOTE_TABNAME 'donotmail') 
 
DB20000I  The SQL command completed successfully.

Now let's insert some values, again from the db2 command line, using the convenient DB2 syntax:

db2 => insert into do_not_mail values (101), (105), (110) 
DB20000I  The SQL command completed successfully.

And for fun, let's select our values from the table:

db2 => select * from do_not_mail 
 
CUSTOMER_NUM 
------------ 
         101 
         105 
         110 
 
  3 record(s) selected.

If the tables already exist in the remote server, the last task is to create a table nickname so that we can refer to the remote tables as if they were resident on the DB2 Information Integrator server. This is not required for our case study, but I wanted you to have this for future reference. Again, the graphical user interface makes the setup very simple (Figure 8), and saving the SQL avoids all the pointing and clicking for future definitions.

Figure 8. Creating a nickname for an existing remote table
Creating a nickname for an existing remote table

The next thing you'll see (Figure 9) is a way to filter names from the remote catalog. I'm grabbing the entire remote catalog for this example; you should plan on using this filter dialog to cut down on the number of tables returned if you have many tables in your database.

Figure 9. Filtering the number of remote table names returned
Filtering the number of remote table names returned

Debugging tip: If you get this message:

IBM][CLI Driver][DB2/NT] SQL1822N  Unexpected error  
code "12541" received from data source "oralin".  
Associated text and tokens are "ORA-12541:  
TNS:no listener ".  SQLSTATE=560BD

Ask your Oracle DBA to run the lsnrctl start command. It turns out that the Oracle TCP remote listener doesn't auto-start when the server comes up. Put it in a script and then you won't forget.

Figure 10 the last two dialog boxes to create a nickname:

Figure 10. Creating the wrapper (continued)
Creating the wrapper (continued)

As you may have guessed, I'm really a command line guy, so for those of you who shun graphical user interfaces, see Listing 2.

And now, illustrated in Listing 2 are steps 3 to 6 for those who prefer the command line:

Listing 2. Creating wrapper, server, user mapping, and nickname from the command line

-- get a database connection  
connect to sample; 
-- STEP 3 create the wrapper, indicating which dll to use 
CREATE WRAPPER "NET8" LIBRARY 'db2net8.dll'; 
-- STEP 4 create the server that defines 
--  the remote data source 
-- notice the use of the MAXIMAL pushdown option 
CREATE SERVER "oralin"  
TYPE ORACLE VERSION '9i'   
WRAPPER "NET8" OPTIONS ( NODE 'foo', 
	 ADD DB2_MAXIMAL_PUSHDOWN 'Y' ); 
-- STEP 5 define the user mapping for authenticating 
-- to the remote server 
CREATE USER MAPPING FOR "LURIE"  
SERVER "oralin" OPTIONS( REMOTE_AUTHID 'scott',  
 REMOTE_PASSWORD 'use_yours'); 
 
-- STEP 6 with the configuration defined above, set  
-- up some tables in the remote server with nicknames 
 
create nickname ora_dept for  
  "oralin"."SCOTT"."DEPT"; 
create nickname ora_card1 for  
  "oralin"."SCOTT"."card1"; 
select count(*) from ora_card1; 
create nickname ora_card2 for  
  "oralin"."SCOTT"."ORA_CARD2"; 
select count(*) from ora_card2;

Setting up the Microsoft Excel Data Source

Many business users are happiest when they have their data right where they can see it - in Excel. This example accesses the Excel data from a local worksheet.

Excel data can be served over a network. A local DB2 Information Integrator server on the Windows PC makes this data viewable as a DB2 table. This may seem like overkill, but in some environments being able to deliver this kind of access can solve some hard problems. OK - here is a real example that a buddy of mine ran into:

You receive 200+ spreadsheets containing data about market segments. The business analysts are very proud of their creation - but you need to get all this data into one table. You could open each spreadsheet and do over 200 cut and pastes, or you could write a script to pump out the DDL shown in Listing 3 below, and do a series of insert-selects or a union. It is your choice, but I personally hate to type, to say nothing of cut and paste, and would certainly choose the script option.

Figure 11 is our original spreadsheet with a list of the premium customers identified by customer number.

Figure 11. Microsoft Excel spreadsheet
Microsoft Excel spreadsheet

This simple DDL provides SQL access to the spreadsheet data:

Listing 3. DDL to get access to the Excel data

create wrapper excelw library 'db2lsxls.dll'; 
create server excel_src wrapper excelw;  
db2 => create nickname rc_mkt  
  ( customer_num integer )  
  for server excel_src options (file_path  
  'c:\edrive\src\ii\mkt_list.xls', range 'a2:a20')

Now, let's select from our nickname:

db2 => select * from rc_mkt 
 
CUSTOMER_NUM 
------------ 
         102 
         107 
         111 
         103 
         104 
         105 
 
  6 record(s) selected.

It works great without cutting and pasting!

Bonus round: setting up an ODBC data source

For those remote servers that don't have a native DB2 Information Integrator driver, there is always ODBC. This example shows how to set up an ODBC connection to Informix - you can plug in the ODBC you need to get to your desired database.

Remember, the ODBC driver is resident on the DB2 Information Integrator server, and connects to the remote database. I used a remote Informix data source since it was easily available, but this is not a supported configuration - it is shown here just as a configuration example.

Listing 4. SQL to access an ODBC data source

db2 => create wrapper odbc library 'db2rcodbc.dll' 
DB20000I  The SQL command completed successfully. 
-- the NODE is the odbc dsn name 
db2 => create server ifxodbc type odbc version 3.0  
  wrapper ODBC options (node 'stores_odbc') 
DB20000I  The SQL command completed successfully. 
db2 => create user mapping for lurie server ifxodbc  
  options (REMOTE_AUTHID 'informix',  
  REMOTE_PASSWORD 'use_yours' ) 
DB20000I  The SQL command completed successfully.

The passthru feature lets you run SQL on the remote server as if you were directly connected to it. This is good for bypassing local optimization, validating what you think is on the remote server, etc. Don't forget to turn off passthru when you're finished.

Listing 5. Passthru

db2 => set passthru ifxodbc 
DB20000I  The SQL command completed successfully. 
db2 => select count(*) from customer 
 
 
----------------- 
              28. 
 
  1 record(s) selected. 
 
 
db2 => set passthru reset 
db2 => create nickname odbc_customer for  
 ifxodbc."informix"."customer" 
DB20000I  The SQL command completed successfully.

Setting up the Customer and Credit tables in Informix

We'll exercise a few more features of DB2 II as we set up the customer table and the credit rating table for the case study. The capability to create a table on a remote server and insert data into the remote server will be illustrated.

Creating tables and inserting data

The DB2 Information Integrator server can create tables, insert, update, and delete all without leaving the comfort of the db2 => prompt or your DB2 GUI of choice.

To illustrate this example, let's move data from Oracle to Informix. Sorry to let my bias show, but I think moving data out of Oracle and into Informix or DB2 UDB is a splendid idea.

Table creation

Without leaving the environment of DB2 Information Integrator, you can create a table at a remote server. In this case, we are creating a table on IDS:

db2 => create table lurie.credit_rating  (  
customer_num int, credit_rank int)  
options ( remote_server 'rcfliif' ,  
remote_schema 'informix',  
remote_tabname 'cust_credit' ); 
 
DB20000I  The SQL command completed successfully.

Populating a table with an INSERT-SELECT

Now populate the remote table with an INSERT-SELECT. The SQL below will take two columns from the remote Oracle table and put that data into the Informix table we just added.

Notice that we get an error!

db2 => insert into credit_rating (customer_num, credit_rank) 
  select superkey, c3 from ora_card2 
 
DB21034E  The command was processed as an SQL statement because  
it was not a valid Command Line Processor command.  During SQL  
processing it returned: 
SQL20190N  Federated insert, update, or delete operation  
cannot be compiled because of potential data inconsistency.   
SQLSTATE=0A503

DB2 Information Integrator is saving us from an error if we fail during insert. Because the insert is an easily repeatable transaction, we'll override this with a server option. You can accomplish this via the GUI with the Add server options dialog, or just run it from the command line:

db2 => ALTER SERVER "rcfliif" OPTIONS  
  (  ADD IUD_APP_SVPT_ENFORCE 'N') 
DB20000I  The SQL command completed successfully.

Now we'll have a go at the INSERT-SELECT again:

db2 => insert into credit_rating  
  (customer_num, credit_rank)  
   select superkey, c3 from ora_card2 
DB20000I  The SQL command completed successfully. 
db2 => select count(*) from credit_rating 
 
1 
----------- 
      50001 
 
  1 record(s) selected.

The data looks like this:

db2 => select * from credit_rating  
  where customer_num > 100 fetch first 10 rows only 
 
CUSTOMER_NUM CREDIT_RANK 
------------ ----------- 
         101           2 
         102           3 
         103           1 
         104           2 
         105           3 
         106           1 
         107           2 
         108           3 
         109           1 
         110           2 
 
  10 record(s) selected.

Create the customer table for the case study

The customer table is the last of the four tables we need for doing the join to create the marketing list. What does the customer table look like? I thought you'd never ask:

Figure 12. The customer table in XPS
The customer table in XPS

It is a very simple nickname. Part 1 of the article provides lots more background about setting up Informix nicknames.

-- set up the customer table nickname 
db2 => create nickname rc8_customer  
  for "rc_xps"."informix"."customer" 
DB20000I  The SQL command completed successfully.

The mailing list join

Now for the "mega-query". Please notice the comments that are in stream with the SQL. I have put these in bold for you.

-- the rc8_customer table is the system of  
-- record for addresses 
-- so get all the address fields there 
select x.fname,  
	x.lname,  
	x.company,  
	x.address1,  
	x.city, 
	x.state,  
	x.zipcode, 
-- use the CASE SQL statement to put in the logic... 
-- here is the magic to deal with each customer  
-- situation, on the marketing list, or a range of  
-- credit ratings 
	case  
-- if the customer is on the marketing list send 
-- the "super" mailing 
	 when x.customer_num in  
(select customer_num from rc_mkt ) then 'super'  
-- if the customer has a good credit, a 1 of 3 
-- then send a "deluxe" offer, not quite as good as super 
	 when i.credit_rank = 1  then 'deluxe'  
-- average rating, a 2, send standard mailing 
	 when i.credit_rank = 2  then 'standard'  
-- for the 3 rankings, they owe us money from the last  
--  order 
	 when i.credit_rank = 3  then 'collection letter'  
	 else 'standard'  
	  end  
-- join condition across the the tables 
from rc8_customer x, credit_rating i  
where x.customer_num = i.customer_num  
-- now suppress all maling to those on the do not mail list 
	and i.customer_num not in  
( select z.customer_num from do_not_mail z) 
;

The results look like this

select x.fname, x.lname, x.company, x.address1, x.city, x.state, x.zipcode, case 
 when x.customer_num in (select customer_num from rc_mkt ) then 'super' when i.c 
redit_rank = 1  then 'delux' when i.credit_rank = 2  then 'standard' when i.cred 
it_rank = 3  then 'collection letter' else 'standard' end from rc8_customer x, c 
redit_rating i where x.customer_num = i.customer_num and i.customer_num not in ( 
 select z.customer_num from do_not_mail z) 
 
FNAME           LNAME           COMPANY              ADDRESS1             CITY 
          STATE ZIPCODE 8 
--------------- --------------- -------------------- -------------------- ------ 
--------- ----- ------- ----------------- 
Carole          Sadler          Sports Spot          785 Geary St         San Fr 
ancisco   CA    94117   super 
Philip          Currie          Phil's Sports        654 Poplar           Palo A 
lto       CA    94303   super 
Anthony         Higgins         Play Ball!           East Shopping Cntr.  Redwoo 
d City    CA    94026   super 
George          Watson          Watson & Son         1143 Carver Place    Mounta 
in View   CA    94063   delux 
Charles         Ream            Athletic Supplies    41 Jordan Avenue     Palo A 
lto       CA    94304   super 
Donald          Quinn           Quinn's Sports       587 Alvarado         Redwoo 
d City    CA    94063   collection letter 
Jane            Miller          Sport Stuff          Mayfair Mart         Sunnyv 
ale       CA    94086   delux 
Frances         Keyes           Sports Center        3199 Sterling Court  Sunnyv 
ale       CA    94085   super 
Margaret        Lawson          Runners & Others     234 Wyandotte Way    Los Al 
tos       CA    94022   delux 
Lana            Beatty          Sportstown           654 Oak Grove        Menlo 
Park      CA    94025   standard 
(... rows not shown)

DB2 Visual Explain shows the use of a new operator, RPD, to access the Excel data, as shown in Figure 13:

Figure 13. Visual Explain of the mailing list join
Visual Explain of the mailing list join

Let's put this query into a view for future reference. IBM recommends creating views against remote tables as preferred to views on remote views.

Again, from the comport of the db2 command prompt we create the following view:

drop view v_mkt_list; 
create view v_mkt_list (fname, lname, company,  
 addr, city, st,zip, mail_type) as 
 
select x.fname,  
	x.lname,  
	x.company,  
	x.address1,  
	x.city, 
	x.state,  
	x.zipcode, 
	case  
	 when x.customer_num in (select customer_num from rc_mkt )  
	   then 'super'  
	 when i.credit_rank = 1  then 'delux'  
	 when i.credit_rank = 2  then 'standard'  
	  when i.credit_rank = 3  then 'collection letter'  
	 else 'standard'  
	  end  
from rc8_customer x, credit_rating i  
where x.customer_num = i.customer_num  
	and i.customer_num not in  
	 ( select z.customer_num from do_not_mail z) 
;

Dealing with big tables and dirty data

There are two design aspects of federation that pose significant challenges: VLDBs (very large databases), and dirty data. With good planning and architecture these issues can be managed.

Big tables

In a VLDB environment, a shared nothing architecture is required to scale up the processing power to perform "boil the ocean" type queries. See my article Simulating MPP Processing on Linux for more information about shared nothing architectures. If we try to join our large database to a remote database, then we have a torrent of data converging on the DB2 Information Integrator server.

There is a simple work-around. Create a table on the shared nothing MPP server and do an insert-select from the remote database into the MPP environment. Then send the join to the MPP environment and enjoy the rapid response, and end-user accolades, of running a highly scalable shared nothing server.

Dirty data

The second issue of federation is "dirty data". No, nothing that exciting, just that customer_number from system A is a household number, which doesn't mean the same at all as customer_number from system B, which is social security number. What's a DBA to do?

The DBA really needs to know the metadata - and apply the right tools to do data scrubbing. There is are wealth of extract, transform, and load (ETL) tools to help with this problem. We shouldn't over-look what DB2 offers in its native form.

The soundex function is a good example of a simple tool that can be used to join names. An attempt to join "LURIE" and "Laurie" and "Luria" would fail. Even if the names were all converted to upper case the join would fail. How can soundex help? It changes all the characters to upper case, compresses out the vowels, and uses a sound matching score for each letter.

Here is a working soundex() example:

create table soundex_test( lname char(18) ) 
DB20000I  The SQL command completed successfully. 
 
insert into soundex_test values  
  ('LURIE'), ('Laurie'), ('Luria') 
DB20000I  The SQL command completed successfully. 
 
select * from soundex_test 
 
LNAME 
------------------ 
LURIE 
Laurie 
Luria 
 
  3 record(s) selected. 
 
 
select soundex(lname) from soundex_test 
 
1 
---- 
L600 
L600 
L600 
 
  3 record(s) selected.

All three names produce the same soundex() result so our join would be successful. You can also write your own user-defined functions to do data scrubbing. A word of caution: if you need the highest possible match rates with the minimal false positives you really need to evaluate the data cleansing products, unless you have way too much free time and enjoy re-inventing the wheel.


Creating XML documents from federated joins

XML has become an excellent format for inter-system communication. What could be better for exchanging than self-describing data? XML certainly suffers from a low signal-to-noise ratio, being quite verbose, but has still gained wide acceptance.

DB2 has several built-in functions to XMLize your data. Let's just consider the rec2xml() function for now.

The rec2xml() function is a very quick and easy way to turn table data into XML. Our marketing campaign is defined by the view we created. Applying rec2xml() gives an XML stream in a single step:

select rec2xml ( 1.0, 'COLATTVAL','MKT_MAILING' 
	, fname  
	, lname  
	, company  
	, addr  
	, city  
	, st  
	,zip  
	, mail_type) 
from v_mkt_list;

and some sample output looks like this:

<MKT_MAILING> 
<column name="FNAME">Frank          </column> 
<column name="LNAME">Lessor         </column> 
<column name="COMPANY">Phoenix University  </column> 
<column name="ADDR">Athletic Department </column> 
<column name="CITY">Phoenix        </column> 
<column name="ST">AZ</column> 
<column name="ZIP">85008</column> 
<column name="MAIL_TYPE">standard</column> 
</MKT_MAILING>

This can also be encapsulated in a view as follows:

create view mkt_xml (mail_label_xml) as ( 
select rec2xml ( 1.0, 'COLATTVAL','MKT_MAILING' 
	, fname  
	, lname  
	, company  
	, addr  
	, city  
	, st  
	,zip  
	, mail_type) 
from v_mkt_list ) 
;

Putting it all together: publishing federated XML via WebSphere® MQ

We're now ready to put this all together and publish our XML marketing campaign to the company that will do the mailing. The communication method is IBM Websphere MQ. This is a persistent queue.

What is a persistent queue? Think of it as a UNIX® named pipe on steroids. This queue can contain many distinct messages, and it can be read without destroying the message. It can be used for publish-and-subscribe or point-to-point messaging. It is a very powerful messaging system.

To publish our XML to WebSphere MQ we need to do some set-up work. We'll take the easiest path and do everything locally:

  • Install WebSphere on the same server as DB2. Check the IBM web site or contact your IBM team for evaluation software.
  • Find the MQ functions that shipped with DB2. On Windows the file is ma0f_nt.zip. Unzip and install these binaries.
  • Change to the DB2 cfg directory and run enable_mqfunctions, as shown here:
...\IBM\SQLLIB\cfg>enable_mqfunctions -n sample -u lurie -p use_yours 
 
  *** Please wait: creating queue manager (DB2MQ_DEFAULT_MQM) ... 
 
  *** Please wait: starting the queue manager (DB2MQ_DEFAULT_MQM) ... 
 
 (0) db2mq UDFs were found 
     -- Create MQ Functions ... 
  Validate MQ Functions ... 
  Validate successfully. 
 
  *** Please allow a few seconds to clean up the system ...

If you need to manually start WebSphere MQ, the commands are:

#start the queue and command server 
strmqm DB2MQ_DEFAULT_MQM 
strmqcsv DB2MQ_DEFAULT_MQM

And now for the moment you've been waiting for - publishing our XML to WebSphere MQ. The command is so simple it is almost an anti-climax.

C:\edrive\src\ii>db2 select db2mq.mqsend ( mail_label_xml ) from mkt_xml 
 
1 
- 
1 
(... rows not shown)  
1 
1 
1 
 
  25 record(s) selected.

What now? We can see the real results by using the MQ API Exerciser, which is easily found on the "MQ first steps" launcher. Fire up the API Exerciser, connect to the DB2 Queue, and perform an MQGET as shown in Figure 14.

Figure 14. MQGET on the DB2 queue
MQGET on the DB2 queue

Figure 15 is the actual message from the queue, and it is exactly what we need to complete our case study. We have federated across three different relational data sources and an Excel spreadsheet, converted the join output to XML, and published the XML in a persistent message queue so that others may access this information. Wow. We're done.

Figure 15. Contents of the XML message shows federated data
Contents of the XML message shows federated data

Conclusion

I hope these two articles have sparked your imagination about how to combine data across different servers to make your DBA job easier.

We've looked at multiple sources, joins, inserts, optimization, and a case study. We've looked at the implications of large data volumes and the need for data scrubbing. We've combined remote data into an XML stream and published it in a persistent message queue.

Please try this at home. It should keep you busy for long enough so I can write my next article.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, WebSphere
ArticleID=13284
ArticleTitle=The Federation: Database Interoperability, the Adventure Continues (Part 2)
publish-date=07172003