© 2003 International Business Machines Corporation. All rights reserved.
Federated databases increase productivity by providing a single view of many servers. This simplifies data access, and the federation server can join and optimize across different databases very effectively, as we saw in Part 1 of this article. To make Part 2 interesting, we'll focus on a case study using IBM DB2® Information IntegratorTM (DB2 II). The examples are intended to spark your thought process, not represent a suggested production environment. I find that case studies are easier to "sink your teeth into" more than abstract concepts.
Here is the challenge: our data is spread over many servers, as shown in Figure 1:
Figure 1. Campaign mailing list data spread over many data sources
Your mission, should you choose to accept it, is:
Create a mailing list for a marketing campaign. The data for this campaign is spread over multiple sources. Specifically, the required data sources are shown in Table 1:
Table 1. Data sources for mailing list
|"do not mail" list||Oracle 9i|
|High value customers||Excel® spreadsheet|
|Customer list, system of record||Informix® XPS|
|Customer credit ratings||Informix IDS|
The marketing campaign you must create provides different incentives based on credit rating, customer prior purchasing history to yield the "high value" customers, and purchased marketing data.
The marketing list must be formatted as an XML stream and sent to the campaign production company via a persistent message queue.
The solution to the case study involves the following steps:
- Set up and test the data sources
- Develop the SQL to implement the marketing campaign
- "XML-ize" the query result, and enqueue the XML stream so it can be read by any queue client.
As promised in Part 1 of this article, the solution to the case study includes federated inserts, XML from federated joins, and XML and Websphere® MQ.
When my editor saw this challenge she said, "You're going to fit that into part 2?" Here goes...
First, a very brief word from our sponsors: the DB2 Information Integrator announcement
The DB2 Information Integrator product was announced on May 20, 2003. Here is the announcement letter, and here is a snippet from that letter in case you don't go find it yourself:
"IBM DB2 Information Integrator V8.1 represents the next generation of information integration software that provides a fundamental underpinning for e-business on demandTM. Responding quickly to changes means that businesses can quickly and easily correlate information not only across the enterprise, but also across its value chain. DB2 Information Integrator gives companies integrated, real-time access to business information across and beyond the enterprise. Such information may reside in diverse source systems (such as Oracle databases, MicrosoftTM spreadsheets, flat files) and be distributed across a variety of operating environments (such as Windows®, Linux, UNIX®, z/OS®)...."
Rather than spend a lot of time describing features, let's dive right in and use the features.
Setting up the data sources for the case study
In Part 1, we set up the Informix data sources, so I won't repeat that information here. See those instructions to set up the Informix sources. If you prefer the command line instead of a point and click interface see the Appendix from Part 1 of the article.
So in this section I'll describe:
- Setting up and using and an Oracle data source for the do not mail list
- Setting up and using an Excel data source for the marketing information
- Bonus round: Setting up and using an ODBC data source
- A very brief summary on how to create the customer table and credit table in Informix (see Part 1 for more details).
Setting up and using an Oracle data source
From Part 1 of this article we know the DB2 Information Integrator server just appears to the remote servers as a regular client program running queries. The federated environment we need to prepare for the case study consists of setting up a series of remote data sources. Don't be overwhelmed by the complexity; a DB2 Information Integrator server could be as simple as a single remote source with DB2 Information Integrator acting as a data access gateway. The steps to setting up data source are:
- Establish client connectivity.
- Configure the
db2dj.inifile and set the ORACLE_HOME environmental variable.
- Create the wrapper.
- Create the server.
- Create a user mapping.
- Create table nicknames.
After we build this data source you'll see a pattern based on the two prior examples in Part 1 and this example, and should be able to configure federation to access just about any source given the required wrapper.
This section isn't a substitute for reading the fine documentation, but I will try to hit the highlights. There is now also a Redbook on Information Integration, see: Getting Started on Integrating Your Information.
Step 1: Establish client connectivity
Installing the Oracle client SDK on the DB2 Information Integrator server is rather uneventful. The critical step is validating that the connection works! Figure 2 shows the configuration screen that also provides a simple function test of the connectivity to the remote Oracle server.
The highlighted name in Figure 2, "foo", is the Node Name required for defining the configuration in Information Integrator to access the remote Oracle server.
Figure 2. Testing the connection to the Oracle server
Step 2: Configure the
By installing DB2 Information Integrator (I used the third and final beta release preceding the announcement) after installing the Oracle Client, the installation program was able to detect the location of the client code and add the required ORACLE_HOME stanza to the
db2dj.ini file automatically. Listing 1 shows a Windows example of the
Listing 1. The
db2dj.ini file contains the Oracle client location and the parameters for Informix from Part 1.
C:\Program Files\IBM\SQLLIB\cfg>type db2dj.ini INFORMIXDIR=C:\PROGRA~1\Informix\CLIENT~1 INFORMIXSERVER=fliif INFORMIXSERVER=flspy INFORMIXSERVER=flxps ORACLE_HOME=C:\oracle\ora92
When running DB2 II with the DB2 v8 fix pack 2 release, the
ORACLE_HOME environmental variable must be set. The Win32 and UNIX commands are shown here for setting this variable:
set ORACLE_HOME=c:\oracle\ora92 export ORACLE_HOME=/export/home/oracle
Step 3: Configure the wrapper
We learned in Part 1 that the DB2 Information Integrator wrapper tells DB2 Information Integrator which program libraries to use to get to the remote data source. I'll illustrate the Oracle setup, because Part 1 only included Informix remote servers.
There are both graphical and command line interfaces for configuring DB2 Information Integrator. To get to the graphical tool:
db2ccfrom the command line
- In Windows, use Start-> Programs-> IBM DB2 -> General Administration Tools -> Control Center. Expand the tree on the left, left click on Federated Database Objects and take the Create Wrapper option. See Figure 3.
Figure 3. Creating a wrapper from the Control Center
The next dialog defines the type of wrapper. For current releases of Oracle, versions 8 and 9, NET8 is the correct option. See Figure 4.
Figure 4. Choose NET8 for the Oracle wrapper name
Your private tutor is always waiting behind the Show SQL button - please use it! Figure 5 shows an example of how you can find out exactly what commands you are executing so you can script the entire process for this or other data sources.
Figure 5. Behind the scenes of creating a wrapper
Step 4: Create the server
Now we need to identify the IP address and port for the remote Oracle system. There are a number of options available when creating the server. My favorite is DB2_MAXIMAL_PUSHDOWN. Figure 6 below shows how to do this from Control Center.
Figure 6. Creating a remote server
Step 5: Creating a user mapping
The user mapping provides authentication at the remote Oracle instance (Figure 7). To the remote database, the DB2 Information Integrator server looks like any other client. You can't expect the remote server to give up the goods without telling it who you are, now can you? We'll use the "scott" user id that comes as part of the standard Oracle install. No, my password for "scott" is not "tiger" and yours shouldn't be either! The user mapping GUI has some minor changes in the GA code; the screen shots here are from the final Beta test.
Figure 7. Creating a user mapping
Step 6: Create table nicknames
For existing tables, the nickname is the local handle that the DB2 Information Integrator database uses to refer to the remote table. For a brand new table, the process is simplified.
For our case study we need a "do not mail" list stored in a database table. We'll create and populate a table on the remote Oracle instance from the
db2=> command line.
First, let's create the table:
db2 => create table LURIE.do_not_mail (customer_num int) options (REMOTE_SERVER 'oralin', REMOTE_SCHEMA 'SCOTT', REMOTE_TABNAME 'donotmail') DB20000I The SQL command completed successfully.
Now let's insert some values, again from the db2 command line, using the convenient DB2 syntax:
db2 => insert into do_not_mail values (101), (105), (110) DB20000I The SQL command completed successfully.
And for fun, let's select our values from the table:
db2 => select * from do_not_mail CUSTOMER_NUM ------------ 101 105 110 3 record(s) selected.
If the tables already exist in the remote server, the last task is to create a table nickname so that we can refer to the remote tables as if they were resident on the DB2 Information Integrator server. This is not required for our case study, but I wanted you to have this for future reference. Again, the graphical user interface makes the setup very simple (Figure 8), and saving the SQL avoids all the pointing and clicking for future definitions.
Figure 8. Creating a nickname for an existing remote table
The next thing you'll see (Figure 9) is a way to filter names from the remote catalog. I'm grabbing the entire remote catalog for this example; you should plan on using this filter dialog to cut down on the number of tables returned if you have many tables in your database.
Figure 9. Filtering the number of remote table names returned
Debugging tip: If you get this message:
IBM][CLI Driver][DB2/NT] SQL1822N Unexpected error code "12541" received from data source "oralin". Associated text and tokens are "ORA-12541: TNS:no listener ". SQLSTATE=560BD
Ask your Oracle DBA to run the
lsnrctl start command. It turns out that the Oracle TCP remote listener doesn't auto-start when the server comes up. Put it in a script and then you won't forget.
Figure 10 the last two dialog boxes to create a nickname:
Figure 10. Creating the wrapper (continued)
As you may have guessed, I'm really a command line guy, so for those of you who shun graphical user interfaces, see Listing 2.
And now, illustrated in Listing 2 are steps 3 to 6 for those who prefer the command line:
Listing 2. Creating wrapper, server, user mapping, and nickname from the command line
-- get a database connection connect to sample; -- STEP 3 create the wrapper, indicating which dll to use CREATE WRAPPER "NET8" LIBRARY 'db2net8.dll'; -- STEP 4 create the server that defines -- the remote data source -- notice the use of the MAXIMAL pushdown option CREATE SERVER "oralin" TYPE ORACLE VERSION '9i' WRAPPER "NET8" OPTIONS ( NODE 'foo', ADD DB2_MAXIMAL_PUSHDOWN 'Y' ); -- STEP 5 define the user mapping for authenticating -- to the remote server CREATE USER MAPPING FOR "LURIE" SERVER "oralin" OPTIONS( REMOTE_AUTHID 'scott', REMOTE_PASSWORD 'use_yours'); -- STEP 6 with the configuration defined above, set -- up some tables in the remote server with nicknames create nickname ora_dept for "oralin"."SCOTT"."DEPT"; create nickname ora_card1 for "oralin"."SCOTT"."card1"; select count(*) from ora_card1; create nickname ora_card2 for "oralin"."SCOTT"."ORA_CARD2"; select count(*) from ora_card2;
Setting up the Microsoft Excel Data Source
Many business users are happiest when they have their data right where they can see it - in Excel. This example accesses the Excel data from a local worksheet.
Excel data can be served over a network. A local DB2 Information Integrator server on the Windows PC makes this data viewable as a DB2 table. This may seem like overkill, but in some environments being able to deliver this kind of access can solve some hard problems. OK - here is a real example that a buddy of mine ran into:
You receive 200+ spreadsheets containing data about market segments. The business analysts are very proud of their creation - but you need to get all this data into one table. You could open each spreadsheet and do over 200 cut and pastes, or you could write a script to pump out the DDL shown in Listing 3 below, and do a series of insert-selects or a union. It is your choice, but I personally hate to type, to say nothing of cut and paste, and would certainly choose the script option.
Figure 11 is our original spreadsheet with a list of the premium customers identified by customer number.
Figure 11. Microsoft Excel spreadsheet
This simple DDL provides SQL access to the spreadsheet data:
Listing 3. DDL to get access to the Excel data
create wrapper excelw library 'db2lsxls.dll'; create server excel_src wrapper excelw; db2 => create nickname rc_mkt ( customer_num integer ) for server excel_src options (file_path 'c:\edrive\src\ii\mkt_list.xls', range 'a2:a20')
Now, let's select from our nickname:
db2 => select * from rc_mkt CUSTOMER_NUM ------------ 102 107 111 103 104 105 6 record(s) selected.
It works great without cutting and pasting!
Bonus round: setting up an ODBC data source
For those remote servers that don't have a native DB2 Information Integrator driver, there is always ODBC. This example shows how to set up an ODBC connection to Informix - you can plug in the ODBC you need to get to your desired database.
Remember, the ODBC driver is resident on the DB2 Information Integrator server, and connects to the remote database. I used a remote Informix data source since it was easily available, but this is not a supported configuration - it is shown here just as a configuration example.
Listing 4. SQL to access an ODBC data source
db2 => create wrapper odbc library 'db2rcodbc.dll' DB20000I The SQL command completed successfully. -- the NODE is the odbc dsn name db2 => create server ifxodbc type odbc version 3.0 wrapper ODBC options (node 'stores_odbc') DB20000I The SQL command completed successfully. db2 => create user mapping for lurie server ifxodbc options (REMOTE_AUTHID 'informix', REMOTE_PASSWORD 'use_yours' ) DB20000I The SQL command completed successfully.
The passthru feature lets you run SQL on the remote server as if you were directly connected to it. This is good for bypassing local optimization, validating what you think is on the remote server, etc. Don't forget to turn off passthru when you're finished.
Listing 5. Passthru
db2 => set passthru ifxodbc DB20000I The SQL command completed successfully. db2 => select count(*) from customer ----------------- 28. 1 record(s) selected. db2 => set passthru reset db2 => create nickname odbc_customer for ifxodbc."informix"."customer" DB20000I The SQL command completed successfully.
Setting up the Customer and Credit tables in Informix
We'll exercise a few more features of DB2 II as we set up the customer table and the credit rating table for the case study. The capability to create a table on a remote server and insert data into the remote server will be illustrated.
Creating tables and inserting data
The DB2 Information Integrator server can create tables, insert, update, and delete all without leaving the comfort of the
db2 => prompt or your DB2 GUI of choice.
To illustrate this example, let's move data from Oracle to Informix. Sorry to let my bias show, but I think moving data out of Oracle and into Informix or DB2 UDB is a splendid idea.
Without leaving the environment of DB2 Information Integrator, you can create a table at a remote server. In this case, we are creating a table on IDS:
db2 => create table lurie.credit_rating ( customer_num int, credit_rank int) options ( remote_server 'rcfliif' , remote_schema 'informix', remote_tabname 'cust_credit' ); DB20000I The SQL command completed successfully.
Populating a table with an INSERT-SELECT
Now populate the remote table with an INSERT-SELECT. The SQL below will take two columns from the remote Oracle table and put that data into the Informix table we just added.
Notice that we get an error!
db2 => insert into credit_rating (customer_num, credit_rank) select superkey, c3 from ora_card2 DB21034E The command was processed as an SQL statement because it was not a valid Command Line Processor command. During SQL processing it returned: SQL20190N Federated insert, update, or delete operation cannot be compiled because of potential data inconsistency. SQLSTATE=0A503
DB2 Information Integrator is saving us from an error if we fail during insert. Because the insert is an easily repeatable transaction, we'll override this with a server option. You can accomplish this via the GUI with the Add server options dialog, or just run it from the command line:
db2 => ALTER SERVER "rcfliif" OPTIONS ( ADD IUD_APP_SVPT_ENFORCE 'N') DB20000I The SQL command completed successfully.
Now we'll have a go at the INSERT-SELECT again:
db2 => insert into credit_rating (customer_num, credit_rank) select superkey, c3 from ora_card2 DB20000I The SQL command completed successfully. db2 => select count(*) from credit_rating 1 ----------- 50001 1 record(s) selected.
The data looks like this:
db2 => select * from credit_rating where customer_num > 100 fetch first 10 rows only CUSTOMER_NUM CREDIT_RANK ------------ ----------- 101 2 102 3 103 1 104 2 105 3 106 1 107 2 108 3 109 1 110 2 10 record(s) selected.
Create the customer table for the case study
The customer table is the last of the four tables we need for doing the join to create the marketing list. What does the customer table look like? I thought you'd never ask:
Figure 12. The customer table in XPS
It is a very simple nickname. Part 1 of the article provides lots more background about setting up Informix nicknames.
-- set up the customer table nickname db2 => create nickname rc8_customer for "rc_xps"."informix"."customer" DB20000I The SQL command completed successfully.
The mailing list join
Now for the "mega-query". Please notice the comments that are in stream with the SQL. I have put these in bold for you.
-- the rc8_customer table is the system of -- record for addresses -- so get all the address fields there select x.fname, x.lname, x.company, x.address1, x.city, x.state, x.zipcode, -- use the CASE SQL statement to put in the logic... -- here is the magic to deal with each customer -- situation, on the marketing list, or a range of -- credit ratings case -- if the customer is on the marketing list send -- the "super" mailing when x.customer_num in (select customer_num from rc_mkt ) then 'super' -- if the customer has a good credit, a 1 of 3 -- then send a "deluxe" offer, not quite as good as super when i.credit_rank = 1 then 'deluxe' -- average rating, a 2, send standard mailing when i.credit_rank = 2 then 'standard' -- for the 3 rankings, they owe us money from the last -- order when i.credit_rank = 3 then 'collection letter' else 'standard' end -- join condition across the the tables from rc8_customer x, credit_rating i where x.customer_num = i.customer_num -- now suppress all maling to those on the do not mail list and i.customer_num not in ( select z.customer_num from do_not_mail z) ;
The results look like this
select x.fname, x.lname, x.company, x.address1, x.city, x.state, x.zipcode, case when x.customer_num in (select customer_num from rc_mkt ) then 'super' when i.c redit_rank = 1 then 'delux' when i.credit_rank = 2 then 'standard' when i.cred it_rank = 3 then 'collection letter' else 'standard' end from rc8_customer x, c redit_rating i where x.customer_num = i.customer_num and i.customer_num not in ( select z.customer_num from do_not_mail z) FNAME LNAME COMPANY ADDRESS1 CITY STATE ZIPCODE 8 --------------- --------------- -------------------- -------------------- ------ --------- ----- ------- ----------------- Carole Sadler Sports Spot 785 Geary St San Fr ancisco CA 94117 super Philip Currie Phil's Sports 654 Poplar Palo A lto CA 94303 super Anthony Higgins Play Ball! East Shopping Cntr. Redwoo d City CA 94026 super George Watson Watson & Son 1143 Carver Place Mounta in View CA 94063 delux Charles Ream Athletic Supplies 41 Jordan Avenue Palo A lto CA 94304 super Donald Quinn Quinn's Sports 587 Alvarado Redwoo d City CA 94063 collection letter Jane Miller Sport Stuff Mayfair Mart Sunnyv ale CA 94086 delux Frances Keyes Sports Center 3199 Sterling Court Sunnyv ale CA 94085 super Margaret Lawson Runners & Others 234 Wyandotte Way Los Al tos CA 94022 delux Lana Beatty Sportstown 654 Oak Grove Menlo Park CA 94025 standard (... rows not shown)
DB2 Visual Explain shows the use of a new operator, RPD, to access the Excel data, as shown in Figure 13:
Figure 13. Visual Explain of the mailing list join
Let's put this query into a view for future reference. IBM recommends creating views against remote tables as preferred to views on remote views.
Again, from the comport of the db2 command prompt we create the following view:
drop view v_mkt_list; create view v_mkt_list (fname, lname, company, addr, city, st,zip, mail_type) as select x.fname, x.lname, x.company, x.address1, x.city, x.state, x.zipcode, case when x.customer_num in (select customer_num from rc_mkt ) then 'super' when i.credit_rank = 1 then 'delux' when i.credit_rank = 2 then 'standard' when i.credit_rank = 3 then 'collection letter' else 'standard' end from rc8_customer x, credit_rating i where x.customer_num = i.customer_num and i.customer_num not in ( select z.customer_num from do_not_mail z) ;
Dealing with big tables and dirty data
There are two design aspects of federation that pose significant challenges: VLDBs (very large databases), and dirty data. With good planning and architecture these issues can be managed.
In a VLDB environment, a shared nothing architecture is required to scale up the processing power to perform "boil the ocean" type queries. See my article Simulating MPP Processing on Linux for more information about shared nothing architectures. If we try to join our large database to a remote database, then we have a torrent of data converging on the DB2 Information Integrator server.
There is a simple work-around. Create a table on the shared nothing MPP server and do an insert-select from the remote database into the MPP environment. Then send the join to the MPP environment and enjoy the rapid response, and end-user accolades, of running a highly scalable shared nothing server.
The second issue of federation is "dirty data". No, nothing that exciting, just that
customer_number from system A is a household number, which doesn't mean the same at all as
customer_number from system B, which is social security number. What's a DBA to do?
The DBA really needs to know the metadata - and apply the right tools to do data scrubbing. There is are wealth of extract, transform, and load (ETL) tools to help with this problem. We shouldn't over-look what DB2 offers in its native form.
The soundex function is a good example of a simple tool that can be used to join names. An attempt to join "LURIE" and "Laurie" and "Luria" would fail. Even if the names were all converted to upper case the join would fail. How can soundex help? It changes all the characters to upper case, compresses out the vowels, and uses a sound matching score for each letter.
Here is a working soundex() example:
create table soundex_test( lname char(18) ) DB20000I The SQL command completed successfully. insert into soundex_test values ('LURIE'), ('Laurie'), ('Luria') DB20000I The SQL command completed successfully. select * from soundex_test LNAME ------------------ LURIE Laurie Luria 3 record(s) selected. select soundex(lname) from soundex_test 1 ---- L600 L600 L600 3 record(s) selected.
All three names produce the same soundex() result so our join would be successful. You can also write your own user-defined functions to do data scrubbing. A word of caution: if you need the highest possible match rates with the minimal false positives you really need to evaluate the data cleansing products, unless you have way too much free time and enjoy re-inventing the wheel.
Creating XML documents from federated joins
XML has become an excellent format for inter-system communication. What could be better for exchanging than self-describing data? XML certainly suffers from a low signal-to-noise ratio, being quite verbose, but has still gained wide acceptance.
DB2 has several built-in functions to XMLize your data. Let's just consider the rec2xml() function for now.
The rec2xml() function is a very quick and easy way to turn table data into XML. Our marketing campaign is defined by the view we created. Applying rec2xml() gives an XML stream in a single step:
select rec2xml ( 1.0, 'COLATTVAL','MKT_MAILING' , fname , lname , company , addr , city , st ,zip , mail_type) from v_mkt_list;
and some sample output looks like this:
<MKT_MAILING> <column name="FNAME">Frank </column> <column name="LNAME">Lessor </column> <column name="COMPANY">Phoenix University </column> <column name="ADDR">Athletic Department </column> <column name="CITY">Phoenix </column> <column name="ST">AZ</column> <column name="ZIP">85008</column> <column name="MAIL_TYPE">standard</column> </MKT_MAILING>
This can also be encapsulated in a view as follows:
create view mkt_xml (mail_label_xml) as ( select rec2xml ( 1.0, 'COLATTVAL','MKT_MAILING' , fname , lname , company , addr , city , st ,zip , mail_type) from v_mkt_list ) ;
Putting it all together: publishing federated XML via WebSphere® MQ
We're now ready to put this all together and publish our XML marketing campaign to the company that will do the mailing. The communication method is IBM Websphere MQ. This is a persistent queue.
What is a persistent queue? Think of it as a UNIX® named pipe on steroids. This queue can contain many distinct messages, and it can be read without destroying the message. It can be used for publish-and-subscribe or point-to-point messaging. It is a very powerful messaging system.
To publish our XML to WebSphere MQ we need to do some set-up work. We'll take the easiest path and do everything locally:
- Install WebSphere on the same server as DB2. Check the IBM web site or contact your IBM team for evaluation software.
- Find the MQ functions that shipped with DB2. On Windows the file is
ma0f_nt.zip. Unzip and install these binaries.
- Change to the DB2 cfg directory and run
enable_mqfunctions, as shown here:
...\IBM\SQLLIB\cfg>enable_mqfunctions -n sample -u lurie -p use_yours *** Please wait: creating queue manager (DB2MQ_DEFAULT_MQM) ... *** Please wait: starting the queue manager (DB2MQ_DEFAULT_MQM) ... (0) db2mq UDFs were found -- Create MQ Functions ... Validate MQ Functions ... Validate successfully. *** Please allow a few seconds to clean up the system ...
If you need to manually start WebSphere MQ, the commands are:
#start the queue and command server strmqm DB2MQ_DEFAULT_MQM strmqcsv DB2MQ_DEFAULT_MQM
And now for the moment you've been waiting for - publishing our XML to WebSphere MQ. The command is so simple it is almost an anti-climax.
C:\edrive\src\ii>db2 select db2mq.mqsend ( mail_label_xml ) from mkt_xml 1 - 1 (... rows not shown) 1 1 1 25 record(s) selected.
What now? We can see the real results by using the MQ API Exerciser, which is easily found on the "MQ first steps" launcher. Fire up the API Exerciser, connect to the DB2 Queue, and perform an MQGET as shown in Figure 14.
Figure 14. MQGET on the DB2 queue
Figure 15 is the actual message from the queue, and it is exactly what we need to complete our case study. We have federated across three different relational data sources and an Excel spreadsheet, converted the join output to XML, and published the XML in a persistent message queue so that others may access this information. Wow. We're done.
Figure 15. Contents of the XML message shows federated data
I hope these two articles have sparked your imagination about how to combine data across different servers to make your DBA job easier.
We've looked at multiple sources, joins, inserts, optimization, and a case study. We've looked at the implications of large data volumes and the need for data scrubbing. We've combined remote data into an XML stream and published it in a persistent message queue.
Please try this at home. It should keep you busy for long enough so I can write my next article.
- Getting Started on Integrating Your Information, available from IBM Redbooks at http://publib-b.boulder.ibm.com/Redbooks.nsf/RedbookAbstracts/sg246892.html.
- Saracco, C. M., Susanne Englert and Ingmar Gebert, "J2EE Development Across Multiple Data Sources: Digging into the Details," DB2 Developer's Domain, June 2003. Available at http://www-106.ibm.com/developerworks/db2/library/techarticle/0306saracco/0306saracco.html
- Hayes, Holly A, "Creating a flexible infrastructure for integrating information," available at http://www.ibm.com/software/data/pubs/pdfs/ii.pdf
- Saracco, C. M., Susanne Englert and Ingmar Gebert, "Performance of DB2 Information Integrator in a J2EE Environment with Multiple Data Sources," DB2 Developer's Domain, June 2003. Available at http://www-106.ibm.com/developerworks/db2/library/techarticle/0306saracco1/0306saracco1.html
- Saracco, C. M. and T. J. Rieger, "Our Experience with Developing Entity EJBs over Disparate Data Sources," DB2 Developer's Domain, May 2003. Available at http://www-106.ibm.com/developerworks/db2/library/techarticle/0305saracco/0305saracco.html