Skip to main content

Coping with Disparate Data in Web Applications

C. M. Saracco, DBMS and Web Application Server Integration, IBM, Software Group
C. M. Saracco is a senior software engineer at IBM's Silicon Valley laboratory and a former software technology instructor for UC Santa Cruz extension program. She has lectured on a variety of technology topics throughout North America, South America, Europe and the Middle East.

Summary:  Federated DBMSs offer Web architects some help in coping with disparate data. In this first of three articles, author C. M. Saracco explores some key technologies relevant to accessing federated data from Web application servers and reviews techniques for configuring DB2 federated so that Web component developers can build J2EE components and Web services that transparently access a wide range of data.

Date:  22 Aug 2002
Level:  Introductory
Activity:  675 views

©2002 International Business Machines Corporation. All rights reserved.

Introduction

As companies continue to drive for higher revenues and greater operational efficiency, IT leaders frequently seek better ways to access, integrate, and act upon critical business data. The task is rarely an easy one, since this data is often dispersed across a variety of systems and stored in a variety of formats. Some of this data even may be owned by business partners or generated dynamically by applications, Web services, or JavaTM 2 Enterprise Edition (J2EE) server-side components.

Consolidating and working with this plethora of data poses a serious challenge to many IT organizations -- a challenge that is arguably extreme in Web-based environments, where software development cycles are often very short. In such environments, the Web application server tier is usually where critical business logic -- often embodied in reusable software components such as Enterprise JavaBeansTM (EJBs), Web services, servlets, and the like -- must ultimately work with a wide range of data stored or generated by a variety of remote systems. Failure to do so in a transparent and reliable fashion restricts the usefulness of these mid-tier servers and, in turn, limits the capabilities of the applications built upon them.

Federated database management systems (DBMSs) offer Web architects some help in coping with disparate data. Such systems provide a single-site image of distributed data stored (or generated) in a variety of formats. Furthermore, federated DBMSs offer Web programmers a common interface for accessing this data. Finally, an efficient federated DBMS may feature an extended catalog capable of maintaining statistics about remote data and use these statistics to globally optimize data access.

As a result, federated DBMSs can help firms integrate information quickly and effectively, relieving component developers of much of the burden associated with connecting to, accessing, combining, filtering, and transforming data from multiple sources. In this article, we'll explore some key technologies relevant to accessing federated data from Web application servers. We'll also review techniques for configuring a federated DBMS -- DB2® Version 8 for Linux, UNIX®, and Windows® platforms -- so that Web component developers can build J2EE components and Web services that transparently access a wide range of data. Subsequent articles will delve into design and programming issues associated with developing session beans, Web services, and entity beans with container-managed persistence over a federated database. Building Web Components that Access Federated Data reviews design requirements for building session EJBs and Web services that work with federated data. In Building Entity EJBs that Span Federated Data , we see how to build entity EJBs that span federated data.


Learning the key technologies

To determine if a federated DBMS can help your Web application developers, it's important to understand the technologies that comprise the environment we've been discussing. Key technologies include federated DBMSs , Web application servers , EJBs , and Web services . We'll look into each of these briefly here; for more details, see some of the resources listed under Resources and related readings .

Federated DBMS

A federated DBMS, such as IBM's DB2 on Linux, UNIX and Windows, provides a virtual database for multiple data sources. These data sources may run on different hardware and operating system platforms, may be marketed by different vendors, and may use different application programming interfaces (including different SQL dialects). A federated DBMS shields programmers from these differences, enabling programmers to work at a higher level of abstraction than would otherwise be possible. For example, nicknames for tables (or other data objects, such as files) offers programmers location transparency, eliminating the need for them to know precisely where the desired data resides. Functional compensation can mask differences between different data sources and simulate capabilities that aren't natively supported. Multi-site joins and unions promote integration of data from multiple sources.

Federated DBMS technology debuted commercially in the 1990s. Product offerings were variously called next-generation gateways, data-access middleware and multi-database servers. DataJoiner® was IBM's first commercial release of this technology; once marketed as a separate product, it has since been enhanced and integrated into DB2 V8. In addition to the capabilities we just discussed, DB2 supports a "wrapper" architecture, which enables programmers to customize the federated DBMS to access a data source of their choosing. IBM supplies various ready-made wrappers for DB2, enabling its federated DBMS to interface to a number of relational and non-relational data sources. Relational data sources include all members of the DB2 family, Microsoft® SQL Server, Oracle, Sybase, and Informix®. For a complete list of supported data sources and available wrappers, download the DB2 V8 beta library and see the documentation on "Federated Systems." The DB2 V8 library, as well as the product itself, can be downloaded by following the link on the DB2 home page at http://www.software.ibm.com/data/db2 .

A sample federated DBMS server architecture is shown in Figure 1 . JDBC applications connect to a federated DB2 server that is configured to access three different data sources on different platforms. This enables the JDBC applications to work transparently with any or all of these data sources. Furthermore, views can be created across these data sources to simplify data integration issues for read-only applications.


Figure 1. Sample federated DBMS environment configured to access three remote DBMSs
Sample federated DBMS environment configured to access three remote DBMSs

Of course, DB2 can store and manage its own local data objects, such as tables, views, and indexes. Its optimizer is designed to consider the disparate and physically distributed nature of its environment so that an efficient data access strategy can be selected for each query. Those who build wrappers can provide cost estimation information to help the optimizer make an efficient selection when working with custom data sources. However, as of this writing, two-phase commit processing is not supported. Thus, programmers should avoid attempting to insert, update, or delete data from multiple data sources within a single transaction.

Web application servers

Web application servers help firms manage and deploy server-side business logic. This logic, typically written in Java, is crucial to supporting multi-tiered Internet, intranet, and extranet applications. Depending on application needs, different technologies may be employed to implement this logic. Examples include Java Server PagesTM (JSPs), Java servlets, EJBs, and Web services. We'll discuss some of these shortly.

A sample Web application server environment is shown in Figure 2 . The mid-tier server can manage requests from a variety of remote clients. Installed on the middle tier is an HTTP Web server (such as the IBM HTTP Server) as well as a Web application server (such as IBM's WebSphere Application Server Advanced Edition). The application server supports various technologies, such as EJBs and Web services, which perform relevant functions in support of client applications. Among these functions may be access to local or remote data sources from a variety of vendors; the configuration illustrates local data source access only.


Figure 2. Sample Web application server installation, configured with a local DBMS
Sample Web application server installation, configured with a local DBMS

Web application servers employ various techniques to support efficient access of local or remote data. Connection pooling, for example, can conserve resources and prevent software components from consistently incurring the overhead of creating a new connection at the data source (and, later, terminating the connection). With EJBs, proper settings for transaction attributes and isolation levels can improve concurrent data access and potentially minimize deadlocks.

Federated DBMS technology can complement built-in database support provided by Web application servers. It can make joining and unioning data across multiple data sources as simple as writing a single SQL statement. This is a considerable improvement over the alternative: connecting to each data source individually, extracting the necessary data using different native APIs, and filtering, sorting, and consolidating the data manually. In addition to its complexity and error-prone nature, the latter approach affords little possibility for ensuring high performance. In such a scenario, the software developer is forced to contend with global optimization issues manually and without the benefits of a global catalog maintaining important statistics about the remote data.

In addition to providing transparent and efficient access to data spanning multiple sources, federated DBMSs can extend the reach of Web application servers to include support of remote data -- both physically stored or dynamically generated -- that the application server doesn't natively support.

Enterprise JavaBeans (EJBs)

EJBs are server-side software Java components designed to promote code reuse and provide built-in support for functions critical to production applications, such as transaction management, security, and persistence. Java client applications can access EJBs directly using remote method invocation (RMI) over Internet Inter-Orb Protocol (IIOP). Web clients, on the other hand, access EJBs indirectly; they use HTTP to communicate to an HTTP server, which in turn may invoke a servlet, JSP, or Web service that accesses the EJB. Figure 3 illustrates both approaches.


Figure 3. Both HTTP-based clients and "traditional" Java applications can use EJBs
Both HTTP-based clients and traditional Java applications can use EJBs

EJBs are deployed into containers running on the Web application server. We won't discuss the role of EJB containers in great detail here. However, it's worth noting that these containers may provide support for persistence. Certain kinds of EJBs rely on the container to implement and manage access to the target data source. This relieves EJB developers from writing data access routines and can help promote DBMS portability of their beans. In a future article, we'll see how to build such beans over federated data and explore when it may be beneficial to do so. But for now, let's continue our general discussion of EJBs.

The EJB 1.1 specification defines two overall types of beans: session EJBs, which are transient by nature, and entity EJBs, which are persistent. Session beans themselves can be stateless or stateful , and developers of such beans can employ JDBC for read/write access to supported DBMSs. Indeed, many session beans are written to perform some database operation or transactional work. However, any data associated with a session bean is presumed to be transient; the container provides no automated support for persistence. Entity EJBs, by contrast, are presumed to possess data that is persistent. Developers can manage this persistence themselves (via bean-managed persistence) or they can delegate this responsibility to the container (via container-managed persistence).

The different types of EJBs place different coding requirements on EJB developers and imply that minimal services available to clients will differ to some degree. Future articles will discuss the different types of EJBs in greater detail, as well as explain how you can build session and entity beans that span federated data.

Web services

Web service technology is relatively new, although a number of major vendors (including IBM, Microsoft, Oracle, and Sun Microsystems) have already announced or shipped products to support Web services. Even if you haven't worked with Web services yet, chances are the basic concepts will sound familiar to you.

A Web service is a set of one or more business functions that can be invoked programmatically over the Internet by applications or by other Web services. As such, they facilitate distributed computing and are designed to promote interoperability. Several underlying technologies play a key role in supporting web services, including HTTP, Extensible Markup Language (XML), Simple Object Access Protocol (SOAP), Web Services Definition Language (WSDL), and Universal Description, Discovery and Integration (UDDI). Web service providers publish their Web services so that clients can access these services using SOAP over HTTP. The client request causes the Web service to invoke a business function and, in most cases, to return a response to the client. The Web service itself is described by a WSDL document stored in a repository (such as a UDDI registry) or on the Web service provider's server. Storing the Web service description in an appropriate repository offers the potential for interested customers to discover its existence, potentially generating new business for the Web service provider.

Java is often the programming language of choice for developing Web services. And, as such, Web services that need to access DBMS data frequently make use of JDBC. In a future article, we'll see how such Web services can be written to exploit federated database technology.


Sample architecture for WebSphere and DB2 federated technology

By now, you may be wondering exactly how DB2's federated technology and WebSphere products can be configured to work together. While a number of installation options are possible, we'll explore the configuration used by this author for an internal project. A Windows NT® 4.0 workstation served as the development platform. We ran client software for DB2 V7.2 with FixPak 6 on this machine, along with WebSphere Studio Application Developer Integration Edition (WSADIE) 4.1. The latter product provides a Java-integrated development environment and includes a single-server version of WebSphere Application Server 4.0.2, which we used as a test bed for EJBs and Web services built to access federated data.

A remote AIX 4.3.3 server contained an instance of DB2 V8. We used various pre-beta level drivers for our work, but the functions supported by these drivers are now part of the open beta for DB2 V8 . The DB2 server was configured to act as a federated DBMS capable of accessing data in remote Oracle, Sybase, and Microsoft SQL Server DBMSs. DB2 supports access to other data sources -- such as DB2 for OS/390TM and z/OSTM, and Informix® -- so it's certainly possible to build a federated database environment involving more than the three data sources we used. Software prerequisites and the configuration process vary depending on the data sources involved, so we won't delve into this topic in detail. However, we'll outline the basic steps to give you an idea of what's involved in setting up a working environment. Consult the DB2 V8 beta manuals for further information.

Configuring the DB2 client and server

To enable a DB2 client to connect to a remote DB2 server, ensure that basic network connectivity is operational. We used TCP/IP as our communications network and added entries into the services files on each system to specify a service name and port number for our use.

To configure the DB2 server, we updated the database manager configuration to set properties for SVCENAME and FEDERATED . We connected to a locally created DB2 database ( rdjdb in our environment). Then, for each data source we wished to access (one instance each of an Oracle, Sybase, and Microsoft SQL Server DBMS), we created the necessary wrappers objects, server objects, and user mappings. The following example shows the commands that are contained within a script file used to accomplish this. Items shown in italics are peculiar to our environment.

db2 update dbm cfg using svcenamemyID authentication server 
db2 update dbm cfg using federated yes db2 connect reset
db2stop
db2start 
db2 connect tordjdb user user1 using pass1word

db2 create wrapper net8 options (DB2_FENCED 'N')

db2 create server oracle8 type oracle version 8.1.5 wrapper net8 authorization 
  oracleuser1    password oraclepwd options (node'oracle8.world', password 'Y', 
  pushdown 'Y')

db2 create user mapping for user1server oracle8 options ( REMOTE_AUTHID 
oracleuser1', REMOTE_PASSWORD'oraclepwd' )
. . .
			

To configure the DB2 client, you may use the DB2 Client Configuration Assistant or issue statements through the DB2 command line processor. We did the latter, issuing statements to identify the remote node on which the DB2 server resides ( blackcat.ibm.com in our example) and to specify a logical database name for the remote federated database we intended to use ( djdb ). Regardless of the method you use to configure your client, it's wise to test your configuration to ensure basic connectivity is working before developing Java applications or components that rely on the federated environment.

Here's an outline of the script we used to complete our client configuration, with italicized text representing input unique to our environment:

db2 catalog tcpip node <i>fednode</i> remote <i>blackcat.ibm.com</i> server <i>myID</i> 
db2 catalog database <i>rdjdb</i> as djdb at node <i>fednode</i> 
db2 terminate 
db2stop 
db2start 
db2 connect to <i>djdb</i> user <i>user1</i> using <i>pass1word</i>

Enabling transparent data access

As a final step to enable transparent data access, we created nicknames for remote data we wished to access. There are at least two ways to do this. For existing remote data, use the CREATE NICKNAME statement to map a remote object (such as a table stored in an Oracle database) to a nickname understood by DB2. Thereafter, applications can reference the nickname to access data in the relevant Oracle table rather than manually connecting to the Oracle system (via a SET PASSTHRU statement) and referencing its table directly. The following example connects to the federated database and creates a nickname of "budget" for the "budget" table residing on a remote Oracle system.

db2 connect to djdb user user1 using pass1word 
db2 create nickname budget for oracle8.oracleuser1.budget

However, if you want to create new data objects on remote relational DBMSs and map these to nicknames, it's often easiest to use DB2's data definition language (DDL) transparency feature. Doing so minimizes the number of SQL statements you'll have to write. Issuing a CREATE TABLE ... WITH OPTIONS statement accomplishes two tasks: it creates a table at the remote data source and a corresponding DB2 nickname for this table. The following example creates a "project" table on a remote Oracle system and a corresponding "project" nickname in our DB2 federated database.

db2 connect to djdb user user1 using pass1word 
db2 create table project ( 
ID int primary key not null, 
name varchar(30), 
managerID varchar(10)) 
options (remote_server 'oracle8', remote_schema 'oracleuser1')

Configuring the Java IDE and Web application server

While it's not mandatory to use a Java integrated development environment (IDE) to build mid-tier server components, it certainly can speed and simplify the process. As mentioned earlier, we installed IBM's WSADIE 4.1 product, which includes a version of WebSphere Application Server for testing purposes. To enable WSADIE and its embedded versions of WebSphere to work with DB2 federated data, there's little you need to do once you've completed the tasks outlined in the previous section.

When developing server-side Java components and Web services with WSADIE, make sure that the Java build path associated with your projects includes the location of the db2java.zip file. This file is found in the \java subdirectory of your DB2 installation location. For example, if you installed DB2 in the c:\sqllib directory, you would need to specify c:\sqllib\java\db2java.zip in the Java build path of your WSADIE projects.

In addition, you'll want to create a DataSource object in your WebSphere environment that maps to the DB2 database you configured for federated support. DataSource objects, which are part of the JDBC specification, provide for connection pooling in WAS. DataSources are easy to create; both the WebSphere test environment included with WSADIE and the stand-alone version of WebSphere provide graphical tools to enable you to create a DataSource. Doing so requires that you specify a name, user ID and password for the target data source. In our environment, we defined a DataSource named jdbc/Federated for the JDBC URL of jdbc:db2:djdb. We set the user ID to user1 and the password to pass1word . In a future article, we'll discuss DataSources in more detail as well as see examples of how they're used.


Tips for successful deployment

By now, you have some idea of what federated DBMS technology can bring to a Web application development environment. And you also understand what's generally involved in getting things set up. If the combination of WSADIE, WebSphere, and federated DBMS technologies sounds appealing to you, it's best to consider a few steps you can take to ensure a successful deployment of these technologies in your environment. Here are some tips to consider:

  • Ensure that subcomponents of your architecture are properly working before trying to integrate all software technologies together. For example, ensure your DB2 environment is properly configured to all necessary back-end data sources before trying to integrate it with WSADIE or WebSphere Application Server. This will aid debugging of any installation-related problems.
  • Familiarize yourself with problem diagnosis tools available with the products you're using. In particular, when debugging, make an early attempt to isolate the source of the failure. For example, if a database access activity is failing, determine the exact SQL statement being issued and run this statement from a DB2 command line processor connected to the DB2 federated database to help narrow down the cause.
  • Understand performance monitoring and tuning facilities available with the products you're using. If database-related activities appear to be slow, enlist the support of a database administrator. Activities such as adding appropriate indexes or updating catalog statistics (including information kept by DB2 for global optimization purposes) can have considerable impact on performance.
  • Consider using DB2's DDL transparency feature if you want to create new data objects in remote relational data sources to support your work. This can save time and simplify your work.

Useful Web sites

Acknowledgments

The author would like to thank Vander Alves, Jordan Barnes, and Dirk Wollscheid for their technical contributions to this project.

Top of page


Resources

  • Bontempo, Charles J. and C. M. Saracco. "Data Access Middleware: Seeking Out the Middle Ground," InfoDB, Volume 9 Number 4, August 1995. Available for order via http://www.middlewarespectra.com/abstracts/5_96_07.htm

  • Bontempo, Charles J. and C. M. Saracco. Database Management: Principles and Products , Prentice Hall, 1995, ISBN 0-13-380189-6. In particular, see Chapter 9.

  • Glass, Graham. "The Web services (r)evolution," multi-part article published electronically on IBM developerWorks, at http://www.ibm.com/developerworks

  • Haas, Laura and Eileen Lin. IBM Federated Database Technology , DB2 Developer Domain, at http://www.ibm.com/developerworks/db2/library/techarticle/0203haas/0203haas.html , March 2002.

  • Monson-Haefal, Richard. Enterprise JavaBeans , O'Reilly and Associates, 1999, ISBN 1-55860-519-3.

  • Purnell, Micks. Fundamentals of IBM DB2 Federated Server and Relational Connect at http://www.ibm.com/developerworks/db2/library/techarticle/0206purnell/0206purnell.html , June 2002.

  • Roth, M. Tork and P. Schwarz.: "Don't scrap it, wrap it! a wrapper architecture for legacy sources," Proceeding of the 23th VLDB Conference, Athens, Greece, 1997. Available for download from http://citeseer.nj.nec.com/roth97wrapper.html

  • Saracco, C. M. An Introduction to Data Access Middleware , IBM Corp., Technical Report STL 03.529, October 1993.

  • Saracco, C. M. Universal Database Management: A Guide to Object/Relational Technology , Morgan Kaufmann, 1998, ISBN 1-55860-519-3. In particular, see Chapter 9.

  • Schlimmer, Jeffery C., ed., Web Service Description Requirements, Working Draft , W3C, latest draft accessible at http://www.w3.org/TR/ws-desc-reqs

  • White, Seth, et. al. JDBC API Tutorial and Reference , Second Edition, Addison-Wesley, 1999, ISBN 0-201-43328-1.

Java tutorials, specifications, and news items:

IBM DB2 manuals, articles, and white papers:

IBM WebSphere Application Server manuals, articles, and white papers:

IBM redbooks on a variety of topics:

Web services articles and papers:

About the author

C. M. Saracco is a senior software engineer at IBM's Silicon Valley laboratory and a former software technology instructor for UC Santa Cruz extension program. She has lectured on a variety of technology topics throughout North America, South America, Europe and the Middle East.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, WebSphere
ArticleID=14072
ArticleTitle=Coping with Disparate Data in Web Applications
publish-date=08222002
author1-email=saracco@us.ibm.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers