Skip to main content

skip to main content

developerWorks  >  Information Management | WebSphere  >

Our Experience with Developing Entity EJBs over Disparate Data Sources

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Introductory

C. M. Saracco, Senior Software Engineer, IBM Silicon Valley Lab
Tim Rieger, Student, University of Rostock, Germany

08 May 2003

Our authors share their experiences and compare the relative cost of implementing entity Enterprise JavaBeans that span different backend data sources both with and without DB2 Information Integrator for federated database access.

Introduction

If you're like many server-side JavaTM developers, you've probably found yourself struggling to cope with data that's spread across a variety of sources. In earlier articles, we described how Java 2 Enterprise Edition (J2EE) programmers can use federated database management (DBMS) technology to ease their burden. At least two of these articles, Building Entity EJBs that Span Federated Data and Accessing Federated Databases with Application Server Components, provided step-by-step instructions for entity Enterprise JavaBeans (EJB) developers working with the Application Developer configuration of IBM® WebSphere® Studio.

We wanted to follow up on these articles and explore just how beneficial federated database technology might be. We implemented some entity EJBs that needed to access data in different sources. To keep things interesting, we had critical data stored in two relational DBMSs (DB2® Universal DatabaseTM and Oracle) and one spreadsheet system (Microsoft® Excel). Arguably, our life would've been easier if we'd just used relational data sources. But then, who ever said life was always easy, anyway?

We used the Application Developer configuration of WebSphere Studio Version 5 as our development and test platform.

  • In one scenario, we relied on IBM DB2 Information Integrator to simulate a single-site image of our data sources. We call this the federated implementation.
  • In another scenario, we built our entity EJBs to access each of the required data sources directly. We call this the native data access implementation.

(In both cases, we also built facade session EJBs to front-end our entity EJBs, as many people do. But we'll get to that a bit later.) What follows is our development tale: what we had to do to make each scenario work and what we learned as a result.

We hope this experience will give you with some insight into the development challenges of building these types of EJBs with and without the use of a federated DBMS. For those of you who aren't interested in all the gory details, we've put our conclusions in the next section. And for those of you who really like to dig into design and coding issues, you can read through the subsequent sections to draw your own conclusions -- and see if they match ours.



Back to top


What we learned

We didn't think things were going to be easy, and we weren't disappointed. We had to work through design issues carefully, and we still encountered a surprise or two at implementation.

But we did find our overall task to be a lot easier when we used a federated DBMS than when we used native data access. With a federated DBMS, our design was cleaner, our implementation phase went more quickly, and we ended up with much less handwritten code to test and maintain (less than half, actually). We realized these benefits largely because the federated DBMS enabled us to build a single entity EJB with container-managed persistence (CMP) to perform the work that we needed.

Without the federated server, we needed to build three separate entity EJBs: one for the Oracle data, one for the DB2 UDB data, and one for the Excel data. Furthermore, because we needed to rely on a free JDBC/ODBC bridge from a third party to access our Excel data, we couldn't model this data using a CMP entity bean. Instead, we had to hand-code our own persistence mechanism, building an entity EJB with bean-managed persistence (BMP). This, of course, resulted in much more work.

These architectural issues had a ripple effect. Like many EJB developers, we realized our beans needed to support more flexible search mechanisms than just the default of finding a bean by its primary key value. Of course, this meant we had to implement a custom finder method, as many EJB developers do. For the federated implementation of our CMP entity EJB, this wasn't terribly difficult; we merely wrote one finder method using the EJB Query Language (EJB QL). Because of restrictions associated with EJB QL, we couldn't express everything in the query that we would have liked (more on that later), but we were able to cope with these limits adequately for our test case.

For the native data access implementation of our entity EJBs, things again got tougher. We were able to use EJB QL for our CMP EJBs only (our Oracle and DB2 UDB data). Because EJB QL isn't supported for BMP EJBs, we had to hand-code SQL queries in our custom finder methods, using JDBC to execute these queries to retrieve the necessary data. Since we're already familiar with SQL, this wasn't a great hardship. However, it did mean that implementing our solution without the use of a federated DBMS required us to be proficient in two different query languages -- EJB QL and SQL.

Finally, the work of our facade session EJB was also affected by our entity EJB implementations. With the federated implementation, the session EJB's work was relatively straightforward:

  1. Look up the single entity EJB of interest,
  2. Invoke its custom finder method, and
  3. Process the results.

With the native data access implementation, the session EJB had to:

  1. Look up each of the three entity EJBs of interest,
  2. Invoke the custom finder method for each,
  3. Integrate the results returned by each,
  4. Finally process the combined result as desired.

Again, this required more thought, effort, and time to code and test.

The one drawback to our federated implementation was that we had to make our CMP entity EJB read only (by modifying the Access Intent setting in the EJB deployment descriptor), because we mapped our EJB to a view that was not updateable. For our application scenario, which we'll describe shortly, this was acceptable.

If all this has made you curious about our work, read on. We'll take you through the design issues and review relevant code samples as we go along. And if you've had trouble following along so far, don't worry. The next two sections will give you a quick overview of federated DBMS and EJB technologies.



Back to top


Understanding the concepts

This section is intended for those of you new to either federated database management systems or to Enterprise JavaBeans (EJBs).

Federated DBMSs

We've been tossing around the term "federated DBMS" a lot already, and if you haven't read some of the previous articles on DB2 Developer Domain about the topic, you're probably curious just exactly what such a system is.

In simplest terms, a federated DBMS is a virtual database server. It provides a single application programming interface (API) to multiple data sources. These data sources may run on different hardware and operating system platforms, may be developed by different vendors, and may employ different APIs (including different SQL dialects). Programmers use the federated server to work at a higher level of abstraction than would otherwise be possible, because the server presents a single-site image of physically disparate data. The use of nicknames for tables (or other data objects, such as files) offers programmers location transparency, eliminating the need for them to know precisely where the desired data resides. Functional compensation can mask differences between different vendors' products and simulate capabilities that aren't natively supported by a given data source. Multi-site joins and unions promote integration of data from multiple sources.

Federated DBMS technology debuted commercially in the 1990s under various names; papers of that time often referred to such products as next-generation gateways, data access middleware and multi-database servers. IBM DB2 DataJoiner® was IBM's first commercial release of this technology. DB2 Universal Database V8 for Linux, UNIX® and Windows platforms now provides built-in federated DBMS support for a select number of data sources, including the DB2 family, Web services, and WebSphere MQ. DB2 Information Integrator adds federated support for a wide range of additional relational and non-relational data sources, which we'll discuss shortly. DB2 Information Integrator also features an extensible architecture that enables users to customize the federated DBMS to support access to a data source of their choosing.

A sample federated DBMS server architecture is shown in Figure 1. In this environment, Java programmers can write JDBC-based applications that connect to the server. This server, in turn, interfaces to data sources supported by different vendors on different platforms. As a result, JDBC applications can reach any or all of these data sources without the application programmer needing to learn the underlying API of each data source. Furthermore, views can be created across these data sources to simplify data integration issues for read-only applications.


Figure 1. Sample federated DBMS environment with DB2 Information Integrator
Sample federated DBMS environment with DB2 Information Integrator

Data source support and product function varies among commercial offerings. DB2 Information Integrator supports:

  • All members of the IBM DB2 family
  • IBM Informix®
  • Microsoft SQL Server
  • Oracle
  • Sybase
  • ODBC-enabled data sources
  • XML
  • Web services
  • WebSphere MQ
  • Excel spreadsheets
  • Flat files
  • Life sciences data sources

In addition, DB2 Information Integrator can also access Web search engines, content repositories, e-mail databases, and other content-based data sources through IBM Lotus® Extended Search.

Because DB2 Information Integrator contains a full-fledged relational DBMS, it can store and manage its own local data objects, such as tables, views, and indexes. Its optimizer is designed to consider the disparate and physically distributed nature of its environment so that it can select an efficient data access strategy for each query. This release of DB2 Information Integrator supports reading data from multiple data sources within a single transaction; write activities are supported against one data source per transaction.

EJBs

EJBs are server-side software components that implement a minimal set of behaviors, encapsulating business logic in a way that simplifies application development and helps promote portability. Support for functions frequently required by production applications, such as transactions, security, and persistence, are mandated by the EJB specification. EJBs may be accessed directly from client-side Java applications by using RMI/IIOP protocols or may be accessed indirectly from Web clients, which communicate to a Web server via HTTP and invoke a servlet, JSP, or Web service that in turn accesses the EJB.

At deployment, EJBs reside in containers that provide a variety of services, most of which we won't discuss. However, we've already alluded to one critical function that is managed by containers: support for persistence. CMP entity EJBs rely on the EJB container to implement and manage access to the target data source. BMP entity EJBs, by contrast, require the EJB developer to implement and manage persistent data access.

As mentioned earlier, the EJB 2.0 specification describes several types of EJBs, some of which we'll discuss here. Session EJBs typically encapsulate functions or business services that an application might wish to call, while entity EJBs represent persistent data. EJB developers can create session beans that employ JDBC for read/write access to supported DBMSs. Indeed, many session beans are written to perform some database operation or transactional work. However, any data associated with a session bean is presumed by the EJB container to be transient; the container provides no automated support for persistence, and the values of any variables declared by a session bean won't exist after an application completes its work with the bean. Entity EJBs, by contrast, are presumed to possess data that is persistent. Developers can manage this persistence themselves (via bean-managed persistence) or they can delegate this responsibility to the container (via container-managed persistence).

The different types of EJBs place different coding requirements on EJB developers and imply that minimal services available to clients will differ to some degree. For example, at deployment, each CMP entity bean has several code modules, including:

  • The remote interface, which defines the business methods associated with the bean. "Getters" and "setters" are commonly used to get/set individual attributes of the bean.
  • The remote home interface, which defines client methods for creating, finding, and removing instances of the bean.
  • The local interface (EJB 2.0) provides the same services as the remote interface to other EJBs in the same container. This improves performance by eliminating the overhead of a distributed object protocol.
  • The local home interface (EJB 2.0) provides the same services as the remote home interface to other EJBs in the same container with the same effects on performance. Again, this improves performance by eliminating the overhead of a distributed object protocol.
  • The bean class, which contains business logic methods coded by the EJB developer as well as EJB life cycle methods used by the container. EJB clients don't access objects of this class directly, but instead work with the container-generated classes that implement the home and remote interfaces to indirectly employ services of this class.
  • The primary key class, which identifies the attribute (or set of attributes) that uniquely identifies each instance of this bean and provides methods for creating and manipulating the key.

Session beans also have all of the interfaces just described, as well as a bean class. However, they do not have a primary key class.

After developing an EJB, the programmer must set deployment descriptors that govern characteristics such as transaction support, isolation levels, and others. Finally, the bean must be packaged and deployed in an EJB server. The deployment process causes additional classes to be generated, including those associated with the remote, remote home, local, and local home interfaces described earlier. An appropriate Java development environment (such as WebSphere Studio) can be of considerable help when developing, deploying, and testing EJBs.



Back to top


Our sample scenario

To explore design and development tradeoffs of using a federated DBMS to support entity EJB development, we populated three different data sources -- Oracle, DB2, and Excel -- with data appropriate for our project. We reasoned this would support a business scenario in which multiple firms (parts distributors, in our case) merged into a single company or wanted to collaborate on a joint sales venture. In such a situation, each firm is likely to store its own data in a different data source, forcing our application to integrate data from each. Our objective was simple: we wanted our Web users to be able to search for various parts that might reside in any or all of these data sources. Each of our data sources contain similar data about parts available for order from our distributors, including a unique part key, a name, a type, a retail price, etc.

We set out to support a very simple search function: enable users to find a part of interest to them. We wrote custom EJB finder methods that would enable people to search PART data (spanning our three data sources) based on a part's type and name. Expressed in SQL, a sample search might look something like this:

Select p_name, p_mfgr, p_type, p_partkey from part
where p_type like '%BURNISHED%'
and p_name like '%KNOB%'
order by p_partkey
fetch first 20 rows only;

Of course, our EJB finder methods don't really look like this. We don't hard-code the values for our query search predicates into our methods, and we don't use SQL (except for our BMP entity EJB for Excel). But SQL is arguably much more popular than EJB QL, and we wanted to give you a rough idea of what we were trying to implement. As queries go, this one's really pretty easy.



Back to top


Issues with the design step for the two approaches

This section describes the design issues for both the federated approach and the native data access approach.

The federated design

After defining application objectives, we needed to address some critical design issues. For our federated implementation, this included the definition of a federated database schema. In other words, we had to figure out how to represent PART data to our J2EE programmers. Typically, a database administrator would take on this task or at least help with it.

Defining the federated database objects
We created objects in our federated database that promote transparent access to our DB2 UDB, Oracle, and Excel data. In particular, we defined nicknames for the data managed by each data source, and then we created a UNION ALL view based upon these. In this manner, all PART data could be perceived as residing in one logical table in our federated database. This would enable us to build a single CMP entity EJB across all our disparate data sources -- a simple, clean solution.

Strictly speaking, a UNION ALL view (which retains duplicate rows) wasn't necessary for our work; we could have easily used a UNION view (which eliminates duplicate rows) instead. However, our particular UNION ALL view definition included an added "server attribute" column, which we later found useful for identifying a composite primary key to WebSphere. This point is somewhat subtle, so we'll explore it further.

Entity EJB developers must identify one or more attributes of their beans that guarantee the uniqueness of each object. In other words, if you create a Part entity EJB, at least one attribute must serve as its primary key. This enables software components to use the EJB's findByPrimaryKey() method to retrieve a single Part by passing in a given primary key value, such as "1" in the following case:

p_partkey            p_name                 . . .

      1              Foo
      2              Foobar
      6              Fusilli

This is all well and good when an entity EJB is mapped to a single, physical table. But things get more complicated when an entity EJB is mapped to a logical view spanning multiple data sources. In relational DBMSs, primary keys are associated with physical tables, not logical views. Similarly, unique indexes are defined on tables, not views. This means that EJB developers must be careful when using views to make sure the values of their EJB primary keys really are unique.

Confused? Let's continue with our example. Imagine you want to map your CMP entity EJB to a view that unions data from Oracle and DB2 UDB. (Forget about Excel for the moment -- it's not a relational table, and it doesn't inherently have a concept of a primary key.) Let's further suppose that the Oracle and DB2 UDB PART tables each have a primary key defined on their own p_partkey column. Everything's fine, right? Just define a view that unions the two tables together, and you'll be OK, right? Wrong.

The relational union operator eliminates duplicate rows. You might have a case in which Oracle and DB2 UDB have rows with the same p_partkey value but different values for p_name (or some other column), as shown below.

DB2 PART table:
   p_partkey              p_name                 . . .

        1                 Foo
        2                 Foobar
        6                 Fusilli
        ...               ...

Oracle PART table:
   p_partkey              p_name                 . . .

        1                 Farfalle
        8                 Fettuccine
        2                 Foobar
        ...               ...
 

Therefore, the rows aren't true duplicates and won't be filtered out by the union operation. Instead, a view that unions the Oracle and DB2 UDB PART tables (shown above) will yield:

UNION view:
p_partkey              p_name                 . . .

      1                 Foo
      2                 Foobar
      6                 Fusilli
      1                 Farfalle
      8                 Fettuccine
      ...               ...

Note that two rows contain p_partkey values of "1", making this column unsuitable to be the sole attribute upon which our EJB's primary key value is based.

But don't despair. There are options.

The most obvious option is to extend the EJB's primary key definition to include more attributes derived from the underlying tables. While this is tempting, it can lead to including all the columns (or, at least all those in the view) in the EJB's primary key definition. This is quite restrictive and, in some cases, downright impractical.

Another option is to carefully craft the view definition to eliminate the problem. This is by far the better solution. But how can you do this? Well, we elected to add a new server_attribute column to our view definition, and we populated it with information regarding the data source from which the row of interest originated. Thus, our EJB primary key definition spans the p_partkey column and the new server_attribute column, as shown below. In this example, db2_part, ora_part, and odbc_part are nicknames we previously defined for our PART data managed by DB2 UDB, Oracle, and Excel.

CREATE VIEW fed_part AS
  SELECT db2_part.*, 'db2' AS p_server
  FROM db2_part
UNION ALL
  SELECT ora_part.*, 'ora' AS p_server
  FROM ora_part
UNION ALL
  SELECT odbc_part.*, 'xls' AS p_server
  FROM odbc_part

This approach enabled us to preserve the integrity of our EJB primary key definition (and its corresponding mandatory home interface method) as well as retrieve all the data we desired from our various data sources. Note that our Excel spreadsheets had unique values for the part keys it maintained.

Designing the custom finder method
As we mentioned earlier (in Our sample scenario), we wanted to implement a custom finder method in our entity EJB to support searching for parts with certain characteristics. To do so, we planned to write a single query in EJB QL. This design proved feasible, although we were disappointed to learn of certain EJB QL restrictions, which we'll discuss shortly.

But first, now that you know how we designed our federated implementation, let's look at how we designed our native data access implementation.

The native data access design

When working directly with each of our data sources, we knew we'd need to create multiple entity EJBs to represent the PART data stored in Oracle, DB2 UDB, and Excel. We had hoped to be able to create one CMP entity EJB for each, but we ran into trouble doing so for Excel.

Using BMP and CMP entity EJBs
Although various JDBC/ODBC bridges are available commercially, ours didn't support DataSource objects, which WebSphere uses for pooled connections. Thus, we couldn't rely on a CMP entity EJB design for the Excel PART data. We had to implement a BMP entity EJB instead.

Like anything else, BMP entity EJBs have their advantages and disadvantages. On the plus side, they're quite flexible and can be implemented against virtually any data source. On the negative side, this flexibility and wide range of data access comes at a heavy price: you, the EJB programmer, must write all the data access code. In particular, you must be sure to implement and test all the mandatory EJB methods that you don't have to worry about with a CMP entity EJB. (The EJB container generates those methods when it deploys CMP entity EJBs.) This requires a greater level of programming skill and translates into longer, more complex test cycles. It also increases the maintenance costs of your work.

Using EJB QL and SQL for our custom finder methods
Our need to implement a BMP entity EJB also meant that we couldn't readily reuse our simple EJB QL query, because EJB QL only works with CMP entity EJBs. So, while we wrote EJB QL queries for our Oracle- and DB2-based PART entity EJBs, we had to write our own SQL-based method in our Excel PART entity EJB. The Excel custom finder method looked a lot like a miniature JDBC application, while our custom finder methods for Oracle and DB2 UDB were simply a few lines of EJB QL code.

Designing the queries for each data source
When working with different data sources directly, we expected that some of our queries might need to be modified to conform to different native APIs. Fortunately, WebSphere's support for EJB QL took care of the necessary translations for Oracle and DB2 UDB in our test case. Furthermore, our query was simple enough that we were able to issue standard SQL statements through our JDBC/ODBC bridge to retrieve the necessary Excel data.

However, we easily identified cases where we wouldn't be so lucky. Such cases might involve certain aggregate operations and retrieval of subsets of data (fetching the first n rows). The issue is worth exploring further.

When you're building applications or J2EE components that need to work directly with multiple data sources, it's up to you to figure out just how to decompose the queries you want answered to maximize efficiency while still preserving the proper query semantics. And sometimes, that's easier said that done.

In general, you want to push as much data filtering down to each data source as possible. For example, our query is only interested in certain kinds of parts (those whose name and type match some user-specified criteria). It would be very inefficient if we retrieved all parts from our data sources and then filtered out unqualified parts within our EJBs. However, if your queries involve aggregations, GROUP BY clauses, or FETCH FIRST n ROWS clauses, you must think carefully about query semantics before simply passing off your original query to each data source. If you don't, you can end up with incorrect information when you attempt to consolidate the final results.

If that sounds confusing, consider a simple example. Imagine that two banks, BankA and BankB, merge and wish to determine the average account balance for their (combined) customers. If John Doe has two accounts in BankA with balances of $200 and $400, and this same customer has one account in BankB with a balance of $900, the average balance for this customer would be $500. ($200 + $400 + $900 = $1500. $1500/3 = $500.) However, if each data source containing this customer's records computed the AVG balance and the results were combined and the overall AVG were computed again, you'd get a very different result. BankA's data source would return an average balance of $300, BankB's data source would return an average of $900, and the data source containing the combined results would compute the "final" average to be $600 ($300 + $900 = $1200. $1200/2 = $600). This answer is obviously incorrect, yet each database would have properly processed its query and no errors would have been issued.

Thus, determining how to correctly decompose complex queries so that performance won't suffer and the integrity of your work won't be compromised can be a time-consuming and error-prone process. Again, our test case was quite simple, so we really didn't have to confront this issue. However, it's worth noting that you may not be so lucky. With DB2 Information Integrator, its global query optimizer takes care of this work for you automatically.



Back to top


How we built our implementations

In this section, we describe the implementation issues that arose for both the federated and native data access approaches.

Our federated implementation

Our federated implementation consisted of a single CMP entity EJB and a single stateless session EJB. The Fed_part entity EJB mapped to the FED_PART view that unioned data from three data sources. Fed_part has a custom finder method, findRowsForQ1(), that is called in the getQ1Results() method of the session bean SessionQ1Fed. This session bean acts as a front-end, or a facade, to the CMP bean. See Figure 2.


Figure 2. Software design with entity EJBs using DB2 Information Integrator
Software design with entity EJBs using DB2 Information Integrator

The following sections describe how we built these EJBs after setting up our development environment, which we reviewed in the project description.

CMP entity EJB
We used WebSphere Studio's standard top-down modeling approach to develop a CMP entity EJB that maps to the DB2 federated view. We created attributes for each of the columns required by our query, which were all derived from the FED_PART view.

Next, we created a custom finder method using EJB QL for this CMP bean to retrieve the desired results for our query, following the standard WebSphere Studio process. It's worth noting that EJB QL restricts the type of output we could return. We chose to return a collection of bean interfaces, since we expected many parts might match our caller's search criteria. Due to EJB QL restrictions, we could not express the fetch first 20 rows only part of our query. However, we could push down the order by p_partkey part of the query, as you can see below:

SELECT DISTINCT object(o) FROM Parts o
WHERE o.p_name LIKE ?1
  AND o.p_type LIKE ?2
ORDER BY o.p_partkey

Of course, we used markers for the String values of the predicates. These are passed as parameters when the finder method is called.

In case you're wondering why we didn't just write this query in SQL, the answer is pretty simple: EJB QL is the query language supported for CMP entity EJBs. If we wanted to express our query directly in SQL through JDBC, we'd need to build a BMP entity EJB. This would have involved much more work, and we wanted to take advantage of the benefits that CMP entity EJBs offer, including development productivity and greater portability.

Session EJB
Our session bean uses JNDI services to look up the CMP bean's local home interface and then invokes its custom finder method. The session bean iterates over the returned collection of CMP beans, retrieving the desired data (that is, columns p_partkey, p_name, p_type and p_mfgr) from the first 20 beans and storing it in an array. Finally, the session bean returns this array to the caller.

Here's the getQ1Results() method from our session EJB, which performs these steps.

public Q1Tuple[] getQ1Results() {
	Q1Tuple[] result = new Q1Tuple[20];
	try {
		javax.naming.InitialContext ctx = new InitialContext();
		/* get access to 'Parts' EJB local home interface
		 *  "ejb/q1/PartsLocalHome" is a local EJB reference in the
		 * Deployment Descriptor.  You must prefix it with "java:comp/env/"
		 */
		PartsLocalHome home =
			(PartsLocalHome)ctx.lookup("java:comp/env/ejb/PartsLocalHome");
		/* receive a collection of all rows (i.e. a list of beans) that
		 * satisfy our criteria, except for "fetch first 20 rows"
		 */
		AbstractCollection coll =
			(AbstractCollection)  home.findRowsForQ1("%KNOB%","%BURNISHED%");
		Iterator beanIter = coll.iterator();
		/* now we get the desired data from the first 20 entries only
		 */
		for (int i = 0; i < 20; i++) {
			PartsLocal tempBean = (PartsLocal)beanIter.next();
			Q1Tuple tempTuple = new Q1Tuple();
			tempTuple.setP_partkey(tempBean.getP_partkey().intValue());
			tempTuple.setP_name(tempBean.getP_name());
			tempTuple.setP_type(tempBean.getP_type());
			tempTuple.setP_mfgr(tempBean.getP_mfgr());
			result[i] = tempTuple;
		}
	} catch (Exception e) { e.printStackTrace(); }
	return result;
}

New Java class
The requested result consists of two different data types: an Integer for the p_partkey attribute and Strings for all others. Naturally, there is no data type prepared in Java to hold this result #x2013 we had to create a new one.

To do this, we created a new serializable Java class in the EJB module using WebSphere Studio's standard process. Our Q1Tuple class had four fields for holding the four column values, and get() and set() methods for manipulating these fields.

We stored each resulting row in a Q1Tuple object and returned all 20 rows as array of Q1Tuple.

Transaction scope
Our session bean invoked appropriate CMP bean methods in a single transaction. We used the EJB deployment process to control our transaction scope, editing the CMP bean's deployment descriptor to set all appropriate methods to transaction "Required."

Code count
We wrote just under 200 lines to implement our federated solution. WebSphere generated additional EJB code at deployment time, bringing the runtime code total to about 3100 lines.

Our native data access implementation

Developing the native version of our work was more expensive (as could be expected). We had more entity EJBs to create, and the complexity of our session EJB increased. Figure 3 illustrates our architecture.


Figure 3. Software design with entity EJBs using native data access
Software design with entity EJBs using native data access

As you'll note, we used three entity EJBs and one session EJB with this approach. For Oracle and DB2 UDB, we built CMP entity beans; however, we had to write a BMP entity bean for Excel. Our session EJB invoked methods from each of our entity beans and served as a facade to calling applications.

It's worth mentioning that although our need to develop a BMP entity bean increased our effort considerably, it wasn't the only factor that forced us to take on more work. Indeed, even if we were able to work only with CMP entity EJBs, we still would have had more of these beans to create. This, in turn, would have increased the complexity of our facade session bean, which would need to perform multiple JNDI lookups, invoke multiple custom finder methods, integrate results returned by each finder method, sort the integrated results, and eliminate potential duplicates.

CMP beans
We followed the standard WebSphere Studio development process to create CMP entity beans for our Oracle and DB2 UDB data. Each of these beans has CMP fields for the columns we needed.

We placed all entity beans in the same EAR file with the facade session bean so that the latter could access the entity beans' local home interfaces. We also created custom finder methods for our two CMP entity beans. Again, we created these in EJB QL using the wizard in the deployment descriptor editor. The query remained the same as for the federated version except for the table name. Of course, the same limitations that applied to EJB QL for the federated query also applied here.

BMP entity EJB
Because our JDBC-ODBC driver did not support JDBC 2.0 data sources, we could not create a CMP bean for accessing the Excel spreadsheet. Instead, we had to rely on bean-managed persistence (BMP). BMP beans require developers to implement all bean methods, as well as test and maintain this code. By contrast, with CMP entity beans, the EJB container generates the necessary code at deployment.

Although the data access code of our BMP entity bean was relatively simple to implement -- after all, we just wrote JDBC statements to access Excel -- it still required considerably more labor than implementing a CMP entity bean. In particular, about half of the code we needed to write for our native data access implementation was due to our need to use a BMP entity bean. Furthermore, BMP development is much more time-consuming than CMP entity bean development. For an inexperienced BMP programmer, design, development, and testing work can easily span days or weeks, depending on the complexity of data access code required to reach the target data source. Even in our simple case, building the Excel BMP entity bean required several hours of work and additional time for testing. By contrast, developing and testing a simple CMP entity EJB in WebSphere Studio typically takes a matter of minutes.

Conceptually, our Excel bean performs the same function as the CMP beans. In particular, it also has a method findRowsForQ1() in its local home interface for fetching the rows that satisfy our query. The most positive aspect of using BMP was that we weren't restricted to the limitations of EJB QL anymore. Instead, we could implement the query directly in standard SQL.

As we mentioned earlier, each entity bean needs a primary key class to provide a way to uniquely identify it. For CMP beans, WebSphere Studio generates the primary key class automatically. However, for BMP beans the developer has to create this class because s/he is the only one who knows which attributes identify the bean. Since p_partkey is sufficient as a primary key in our case, we developed our primary key class based upon this.

Session EJB
The implementation of our facade session bean was more complex for the native data access implementation, as we expected. Our session EJB looks up the local home interfaces for our entity EJBs (the two CMP beans and the one BMP bean) and then invokes the custom finder methods associated with each EJB. Then it must merge and sort the results.

There are at least two ways to do this:

  • Use a temporary table in a local database to sort the interim results and eliminate duplicates
  • Use the java.util.TreeSet class to automatically sort the interim results and eliminate duplicates

We thought most J2EE programmers would be more inclined to implement the second option, so that's what we did. We also expected this approach would result in less overhead and resource utilization, although we didn't explicitly test this.

After invoking each entity EJB custom finder method, we added the returned results to our TreeSet object, which transparently takes care of sorting the results and eliminating duplicates. Using a TreeSet object meant we had to write our own comparator class that implements a compareTo() method. This method evaluates whether one object in the set is smaller, bigger, or equal to another one. The Q1TupleComparator class implements this method for Q1Tuple objects which we store in the TreeSet. See the code of Q1TupleComparator.java below:

import java.util.Comparator;

/**
 * Implements the Comparator interface. The Q1TupleComparator is used
 * to compare Q1Tuple objects.
 */
public class Q1TupleComparator implements Comparator {

	public Q1TupleComparator() {
		super();
	}

	public int compare(Object o1, Object o2) {

		//cast objects back to Q1Tuple
		Q1Tuple x = (Q1Tuple)o1;
		Q1Tuple y = (Q1Tuple)o2;

		//assign p_partkey values
		Integer ix = new Integer(x.getP_partkey());
		Integer iy = new Integer(y.getP_partkey());

		/* the result to be returned will be
		 *  < 0 , if ix < iy
		 *  = 0 , if ix = iy
		 *  > 0 , if ix > iy,
		 * where ix and iy contain p_partkey values
		 */
		return ix.compareTo(iy);
	}
}

With this work done, we were ready to implement our facade session bean. As you'll see from the code sample below, this bean retrieves query results from each data source, combines these results in a set (which does the ordering and duplicate elimination), and returns the set to the caller. To accomplish this, the getQ1Results method of our session EJB relies on three helper methods to retrieve the result: getDB2Results(), getOraResults(), and getXLSResults(). We haven't included the source code for these helper methods here, as they're somewhat redundant.

public Q1Tuple[] getQ1Results() {
	//holds the final result
	Q1Tuple[] result = new Q1Tuple[20];
	/* create a comparator capable of comparing the string arrays in
	 * the result set */
	Q1TupelComparator qtc = new Q1TupelComparator();
	//is used for sorting and duplicate elimination
	TreeSet resultSet = new TreeSet(qtc);
	try {
		//get the DB2 results and append them to the result set
		resultSet.addAll(this.getDB2Results());
		//get the Oracle results and append them to the result set
		resultSet.addAll(this.getOraResults());
		//get the Excel results and append them to the result set
		resultSet.addAll(this.getXLSResults());

		//copy results from result set to result array
		Iterator resultSetIter = resultSet.iterator();
		for (int i = 0; i < 20; i++) {
			result[i] = (Q1Tuple)resultSetIter.next();
		}
	} catch (Exception e) { e.printStackTrace(); }
	return result;
}

New Java class
As with our federated implementation, we needed to create a serializable Java class to represent the results returned by our entity EJBs. This Q1Tuple class was identical to the class we used for our federated implementation, which we described earlier.

Transaction scope
We had trouble establishing a single transaction scope spanning all three data sources. We suspect at least part of the problem was due to limitations associated with Excel and our JDBC/ODBC bridge. Because two-phase commit logic wasn't important to our application, we didn't dwell on this issue. Instead, we performed our DB2 UDB work within the same transaction scope as our session EJB, executed the Oracle work in a separate transaction scope, and executed the Excel work outside of a transaction context (with a transaction attribute of "not supported").

Code count
We wrote 545 lines to implement our native data access approach, nearly half of which was due to our need to write a BMP entity EJB. WebSphere generated additional EJB code at deployment time, bringing the runtime code total to about 5600 lines.



Back to top


Understanding the limitations of EJB QL

We've already mentioned several EJB QL limitations we encountered for our work. If you're an experienced SQL programmer with little knowledge of EJB QL, you'll probably find the language easy to learn. It really looks a lot like SQL. However, you may be surprised to learn that some language elements you take for granted aren't currently supported in EJB QL 2.0. These include support for certain string comparison operations, DATE/TIME expressions, aggregations, ORDER BY, GROUP BY, HAVING, EXISTs, FETCH FIRST n ROWS, and others.

Some of these restrictions may be lifted in future releases of the EJB specification, and WebSphere provides extensions to EJB QL to address many of these issues today. For example, we were able to use WebSphere's support for the ORDER BY clause in our work. However, FETCH FIRST n ROWS wasn't supported, so we had to write appropriate logic in our facade session EJBs to implement this.

Furthermore, the EJB 2.0 specification restricts the type of output that can be returned from a finder method to one of three types:

  • An interface to a CMP bean
  • A collection of CMP bean interfaces
  • Object attributes

Because our query involved only PART data, it was easy for us to implement a finder method that returned a collection of CMP bean interfaces. However, if we had to express SQL queries that returned attributes spanning multiple CMP beans and (possibly) dynamically generated data, this would present a problem. Such queries wouldn't readily return collections of existing CMP bean interfaces; they'd really be returning new types of objects. Thus, we'd need to reevaluate our entity EJB data model or consider using session EJBs for this work.

It's worth noting, however, that the Enterprise Developer configuration of WebSphere Studio -- which we didn't use for our Java development work -- provides the most extensive support for EJB queries found in the WebSphere development family. For example, this product features an executeQuery() API that supports dynamic execution of an EJB query, syntax for various expressions not natively supported in the EJB 2.0 specification (including certain aggregations, ORDER BY, GROUP BY, HAVING, EXISTS), and greater flexibility regarding query output. This API also supports a parameter than can be used to limit the size of the returned result set, offering similar capability to the FETCH FIRST n ROWS clause of SQL. These extensions can be quite useful; however, their use involves a tradeoff between advanced function and portability.



Back to top


Conclusions

Building Web components that need to integrate data from multiple sources is no trivial task, but we found that federated DBMS technology can help alleviate many of the burdens associated with doing so.

We built entity EJBs that needed to integrate data from multiple systems. One implementation used a federated DBMS, while another accessed each data source directly. Our experience showed a significant reduction in the design, development, and maintenance efforts when we used a federated DBMS to transparently access data across various remote data sources. We cut the number of lines of code we had to write in half and shortened our development cycle considerably. And we achieved this largely because the federated DBMS enabled us to use a new CMP entity EJB data modeling feature: the ability to build a single entity EJB whose attributes span multiple data sources.

Our work has made us optimistic about the potential benefits that federated DBMS technology can bring to Web component developers who need to integrate data from multiple sources. We encourage firms with such business requirements to investigate how the combination of WebSphere and DB2 Information Integrator may help speed such development efforts and reduce future maintenance requirements.



Back to top


Bibliography and related readings

DBMS and federated DBMS topics

Java and WebSphere topics

  • Flanagan, David, et. al. Java Enterprise in a Nutshell, O'Reilly and Associates, 1999, ISBN 1-56592-483-5.
  • Marinescu, Floyd. EJB Design Patterns, J. Wiley and Sons, Inc., 2002, ISBN 0-471-20831-0.
  • Monson-Haefal, Richard. Enterprise JavaBeans, O'Reilly and Associates, 1999, ISBN 1-55860-519-3.
  • Rowan, Ed, et. al., Mastering Enterprise JavaBeans, Second Edition, John Wiley and Sons, Inc., 2001, ISBN 0-471- 41711-4.
  • White, Seth, et. al. JDBC API Tutorial and Reference, Second Edition, Addison-Wesley, 1999, ISBN 0-201-43328-1.

Useful Web sites



About the authors

C. M. Saracco is a senior software engineer at IBM's Silicon Valley laboratory and a former software technology instructor for UC Santa Cruz extension program. She has lectured on a variety of technology topics throughout North America, South America, Europe and the Middle East.


Tim Rieger is a student of Business Informatics at the University of Rostock, Germany. He recently completed an internship at IBM's Silicon Valley Laboratory, where he researched issues involving entity EJBs and database management systems.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top