IBM WebSphere Developer Technical Journal: Eliminate caching in service locator implementations in J2EE 1.3

When bad things happen to good applications

This article shows how implementations of the Service Locator pattern that include a resource cache can cause code to run incorrectly in J2EE 1.3 and later versions. While the Service Locator pattern itself is still useful, this article will show how caching with this pattern is harmful rather than helpful, why it should be eliminated from service locator implementations, and offer some practical alternatives.

Share:

Bobby Woolf (bwoolf@us.ibm.com), WebSphere J2EE Consultant, IBM Software Services for WebSphere

Bobby Woolf is a member of the IBM Software Services for WebSphere consulting practice, where he assists clients in developing applications for WebSphere Application Server using WebSphere Studio Application Developer. He is co-author of Enterprise Integration Patterns and The Design Patterns Smalltalk Companion, both from Addison-Wesley, and a frequent conference speaker. Bobby's blog, J2EE in Practice, is available through developerWorks.



13 October 2004

Introduction

The Service Locator pattern is a popular application design pattern in Java™ 2 Enterprise Edition (J2EE) applications. This pattern encapsulates code for accessing components through directory services, such as JNDI client code, so that a client can simply pass in the name of a resource and get back that resource. Service locator implementations usually include a resource cache to avoid repeated lookups of the same resource. While this worked well in J2EE 1.2, caching can introduce subtle and difficult-to-diagnose errors into applications deployed in J2EE 1.3 and later. Therefore, J2EE 1.3 applications should not include the resource cache in their service locator implementations.


The JNDI directory service

The Java Naming and Directory Interface (JNDI), part of the J2EE platform, enables a Java program to access a resource via a unique name without regard to where the resource is stored, how it is implemented, nor how the container and its JNDI provider actually enable access to the resource; a resource can be any object that the program needs to access globally.

We will briefly review how JNDI works here, primarily to understand the parts that affect the Service Locator pattern. To learn more, please refer to Sun's JNDI tutorial (see Resources).

JNDI contexts

JNDI names are arranged in a hierarchical tree structure, like a file system's directory structure or the package structure for a set of Java classes. J2EE has conventions for what the JNDI resource environment reference subcontext names should be for common types of resources. Table 1 shows typical JNDI subcontexts and their types.

Table 1

SubcontextJava interfaceDescription
ejbjavax.ejb.EJBHome
javax.ejb.EJBLocalHome
An Enterprise JavaBean (EJB) home
jdbcjavax.sql.DataSourceA Java Database Connectivity (JDBC) data source
jmsjavax.jms.ConnectionFactory
javax.jms.Destination
A Java Messaging Service (JMS) connection factory or destination
eisjavax.resource.cci.ConnectionFactoryA J2EE Connector connection factory
mailjavax.mail.SessionA JavaMail session
urljava.net.URLA Web service connection factory

Each subcontext name is used as part of a JNDI expression for accessing objects in the client's local context. For example, java:comp/env/ejb provides access to EJB homes, while java:comp/env/jdbc provides access to JDBC data sources.

Why use JNDI?

JNDI, like most J2EE services, provides a standard API (defined in the javax.naming packages) regardless of implementation. As an example of the transparency JNDI offers, consider a JMS provider (such as WebSphere® MQ) that is configured with a set of administered objects (connection factories and destinations) that the component applications would use to perform messaging. The provider will organize these resources in some kind of directory (or perhaps multiple directories, such as one per administered object type) that enables an application to access a component by its name. This directory structure may be vendor-specific, but because the provider implements the JMS specification, the provider must also make its resources available through the JNDI API, regardless of how exactly the directory is actually implemented. JNDI provides uniform access to resources which may not be so uniform

Furthermore, what appears to the application to be one single, continuous JNDI tree may (and usually does) aggregate together resources from several different providers. JNDI hides this from the application so that the code only needs to know the name of the resource, not details such as what provider contains the resource, how to access the provider, what name the provider may actually use for the resource, and so on.

A J2EE application server (such as WebSphere Application Server) implements its own JNDI directory which its applications use to access all resources. The application server needs to make the JDBC, JMS, and other resources available to its applications. Because the providers make their resources available through JNDI, the application server's JNDI tree can simply include nodes for the resources.

Figure 1. Application server JNDI tree
Figure 1. Application server JNDI tree

The EJB component doesn't even realize that some of the resources are remote and managed outside the EJB container. It accesses all resources the same way through the same JNDI tree. This is an important part of component reuse: the ability of a component to access resources without having to know exactly how they are provided.

How to use JNDI

To use JNDI, code accesses its component's JNDI context and then uses it to perform a lookup, which is a JNDI operation that takes in a name and returns the object bound to that name, much like the get(Object) method in a java.util.Map object.

While JNDI can be used to bind any kind of object to a name, J2EE primarily uses JNDI to provide access to resources. To access a resource through a JNDI lookup, a component passes in a resource name and gets back the resource described in the deployment descriptor by a resource reference element. Most types of resources are actually resource factories, which the container uses to control instance creation, pooling, and sharing of the resource connections. Examples of resource factory types include: javax.ejb.EJBHome, javax.ejb.EJBLocalHome, javax.sql.DataSource, and javax.jms.ConnectionFactory.

For example, here is a code snippet that uses JNDI to look up the resource name jdbc/datasource, which returns a resource factory that is a DataSource object.

javax.naming.Context root = new javax.naming.InitialContext();
javax.sql.DataSource datasource =
	(javax.sql.DataSource) root.lookup("java:comp/env/jdbc/datasource");

This way, the code can leave the resource management to the container and yet easily access a resource through its name. The code and the container need only agree on the name to use.

Resource name mapping

A resource actually has two separate but related names within J2EE:

  1. Resource reference name: the name for the resource reference that the code component uses to identify the resource.
  2. JNDI name: the name that the container uses to identify the resource.

With this setup, the code and the container do not even have to agree on the name to use for the resource. When the code is deployed, the deployer (the J2EE administrator who installs the application components into the container) uses the deployment tool to map each resource reference name to its corresponding JNDI name.

For example, the code may refer to a database it calls jdbc/AccountDB. Meanwhile, the account data may be stored in a Cloudscape® database the container refers to as jdbc/Cloudscape. The deployer binds the resource reference name jdbc/AccountDB to the JNDI name jdbc/Cloudscape:

Table 2

EJB NameApplication Resource NameContainer Resource Name
any EJB componentjdbc/AccountDBjdbc/Cloudscape

Then in production, when the code accesses AccountDB, it will get the Cloudscape database.

Application resource name scoping

In J2EE 1.2, resource names are all global, so when two components reference the same resource name, they receive the same resource, since that single resource name maps to a single resource. Starting in J2EE 1.3, each component (that is, each EJB bean class and each Web application) defines its own set of resource names, where each component's set of names is independently bound to a set of container resources

This change is reflected in a single paragraph that was added to the J2EE 1.3 specification, in section J2EE.5.4.1.2, "Declaration of Resource Manager Connection Factory References in Deployment Descriptor." The new paragraph says:

A resource manager connection factory reference is scoped to the application component whose declaration contains the resource-ref element. This means that the resource manager connection factory reference is not accessible from other application components at runtime, and that other application components may define resource-ref elements with the same res-ref-name without causing a name conflict.

Other application components may use the same resource reference name without conflict (see the J2EE 1.2 and 1.3 specifications in Resources).

This means that two components can refer to the same resource name, yet actually be bound in production to access two different container resources. This can be thought of as an overloaded resource name. The same resource name in two different components actually maps to two different container resources. Thus, to know which resource an overloaded name refers to, one must also know the component which is using the overloaded name, and how that component is mapped to container names.

Example of an overloaded name

For example, an application may contain two session beans:

  1. EmployeeBean: enables users who are employees to access multiple customer accounts and confidential information about the accounts.
  2. CustomerBean: enables users who are customers to access their own customer account.

For simplicity, both beans may use the resource name jdbc/AccountDB because they are both using the customer account database. The administrator would define one data source for accessing that database, called jdbc/AccountDS, and at deployment time would bind jdbc/AccountDB to jdbc/AccountDS in both components.

Table 3

EJB NameApplication Resource NameContainer Resource Name
EmployeeBeanjdbc/AccountDBjdbc/AccountDS
CustomerBeanjdbc/AccountDBjdbc/AccountDS

At this point, the resource name jdbc/AccountDB is not overloaded because it always maps to the same container resource. The resource name is shared, because it is used by multiple components, but it is not overloaded.

Now, let's say that management becomes concerned that customers may be able to access and even change confidential data in the accounts database that only employees should be able to access. The business requirements dictate that certain database tables should only be accessible by employees and not by customers. To help enforce this, the database administrators create two separate logins, one for employees and another for customers. The customer login cannot access the confidential tables, and has read-only access to certain data; the employee login has full access. This way, even if the Java application mistakenly allows customers to access these tables, the database will prevent it.

To support this new requirement and use these new database logins, the deployer configures the container with two separate data sources, one for each login, called jdbc/AccountEmployeeDS and jdbc/AccountCustomerDS. At deployment time, the deployer binds the resource names accordingly:

Table 4

EJB NameApplication Resource NameContainer Resource Name
EmployeeBeanjdbc/AccountDBjdbc/AccountEmployeeDS
CustomerBeanjdbc/AccountDBjdbc/AccountCustomerDS

The application resource name jdbc/AccountDB is now an overloaded name. Which resource it refers to -- the data source for employee access or the one for customer access -- depends on which component is using the resource name.

In this way, the deployer is able to change the application's operation to meet the new requirements without having to change the code; he just changes the container configuration and the deployment of the existing code.

Likewise, this kind of overloaded resource mapping can be used to make different components use several different types of resources:

  • Multiple databases; perhaps containing separate sets of data, perhaps running as products from different vendors.
  • Multiple JMS messaging systems; perhaps an internal one and a separate one for business partners, perhaps running as products from different vendors.
  • Multiple deployments of the same Web service; perhaps offering different qualities of service.

Overloading is a capability with J2EE 1.3 application resource names because they are component-scoped. This capability is not possible with J2EE 1.2's global application resource names.


The Service Locator pattern

The Service Locator pattern (see Resources) is a well-known J2EE application design pattern that encapsulates the code needed to lookup resources through directory services such as JNDI. Business logic code that uses a service locator avoids becoming cluttered with directory lookup code, and so is easier to understand. The client passes in a unique identifier -- a resource name -- for the resource; the service locator finds the resource and returns it to the client. The service locator encapsulates accessing the JNDI context, narrowing and casting the reference to the appropriate Java interface, and handling errors.

For example, the service locator code that enables a client to access a data source looks like this:

private InitialContext initialContext;

private ServiceLocator() throws ... {
try {
initialContext = new InitialContext();
}
catch ...
}
public DataSource getDataSource(String dataSourceName) throws ... {
DataSource dataSource = null;
try {
dataSource = (DataSource) initialContext.lookup(dataSourceName);
}
catch ...
return dataSource;
}

A service locator is commonly implemented as a singleton, such that the entire application (specifically, all of the code running in the same Java virtual machine (JVM)) uses the same instance. (If the service locator class is part of a utility JAR shared by multiple applications, those applications will share the same instance.) Since the service locator is basically stateless, multiple components sharing the same instance is not a problem.

A service locator implementation commonly includes a resource cache so that when the application accesses the same resource repeatedly, the service locator only has to perform the lookup the first time. When resource lookups are expensive, caching improves performance by avoiding the repeated lookup of the same resource. This helps most when the service locator is also a singleton, so that a lookup from one component helps that component instance, other instances of the same component type, and instances of other component types avoid repeating that lookup.

A service locator implementation that includes caching accesses a data source like this:

private InitialContext initialContext;

private Map cache;

private ServiceLocator() throws ... {
try {
initialContext = new InitialContext();
cache = Collections.synchronizedMap(new HashMap());
}
catch ...
}

public DataSource getDataSource(String dataSourceName) throws ... {
DataSource dataSource = null;
try {
if (cache.containsKey(dataSourceName)) {
dataSource = (DataSource) cache.get(dataSourceName);
}
else {
dataSource = (DataSource) initialContext.lookup(dataSourceName);
cache.put(dataSourceName, dataSource);
}
}
catch ...
return dataSource;
}

Service locator caching error

When the Service Locator pattern was first developed for J2EE 1.2, caching references was a good idea, or at least it didn't hurt anything. But starting in J2EE 1.3, application resource names are component-scoped, not global. Because each component is mapped separately, two components which happen to use the same resource name may not necessarily be bound to the same container resource. It may be overloaded such that components which use the same name map to different resources.

A service locator that caches references will cause a J2EE 1.3 (and later) application with an overloaded resource name to work incorrectly. It will deploy successfully and seem to run correctly, but will run into subtle and difficult-to-diagnose problems when a component uses the wrong resource. This is because the service locator will cache the resource for whichever component uses the overloaded name first. When a differently bound component uses the same resource name, it will not receive the resource it is bound to; it will receive the resource in the cache, which is the first component's resource.

When service locators go bad

Let's walk through the process step-by-step to see what exactly goes wrong:

  1. At deployment, the deployer binds two resources, as shown earlier (Table 4).
  2. Let's say that after the application is started, the first user is an employee and uses EmployeeBean to access the database. EmployeeBean uses a caching service locator to access jdbc/AccountDB
    1. The service locator does not find jdbc/AccountDB in its cache because the cache is empty.

      reference cache { }
    2. The service locator performs the JNDI lookup and gets the container resource jdbc/AccountEmployeeDS. The service locator adds this resource to the cache under the key jdbc/AccountDB:

      reference cache {
      jdbc/AccountDB jdbc/AccountEmployeeDS
      }
    3. The service locator returns the container resource jdbc/AccountEmployeeDS.
    4. The EmployeeBean now has the resource jdbc/AccountEmployeeDS, as intended.
  3. Now another user, a customer, starts using the application, using CustomerBean to access the database. CustomerBean uses a caching service locator to access jdbc/AccountDB:
    1. This time, the service locator does find jdbc/AccountDB in its cache.

      reference cache {
      jdbc/AccountDB jdbc/AccountEmployeeDS
      }
    2. The service locator skips the JNDI lookup and uses the resource from the cache instead.
    3. The service locator returns the container resource jdbc/AccountEmployeeDS.
    4. The CustomerBean now has the resource jdbc/AccountEmployeeDS. Problem: CustomerBean now has jdbc/AccountEmployeeDS, not jdbc/AccountCustomerDS like it should. CustomerBean is now using the wrong data source.

Detecting the caching problem

When two components use a resource name that is overloaded -- that is, a name that is bound to two different resources -- a caching service locator returns the same resource to both components. This is a very subtle error that is difficult to detect and diagnose.

Consider what would go wrong in our example and how difficult this would be to diagnose:

  • If an employee goes first, the customers get the data source whose login has access to confidential tables. This will be difficult to detect in functional testing, especially if the Java application properly prevents customers from accessing proprietary data. If this problem can be detected, it will only happen sometimes, only when an employee goes first. It is very difficult to determine the cause of a problem that only occurs sometimes.
  • If a customer goes first, the customer data source gets used and employees don't have access to the confidential tables. This problem is more obvious and thus easier to detect, but difficult to explain. A static code analysis clearly shows that EmployeeBean uses the data source with the employee login. Also, the problem only happens sometimes. How long will it take to figure out that the problem only happens when a customer goes first? How hard will it be to figure out that the service locator is causing the problem?

Run time proof

If you're specifically looking for this error, it's reasonably easy to see -- as long as you're looking in the right place for the right symptoms. I implemented two stateless session bean classes that both access a data source through the same resource name: jdbc/datasource. After a bean accesses its data source, it prints out the object's toString(). This code is the same in both beans:

public void persist() throws Exception {
	Context root = new InitialContext();
	DataSource ds = (DataSource) root.lookup("java:comp/env/jdbc/datasource");
	System.out.println("The datasource is: " + ds.toString());
	// use the datasource to persist some data
}

A servlet creates an instance of each bean class and runs it. I also configured the container with two different data sources. When I deployed my application, I bound one bean to each data source.

My first implementation used no service locator; the beans performed all of the JNDI code themselves. Here are the two toString()'s I got:

The datasource is: com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource@2ecdaf6c
The datasource is: com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource@75282f6b

So what does that mean? The class com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource is a proprietary WebSphere Application Server class that implements a JDBC data source; the class is not important. The important part is the hexadecimal number at the end, which indicates the instance's object identifier (OID).

Notice that the two OIDs are different. This shows that the two beans are using two different data sources.

My second implementation used a caching service locator. The JNDI lookup code is in the service locator implementation, and it caches resources. Since both beans use the same resource name, jdbc/datasource, the service locator will cache the first lookup and reuse it for the second bean. Here is the output:

The datasource is: com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource@4652e820
The datasource is: com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource@4652e820

Notice that the two OIDs are the same. Because of the caching service locator, the two beans are now using the same data source. The second bean is using the first bean's data source, which is the wrong one.

My third implementation used a service locator with no cache. The service locator's code looks up the resource both times because the first lookup is not cached. Here is the output:

The datasource is: com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource@4652e820
The datasource is: com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource@277d6821

The OIDs are different. Because there is no caching, each bean gets the proper data source. Just as predicted, overloaded resource names and a caching service locator makes the code execute differently; it makes some components get the wrong resources.


Fixing the implementation

We have seen that an overloaded resource name and the typical service locator implementation, a singleton with a resource cache, do not get along. How did we get here? And more importantly, how can we fix it?

Why cache?

OK, so a caching service locator causes errors for overloaded resource names. Why do we need caching anyway?

When the Service Locator pattern was first developed for J2EE 1.2, caching resource references was a good idea because JNDI lookups were often slow and hurt performance. The J2EE container vendors discovered this performance problem and improved their implementations in J2EE 1.3. Therefore, as of J2EE 1.3, JNDI lookups perform much better and caching lookups does not help as much.

Do not just assume that a caching service locator significantly improves application performance; use performance testing to confirm that it does. Even if caching does improve performance, that's little comfort when a component fails to receive the resource it is mapped to. "Sure, the component got the wrong resource and therefore didn't work right, but thanks to caching, it failed really fast." Fast code that doesn't work right is no good.

Caching isn't even the main point of the Service Locator pattern. The motivation for Service Locator is to create a service with a simple, clean API for locating and accessing other services and components in a uniform manner. Caching is simply an optimization. It is supposed to enable the locator to perform the same functionality, but perform it faster. It's not supposed to change the behavior of the locator; the results must be the same with or without caching. Clearly, as this article shows, caching can change the results from the locator and thus the behavior of the application.

So caching may not be desirable. In a decision between correct behavior and fast behavior, correctness has to win every time.

The best way to cache is for an object to store any JNDI reference it uses in an instance variable. Each instance has to initialize its own cache, but it only caches the references it uses, and is able to access the reference directly rather than having to look it up in a hash map. It is always a good practice to avoid getting the same value repeatedly, even from a service locator which may cache; instead, get the value once, store it in a variable with a descriptive name, and access it from the variable when needed.

Why overload resource names?

So overloaded resource names mess up a service locator and its cache. Is there any other option besides removing the cache? Why not just avoid having overloaded resource names?

Binding each component's resource name to the container's resource name is part of the J2EE 1.3 specification. An application which cannot support this is not J2EE-compliant. Applications have to support resource name binding at deployment. This means that if multiple components use the same resource name, that name could become overloaded at deployment time.

Alternative solutions

Since a shared resource name can become overloaded, how could we use a caching service locator and still have our components receive the right resources? Here are some options:

  • Unique resource names per component
    One way is to make sure that every component uses unique resource names. That way, no two components would use the same resource name, and so no name could become overloaded. For example, the EJB com.acme.AcmeEJB using the container resource jdbc/datasource could actually use the resource name jdbc/com/acme/AcmeEJB/datasource. Since each EJB class has a unique package and class name, this naming approach ensures that the resource name will be unique for each component. This is a very cumbersome approach, though. When multiple component types plan on using the same resource, it's easier for them to all use the same resource name.
  • Unique service locator instance per component type
    Another way is for each component type to use its own instance of service locator, not a singleton. This way, every component of a certain type would get its own reference cache that won't conflict with any other component's caches. However, this would be difficult to implement. Where would each Web application store its service locator that is unique from other Web applications and EJB types? As for the EJB types, each class could declare a static variable for its service locator, but an EJB class's static variable must be read-only (and therefore final), which requires redesigning the service locator class so that its zero-argument constructor does not throw exceptions. Furthermore, well-factored EJBs usually delegate a lot to plain Java objects which won't know which service locator to use and will have to get it from its calling EJB somehow. It is much simpler for the service locator to be a singleton.
  • Do not use service locators
    Components could avoid the problem of overloaded names being cached by not using service locators at all, but that would be overkill. The Service Locator pattern is still useful for encapsulating the code that uses JNDI, even if the locator does not use caching. It would be unfortunate to lose that encapsulation by avoiding the pattern entirely.

The remaining alternative is to use a service locator, make it a singleton, but remove the reference cache. This is simple to apply to an existing application: just change the service locator implementation to disable or remove the cache.

Caching in the container

The best answer ultimately is to not cache in the application, but rather to cache in the container. First, this would make the performance benefits of caching available to all applications, whether or not they use the Service Locator pattern. Second, the container could cache resources by their container names, which must be unique and so therefore cannot be overloaded within any single J2EE application deployment.

It just so happens that WebSphere Application Server does cache the results of JNDI references. For more details, see the WebSphere Application Server Information Center (see Resources).

As mentioned earlier, each object can also cache the JNDI references it uses in instance variables, so that each object only has to access a reference one time.


Conclusion

Overloaded resource names are a reality starting with J2EE 1.3. Components usually share resource names and the deployer can always map components sharing a name to different container resources. A singleton service locator with a resource cache creates a global cache that cannot properly handle the component-level mappings for an overloaded resource name.

The simple conclusion is that service locators that cache are a bad idea. Applications that contain a caching service locator are a problem waiting to happen -- a problem that will be difficult to detect, difficult to reproduce reliably, and difficult to diagnose. It is not a question of if the problem may happen; this problem will occur whenever a deployment binds the same resource name to different resources. Guaranteed.

It is still a good idea for components which expect to use the same resource to share the same resource name. It is still a good idea to use service locators to encapsulate resource access. And it is still a good idea for the service locator to be a singleton. But the service locator should not include a resource cache.


Acknowledgements

The author would like to thank Roland Barcia, Keys Botzum, Tom Alcott, and Bill Higgins for their help in developing this article.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=23249
ArticleTitle=IBM WebSphere Developer Technical Journal: Eliminate caching in service locator implementations in J2EE 1.3
publish-date=10132004