Writing great code with the IBM FileNet P8 APIs, Part 3: Take a number

Implementing a sequence number dispenser in FileNet P8

Yes, you, too, can have an ECM-backed corner bakery with a tidy customer queue! Just have them take a number. This article discusses implementation techniques for getting reliably unique sequence numbers from an IBM® FileNet® P8 repository. Some of the obvious approaches have hidden dangers, but a correct and useful approach is simple and performant. Along the way to solving this common problem, we'll see some things about P8 development that have a much wider scope.

Bill Carpenter (WJCarpenter@us.ibm.com), ECM Software Architect, IBM

Bill CarpenterBill Carpenter is an ECM Architect with IBM in the Seattle, Washington area. Bill has experience in the Enterprise Content Management business since 1998 as a developer, development manager, and architect. He is co-author of the books IBM FileNet Content Manager Implementation Best Practices and Recommendations and Developing Applications with IBM FileNet P8 APIs. He has previous experience in building large software systems at Fortune 50 companies and has also served as the CTO of an Internet startup. He has been a frequent mailing list and patch contributor to several open source projects. Bill holds degrees in Mathematics and Computer Science from Rensselaer Polytechnic Institute in Troy, New York.



15 October 2009

Also available in Vietnamese

Introduction

It is the custom in many areas for small shops to keep track of their queued up customers by assigning them sequential numbers in the approximate order in which they arrive. The numbers are typically printed on slips of paper and dispensed from a single physical mechanical dispenser. If multiple customers arrive simultaneously, ties are easily broken by etiquette and common courtesy.

DISCLAIMER OF WARRANTIES

The accompanying code is sample code created by IBM Corporation. This sample code is not part of any standard or IBM product and is provided to you solely for the purpose of assisting you in the development of your applications. The code is provided "AS IS", without warranty of any kind. IBM shall not be liable for any damages arising out of your use of the sample code, even if they have been advised of the possibility of such damages.

Similar problems often arise in software systems. One often needs to assign numbers to things with a guarantee that those numbers are unique and follow some pattern. There are some common solutions for this problem, but distributed systems complicate matters. It would be highly unlikely that you would use an ECM system for assigning numbers to customers in a corner bakery. (But if you were interested in that, I could hook you up with a willing sales person!) You might, however, need to assign case numbers, or customer IDs, or part numbers, or something simpler. Database vendors implement sequence number column types for just this sort of problem. However, P8 does not provide direct access to database sequence number types, so you must use other mechanisms.

In this article, we'll look at solving this problem in a P8 environment. Let's summarize the requirements:

  1. We need number assignments that are absolutely guaranteed to be unique. It is completely unacceptable for the same number to ever be assigned twice.
  2. We want the numbers to follow some pattern. We don't want gaps in the number assignments. The pattern could be a lot of things, but for our purposes we'll just use simple incrementing. The next number we get will be one greater than the previous number.
  3. We want all of this to work reliably and with decent performance in a P8 environment that has multiple threads, multiple processors, multiple servers, multiple tiers, and multiple users.
  4. While we're at it, we need about a dozen of those delicious-looking red velvet cupcakes with cream cheese frosting!

Before describing our preferred implementation, we'll first look at a few techniques that either don't work or don't work very well. Even if you never have a need to implement this particular use case, most of the points illustrated in this article can apply to many areas of P8 programming.


Java or .NET synchronization

If you are new to enterprise development or distributed development in general, your first thought might be to use a singleton object with some sort of synchronized access to the part that updates the counter. In Java, that would be a synchronized method or code block. In C#, it would be a method marked as synchronized or a code block protected by a lock(). Code areas with synchronized access are sometimes called critical sections. Listing 1 shows one of many ways to implement this.

Listing 1. Synchronized code block
/**
 * *** DON'T DO IT THIS WAY ***
 */
public class Dispenser
{
    /** static access only, so private constructor */
    private Dispenser() {}
    private static int counter = 0;
    public static final synchronized int getNextValue()
    {
        return ++counter;
    }
}

Using synchronized code is fine for certain problems, but the weaknesses are fairly obvious for our use case. Because the value of the counter exists only in the memory of the running program, the values will start over if the program is restarted. You could change the Dispenser class to save the updated counter value to a file, but that would lead to new problems because of the synchronization not being coordinated across process boundaries. Different independently running applications (or copies of the same application), even if they are using the Dispenser class, could have interleaved reads and writes to the file. Worse yet, they could be accessing a file with the same name on different machines. Both of those situations lead to conditions which violate our use case's requirements.

Here's an example of the trouble that can result from interleaved reads and writes. Assume the current value recorded in the file is 7 and then the following takes place:

  • A reads value 7 from the file.
  • B reads value 7 from the file.
  • A writes value 8 to the file.
  • C reads value 8 from the file.
  • C writes value 9 to the file.
  • B writes value 8 to the file.

At this point, both A and B have the same number that they believe is exclusive to them. The next reader of the file will end up with the same value that C already has. You might have some ideas about using file locking or other operating system-specific tricks to synchronize access to a file in a distributed environment. However, the difficulties with that type of solution are fairly well understood in computer science at large, and it's notoriously tricky to get it right, so we don't want to belabor this scenario too much. If you need further convincing, use your favorite search engine to research "NFS lock problem".


P8 dispenser object

Using a J2EE servlet

It's beyond the scope of this article to describe it fully, but it is possible to use J2EE declarative security to provide controlled superuser access to a P8 object. That technique can work for both Java and .NET clients. Briefly, you would create a simple servlet to implement the techniques discussed in this section. This could be done as a Web service or something similar as long as it returned the counter value for the caller to use in some easily-consumed format.

The deployment descriptor for a J2EE servlet provides for the specification of a RunAs role that indirectly specifies a security principal. Downstream J2EE calls (including P8 calls using EJB transport) are performed with the security identity of the specified security principal. In our case, that would be a user with appropriate access rights to the otherwise locked-down dispenser object. Even if the configured security identity were a relatively highly-privileged user, this technique is safe because the servlet tightly controls what activity is allowed. However, the technique becomes correspondingly less safe as the complexity of the servlet grows.

Not only does this technique isolate the security access aspects, but a nice side-effect of using the servlet is that it isolates the business logic of getting a sequence number. Any number of disparate clients with any number of technology stacks can access the servlet and get a sequence number. The obvious downside is the complexity of adding the servlet to the architecture and providing the client plumbing to access it transparently. Whether that downside is worth it depends on the specifics of your scenario.

A popular solution to these problems is to use a database for holding the dispenser. Enterprise-ready relational databases have inherently distributed resources with reliable locking semantics. In the P8 architecture, applications do not have direct access to the backing databases. In other words, database access must be separately arranged for such applications, whether it be through J2EE datasources, direct JDBC connections, or other means. This may work fine in some scenarios, but generally it would be a lot of bother to do this just to access the dispenser data.

One thing we do know is that all P8-based applications have access to P8 ObjectStores and objects, subject to P8-enforced access checks. We can therefore model the dispenser as a P8 object. Specifically, we can create an instance of a subclass of CustomObject called WjcDispenser with an integer custom property called WjcCounter. (The names are given the arbitrary prefix "Wjc" to avoid conflicts with other class and property names.) Figure 1 shows a UML diagram for this simple subclass.

Figure 1. UML diagram for WjcDispenser
UML diagram showing WjcDispenser as a subclass of CustomObject

We assume that security access to this object can be conveniently arranged for all users of all applications that need to use it. For now, we'll just assume that anyone can connect to the ObjectStore and update the dispenser object. See the Using a J2EE servlet sidebar for an interesting approach to this security situation.

Furthermore, we will gloss over the initial creation of the dispenser object. A good technique would be to have the accompanying Java and .NET classes detect whether the dispenser object is missing and create it on demand or within the static initializer for the class. The two popular techniques for locating the dispenser object are to use a predefined ID value or to store the object in a predefined path within the ObjectStore. It would also be possible to use a query to find all instances of the WjcDispenser class. For the examples that follow, we assume that the ObjectStore identity and specific location of the dispenser object are somehow configured for the application.


FileNet Content Engine (CE) cooperative locks

When using a P8-based dispenser object, the conceptual idea for getting a sequence number is the same: read the old value, update and store the new value, and return the new value to the caller. Obviously, the specific implementation logistics change. All of the disadvantages of Java or .NET synchronization continue to apply.

The CE server and APIs implement a feature called cooperative locking. This feature was originally implemented to provide cooperative locking semantics compatible with RFC-2518 (WebDAV). API classes for Folder, Document, and CustomObject have methods for locking and unlocking those objects. Because this is a built-in P8 feature implemented in the server, you might develop an implementation similar to what is shown in Listing 2. This example shows an internal method and assumes some other piece of code has identified the dispenser object.

Listing 2. P8 cooperative locking
private static final String COUNTER_PROPERTY_NAME = "WjcCounter";
/**
 * *** DON'T DO IT THIS WAY ***
 */
private static int getNextValue(CustomObject dispenser)
{
    final Properties dispenserProperties = dispenser.getProperties();
    // Object might be locked by someone else, so try a few times
    for (int attemptNumber=0; attemptNumber<10; ++attemptNumber)
    {
        dispenser.lock(15, null);  // LOCK the object for 15 seconds
        try
        {
            // Because we use a refreshing save, the counter property
            // value will be returned.
            dispenser.save(RefreshMode.REFRESH);  // R/T
            break;
        }
        catch (EngineRuntimeException ere)
        {
            ExceptionCode ec = ere.getExceptionCode();
            if (ec != ExceptionCode.E_OBJECT_LOCKED)
            {
                // If we get an exception for any reason other than
                // the object already being locked, rethrow it.
                throw ere;
            }
            // already locked; try again after a little sleep
            try
            {
                Thread.sleep(100); // milliseconds
            }
            catch (InterruptedException e) { /* don't worry about this rarity */ }
            continue;  
        }
    }
    int oldValue = dispenserProperties.getInteger32Value(COUNTER_PROPERTY_NAME);
    int newValue = oldValue + 1;
    dispenserProperties.putValue(COUNTER_PROPERTY_NAME, newValue);
    dispenser.unlock();  // UNLOCK the object
    dispenser.save(RefreshMode.NO_REFRESH);  // R/T
    return newValue;
}

Assuming the dispenser object is fetchlessly instantiated (that is, via a Factory.CustomObject.getInstance() method), this technique costs one round-trip to the CE server to apply the lock and fetch the current property value. If the object is already locked, we won't get the current value, so we iterate a few times to get our turn at locking the dispenser. It costs another round-trip to the CE server to store the new counter value. That's a reasonable performance cost for this, and the overall use of the locking/unlocking feature is also reasonable.

The main problem with using P8 cooperative locking for this use case is that it is only cooperative locking. The CE server will not prohibit any change due to the mere presence of a lock. Optimistically, you might assume that all of the applications will follow the cooperative locking sequence. Realistically though, you still have to allow for the possibility of application bugs bypassing the locking. It's easy to imagine someone writing an independent piece of code that uses the dispenser object but neglects to do the locking.


Touch plus an event handler

It's a bit of a side note, but if you look at the discussion of cooperative locking in the previous section, you see that it takes at least two round-trips to the server to reliably obtain a sequence number. Can it be done with a single round-trip? For that idea to work, we would need at least part of the computation to be done on the CE server. The CE mechanism for that is event handlers. Here's a thought experiment:

  • Fetchlessly instantiate the dispenser object (no round-trip).
  • We need to make some kind of change to the dispenser object so that an event will be fired. CE has a feature that allows you to define custom events. In fact, the class is called CustomEvent. So as part of the one-time setup for all of this, define a new custom event and persist it in the ObjectStore holding the dispenser object.
  • Instead of being raised as a side-effect of something else, a custom event is triggered when anyone calls the raiseEvent() method on a subscribable object. On the server, the subscription and event handling are the same as for system-defined events. Call raiseEvent() on the dispenser object and then call save() with refresh (to get the current value of WjcCounter).
  • The save() will trigger the CustomEvent.
  • As part of the one-time setup, provide an event handler subscribed to that specific type of CustomEvent on the WjcDispenser instance or class. The event handler will calculate and save the new value for the WjcCounter property. Since properties cannot be updated in a synchronous event handler, we'll make the event handler asynchronous (synchronous or asynchronous is specified in the subscription).
  • The client application knows how the event handler is going to update WjcCounter, so it does the same calculation to predict the new value of WjcCounter.

In this thought experiment, you take into account that updates to the dispenser object will happen one at a time, no matter how many independent client applications are requesting it. The CE server does not "optimize away" intermediate (redundant) updates. You remember that asynchronous event handlers are guaranteed to be executed. In fact, you might even remember hearing somewhere that asynchronous event handlers are processed via a queue (which is true). All of that seems to add up to a single, reliable, and predictable update to WjcCounter per client round-trip to the CE server.

You have probably guessed from the tone of the above description that there is some problem lurking in here. In fact, there are two problems. First, there is a small period of time between the update of the dispenser object in the ObjectStore and the execution of the asynchronous event handler. If there are multiple independent clients that happen to do updates at the same time, they will all see the same refreshed value of WjcCounter and calculate the same updated value. Even if you could overcome that and somehow tie specific client save() activities to the specific firings of the event handler, there is a second problem. The second problem is that although asynchronous events are indeed processed through a queue, there are multiple readers of the queue. Therefore, there is no guarantee that asynchronous event handlers will execute in the same order as the triggering updates.


First writer wins

The CE server has a built-in feature for reliably detecting interleaved updates. CE implements a policy called first writer wins. That means that if two requests are updating the same object, the first will succeed and the second will fail. For the failed update, the server throws an EngineRuntimeException with an ExceptionCode of E_OBJECT_MODIFIED. The accompanying exception message is "The object ... has been modified since it was retrieved." What does that mean, though?

Every independently peristable object in an ObjectStore is marked with an Update Sequence Number (USN). This is not a property in the ordinary sense, but its value is exposed via the method IndependentlyPersistableObject.getUpdateSequenceNumber(). The CE server automatically increments the USN whenever an object is updated in the ObjectStore. When you fetch an object from the server, the USN is also fetched and brought to the client side. The APIs send the USN back to the server as part of the object save(). If the USN value sent does not match the USN value currently persisted in the repository, the CE server knows that a change was made (by some other caller) since the object was fetched. This represents a simplified form of what the database world calls optimistic locking.

Given that the CE server detects these interleaved changes, it is logical to exploit that rather than the purely voluntary cooperative locking feature. Via the CE APIs, you can use something called fetchless instantiation, which we've mentioned elsewhere, to bypass the server check, but you'd have to go out of your way to do that for this use case. When you locally instantiate an object in the API without fetching it from the server, that is fetchless instantiation. In such cases, the USN value has a special value that signals the CE server to skip the USN check. This is sometimes called an unprotected update. If you later fetch or refresh any properties from the server, the current USN is obtained as well.

In our use case, however, it doesn't make sense for someone to do a fetchless instantiation followed by an unprotected update. To get the current value of the counter property, one way or another it has to be fetched from the server. It would be possible for someone to maliciously corrupt the counter property via an unprotected update, but that same malicious party could do the same thing with a normal update cycle. So there's no new danger there. Because of the semantics of the use case, the odds are low for someone to do this via a coding mistake.

To take advantage of USN checking and the first-writer-wins policy of the CE server, you would attempt to update the counter in the dispenser object and detect the error reported for an interleaved change. Listing 3 shows an example of how to do this.

Listing 3. First writer wins
private static final String COUNTER_PROPERTY_NAME = "WjcCounter";
/** 
 * This property filter is used to minimize data returned in fetches and refreshes.
 */
private static final PropertyFilter PF_COUNTER = new PropertyFilter();
static
{
    PF_COUNTER.addIncludeProperty(1, null, null, COUNTER_PROPERTY_NAME, null);
}

/**
 * Get the next value efficiently by exploiting First Writer Wins
 */
public int getNextValue(boolean feelingUnlucky)
{
    final Properties dispenserProperties = dispenser.getProperties();
    // Object might be updated by someone else, so try a few times
    for (int attemptNumber=0; attemptNumber<10; ++attemptNumber)
    {
        // If cached data invalid, fetch the current value
        // from the server.  This also covers the fetchless
        // instantiation case.
        if (feelingUnlucky
        ||  dispenser.getUpdateSequenceNumber() == null 
        ||  !dispenserProperties.isPropertyPresent(COUNTER_PROPERTY_NAME))
        {
            // fetchProperties will fail if the USN doesn't match, so null it out
            dispenser.setUpdateSequenceNumber(null);
            dispenser.refresh(PF_COUNTER);  // R/T
        }
        int oldValue = dispenserProperties.getInteger32Value(COUNTER_PROPERTY_NAME);
        int newValue = oldValue + 1;
        dispenserProperties.putValue(COUNTER_PROPERTY_NAME, newValue);
        try
        {
            // Because we use a refreshing save, the counter property's
            // new value will be returned from the server.
            dispenser.save(RefreshMode.REFRESH, PF_COUNTER);  // R/T
            return newValue;
        }
        catch (EngineRuntimeException ere)
        {
            ExceptionCode ec = ere.getExceptionCode();
            if (ec != ExceptionCode.E_OBJECT_MODIFIED)
            {
                // If we get an exception for any reason other than
                // the object being concurrently modified, rethrow it.
                throw ere;
            }
            // Someone else modified it.  Invalidate our cached data and try again.
            dispenser.setUpdateSequenceNumber(null);
            dispenserProperties.removeFromCache(COUNTER_PROPERTY_NAME);
            continue;  
        }
    }
    // too many iterations without success
    throw new RuntimeException("Oops");
}

/**
 * Set by constructor or some other means.
 * Fetchless instantiation is OK.
 */
private final CustomObject dispenser;

At first glance, this seems to take the same two server round-trips as our earlier cooperative locking code. That's true for the first counter update. We must fetch the current counter value from the server, but the counter state is remembered after a successful update. If nobody else has updated the dispenser object in the meantime, our later updates will only cost a single round-trip. On the other hand, if a pair of applications is taking turns updating the dispenser, perhaps as some kind of load balancing, that remembered state backfires on us. In that case, it will usually cost three round-trips to do an update (the initial failed update attempt, the fetch of the current counter value, and the final successful update). This extra cost is incurred even if the competing applications are doing updates with a lot of time in between them (for example, A gets a counter value on the hour and B gets a counter value on the half hour). Whether you will pay the extra cost depends on how many independent reading applications you have, how they overlap, and so on. The Boolean parameter feelingUnlucky controls whether the method pays the cost of a two round-trip update sequence or gambles on the one-or-three round-trip update sequence.


Other considerations

Here are some additional things to consider for your implementation and deployment.

USN instead of property

Since every independently persistable object in a repository already has a monotonically increasing Update Sequence Number, why not use that instead of a custom property for the counter value? You can do this if you are willing to accept some compromises, but other than avoiding the definition of the counter property itself, you wouldn't really save much.

  • The normal progression of the USN is to be incremented by one. If you need some other progression pattern for your sequence numbers, you're out of luck.
  • Actually, the CE progression pattern for USN is only vaguely specified. The actual official use that you can make of it is comparing it against null or comparing two USNs for less than, greater than, or equal values. It's possible that the increment-by-one behavior could change in a future CE release, though it would still be monotonically increasing behavior.
  • The incrementing of the USN is not, strictly speaking, under your control. The CE server increments the USN whenever the persisted object is updated, not just when someone has requested a new sequence number.
  • You're still going to have to make the same number of server round-trips to update the dispenser object or fetch the USN value. There is no performance advantage.

Many dispensers

Regardless of the technique you use for the implementation code, repeatedly updating a single object can cause a performance hotspot in the repository if the volume is high. By high, we mean high by database standards. So, even a few thousand updates a day is unlikely to be a problem. At very high volumes, you might like to make things more scalable by having multiple dispenser objects with each one responsible for some range of values. In other words, the values given out by different dispenser instances are still globally unique.

An aside for the 3.x Java API

The CE 3.x Java API does not expose the Update Sequence Number used above. In fact, for most operations, that API turns things on their head and uses a last-writer-wins policy. It's not impossible, but it is fairly tricky to write a dispenser that is both performant and functionally correct in the 3.x Java API. If your business constraints allow it, the best thing for you to do would be to migrate that part (or all) of your application to the CE 4.x APIs.


Final thoughts

This article has covered a variety of techniques for implementing a sequence number dispenser. Like everything else, how much trouble you want to go to is partially dependent on business requirements and whatever else you have on your implementation schedule. If someone asked me to implement this in a totally gold-plated, no-holds-barred way, it would look something like this:

  • Use first-writer-wins code similar to that shown in Listing 3.
  • Put that code in a simple J2EE servlet. By deploying it in a J2EE Web container, we would automatically get the scaling, failover, isolation, and other benefits of the J2EE infrastructure.
  • Restrict P8 write access to the dispenser object or objects so that ordinary users cannot bypass the servlet to update the value.
  • Configure the servlet with a RunAs role that has CE rights to update the dispenser objects. (See the Using a J2EE servlet sidebar.)
  • If necessary, implement some scheme to verify that requesters have the right to ask the dispenser for a number. This is only interesting if there is a concern about numbers being wasted.
  • Start the servlet with an optimistic assumption that single round-trips will be the normal cost for getting a number from a dispenser. In other words, start with feelingUnlucky being false. Keep track of how often that turns out to be wrong in practice. When it's wrong more than some percentage of the time, switch to the pessimistic view that always pays the two round-trip cost. Periodically switch back to the optimistic view to see if things have changed.
  • Provide, as needed, client side utility code for making calls to the servlet to get sequence numbers.

Needless to say, I would also demand payment in cupcakes. Yum!

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, Java technology
ArticleID=435578
ArticleTitle=Writing great code with the IBM FileNet P8 APIs, Part 3: Take a number
publish-date=10152009