IBM
Shop Support Downloads
IBM Home Products Consulting Industries News About IBM Search
IBM : developerWorks : Open source library
 
Download it now!
PDF (59.3 KB)
Free Acrobat™ Reader

Workflow toolkit: Case study of an open source project
Sponsoring open source development can benefit everyone involved

Michael Roberts
Owner, Vivtek
July 2000

Contents:
 Open source can be powerful
 When OS projects fail
 Commercial enterprise fears
 What is workflow?
 Commercial workflow systems
 No OS workflow systems
 Sponsored open source
 WFTK in a nutshell
 The OS experience
 Resources
 About the author

The open source workflow toolkit (WFTK) is a task management and workflow system. It serves as an excellent example of a new model of open source development: sponsored funding. This article provides a technical overview of WFTK; describes the history of the project; and discusses advantages of the sponsored open source model for everyone involved, sponsor, developer, and community alike.

Open source can be powerful
Although press coverage of open source software development has largely focused on the sudden and unexpected success of Linux, other open source projects actually have stronger market share: the Apache Web server has some 70% of the HTTP server market, sendmail handles something like 90% of e-mail on the Internet and BIND provides such a successful solution to DNS serving that there are basically no commercial products in its class. (The Domain Name System is a general-purpose distributed, replicated data query service used for translating hostnames into Internet addresses.)

But Linux is going up against the industry behemoth of Microsoft Windows and is starting to hold its own, and that has attracted the notice of the analysts.

The idea of open source software is simple: all users have free access to the source code of the product. Having free access to source code means three basic things to software users: they are free to find and fix errors in their own installations; they have the greatest possible freedom to integrate and customize the software; and they are fully empowered to control the software for their individual needs. The efficiency of Linux development has been one of the most famous success stories of the open source model. In many cases, bugs in the operating system are fixed within hours of discovery.

When open source projects fail
One explanation for the failure of open source projects is the hit-or-miss nature of open source development culture. The typical open source project is started when a programmer has a need ("an itch to scratch"). The programmer's first run at a tool to satisfy the need forms the core of a new project. If it's a good tool and it solves a general problem, the project will ideally attract a user base. Among the user base will be a new generation of core developers. A healthy community will thus develop and become stable. If large enough, the user community for the project may even sustain commercial support providers, paid developers, and so forth. (These are phenomena witnessed with Linux, of course, but also with sendmail and Apache, among others.)

But there are many ways this path can fail. The interest level in the project may not be high enough, the implementation may be too specialized, or the developer may have neither time nor interest to market the tool. The lead developer may get a different job before the community stabilizes, for instance, and the whole project loses momentum before it gets off the ground. Because open source development is often unpaid work, it's hard to consistently run successful open source projects in the real world.

Commercial enterprises fear open source as a solution
In the enterprise market, open source is often seen as a dangerous proposition. After all, what happens if a company bases its solution on an open source project that later fails? But if you look closely, you'll see that this attitude is based on a false understanding of open source projects, which are not any more or less likely to fail than proprietary projects. Open source is primarily a method of code development and maintenance, so an enterprise that chooses to go with an open source solution will always save money in the initial development phase.

The rationale for enterprise managers to prefer closed source commercial solutions is based on the perception that someone must be held accountable in case of failure. But if you read the licensing terms for any software package, you'll notice that software companies never agree to be held accountable in case of failure. The recent estimates of damages due to e-mail worms exploiting known security problems in Microsoft Outlook serve to illustrate this very well. Microsoft is not obligated by contract to make these losses good.

So it would seem that open source is as good a fit for an applications enterprise as for the developers and users of the applications. And of course enterprise participation in any open source project would be beneficial to the project. Now let's look at how workflow systems have made the sponsored funding model work for an open source project.

What is workflow, anyway?
If you'll recall, this article is about a workflow toolkit. So what is workflow? A workflow system automates coordination. This can be seen in two ways. One view of workflow makes it a set of tasks assigned to people (or programs) in an organization. What a workflow system does, seen from this perspective, is to coordinate who does what and when. So if I request a chair, instead of getting a paper form from an administrator, filling it out, and sending it via internal mail to the purchasing department, I simply initiate a workflow process. The workflow system knows who needs to sign off on the request and who needs to be informed of the purchase, and the system is able to create tasks accordingly. People are either assigned those tasks explicitly or may select them from a list of "stuff to be done by the department." As tasks are completed, the workflow engine keeps track of what happens next.

That's one view. The other way to view workflow is from the process side: The process can be thought of as a data structure that is built up over time via the combined efforts of many people (or programs). The workflow engine coordinates that process. So you can see that a workflow engine could also be the basis for countless useful Web applications, customer-service status report pages, and so on. If you're doing any kind of coding of complex Web applications, chances are you're already doing workflow. But you're probably not doing it as well as you could if you formalized the workflow into an engine.

Commercial workflow systems
The workflow market, as you can imagine, is saturated. There are over 40 known commercial workflow applications. Yet Microsoft is busily trying to work its way into the workflow market. Why?

The economics are simple. A typical workflow installation at a fairly large organization requires an incredible cash outlay: There are generally high server and per-seat user fees. The vendor is almost always hired to customize the product. There are training courses for affected personnel and/or workflow analysts in order to codify the business's practices for formal workflow processing. Etc. The list is endless. The software licensing fees alone for a fairly small FileNet Workflow installation start around $50,000. A typical real-world workflow system runs into the millions. In short, there's gold in them thar hills.

There are no open source workflow systems
With all that interest in workflow, you'd think there would already be some sort of open source system on the market. But the fact is that most open source programmers don't work for enterprises. They are typically either staff at universities or independent consultants (like me). In the rare instance where an enterprise might develop its own tool from scratch, there is basically no chance that the tool will be open sourced. Management just doesn't think that way.

So when Galactic Marketing decided that they needed a workflow system, they looked around at the pricing of commercial solutions and, realizing that the commercial solutions provided the only readily available solution, they saw that they had an itch to scratch. The only hitch is they're not programmers.

Sponsored open source: The best of both worlds
Enter SourceXchange. SourceXchange is a site offered by Collab.net, a new company trying out new models of open source business. The idea of SourceXchange is to provide a venue whereby sponsors can post projects, along with fairly detailed specs and an amount of money they're willing to pay. Interested developers can then make proposals for the projects. And the resulting code is required to be open source. The sponsor and the developer share copyright, so that alternative licensing and further development needn't be subject to the open source license.

So Galactic Marketing posted an RFP (Request for Proposal) for a workflow system on SourceXchange. I won the contract. And the WFTK was born. Galactic is paying for initial development and partial ownership of the code.

The key to this partnership is that open source programming needn't be a volunteer project, as is often assumed. One of the criticisms leveled at open source in the enterprise environment is that changes, while freely promised, may take too long to complete. (Let's ignore for a moment that commercial application vendors often exhibit the same behavior.) Why should an enterprise expect someone working on a volunteer basis to conform to their schedule? The answer should have been obvious all along: if you have a problem now, then offer money for the solution. SourceXchange isn't the only venue where this sort of transaction is arranged. Sites that offer these services are popping out of the woodwork lately. But SourceXchange is organized, they're already in existence, and they're building credibility in both enterprise and open source communities.

The open source workflow toolkit in a nutshell
So the idea of the WFTK project was (and is) an engine which

  • is practical to embed in arbitrary Web applications
  • is easy to use and install
  • is easily integrated with existing infrastructure
  • is easily customized
  • includes User Interface (UI) to manage open tasks/processes as well as UI to edit process definitions

In other words, an open source workflow toolkit is more than just a plain old workflow engine that includes source code. The real strength of open source development, and of the open source workflow toolkit, is that you don't have to hide anything. There are no trade secrets. You can work with anything you want to, and you can shoot for any application you want. As long as the needs of the sponsor are met, you're free to put more effort into it. In the long run, the extra effort will pay off.

The basic setup of WFTK consists of the following modules:

  • The core engine
    The core engine is responsible for actually executing workflows. Thus, if I have an active process and a task has just completed, I tell the core engine, and it updates the state of the process, determines what has to happen next, and gives me the resulting information. The state of each process is stored in an XML file called the "datasheet", which is stored in a central directory.

  • The task manager
    Although the core engine could be regarded as a complete workflow system (and will suffice for many workflow applications), a task manager can be used to maintain information about active processes across the board. The task manager component in the prototype system is a database implemented in PostgreSQL and includes a full-featured UI implemented as AOLServer/Tcl code. The task manager talks to the core engine by invoking it as a command-line program. It takes the task activation notifications from the core engine and uses them to create task records in the database. Thus, across-the-board queries about "what tasks do I own" are very easy to answer. The task manager can also organize ad hoc tasks (to-do list entries) that are not managed by the core engine at all.

  • The process definition manager
    The procdef manager maintains the procdef repository. The procdef repository is effectively a simple document management and version control system for process definitions. The reason for this is that individual processes may stay active for great periods of time. If the process definition changes under the individual process, chaos will ensue. So the repository ensures that a process will always work from the same version of a process definition that was used to start it. In addition, the procdef manager provides a simple editing UI for creating test versions of processes, modifying them, and (soon, anyway) testing them in a "sandbox."

  • The user manager
    At some point it would be a nice thing to include LDAP (Lightweight Directory Access Protocol) in all this, but for the time being, and for those of us who don't use LDAP, WFTK includes a simple user manager. Each user is represented as an XML file, and permissions are managed centrally.

Everything in WFTK is represented as an XML document. The "expat" parser -- also an open source product (by James Clark) -- parses the documents; and a full-featured manipulation API works with the documents, once loaded.

A typical workflow process definition might look like this (and, in fact, this is the first usage scenario for the design):

<workflow name="Purchase request" author="Michael michael@vivtek.com">
<role name="Supervisor"/>
<role name="Purchasing"/>
<role name="Accounting"/>
<role name="Receiving"/>

<data name="Product requested" type="string"/>
<data name="Reason for request" type="string"/>
<data name="Requester's email" type="string"/>

<sequence>
<task label="Approval" role="Supervisor">
<data name="Approval code" type="string"/>
</task>

<if expr="${Approval code} == 'No'">
<situation name="Request rejected"/>
</if>

<task label="Order item" role="Purchasing">
<data name="Purchasing record" type="string"/>
</task>

<alert type="email" to="${Requester's email}">
Your request for the purchase of ${Product requested} has been approved and the
order was placed.  The purchasing record is ${Purchasing record} if you need to
contact Purchasing for inquiries.
</alert>

<alert type="role" to="Accounting">
An order for ${Product requested} has been placed.
</alert>

<alert type="role" to="Receiving">
An order for ${Product requested} has been placed.  Expect delivery.
</alert>

<parallel>
<sequence>
<task label="Receive ${Product requested}" role="Receiving"></task>
<alert type="email" to="${Requester's email}">
Your requested ${Product requested} has arrived.
</alert>
</sequence>

<task label="File invoice" role="Accounting">
<data name="Invoice number" type="string">INV309843</data>
</task>
</parallel>

<task label="Pay invoice" role="Accounting"/>
<alert type="role" to="Purchasing">
The purchase has been paid.
</alert>
</sequence>

<handle situation="Request rejected">
<alert type="email" to="${Requester's email}">
Your request for ${Product requested} has been rejected by your supervisor.
</alert>
</handle>
</workflow>

I will not go into any great detail of code analysis here. Visit the project site (listed below) for more in-depth information. But what I've included in the code example should give you a taste of what an XML process definition might look like.

There are a few interesting points I'd like to note. First, notice that the titles of tasks and other actions can depend on data collected for the specific case. In fact, the description of the process itself can depend on data collected when the process is started. You can code for alerts and notifications as an integral part of the process. Data received via e-mail will be a later addition to the engine that will allow a great deal of flexibility.

Also, specific data items may be collected at any point in the process. The task manager builds a form, depending on the process definition, that the user then uses to complete the task. All the data thus collected becomes part of the process datasheet, making it extremely simple to report the state of each individual process.

The open source experience
WFTK has been my first experience developing in an open source environment. On the one hand, I was initially quite nervous that my code would be visible to the public. But this was an incentive to write better code, and certainly it caused me to document my code extremely well. Where else would I be able to explain the corners I was cutting to get a prototype out?

But the most striking aspect of the open source world is that people keep e-mailing me out of the blue, giving extremely valuable advice about possible tools, mistakes I have made, possible extensions, etc. I just received a Rational Rose implementation of a subset of my engine from a college class team in India. I have had extensive conversations about workflow design with academics, an Air Force implementer, and the maintainers of other projects. The publicity of open source has brought a much higher quality to this project than to any of my previous work.

Resources

About the author
Michael Roberts has been slinging code for thirteen years, but has only done it in public for five months. You can contact Michael Roberts at
michael@vivtek.com or visit his Web site.


 
What do you think of this article?

Killer! Good stuff So-so; not bad Needs work Lame!

Comments?


Privacy Legal Contact