Open source can be powerful
Although press coverage of
open source software development has largely focused on the sudden and unexpected success of Linux, other open source projects actually have stronger market
share: the Apache Web server has some 70% of the HTTP server market, sendmail
handles something like 90% of e-mail on the Internet and BIND provides such a
successful solution to DNS serving that there are basically no commercial
products in its class. (The Domain Name System is a general-purpose distributed, replicated data query service used for translating hostnames into Internet addresses.)
But Linux is going up against the industry behemoth of
Microsoft Windows and is starting to hold its own, and that has attracted the
notice of the analysts.
The idea of open source software is simple: all users have free access to the
source code of the product. Having free access to source code means three basic things to software users: they are free to find and fix errors
in their own installations; they have the greatest possible freedom to integrate
and customize the software; and they are fully empowered to control the software for their individual needs. The efficiency of Linux development has been one of the most famous success stories of the open source model. In many cases, bugs in the operating system are fixed within hours of
discovery.
When open source projects fail
One explanation for the failure of open source projects is the hit-or-miss nature of open source development culture. The typical open source project is started when a programmer has a need ("an itch to scratch"). The programmer's first run at a tool to satisfy the need forms the core of a new project.
If it's a good tool and it solves a general problem, the project will ideally
attract a user base. Among the user base will be a new generation of core developers.
A healthy community will thus develop and become stable. If large enough, the user community for the project may even
sustain commercial support providers, paid developers, and so forth. (These are
phenomena witnessed with Linux, of course, but also with sendmail and Apache,
among others.)
But there are many ways this path can fail. The interest level in the project may not be high enough, the implementation may be too specialized, or the developer may have neither time nor interest to market the tool. The lead developer may get a
different job before the community stabilizes, for instance, and the whole
project loses momentum before it gets off the ground. Because open source development is often unpaid work, it's hard to consistently run successful open source projects in the real world.
Commercial enterprises fear open source as a
solution
In the enterprise market, open source is often seen as a
dangerous proposition. After all, what happens if a company bases its solution on an
open source project that later fails? But if you look closely, you'll see that this attitude is based on a false understanding of open source projects, which are not any more or less likely to fail than proprietary projects. Open source is primarily a method of code development and maintenance, so an enterprise that chooses to go with an open source solution will always save money in the initial development phase.
The rationale for enterprise managers to prefer closed source commercial
solutions is based on the perception that someone must be held
accountable in case of failure. But if you read the licensing terms for any
software package, you'll notice that software companies never agree to be held accountable in case of
failure. The recent estimates of damages due to e-mail worms
exploiting known security problems in Microsoft Outlook serve to illustrate
this very well. Microsoft is not obligated by contract to make these losses good.
So it would seem that open source is as good a fit for an applications enterprise as for the developers and users of the applications. And of course enterprise participation in any
open source project would be beneficial to the project. Now let's look at how workflow systems have made the sponsored funding model work for an open source project.
What is workflow, anyway?
If you'll recall,
this article is about a workflow toolkit. So what is workflow? A workflow
system automates coordination. This can be seen in two ways. One view of
workflow makes it a set of tasks assigned to people (or programs) in an
organization. What a workflow system does, seen from this perspective, is to
coordinate who does what and when. So if I request a chair, instead of
getting a paper form from an administrator, filling it out, and sending it via
internal mail to the purchasing department, I simply initiate a workflow
process. The workflow system knows who needs to sign off on the request and who
needs to be informed of the purchase, and the system is able to create tasks accordingly. People are
either assigned those tasks explicitly or may select them from a list of "stuff
to be done by the department." As tasks
are completed, the workflow engine keeps track of what happens next.
That's one view. The other way to view workflow is from the process side: The
process can be thought of as a data structure that is built up over time via
the combined efforts of many people (or programs). The workflow engine
coordinates that process. So you can see that a workflow engine could also be
the basis for countless useful Web applications, customer-service status report
pages, and so on. If you're doing any kind of coding of complex Web
applications, chances are you're already doing workflow. But you're probably
not doing it as well as you could if you formalized the workflow into an engine.
Commercial workflow systems
The workflow market, as you can imagine, is saturated. There are over 40 known commercial workflow applications. Yet Microsoft is busily trying to work its way into the workflow market. Why?
The economics are simple. A typical workflow installation at a fairly large
organization requires an incredible cash outlay: There are
generally high server and per-seat user fees. The vendor is almost always hired
to customize the product. There are training courses for affected personnel
and/or workflow analysts in order to codify the business's practices for
formal workflow processing. Etc. The list is endless. The software licensing fees
alone for a fairly small FileNet Workflow installation start around $50,000. A
typical real-world workflow system runs into the millions. In short, there's
gold in them thar hills.
There are no open source workflow
systems
With all that interest in workflow, you'd think there would
already be some sort of open source system on the market. But the fact is that most
open source programmers don't work for enterprises. They are typically either
staff at universities or independent consultants (like me). In the rare instance where an enterprise might develop its own
tool from scratch, there is basically no chance that the tool will be open sourced. Management just doesn't think that way.
So when Galactic Marketing decided that they needed a workflow system, they looked around at the pricing of commercial solutions and, realizing that the
commercial solutions provided the only readily available solution, they saw that they had an itch to
scratch. The only hitch is they're not programmers.
Sponsored open source: The best of both
worlds
Enter SourceXchange. SourceXchange is a site offered by
Collab.net, a new company trying out new models of open source business. The
idea of SourceXchange is to provide a venue whereby sponsors can post projects,
along with fairly detailed specs and an amount of money they're willing to pay.
Interested developers can then make proposals for the projects. And the
resulting code is required to be open source. The
sponsor and the developer share copyright, so that alternative licensing and
further development needn't be subject to the open source license.
So Galactic Marketing posted an RFP (Request for Proposal) for a workflow system on SourceXchange. I won the contract. And the WFTK was born. Galactic is paying for initial
development and partial ownership of the code.
The key to this partnership is that open source programming needn't be a volunteer project,
as is often assumed. One of the criticisms leveled at open source in the
enterprise environment is that changes, while freely promised, may take too long
to complete. (Let's ignore for a moment that commercial application vendors
often exhibit the same behavior.) Why should an enterprise expect someone working on a volunteer basis to conform to their schedule? The answer should have been obvious all along: if you have a problem now, then offer money for the solution. SourceXchange isn't the only venue where this sort of transaction is
arranged. Sites that offer these services are popping out of the woodwork lately. But SourceXchange is
organized, they're already in existence, and they're building credibility in
both enterprise and open source communities.
The open source workflow toolkit in a
nutshell
So the idea of the WFTK project was (and is) an engine
which
- is practical to embed in arbitrary Web applications
- is easy to use and install
- is easily integrated with existing infrastructure
- is easily customized
- includes User Interface (UI) to manage open tasks/processes as well as UI to edit process
definitions
In other words, an open source workflow toolkit is more
than just a plain old workflow engine that includes source code. The real
strength of open source development, and of the open source workflow toolkit, is that you don't have to hide anything.
There are no trade secrets. You can work with anything you want to, and you can
shoot for any application you want. As long as the needs of the sponsor are met,
you're free to put more effort into it. In the long run, the extra effort will pay off.
The basic setup of WFTK consists of the following modules:
The core engine
The core engine is responsible for actually executing
workflows. Thus, if I have an active process and a task has just completed,
I tell the core engine, and it updates the state of the process,
determines what has to happen next, and gives me the resulting information. The state of each process
is stored in an XML file called the "datasheet", which is stored in a central
directory.
The task manager
Although the core engine could be regarded as a
complete workflow system (and will suffice for many workflow applications), a
task manager can be used to maintain information about active processes across
the board. The task manager component in the prototype system is a database
implemented in PostgreSQL and includes a full-featured UI implemented as
AOLServer/Tcl code. The task manager talks to the core engine by invoking it
as a command-line program. It takes the task activation notifications from the
core engine and uses them to create task records in the database. Thus,
across-the-board queries about "what tasks do I own" are very easy to answer.
The task manager can also organize ad hoc tasks (to-do list entries)
that are not managed by the core engine at all.
The process definition manager
The procdef manager maintains the
procdef repository. The procdef repository is effectively a simple document
management and version control system for process definitions. The reason for
this is that individual processes may stay active for great periods of time.
If the process definition changes under the individual process, chaos will ensue. So the
repository ensures that a process will always work from the same version of a
process definition that was used to start it. In addition, the procdef manager
provides a simple editing UI for creating test versions of processes,
modifying them, and (soon, anyway) testing them in a "sandbox."
The user manager
At some point it would be a nice thing to include
LDAP (Lightweight Directory Access Protocol) in all this, but for the time being, and for those of us who don't use
LDAP, WFTK includes a simple user manager. Each user is represented as an
XML file, and permissions are managed centrally.
Everything in WFTK is represented as an XML document. The "expat" parser -- also an open source product (by
James Clark) -- parses the
documents; and a full-featured manipulation API works with the
documents, once loaded.
A typical workflow process definition might look like this (and, in fact, this
is the first usage scenario for the design):
I will not go into any great detail of code
analysis here. Visit the project site (listed below) for more in-depth information. But what I've included in the code example should give you a taste of what an XML process definition might look like.
There are a few interesting points I'd like to note. First, notice that the titles of
tasks and other actions can depend on data collected for the specific case. In
fact, the description of the process itself can depend on data collected when
the process is started. You can code for alerts and notifications as
an integral part of the process. Data received via e-mail will be a later
addition to the engine that will allow a great deal of flexibility.
Also, specific data items may be collected at any point in the process. The
task manager builds a form, depending on the process definition, that the user
then uses to complete the task. All the data thus collected becomes part of the
process datasheet, making it extremely simple to report the state of each individual process.
The open source experience
WFTK has been my
first experience developing in an open source environment. On the one hand, I was initially quite nervous that my code would be visible to the public. But this was an incentive to write better code, and certainly it caused me to
document my code extremely well. Where else would I be able to explain the
corners I was cutting to get a prototype out?
But the most striking aspect of the open source world is that people keep
e-mailing me out of the blue, giving extremely valuable advice about possible
tools, mistakes I have made, possible extensions, etc. I just received a
Rational Rose implementation of a subset of my engine from a college class team
in India. I have had extensive conversations about workflow design with academics,
an Air Force implementer, and the maintainers of other projects. The publicity
of open source has brought a much higher quality to this
project than to any of my previous work.