In the previous two articles in this series, I explained the rationale and design considerations for an XML/RDF vocabulary to describe open source projects. The Description of a Project (DOAP) vocabulary will meet the needs of project maintainers who find they must register their software at myriad Web sites, and for anyone seeking to exchange such data. Part 1 outlined existing work in this area, and defined the boundaries of the project. Part 2 presented candidate terms for the vocabulary, and mentioned some design concerns.
In this article, I present the first draft of the DOAP vocabulary along with some example descriptions of projects. A lot of this article is example-based: You are encouraged to experiment with and create your own DOAP descriptions as you read.
I'll use the language of RDF schemas to talk about DOAP. Although DOAP will be pretty easy to use as XML, you'll see that it is fundamentally an RDF vocabulary. Be aware of two main concepts in RDF schemas as used in this article: the class and the property. A class is a type of resource in RDF, similar to the way that a class is a type of object in Java programming. A property is a relationship between one resource and either another resource or a literal value. For additional developerWorks articles explaining RDF schemas, see Resources.
Before I start explaining the terms of the DOAP vocabulary, take a look at this simple example DOAP file -- Listing 1 shows a minimal description of the DOAP project itself:
Listing 1. A minimal description of the DOAP project
<Project xmlns="http://usefulinc.com/ns/doap#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <name>DOAP</name> <homepage rdf:resource="http://usefulinc.com/doap" /> <created>2004-05-04</created> <shortdesc xml:lang="en"> Tools and vocabulary for describing community-based software projects. </shortdesc> <description xml:lang="en"> DOAP (Description of a Project) is an RDF vocabulary and associated set of tools for describing community-based software projects. It is intended to be an interchange vocabulary for software directory sites, and to allow the decentralized expression of involvement in a project. </description> <maintainer> <foaf:Person> <foaf:name>Edd Dumbill</foaf:name> <foaf:homepage rdf:resource="http://usefulinc.com/edd" /> </foaf:Person> </maintainer> </Project>
Here are some general rules about writing DOAP files that you can draw from Listing 1:
- Classes are labeled with capitalized terms, such as "Project" and "Person." This convention in writing RDF vocabularies is a general one that seems to work well. Properties are written in lower case.
- The outer element of a DOAP document is
<Project>. The February 2004 RDF syntax specification (see Resources) allows the omission of the
<rdf:RDF>container where the description can be written with one outer node.
- The Friend-of-a-Friend (FOAF) vocabulary is used to describe people. I have written several articles on this topic (see Resources).
- The DOAP namespace is
- The standard
xml:langattribute denotes the language of textual properties.
The DOAP vocabulary currently contains three kinds of classes:
- Project: The main project resource
- Version: An instance of released software
- Repository: A source code repository
In fact, Repository has several subclasses, which I will describe later. Now, I'll examine each of the classes in turn.
As you might expect, a Project is the main class in DOAP. Each Project is uniquely identified by its home page URI (I outlined the reasons for this in the previous article). Additionally, a project description should also list all of its old home page URIs, as they are still valid unique identifiers that others may still use to refer to the project.
Table 1 shows the permissible properties for a Project. A description can contain an unlimited number of instances of each property. It is probably common sense to have only one name for a Project, but this is not a necessity. Additionally, a Project description can have as few properties as the author desires. The only minimum requirement to be useful is the homepage property.
Table 1. Properties of the Project class (an asterisk denotes an identifying property)
|name||The main name of the project, by which it is known publicly|
|shortname||A short name of the project, most often used for filenames|
|homepage*||URI of the project's homepage, associated only with this one project|
|old-homepage*||URI of a project's past homepage, associated only with this one project|
|created||Date when the project was created, in YYYY-MM-DD form|
|description||Plain text description of a project; several sentences|
|shortdesc||A short plain text description of a project; eight or nine words|
|category||A URI denoting a category assigned to the project|
|wiki||URI of a Wiki attached to this project|
|bug-database||URI of a bug tracker or e-mail address to report bugs on the project|
|screenshots||URI of a Web page with screenshots of the project|
|mailing-list||URI of a mailing list attached to this project|
|programming-language||Programming language this project is implemented in or intended for use with|
|os||Operating system the project is limited to (omit if the project is not OS-specific)|
|license||URI of a license through which the project software is available|
|download-page||URI of the location where the project software can be downloaded|
|download-mirror||URI of a download mirror site|
Listing 2 shows some properties that can be used to extend
Listing 1 when you insert them before the
Listing 2. Some additional properties of the Project class
<mailing-list rdf:resource="http://lists.usefulinc.com/mailman/listinfo/doap-interest" /> <!-- Freshmeat category: Information Management :: Metadata/Semantic Models --> <category rdf:resource="http://software.freshmeat.net/browse/1020/" /> <!-- OSDIR category: All Platforms :: Information Management (XML) --> <category rdf:resource="http://osdir.com/Downloads+index-req-viewsdownload-sid-201.phtml" /> <license rdf:resource="http://usefulinc.com/doap/licenses/GPL" />
Listing 2 demonstrates several more principles of the DOAP vocabulary, such as:
- Properties whose values are URIs use the RDF construct
rdf:resourceto contain the URI.
- DOAP passes the buck on categorization schemes. Of the many categorization schemes, each has different advantages. Specialist communities may well have their own schemes. The approach taken in DOAP is simply to mandate that the category must be a URI. Listing 2 shows the use of two category schemes -- for Freshmeat and OSDIR.com. In each case, the category was derived from the URI of the page for the corresponding category on the Web site. Take care with canonicalization, however, as the Web sites often allow different forms of a URL, all pointing to the same page. DOAP needs to standardize these for the common sites.
- The common software licenses will each have a well-known URI assigned to them by DOAP, as described in Part 2 of this series. However, as maintainer of the DOAP project, I have no desire to own identifiers for licenses and will put in place a mechanism to allow arbitrary URIs. For license URIs that processing software doesn't already know, it should be possible to retrieve a small RDF description of the license to provide software with human-readable license descriptions.
As I mentioned earlier, the
xml:lang attribute can
be used to implement internationalization of a DOAP description.
The permissible values of
xml:lang are the standard
codes for languages as defined in RFC 3066 (see Resources).
Figure 1 shows a screenshot of an excerpt from the full DOAP description
of DOAP itself (see Resources). I took a screenshot because not all
readers will have the right fonts on hand to view the text.
Figure 1. Internationalized description properties
The DOAP schema defines a Repository class, a general class used to describe source code repositories. In itself this is not very useful, so DOAP has four more concrete subclasses of Repository, for the Subversion, BitKeeper, CVS, and GNU Arch source revision control systems. Table 2 shows each of the subclasses and the properties that are applicable to them:
Table 2. Properties applicable to Repository subclasses
|anon-root||Path of the root of the anonymously accessible repository||*|
|module||Module name of source code within the repository||*||*|
|browse||URL of Web browser interface to the repository||*||*||*|
|location||Base URL of archive||*||*||*|
DOAP is restricted to describing public access versions of the repositories, which are read-only. This removes the need to codify access control information for the writeable repositories, thus simplifying DOAP without much penalty as participant developers will have other ways of discovering this information.
To make this clearer, here are some example descriptions for
each of these systems. Subversion repositories are simply URLs. For
example, the DOAP public repository I set up for this project has
also publicly browseable (see Resources). Written using DOAP, these details look like this:
<SVNRepository> <location rdf:resource="http://svn.usefulinc.com/svn/repos/trunk/doap/" /> <browse rdf:resource="http://svn.usefulinc.com/cgi-bin/viewcvs.cgi/trunk/doap/" /> </SVNRepository>
DOAP entries for BitKeeper look similar to those for Subversion, as a single URL is enough to identify a repository. Here's a sample description for the Linux 2.6 kernel:
<BKRepository> <location rdf:resource="http://linux.bkbits.net/linux-2.6" /> <browse rdf:resource="http://linux.bkbits.net:8080/linux-2.6" /> </BKRepository>
CVS is probably the most popular source revision control system
used in the open source world. Each repository is identified by a
root and a module name. For instance, the Epiphany Web browser for
GNOME can be checked out using the command
cvs co -d:pserver:email@example.com:/cvs/gnome epiphany. A
DOAP description for Epiphany's repository looks like:
<CVSRepository> <anon-root>:pserver:firstname.lastname@example.org:/cvs/gnome</anon-root> <module>epiphany</module> <browse rdf:resource="http://cvs.gnome.org/viewcvs/epiphany/" /> </CVSRepository>
The GNU Arch revision control system is currently gaining
popularity. It eschews the idea of a central repository, and works
on the principle that every developer has a repository. However, a
project still needs to designate one repository as the place
where the official released versions of the software are created.
Arch has the concepts of archive location and module name. For instance, you can access a version of the "PlanetPlanet" RSS aggregation system
using the archive at
http://www.gnome.org/~jdub/arch/, with the module name
email@example.com/planet--devel--0.0. The following example
shows how to write this code in DOAP:
<ArchRepository> <location rdf:resource="http://www.gnome.org/~jdub/arch" /> <module>firstname.lastname@example.org/planet--devel--0.0</module> </ArchRepository>
To embed the repository location in the DOAP description, it must be the value of the repository property. Look at DOAP's own DOAP file (in Resources) to see how this works.
Although, as stated in the first article in this series, the tracking of each project release is not part of the first phase of DOAP, you still a need to describe current releases of software projects. The Version class represents an instance of a software release. Table 3 shows its properties.
Table 3. Properties of the Version class
|branch||A string indicating the branch of this version, such as stable, unstable, gnome24, or gnome26|
|name||A release name, such as Panther|
|created||Date of release in YYYY-MM-DD form|
|revision||Revision number of the release, such as 1.0|
An example version description for Mac OS X 10.3 might look like the following:
<Version> <branch>stable</branch> <name>Panther</name> <revision>10.3</revision> <created>2003-10-24</created> </Version>
Each project may well have more than one current release, hence
the need for the
branch property. For example, it is not
uncommon for projects to maintain a stable branch, while also
releasing an unstable branch for testing new features.
The formal definition of classes and properties in the DOAP vocabulary can be found in the DOAP schema (see Resources). It is written as an RDF schema, borrowing one term from the OWL ontology language to denote the identifying properties (in OWL-speak, inverse functional properties). Tools that are RDF-schema- or OWL-enabled can use the schema to assist authoring or interpretation of DOAP descriptions.
One nice feature of RDF schemas is that if you use them properly, you can transform them into a good source of documentation. Morten Fredriksen has created an online service for doing this. From the Resources section, you can view a fully hyperlinked reference to all of the DOAP terms. Figure 2 shows an excerpt:
Figure 2. DOAP schema transformed with Fredriksen's schema viewer
As the examples in this article show, you can also process DOAP as straight XML. I do not encourage this approach to DOAP. RDF is best processed as RDF, and you will find no shortage of tools for doing this. However, one large advantage to DOAP having a reasonably regular XML syntax is the ability to create an XSLT stylesheet to transform a DOAP description into an easy-to-read chunk of HTML.
The crucial next step in the DOAP project is to establish a suite of tools for the creation and consumption of the vocabulary. If DOAP is to live up to its promise as an interchange vocabulary for software directories, then it needs some real-world deployment.
Secondly, it is plain that DOAP needs to be extended to cover software releases. Making release announcements is a major burden on software projects, and some way of automating this process would help maintainers.
- Review the previous articles in this series part 1 introduces the DOAP project while part 2 presents candidate terms for the vocabulary, and identifies several design concerns.
- Read "Basic
XML and RDF techniques for knowledge management, Part 4" (developerWorks February 2002), in which author Uche Ogbuji explains RDF schemas.
- Get the latest in the RDF/XML Syntax
Specification (Revised), published in February 2004.
- Check out the author's description of the FOAF vocabulary in "Finding
friends with XML and RDF" (developerWorks June 2002) and "Support
online communities with FOAF (developerWorks August 2002).
- Review RFC 3066 to define the language codes used with the
- Use Morten Fredriksen's RDF Schema viewer to produce a
pleasingly human-readable version of the DOAP schema.
- Find hundreds more XML resources on the developerWorks XML technology zone. Read previous installments in the XML Watch column series.
- Browse for books on these and other technical topics.
- Learn how you can become an IBM Certified Developer in XML and related technologies.