Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

XML Watch: Describe open source projects with XML, Part 3

A first draft of the DOAP vocabulary

Edd Dumbill (edd@xml.com), Editor and publisher, xmlhack.com
Edd Dumbill is managing editor of XML.com and program chair of the XML Europe conference. He is co-author of the forthcoming O'Reilly book Mono: A Developer's Notebook." You can contact him at edd@xml.com.

Summary:  In this installment of XML Watch, Edd Dumbill continues the development of a vocabulary for describing open source software projects, presenting a schema for the new vocabulary and example project descriptions.

View more content in this series

Date:  11 Jun 2004
Level:  Intermediate
Also available in:   Japanese

Activity:  12081 views
Comments:  

In the previous two articles in this series, I explained the rationale and design considerations for an XML/RDF vocabulary to describe open source projects. The Description of a Project (DOAP) vocabulary will meet the needs of project maintainers who find they must register their software at myriad Web sites, and for anyone seeking to exchange such data. Part 1 outlined existing work in this area, and defined the boundaries of the project. Part 2 presented candidate terms for the vocabulary, and mentioned some design concerns.

In this article, I present the first draft of the DOAP vocabulary along with some example descriptions of projects. A lot of this article is example-based: You are encouraged to experiment with and create your own DOAP descriptions as you read.

Overview

I'll use the language of RDF schemas to talk about DOAP. Although DOAP will be pretty easy to use as XML, you'll see that it is fundamentally an RDF vocabulary. Be aware of two main concepts in RDF schemas as used in this article: the class and the property. A class is a type of resource in RDF, similar to the way that a class is a type of object in Java programming. A property is a relationship between one resource and either another resource or a literal value. For additional developerWorks articles explaining RDF schemas, see Resources.

Before I start explaining the terms of the DOAP vocabulary, take a look at this simple example DOAP file -- Listing 1 shows a minimal description of the DOAP project itself:


Listing 1. A minimal description of the DOAP project
<Project xmlns="http://usefulinc.com/ns/doap#"
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns:foaf="http://xmlns.com/foaf/0.1/">

        <name>DOAP</name>
        <homepage rdf:resource="http://usefulinc.com/doap" />
        <created>2004-05-04</created>
        <shortdesc xml:lang="en">
            Tools and vocabulary for describing community-based
            software projects.
        </shortdesc>
        <description xml:lang="en">
            DOAP (Description of a Project) is an RDF vocabulary and
            associated set of tools for describing community-based software
            projects.  It is intended to be an interchange vocabulary for
            software directory sites, and to allow the decentralized
            expression of involvement in a project.
        </description>
        <maintainer>
            <foaf:Person>
                <foaf:name>Edd Dumbill</foaf:name>
                <foaf:homepage rdf:resource="http://usefulinc.com/edd" />
            </foaf:Person>
        </maintainer>
</Project>

Here are some general rules about writing DOAP files that you can draw from Listing 1:

  • Classes are labeled with capitalized terms, such as "Project" and "Person." This convention in writing RDF vocabularies is a general one that seems to work well. Properties are written in lower case.
  • The outer element of a DOAP document is <Project>. The February 2004 RDF syntax specification (see Resources) allows the omission of the <rdf:RDF> container where the description can be written with one outer node.
  • The Friend-of-a-Friend (FOAF) vocabulary is used to describe people. I have written several articles on this topic (see Resources).
  • The DOAP namespace is http://usefulinc.com/ns/doap#.
  • The standard xml:lang attribute denotes the language of textual properties.

The DOAP vocabulary currently contains three kinds of classes:

  • Project: The main project resource
  • Version: An instance of released software
  • Repository: A source code repository

In fact, Repository has several subclasses, which I will describe later. Now, I'll examine each of the classes in turn.

The Project class

As you might expect, a Project is the main class in DOAP. Each Project is uniquely identified by its home page URI (I outlined the reasons for this in the previous article). Additionally, a project description should also list all of its old home page URIs, as they are still valid unique identifiers that others may still use to refer to the project.

Table 1 shows the permissible properties for a Project. A description can contain an unlimited number of instances of each property. It is probably common sense to have only one name for a Project, but this is not a necessity. Additionally, a Project description can have as few properties as the author desires. The only minimum requirement to be useful is the homepage property.


Table 1. Properties of the Project class (an asterisk denotes an identifying property)
PropertyDescription
nameThe main name of the project, by which it is known publicly
shortnameA short name of the project, most often used for filenames
homepage*URI of the project's homepage, associated only with this one project
old-homepage*URI of a project's past homepage, associated only with this one project
createdDate when the project was created, in YYYY-MM-DD form
descriptionPlain text description of a project; several sentences
shortdescA short plain text description of a project; eight or nine words
categoryA URI denoting a category assigned to the project
wikiURI of a Wiki attached to this project
bug-databaseURI of a bug tracker or e-mail address to report bugs on the project
screenshotsURI of a Web page with screenshots of the project
mailing-listURI of a mailing list attached to this project
programming-languageProgramming language this project is implemented in or intended for use with
osOperating system the project is limited to (omit if the project is not OS-specific)
licenseURI of a license through which the project software is available
download-pageURI of the location where the project software can be downloaded
download-mirrorURI of a download mirror site
repositoryA doap:Repository describing a source code repository for the project
releaseA doap:Version describing a current release of the project's software
maintainerA foaf:Person describing the project maintainer or leader
developerA foaf:Person describing a developer on the project
documenterA foaf:Person describing a contributor of documentation to the project
translatorA foaf:Person describing a contributor of translations to the project
helperA foaf:Person describing a contributor to the project not otherwise described by the other properties

Listing 2 shows some properties that can be used to extend Listing 1 when you insert them before the </Project> tag:


Listing 2. Some additional properties of the Project class
 <mailing-list 
  rdf:resource="http://lists.usefulinc.com/mailman/listinfo/doap-interest" />

 <!-- Freshmeat category:
     Information Management :: Metadata/Semantic Models -->
 <category rdf:resource="http://software.freshmeat.net/browse/1020/" />
 <!-- OSDIR category:
     All Platforms :: Information Management (XML) -->
 <category 
  rdf:resource="http://osdir.com/Downloads+index-req-viewsdownload-sid-201.phtml" />

 <license rdf:resource="http://usefulinc.com/doap/licenses/GPL" />

Listing 2 demonstrates several more principles of the DOAP vocabulary, such as:

  • Properties whose values are URIs use the RDF construct rdf:resource to contain the URI.
  • DOAP passes the buck on categorization schemes. Of the many categorization schemes, each has different advantages. Specialist communities may well have their own schemes. The approach taken in DOAP is simply to mandate that the category must be a URI. Listing 2 shows the use of two category schemes -- for Freshmeat and OSDIR.com. In each case, the category was derived from the URI of the page for the corresponding category on the Web site. Take care with canonicalization, however, as the Web sites often allow different forms of a URL, all pointing to the same page. DOAP needs to standardize these for the common sites.
  • The common software licenses will each have a well-known URI assigned to them by DOAP, as described in Part 2 of this series. However, as maintainer of the DOAP project, I have no desire to own identifiers for licenses and will put in place a mechanism to allow arbitrary URIs. For license URIs that processing software doesn't already know, it should be possible to retrieve a small RDF description of the license to provide software with human-readable license descriptions.

As I mentioned earlier, the xml:lang attribute can be used to implement internationalization of a DOAP description. The permissible values of xml:lang are the standard codes for languages as defined in RFC 3066 (see Resources). Figure 1 shows a screenshot of an excerpt from the full DOAP description of DOAP itself (see Resources). I took a screenshot because not all readers will have the right fonts on hand to view the text.


Figure 1. Internationalized description properties
Internationalized description properties

The Repository classes

The DOAP schema defines a Repository class, a general class used to describe source code repositories. In itself this is not very useful, so DOAP has four more concrete subclasses of Repository, for the Subversion, BitKeeper, CVS, and GNU Arch source revision control systems. Table 2 shows each of the subclasses and the properties that are applicable to them:


Table 2. Properties applicable to Repository subclasses
PropertyDescriptionSVNRepositoryBKRepositoryCVSRepositoryArchRepository
anon-rootPath of the root of the anonymously accessible repository  * 
moduleModule name of source code within the repository  **
browseURL of Web browser interface to the repository*** 
locationBase URL of archive** *

DOAP is restricted to describing public access versions of the repositories, which are read-only. This removes the need to codify access control information for the writeable repositories, thus simplifying DOAP without much penalty as participant developers will have other ways of discovering this information.

To make this clearer, here are some example descriptions for each of these systems. Subversion repositories are simply URLs. For example, the DOAP public repository I set up for this project has the URL http://svn.usefulinc.com/svn/repos/trunk/doap/. It's also publicly browseable (see Resources). Written using DOAP, these details look like this:

<SVNRepository>
 <location rdf:resource="http://svn.usefulinc.com/svn/repos/trunk/doap/" />
 <browse rdf:resource="http://svn.usefulinc.com/cgi-bin/viewcvs.cgi/trunk/doap/" />
</SVNRepository>

DOAP entries for BitKeeper look similar to those for Subversion, as a single URL is enough to identify a repository. Here's a sample description for the Linux 2.6 kernel:

<BKRepository>
  <location rdf:resource="http://linux.bkbits.net/linux-2.6" />
  <browse rdf:resource="http://linux.bkbits.net:8080/linux-2.6" />
</BKRepository>

CVS is probably the most popular source revision control system used in the open source world. Each repository is identified by a root and a module name. For instance, the Epiphany Web browser for GNOME can be checked out using the command cvs co -d:pserver:anonymous@anoncvs.gnome.org:/cvs/gnome epiphany. A DOAP description for Epiphany's repository looks like:

<CVSRepository>
  <anon-root>:pserver:anonymous@anoncvs.gnome.org:/cvs/gnome</anon-root>
  <module>epiphany</module>
  <browse rdf:resource="http://cvs.gnome.org/viewcvs/epiphany/" />
</CVSRepository>

The GNU Arch revision control system is currently gaining popularity. It eschews the idea of a central repository, and works on the principle that every developer has a repository. However, a project still needs to designate one repository as the place where the official released versions of the software are created. Arch has the concepts of archive location and module name. For instance, you can access a version of the "PlanetPlanet" RSS aggregation system using the archive at http://www.gnome.org/~jdub/arch/, with the module name jdub@perkypants.org--projects/planet--devel--0.0. The following example shows how to write this code in DOAP:

<ArchRepository>
  <location rdf:resource="http://www.gnome.org/~jdub/arch" />
  <module>jdub@perkypants.org--projects/planet--devel--0.0</module>
</ArchRepository>

To embed the repository location in the DOAP description, it must be the value of the repository property. Look at DOAP's own DOAP file (in Resources) to see how this works.

The Version class

Although, as stated in the first article in this series, the tracking of each project release is not part of the first phase of DOAP, you still a need to describe current releases of software projects. The Version class represents an instance of a software release. Table 3 shows its properties.


Table 3. Properties of the Version class
PropertyDescription
branchA string indicating the branch of this version, such as stable, unstable, gnome24, or gnome26
nameA release name, such as Panther
createdDate of release in YYYY-MM-DD form
revisionRevision number of the release, such as 1.0

An example version description for Mac OS X 10.3 might look like the following:

<Version>
  <branch>stable</branch>
  <name>Panther</name>
  <revision>10.3</revision>
  <created>2003-10-24</created>
</Version>

Each project may well have more than one current release, hence the need for the branch property. For example, it is not uncommon for projects to maintain a stable branch, while also releasing an unstable branch for testing new features.


The DOAP Schema

The formal definition of classes and properties in the DOAP vocabulary can be found in the DOAP schema (see Resources). It is written as an RDF schema, borrowing one term from the OWL ontology language to denote the identifying properties (in OWL-speak, inverse functional properties). Tools that are RDF-schema- or OWL-enabled can use the schema to assist authoring or interpretation of DOAP descriptions.

One nice feature of RDF schemas is that if you use them properly, you can transform them into a good source of documentation. Morten Fredriksen has created an online service for doing this. From the Resources section, you can view a fully hyperlinked reference to all of the DOAP terms. Figure 2 shows an excerpt:


Figure 2. DOAP schema transformed with Fredriksen's schema viewer
Figure 2. DOAP schema transformed with Fredriksen's schema viewer

As the examples in this article show, you can also process DOAP as straight XML. I do not encourage this approach to DOAP. RDF is best processed as RDF, and you will find no shortage of tools for doing this. However, one large advantage to DOAP having a reasonably regular XML syntax is the ability to create an XSLT stylesheet to transform a DOAP description into an easy-to-read chunk of HTML.


Next steps

The crucial next step in the DOAP project is to establish a suite of tools for the creation and consumption of the vocabulary. If DOAP is to live up to its promise as an interchange vocabulary for software directories, then it needs some real-world deployment.

Secondly, it is plain that DOAP needs to be extended to cover software releases. Making release announcements is a major burden on software projects, and some way of automating this process would help maintainers.


Resources

About the author

Edd Dumbill is managing editor of XML.com and program chair of the XML Europe conference. He is co-author of the forthcoming O'Reilly book Mono: A Developer's Notebook." You can contact him at edd@xml.com.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12407
ArticleTitle=XML Watch: Describe open source projects with XML, Part 3
publish-date=06112004
author1-email=edd@xml.com
author1-email-cc=