Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

XML Watch: Describe open source projects with XML, Part 4

Launching the DOAP vocabulary

Edd Dumbill (edd@xml.com), Editor and publisher, xmlhack.com
Edd Dumbill is managing editor of XML.com and program chair of the XML Europe conference. You can contact him at edd@xml.com.

Summary:  In this installment of XML Watch, Edd Dumbill concludes the development of a vocabulary for describing open source software projects, exploring the documentation, tools, and community that are required for the successful launch of the DOAP vocabulary. The steps taken are drawn from the author's experience with both open source projects and vocabularies such as FOAF and RSS.

View more content in this series

Date:  28 Jul 2004
Level:  Intermediate
Also available in:   Japanese

Activity:  6631 views
Comments:  

In the previous three articles in this series, I covered the development of an XML/RDF vocabulary, Description of a Project (DOAP), for describing open source projects and related resources. Through the use of DOAP, software maintainers will no longer have to register their programs at multiple Web sites. Instead, they can simply give the URL of the DOAP description. As more applications become DOAP-aware, new possibilities open up for participation in and adminstration of open source projects.

To reach these goals, it's important to do more than merely create the vocabulary. In this concluding article, I look at what's needed to get adoption for DOAP, in terms of documentation, tools, and community.

Viewing DOAP

To refresh your memory, here's a simple DOAP file -- Listing 1 shows a minimal DOAP file for the DOAP project itself.


Listing 1. Simple DOAP description of the DOAP project itself
<Project xmlns="http://usefulinc.com/ns/doap#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/">

    <name>DOAP</name>
    <homepage rdf:resource="http://usefulinc.com/doap" />
    <created>2004-05-04</created>
    <shortdesc xml:lang="en">
        Tools and vocabulary for describing community-based
        software projects.
    </shortdesc>
    <description xml:lang="en">
        DOAP (Description of a Project) is an RDF vocabulary and
        associated set of tools for describing community-based software
        projects.  It is intended to be an interchange vocabulary for
        software directory sites, and to allow the decentralized
        expression of involvement in a project.
    </description>
    <maintainer>
        <foaf:Person>
            <foaf:name>Edd Dumbill</foaf:name>
            <foaf:homepage rdf:resource="http://usefulinc.com/edd" />
        </foaf:Person>
    </maintainer>
</Project>

The first and most basic requirement for users of DOAP is to get a user-friendly view of this data. If no other editing mechanisms exist, you can edit a file by hand and then use the viewer to check the file -- many Web pages have been created with this time-honored and admittedly problematic method. The quickest way to create a viewer would probably be to write an XSLT stylesheet to transform the incoming RDF/XML into HTML.

However, DOAP is more than just an XML vocabulary, which means using XSLT isn't a straightforward decision. The viewer will serve more than one purpose: It not only provides viewing of DOAP data, but it is also the first example of a DOAP processing application. As such, many implementors coming to DOAP for the first time will take it as an example. Therefore, it's helpful if the DOAP viewer is an RDF-aware application.

To create a viewer, I used the Redland RDF Application library to read the DOAP file into an in-memory RDF store and extract the data from the store into XML, which I then formatted with XSLT. This intermediate XML representation can either be transformed into HTML on a server-side app, or can be used to drive a graphical user interface for a client-side DOAP viewer. Figure 1 shows the transformed HTML output from the viewer.


Figure 1. Transformed DOAP output
Transformed DOAP output

To create the viewer, I used the Mono/.NET bindings from Redland (see Resources). The code snippet in Listing 2 shows a loop to process every project in a DOAP file (normally, there's only one) and output data inside a <project> element in an XML document. The rdf ["type"] and doap ["Project"] variables are shortcuts for resource nodes with URIs corresponding to the terms in the RDF and DOAP schemas.


Listing 2. Redland C# snippet to extract projects from a DOAP file
foreach (Node proj in model.GetSources (rdf ["type"], doap ["Project"])) {
    w.WriteStartElement (null, "project", null);

    Node name = model.GetTarget (proj, doap ["name"]);
    w.WriteStartElement ("name");
    w.WriteString (name.Literal);
    w.WriteEndElement ();
    ...
    w.WriteEndElement ();
}

Another interesting option for writing a DOAP viewer is to use the RDF rendering support inside the Mozilla Web browser and create a XUL-based DOAP viewer. This could lead to an intriguing scenario for extensions in a browser such as Firefox.


Validating DOAP

The ability to validate DOAP files is an essential part of processing them. Validation is useful in both the creation and consumption of DOAP. For creators, whether they're writing DOAP by hand or creating tools to output it, a validator shows compliance to the specification. For consumers of DOAP, validation is required so that software doesn't end up processing junk.

At its most basic, validation is simply the process of reporting "yes" or "no" as to whether an input file meets the specification. More helpful validators report errors and recommendations. Perhaps the most well known of these is the W3C HTML Validator (see Resources), which returns all manner of helpful information for improving your Web pages' compliance to W3C specifications.

Because DOAP is RDF/XML, it can be validated at a variety of levels:

  • XML: DOAP must be well-formed XML.
  • RDF: DOAP must be valid RDF.
  • Semantic: The document must contain enough DOAP terms for it to make sense; for instance, a DOAP file would be pretty useless without a name, description, and homepage property.

Traditionally, pure XML validation techniques only extend to syntactic validation -- that is, the presence or absence of elements and attributes, and their ordering. While this catches a lot of errors, it doesn't help with scenarios where data processing is required in order to determine that something nonsensical, yet syntactically valid, has been written. However, certain semantic constraints can be expressed as syntactic ones. For instance, one DOAP requirement states that there must be a "homepage" property, so you can use an XML schema to catch those.

The problem with using XML schemas to validate RDF is that RDF has quite a flexible XML syntax, with more than one way of writing the same thing. Indeed, it's not possible to validate RDF with W3C XML Schema; it's easier just to run it through an RDF parser, such as Redland's raptor, to check its RDF-validity. Once you know that an incoming DOAP file is valid RDF, then you can apply a sequence of semantic tests to determine the quality of the rest of the file.

This strategy works fine as long as RDF processing tools are available to everyone, or as long as you're willing to provide a free Web service to validate DOAP. However, you can sacrifice some of RDF's flexibility of syntax to achieve greater ubiquity by providing an XML schema that achieves the first 70% of the validation process and uses syntactic methods to verify a certain amount of semantic integrity. This, together with a bit of common sense, may be enough for a lot of people.

Working along those lines, I constructed a RELAX NG schema for a restricted RDF/XML syntax version of DOAP. Note that the schema does not validate all DOAP files, but what it does validate is guaranteed to be processable by an application as DOAP. Listing 3 shows the schema in RELAX NG Compact notation.


Listing 3. RELAX NG schema for restricted XML profile of DOAP
default namespace = "http://usefulinc.com/ns/doap#"
namespace foaf = "http://xmlns.com/foaf/0.1/"
namespace rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
namespace rdfs = "http://www.w3.org/2000/01/rdf-schema#"

grammar {

rdf-resource = attribute rdf:resource { text }
xml-lang = attribute xml:lang { text }
literal = xml-lang?, text

Person-element = element foaf:Person {
            element foaf:name { literal }
            & (
                element foaf:homepage { rdf-resource } |
                element foaf:mbox { rdf-resource } |
                element foaf:mbox_sha1sum { text }
            )+&
            element rdfs:seeAlso { rdf-resource}*
        }

cvs-repository = element CVSRepository {
            element anon-root { text },
            element module { text },
            element browse { rdf-resource}?
        }

svn-repository = element SVNRepository {
            element location { rdf-resource },
            element browse { rdf-resource}?
        }

bk-repository = element BKRepository {
            element location { rdf-resource },
            element module { text },
            element browse { rdf-resource}?
            }

arch-repository = element ArchRepository {
            element location { rdf-resource },
            element module { text },
            element browse { rdf-resource}?
            }

start = element Project
{
    element name { literal }&
    element homepage { rdf-resource }&
    element old-homepage { rdf-resource }*&
    element created { text }&
    element shortdesc { literal }+&
    element description { literal }+&
    element mailing-list { rdf-resource }*&

    element maintainer { Person-element }+&
    element developer { Person-element }*&
    element documenter { Person-element }*&
    element translator { Person-element }*&
    element tester { Person-element }*&
    element helper { Person-element }*&

    element category { rdf-resource }*&
    element release {
        element Version {
            element name { text },
            element created { text },
            element revision { text }
        }
    }*&
    element license { rdf-resource }*&
    element download-page { rdf-resource }?&
    element download-mirror { rdf-resource }*&
    element repository { ( cvs-repository | svn-repository
        | bk-repository | arch-repository ) }*&
    element bug-database { rdf-resource }?&
    element screenshots { rdf-resource }*&
    element wiki { rdf-resource }*&

    element programming-language { text }*&
    element os { text }*
}
}

One very handy spin-off of constructing the schema is that XML editing tools can now be used to write valid DOAP files. Figure 2 shows James Clark's nxml mode for the Emacs text editor being used to edit the DOAP description for DOAP itself. The nxml mode uses RELAX NG Compact schemas. Note the underlining of the validity error. At the hand-authoring level, the tradeoff between losing some expressivity of syntax and getting editing tool support is worth it.


Figure 2. Emacs with nxml mode editing a DOAP file
Emacs with nxml mode editing a DOAP file

Using verification with the schema, both XML well-formedness and a certain amount of semantic integrity can be assured. That's not enough, however, and you must resort to RDF processing to perform the remaining steps of the validation. Such steps might include checking the license URIs given to see if DOAP tools recognize them, spell-checking the text where appropriate, and so on.

I have begun work on a validator that works along these lines, giving XML output in a way that's similar to the viewer application. This way, the validator code can be used both for Web and client-side GUI applications. The validator performs four levels of checks:

  • XML well-formedness
  • Validity against the RELAX NG Schema (which can be optionally disabled)
  • RDF validity
  • A series of DOAP-specific semantic checks

Listing 4 shows two example runs of the validator that catch XML well-formedness and syntactic validation errors. The intention is that the test name from the <Test> element can be used as an index into further, more detailed, explanation and recommendation.


Listing 4. Validator output
<Problems>
<Problem>
<Test>ParseXML</Test>
<Title>Not well-formed XML</Title>
<Description>DOAP files must follow the rules of XML syntax.</Description>
<Detail>unmatched closing element: expected name but found Project Line
27, position 10.</Detail>
</Problem>
</Problems>

<Problems>
<Problem>
<Test>Doap.Tests.Xml.RelaxValidate</Test>
<Title>XML syntax validation</Title>
<Description>This test checks to see that the DOAP file validates against
a restricted XML syntax, which guarantees its validity.</Description>
<Detail>Invalid start tag found. LocalName = Person, NS = 
http://xmlns.com/foaf/0.1/.  line 26, column 3</Detail>
</Problem>
</Problems>

You can obtain the code for the validator at the DOAP home page (see Resources). A public-access instance of the validator will be set up on the Web. It's important for potential adopters of DOAP to be able to check their output at an early stage. Additionally, I will recommend that all DOAP processing applications validate their input first -- at the very least with the RELAX NG schema -- in order to set minimum expectations for the quality of the DOAP content on the Web. A brief look at the mess caused by lax parsing of RSS feeds on the Web should be enough to convince you that some level of strictness is helpful. To help this, the code for the validator and viewer will be given a permissive open source license so that they can be integrated into other programs without worry.


Creating DOAP

How will DOAP files be created in the first place? The very first adopters will write it by hand in a text editor. The combination of the validator and the XML schema file will help ensure that such files do not contain errors. As mentioned above, DOAP files that conform to the schema will lose some RDF expressivity, but for most people this loss will not matter.

Most people who create DOAP files will have no interest in the DOAP vocabulary itself. They will merely want the resulting functionality, allowing them to register their projects with software registries. They are likely to take an example file and tweak it to include their data. Because of this, it's critical to ensure that enough instructional matter, good examples, and validation tools are available to make sure that bad examples do not proliferate. Once a bad cut-and-paste gets out in the wild, there's little stopping it.

The cut-and-paste phase of DOAP creation should be kept to a minimum. You can do this by quickly introducing some level of tool support for DOAP file creation. Most software developers already use some kind of packaging and configuration system that contains at least some of the metadata required to generate a DOAP file for their software. Such systems include GNU Autoconf for C programming, Python distutils, Perl's MakeMaker files, and so on. If an easy DOAP generation solution is available for a developer's system of choice, it maximizes the chances of getting the DOAP file right.

In addition, you can use various guided methods of DOAP file creation:

  • DOAP-a-matic: Leigh Dodds' FOAF-a-matic is a JavaScript-enabled Web page that makes it easy to create FOAF (Friend-of-a-friend) files. A DOAP version of this would be useful, so all a developer has to do is answer a few questions.
  • Freshmeat-to-DOAP conversion: The Freshmeat software registry is probably the largest around, and it provides an XML export of its content. It would be a relatively easy matter to write a program to generate DOAP descriptions from Freshmeat content.
  • GUI application: A graphical application could be integrated into an IDE such as Eclipse or MonoDevelop, with the metadata necessary for DOAP stored with the build information for a project. The DOAP file would then be part of the resulting build.

However DOAP is created, the two important goals are that it should be as easy as possible for the developer to use, and the output it creates should be as rich and as high quality as possible. The ubiquity and the quality of the available data are both key to DOAP's prospects.


Community

Even with tools in place, if there's no community gathered around the DOAP project, then it is unlikely to last very long. When introducing a new technology, communication is paramount. It is important that the aims of the project are clearly expressed, as are the rules of engagement. The most basic step in communication is to construct a Web site that will hold all the relevant documentation and point to resources that those interested in the project can use.

Figure 3 shows a screenshot of the DOAP home page. Note two important features: clear navigation and news. A static Web site may often give the impression of a stagnant technology, and having regularly updated news meets the dual need of informing people and making the project seem alive. Using an RSS feed for the project news is also a must these days.


Figure 3. Screenshot of the DOAP home page
Screenshot of the DOAP home page

Although it's not necessary to have all conceivable questions answered from the get-go, the project Web site needs to be responsive in adding documentation and tutorial material as quickly as possible. How-tos and FAQs will often be the first place new adopters look.

However, communication isn't all one-way. There needs to be a forum in which those interested in using and adopting DOAP can meet each other and ask questions. The mailing list is historically the most successful medium for doing this with open source projects, and indeed DOAP has one of those too (see Resources). As the community grows, third parties that use DOAP can announce their own products and achievements there, spurring adoption and further growth.

Finally, the project must be seeded by promoting it to those who are likely to be interested. In addition to presenting on DOAP at XML conferences, I will be promoting it in various mailing lists and to key people in the open source world.


Conclusion

Although this article is the end of a four-part series on the creation of DOAP, it also marks the beginning of the project's lifetime in the public eye. I believe that DOAP hits the right compromise between accessing the power of semantic Web technologies and remaining "Webby" enough to gain large-scale support. I will revisit the project in this column in some months' time to see just how these decisions have worked in the medium-to-long term.


Resources

About the author

Edd Dumbill is managing editor of XML.com and program chair of the XML Europe conference. You can contact him at edd@xml.com.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source
ArticleID=11946
ArticleTitle=XML Watch: Describe open source projects with XML, Part 4
publish-date=07282004
author1-email=edd@xml.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers