I concluded my most recent column by posing a series of questions raised by my use of the FOAF application. In the course of this column, I suggest some answers to those questions concerning accountability and privacy. Although my suggestions are certainly not the only solutions, I highlight techniques developed and proven through testing a FOAF software agent in online communities.
In my earlier FOAF article, I touched on the possibilities that exist when FOAF descriptions are aggregated, and showed the value that can be obtained by gathering a community's knowledge in one place. However, I left undiscussed the subject of how this information might be discovered.
The simplest way is to have participants register the URL of their FOAF file with a central service, which then does the aggregation. This is unfortunately rather self-defeating. One great advantage of the Web's infrastructure is its ability to enable the decentralization of publication. The moment you use a central resource for connecting FOAF files together, you may as well abandon the Web and just use a single relational database.
Instead, I follow the Web's model of linking. The particular mechanism used in FOAF is the RDF Schema
seeAlso property, which can hint of the existence of related machine-readable information. Listing 1 shows a FOAF description of me that contains a link to an acquaintance of mine.
Listing 1. Using rdfs:seeAlso to link FOAF descriptions
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <foaf:Person> <foaf:name>Edd Dumbill</foaf:name> <foaf:mbox rdf:resource="mailto:firstname.lastname@example.org" /> <foaf:knows> <foaf:Person> <foaf:mbox rdf:resource="mailto:email@example.com" /> <rdfs:seeAlso rdf:resource="http://example.org/jdoe/foaf.rdf" /> </foaf:Person> </foaf:knows> </foaf:Person> </rdf:RDF>
Software agents that spider FOAF files from the Web can thus follow links from document to document, accumulating more information about a community.
seeAlso link also serves as a useful way of separating FOAF descriptions into multiple files. Listing 2 shows an example of this.
Listing 2. Using rdfs:seeAlso to help manage large descriptions
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <foaf:Person> <foaf:name>Edd Dumbill</foaf:name> <foaf:mbox rdf:resource="mailto:firstname.lastname@example.org" /> <!-- personal details here --> <rdfs:seeAlso rdf:resource="http://example.org/edd/personal.rdf" /> <!-- photo album metadata here --> <rdfs:seeAlso rdf:resource="http://example.org/edd/photos.rdf" /> <!-- descriptions of friends here --> <rdfs:seeAlso rdf:resource="http://example.org/edd/friends.rdf" /> </foaf:Person> </rdf:RDF>
seeAlso links, you do have the risk that a software agent could spend a long time spidering descriptions that are of no interest to the it; for instance, an agent that's designed to discover my publication list would have no interest in the stacks of metadata I have about my holiday photographs. Fortunately, you can use RDF to provide more information about the linked resource. Listing 3 shows how to denote a publication list.
Listing 3. Adding more information to rdfs:seeAlso links
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:bib="http://example.org/biblio/0.1/"> <foaf:Person> <foaf:name>Edd Dumbill</foaf:name> <foaf:mbox rdf:resource="mailto:email@example.com" /> <!-- publication list --> <rdfs:seeAlso> <foaf:BiblioRefs rdf:about="http://example.org/edd/biblio.rdf" /> </rdfs:seeAlso> </foaf:Person> </rdf:RDF>
When you use this mechanism, software agents can identify the specific areas of data that interest them.
Every existing community is based on some degree of shared trust. If that trust disappears or is abused, the community suffers, and may indeed collapse. Online communities demonstrate that although one may freely assume a fictional persona, the general rules of inter-personal relationships remain firmly in place; for example, if you start a pernicious rumor, the community suffers.
If FOAF is to support an online community, it is essential that the provenance of every piece of information be traceable. If I claim my name is "Edd Dumbill," and that claim is attributed to me, it might prove a more trustworthy fact than if somebody else asserted it. (In other circumstances, you might not consider me a trustworthy source of information about myself. Being able to trace who said what is a necessary condition for building a system of trust, but this is not generally sufficient by itself.)
As all the FOAF files in the examples are retrieved through the Web, it makes sense to annotate all the statements made in a file with that file's source URL. Once you have that, you can then annotate the URL with any other useful information that you find out about it (for example, its author).
To a large extent, the actual implementation of provenance tracking depends on the underlying toolkit that you use to process RDF. In my experiments, I've used the Redland toolkit (see Resources), which doesn't have native provenance tracking capabilities (although they are under development). I followed a suggestion of developerWorks RDF expert Uche Ogbuji for layering provenance tracking on top of Redland's RDF store. (If you're not interested in the implementation details, you can skip to Attributing ownership.)
In this mechanism, I rewrite incoming RDF statements to add their source. For instance, the FOAF file at http://example.org/edd/foaf.rdf might generate the statement shown in Table 1.
Table 1. Example statement
If I use the invented properties, http://example.org/prov/pred and http://example.org/prov/source, to denote the original predicate and source URLs respectively, I can rewrite this statement as shown in Table 2.
Table 2. Statement rewritten with provenance tracking
You can use this mechanism to track every single statement to a source. Note also that this radically increases the storage and computation overheads associated with aggregating all the data: Every single statement turns into three statements, and every query on a predicate must go through a layer of indirection. For this reason, native toolkit support for provenance tracking is preferable.
Even with source tracking in place, I still can't answer the "who said what" question properly. For that, I need to associate an author with each document. Using RDF and the Dublin Core vocabulary, I can do this by adding the code from Listing 4 into my file at http://example.org/edd/foaf.rdf. Note that
rdf:about="" means this document.
Listing 4. Using Dublin Core to express authorship of a document
<rdf:Description rdf:about=""> <dc:creator> <foaf:Person> <foaf:mbox rdf:resource="mailto:firstname.lastname@example.org" /> </foaf:Person> </dc:creator> </rdf:Description>
The difficulty with this method is that it guarantees nothing. Since RDF is a decentralized description framework, another person could make a conflicting statement of authorship about http://example.org/edd/foaf.rdf that would be just as credible. One solution might be to believe only those authorship claims that are contained in the document in question. This is limiting as it requires the alteration of a document about which the claim is being made -- often undesirable, and if the document format isn't RDF, impossible.
Instead, FOAF tools use digital signatures to associate an e-mail address with a document. Specifically, OpenPGP is used, along with a new namespace, http://xmlns.com/wot/0.1/, to denote web of trust concepts. Listing 5 shows how a digital signature can be associated with a document. In particular, it illustrates the association of a signature with the document itself in the first description, and then as part of a
seeAlso link that specifies a signature for the linked document. Thus, authorship information can be expressed inside or outside of the documents concerned.
Listing 5: Associating digital signatures with FOAF documents
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:wot="http://xmlns.com/wot/0.1/"> <rdf:Description rdf:about=""> <wot:assurance rdf:resource="foaf.rdf.asc" /> </rdf:Description> <foaf:Person> <foaf:name>Edd Dumbill</foaf:name> <foaf:mbox rdf:resource="mailto:email@example.com" /> <!-- personal, PGP signed, details here --> <rdfs:seeAlso> <rdf:Description rdf:about="http://example.org/edd/personal.rdf"> <wot:assurance rdf:resource="http://example.org/edd/personal.rdf.asc" /> </rdf:Description> </rdfs:seeAlso> </rdf:Person> </rdf:RDF>
The actual meaning of the digital signature should be understood as: "a person who claims to have such and such e-mail address certified this file, and it hasn't changed since then." In my experiments I have assumed that certification carries the semantics of the signer endorsing the statements in the contained FOAF file. If the signature in foaf.rdf.asc as referenced in Listing 5 checks out okay, a processing tool can implicitly add the sort of description contained in Listing 4.
The question of whether a digital signature really belongs to a particular person is conveniently handed off to the mechanisms that PGP itself uses for establishing trust. Discussing PGP's web of trust is outside of the scope of this article, but you can find some references in Resources.
At this point, let's see the discussed mechanisms in action. While investigating FOAF, I built a software agent, FOAFBot, which aggregates and answers queries about FOAF information. The code is now publicly available (see Resources).
Listing 6 shows a sample session that includes the provenance tracking.
Listing 6: Example FOAFBot session
<edd> firstname.lastname@example.org's name <foafbot> email@example.com's name is 'Edd Dumbill', according to Dave Beckett, Edd Dumbill, Jo Walsh, Kip Hampton, Matt Biddulph, Dan Brickley; and anonymous sources Anon47 <edd> edd's email <foafbot> edd's email is '[mailto:firstname.lastname@example.org]', according to Dave Beckett, Edd Dumbill, Kip Hampton, Matt Biddulph; and anonymous sources Anon14, Anon2, Anon43, Anon47, Anon92, Anon93 <foafbot> edd's email is '[mailto:email@example.com]', according to Edd Dumbill; and anonymous sources Anon21 <foafbot> edd's email is '[mailto:firstname.lastname@example.org]', according to Edd Dumbill, Jo Walsh, Dan Brickley
In the first query, you see a list naming all the people who have made the statement that my name is "Edd Dumbill." You also see that one source hasn't digitally signed its information, and is therefore denoted as
Anon47. The second query shows multiple statements made about my e-mail address, and who made them.
The matter of whom to trust is thus left to the user's own discretion. A great deal of research about automating trust ratings exists in other places: FOAFBot doesn't attempt to solve this problem.
The Web is a very public place, and for many good reasons, you wouldn't want to make information available to everybody. For instance, I'm happy to make my cellphone number available to my friends, but not keen that telemarketers should get hold of it. Many people with home pages solve this problem by giving out passwords to their friends. In many cases, however, people opt not to publish the information at all. Google and other Web spiders are incredibly effective at finding things that you forgot are publicly accessible!
The same principles apply to FOAF. Although I'm happy for everybody's FOAF spider to pick up on my name and homepage -- very public information -- I'd like to restrict certain information to particular communities. I can achieve this by extending the use of OpenPGP and
With OpenPGP, I can encrypt content using a public key. The content can only be decrypted by an agent with the corresponding private key. As an example, I created a public key for the FOAFBot I'm running on the #foaf discussion channel on the OpenProjects IRC network, and decided to make only certain information available to members of that channel. I put that information in the foaf-private.rdf file, signed and encrypted it to the relevant public key (generating the foaf-private.rdf.asc file), and then linked from my FOAF file as shown in Listing 7. Then, only the ASCII-encoded encrypted file was published, and the unencrypted source to the private data remains on my private machine.
Listing 7: Fragment showing linking of an encrypted FOAF file
<!-- private info for authorized agents only --> <rdfs:seeAlso> <foaf:Document rdf:about="http://example.org/edd/foaf-private.rdf.asc"> <!-- encrypted for the #foaf community --> <wot:encryptedTo> <wot:PubKey wot:hex_id="6C7F734E" /> </wot:encryptedTo> </foaf:Document> </rdfs:seeAlso>
Along with the
seeAlso usage, further concepts from the web of trust namespace are used in Listing 7 to denote the public key used. Spidering agents are notified that certain FOAF files aren't encrypted for them, and the agents can then ignore those files.
When you publish anything with your e-mail address in it, one major concern is unsolicited commercial e-mail, or spam. While increasingly effective measures for detecting and junking spam are available, many people attempt to avoid receiving spam by not publishing their e-mail addresses on the Web.
This would appear to present a difficulty for the FOAF vocabulary. If you recall from the previous article on this topic, the e-mail address is assumed to be an unambiguous property -- used to merge descriptions from diverse sources. Without it, FOAF aggregation just doesn't work. Fortunately, FOAF's creators have come up with an alternative, the
mbox_sha1sum property. This property contains the ASCII-encoded SHA1 hash of a mailbox URI; this encoding is a one-way mapping and cannot be trivially reverse-engineered to give the original e-mail address. Listing 8 shows how you can use this instead of the
mbox property in a FOAF file.
Listing 8: Using mbox_sha1sum to protect e-mail addresses from harvesting
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <foaf:Person> <foaf:name>Edd Dumbill</foaf:name> <foaf:mbox_sha1sum>fd991c34194a4428e70cdcc8068deaa6f35464fe</foaf:mbox_sha1sum> </foaf:Person> </rdf:RDF>
For merging FOAF descriptions to continue to work, processing agents must instead merge on the
mbox_sha1sum property, and dynamically compute sums for cases where they only know the
mbox property. In your descriptions of other people, it is probably courteous for you to reference their
mbox_sha1sum property instead of their e-mail address. If one of those people wishes to make her e-mail address available, she can reference it in her own FOAF files, and perhaps encrypt it as described above.
The FOAF vocabulary provides a useful means of managing information within communities. Information about other people is often the most interesting sort of data, and FOAF fulfills the need for decentralized, machine-readable, personal descriptions. But beyond its own application sphere, FOAF provides a useful testbed for exploring concepts on which the semantic Web is being built -- notions of linking, trust, and provenance.
Finding friends with XML and RDF, the author's previous column
covering FOAF (developerWorks, June 2002). You can access all of Edd Dumbill's XML Watchcolumns .
- Explore FOAFBot, an
experimental IRC community support agent.
- Find helpful FOAF hints and tips,
on the author's FOAF page.
- Take a look at the FOAF home page, part of the RDFWeb site.
- Try The GNU Privacy Guard, a
popular and free implementation of OpenPGP.
- Create your own FOAF file quickly with Leigh Dodds' FOAF-a-matic
- Check out Dan Brickley's Ruby program Scutter which follows
seeAlsolinks and aggregates FOAF and RDF content.
- Take a closer look at the Redland toolkit.
Read fellow IBM developerWorks columnist Uche Ogbuji's other articles on RDF:
- Introduction to RDF (developerWorks, December 2000)
- Thinking XML: Basic XML and RDF techniques for knowledge management, Part 1: Generate RDF using XSLT (developerWorks, July 2001)
- Thinking XML: Basic XML and RDF techniques for knowledge management, Part 2: Combining files into an RDF model, and basic RDF querying (developerWorks, September 2001)
- Thinking XML: Basic XML and RDF techniques for knowledge management, Part 3: Knowledge from semantics (developerWorks, November 2001)
- Thinking XML: Basic XML and RDF techniques for knowledge management, Part 4: Issue tracker schema (developerWorks, February 2002)
- Thinking XML: Basic XML and RDF techniques for knowledge management, Part 5: Defining RDF and DAML+OIL schemata (developerWorks, March 2002)
- Find more XML resources on the developerWorks
XML technology zone.
- Rational Application Developer for WebSphere Software helps Java™ developers rapidly design, develop, assemble, test, profile and deploy high quality Java/J2EE, Portal, Web, Web services and SOA applications.
- Find out how you can become an IBM Certified Developer in XML and related technologies.
Edd Dumbill is managing editor of XML.com and the editor and publisher of the XML developer news site XMLhack. He is co-author of O'Reilly's Programming Web Services with XML-RPC, and co-founder and adviser to the Pharmalicensing life sciences intellectual property exchange. Edd is also program chair of the XML Europe conference. You can contact Edd at email@example.com.