In an earlier article, I used the heading "Repeat after me: There is no syntax". RDF's traditional XML syntax is often maligned, but luckily it is not what makes RDF tick, and the emergence of alternative serializations has always been inevitable. One problem with XML as a serialization syntax is that it is so flexible that it can be difficult to compare desired versus actual results in the process of automated testing. Whether in regression testing or conformance testing, it is often useful to try to normalize XML to some form so that simple text comparisons give meaningful results. The XML community developed XML canonical form for such purposes, and the W3C RDF working group required the same sort of form for RDF while it was developing RDF conformance test suites.
One option is to define a canonical form of RDF/XML that matches any graphs, and then canonicalize the resulting XML according to the relevant W3C recommendation. Instead, I think the RDF working group chose the right course in developing a simple and strictly-defined textual format for RDF graphs. This format is named N-Triples, and is incorporated into the RDF Test Cases working draft (see Resources). In this article I introduce N-Triples, using examples converted from RDF/XML. You should be familiar with XML and RDF.
I'll start with a simple example of N-Triples. Listing 1 is RDF/XML taken from my earlier article on PRISM.
Listing 1. Thinking XML column 12 described formally in RDF/XML (basic PRISM vocabulary)
<?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xml:lang="en" > <rdf:Description rdf:about="http://www.ibm.com/developerworks/xml/library/x-think12.html"> <dc:description> A discussion of the broader context and relevance of XML/RDF techniques. </dc:description> <dc:title> Basic XML and RDF techniques for knowledge management, Part 7 </dc:title> <dc:publisher>IBM developerWorks</dc:publisher> <dc:creator>Uche Ogbuji</dc:creator> <dc:subject>XML</dc:subject> <dc:subject>RDF</dc:subject> <dc:format>text/html</dc:format> </rdf:Description> </rdf:RDF>
Listing 2 shows an N-Triples equivalent to Listing 1.
I would describe N-Triples as "verbose but explicit." As you can see, there are no abbreviations -- not even namespaces. All the URIs are fully spelled out. This is ideal for testing and the like because it introduces no confusion over what the corresponding RDF model is.
N-Triples is a line-oriented format. Each triple must be written on a separate line, and consists of a subject specifier, a predicate specifier, then an object specifier, followed by a period. One or more spaces or tabs separate subject from predicate, and predicate from object. Resources are specified in one of two forms. If they have a URI, they must be presented in the form you see in Listing 1: the absolute URI reference enclosed in angle brackets. Relative URI references such as
<local/file.ext> are not allowed.
Of course in RDF all subjects and predicates are URIs, but objects can be URIs or literals. All literals are presented as strings in quotes, although N-Triples does support language specifiers and data typing, as I discuss later on in The details, literally.
As I mentioned, there are two forms for expressing resources in N-Triples. I've already discussed the form for resources with URIs. N-Triples also has a convention for expressing anonymous nodes (also known as blank nodes). Listing 3 is a simple RDF/XML example containing a couple of blank nodes:
Listing 3. Simple RDF/XML example with blank nodes
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description> <dc:title>Unwritten work</dc:title> <dc:creator rdf:parseType="Resource"> <dc:title>The League of Procrastinators</dc:title> </dc:creator> <dc:contributor rdf:resource="http://put-off.org"/> </rdf:Description> </rdf:RDF>
Figure 1. A model diagram of listing 3
As you can see, two of the ovals have no labels. These are blank nodes. They do have identity, but that identity is not given by a URI. Blank nodes are often used when there is really no URI appropriate to associate with the resource, as in the example in Listing 3 and Figure 1, where a work is being described that has not yet been written.
Listing 4. N-Triples equivalent to Listing 3
_:blank1 <http://purl.org/dc/elements/1.1/title> "Unwritten work" . _:blank2 <http://purl.org/dc/elements/1.1/title> "The League of Procrastinators" . _:blank1 <http://purl.org/dc/elements/1.1/creator> _:blank2 . _:blank1 <http://purl.org/dc/elements/1.1/contributor> <http://put-off.org> .
Blank nodes are in the form
_:name, where name is an identifier for that blank node within a given set of N-Triples. The
_:name identifiers maintain the identity of the nodes, even though they don't have any corresponding identifiers in the RDF model. RDF/XML has recently added a similar facility for you to use
rdf:nodeID in an
rdf:Description or typed node start tag. Listing 5 is equivalent to Listing 3, but the same local node IDs are used as in Listing 4.
Listing 5. Simple RDF/XML example with blank nodes
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:nodeID="blank1"> <dc:title>Unwritten work</dc:title> <dc:creator rdf:parseType="Resource" rdf:nodeID="blank2"> <dc:title>The League of Procrastinators</dc:title> </dc:creator> <dc:contributor rdf:resource="http://put-off.org"/> </rdf:Description> </rdf:RDF>
Again, it is very important to note that these local IDs for blank nodes are purely a convention within a particular RDF/XML or N-Triples file. Just because listings 4 and 5 both use the node ID "blank1" does not mean the corresponding blank nodes have the same identity. This can be a bit confusing, but is an essential property of blank nodes.
RDF has always allowed users to specify the language used to express the values of properties. Listing 6 shows an example in RDF/XML of an anonymous resource with a property given in both English and Spanish.
Listing 6. An RDF description using language meta-properties.
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description> <dc:title xml:lang="es">A lo cubano</dc:title> <dc:title xml:lang="en">Cuban style</dc:title> <dc:creator>Orishas</dc:creator> </rdf:Description> </rdf:RDF>
dc:title property is given in two different languages. The language specifier is not a property of the statement as a whole (which is why internationalization does not turn RDF into a system of quads rather than triples). The language is instead a fundamental property of the literal itself. N-Triples shows this in its notation for languages, as you can see in Listing 7, which is a conversion of Listing 6 to N-Triples.
Listing 7. N-Triples equivalent to Listing 6
_:blank1 <http://purl.org/dc/elements/1.1/title> "A lo cubano"@es . _:blank1 <http://purl.org/dc/elements/1.1/title> "Cuban style"@en . _:blank1 <http://purl.org/dc/elements/1.1/creator> "Orishas" .
@ is tacked on to the representation of the literal value. It is followed by a language code as defined in RFC 3066; this is the primary designation for a language ("en" for "English", "es" for "Spanish", and so forth.). It can also designate a language variant; for example, "en-US" for American English, "en-GB" for British English, or "es-MX" for Mexican Spanish.
Another property literals can have -- and one introduced more recently in RDF -- is a data type. RDF literals can be given data types such as "integer", "string", "date", or even "morse code". The data type is designated as a URI, and you can use the common data types from the W3C XML Schema (WXS) language using URLs based on the WXS namespace, which is commonly mapped to the prefix
xsd. One of the N-Triples in Listing 8 includes a data type designation.
Listing 8. A triple whose object includes a data type designation
#This is a comment in N-Triples #It must appear by itself on a separate line #The object of the following triple is of type xsd:int http://example.com/employees/jdoe http://example.com/employee-id "23"^^http://www.w3.org/2001/XMLSchema#int
^^ marker is followed by a URI specifying the data type, which may be based on a standard (as in this case) or could be a local convention. The important thing to remember is that even though the object here is expressed in quotes, it is actually interpreted as a WXS integer by any data-types-aware system. Listing 7 also shows how you can embed comments in N-Triples. Be careful: I have seen many N-Triples examples with comments on the same line as a triple, after the closing period. The current N-Triples grammar does not support this usage.
That's all there is to the structure of N-Triples. I did not cover a few nuances; for example, a very strict set of characters is allowed in the syntax, and you must be careful to escape any characters outside these ranges. Some characters (in URI references) must be escaped using URI conventions, and others use an N-Triples convention with a leading backslash. If you are writing code to read or write N-Triples, be sure to see the specification for these details.
One of several efforts aimed at a simple triples-based representation for RDF includes N3 (see Resources), which is pretty popular and is the source of some of the ideas in N-Triples. But N-Triples has the advantage of being written into a formal specification, and because of its use in the standard RDF test cases, will probably be implemented by all RDF processors.
- Participate in the discussion forum.
- Find the formal specification of N-Triples in the relevant section of the W3C RDF Test Cases specification.
- Keep abreast of recent happenings in the RDF space on the RDF Core Working group home page, and on Dave Beckett's excellent RDF Resource Guide.
- Tim Berners-Lee's Notation 3 (N3) is another very popular representation for RDF. But it is more than just a representation -- it also includes mathematical and logic primitives.
- I used 4Suite to perform the conversions to and from N-Triples format in this article.
- RFC 3066 defines the permissible values for language entries in
xml:langtags, used in XML (see "Localization within a document format" developerWorks, September 2002) or RDF. RFC 3066 leaves the actual lists of language codes to ISO 639: Code for the representation of names of languages.
- Check out Thinking XML's previous columns.
- Find more XML resources on the developerWorks XML zone.
- IBM WebSphere Studio provides a suite of tools that automate XML development, both in Java and in other languages. It is closely integrated with the WebSphere Application Server, but can also be used with other J2EE servers.
- Find out how you can become an IBM Certified Developer in XML and related technologies.
Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at email@example.com.