06 December 2011 - Added resource item for "Toward a Basic Profile for Linked Data" article in Resources.
RDF is a model for describing objects. These objects can range from physical objects such as planets, people, or countries, to virtual objects, such as blog posts or wiki pages, to abstract objects, such as the definition of a document or a chat message. An ontology is an object definition system, which you can implement using RDF modeling.
This article details some of the recent updates to the RDF concept specification and some of the latest ontologies. The article is quite technical in detail. If you want a basic guide to RDF, I recommend that you read "An introduction to RDF" by Uche Ogbuji on IBM developerWorks or the World Wide Web Consortium (W3C) RDF Primer (see Resources).
One recent conceptual update related to RDF is the concept of Linked Data. There has been a lot of misunderstanding, particularly within the Linked Data and Semantic Web communities, about what Linked Data is. I'll try to explain it briefly for all people who have some knowledge of web technologies. Linked Data is relatively easy to grasp because it is such a simple concept with a rich historic tapestry spanning thousands of years, and it essentially hooks on to the natural human ability to observe, name, investigate, and describe objects in the real world.
To understand Linked Data objects, you must first understand data objects and distributed data objects. Data objects are named (that is, labeled) descriptions (that is, encapsulations) of observations (that is, objects). Distributed data objects are data objects for which identifiers act as a key (or a handle) for access over a network—so it is roughly like calling someone or something by name, or, in more technical speak, using a primary key (in database terminology) over a network.
Linked Data objects are distributed data objects that have a uniform style of identifier, by that I mean that a Linked Data object identifier takes the same shape and style as every other Linked Data object identifier. A good example is the use of a Hypertext Transfer Protocol (HTTP) Internationalized Resource Identifier (IRI) as a uniform style of identifier. Take the Doctor Who television program for example, which has the following identifier: http://www.bbc.co.uk/programmes/b006q2x0. This is a Linked Data identifier for Doctor Who. You can dereference it using HTTP, meaning that you use it as a locator to retrieve details about that website over HTTP. You can also use it as a reference in your own data store. For example, suppose you write some personal notes about the Doctor Who television series, and you want to store those notes for later use. You, quite literally, append those notes to the identifier http://www.bbc.co.uk/programmes/b006q2x0 in your personal data store. Note that this is in your data store and not on the store at the British Broadcasting Corporation (BBC). Your notes are your data, whereas the program description is hosted by the BBC.
A common misunderstanding about Linked Data is that it requires data in one of the RDF formats (for example, Turtle, RDF/XML, RDFa, or N3). This assumption is certainly incorrect. All that is required is some data in an accessible format available through a dereferenceable Uniform Resource Identifier (URI) or IRI, that is to say an accessible object identifier in the W3C or Internet Engineering Task Force (IETF) network resource identifier format. That aside, it is possible to describe Linked Data objects using RDF, but you must keep in mind that RDF is not a necessity for Linked Data.
Many ontologies exist for RDF. They are usually defined using the Web Ontology Language (OWL) or, in a simpler fashion, using the RDF Schema (RDFS) system. Ontologies exist for defining many types of things, for instance:
- What a blog post is, using the Semantically-Interlinked Online Communities Project (SIOC) ontology.
- How tags can be related to each other, using the Meaning Of A Tag (MOAT) and Simple Knowledge Organization System (SKOS) ontologies.
- What properties are associated with a user profile, using the Friend of a Friend (FOAF) ontology.
How to develop an ontology is beyond the scope of this article. If you want to learn more about ontologies, there is a good tutorial about creating an ontology using OWL on IBM developerWorks by Michel Mitri and Nicholas Chase. See Resources for links to this and other articles and tutorials.
The idea of the RDF concepts document is to specify the rules of RDF modeling, that is, the complexities of the logic behind the rather simple subject-predicate-object triple structure and the more recent graph-subject-predicate-object quad structure that many contemporary data stores now provide.
The RDF Concepts and Abstract Syntax 1.0 recommendation of 2004 and the working draft of 2011 have many differences. Most important are the separation of RDF from RDF/XML, the explicit use of best current practice (BCP) 47 for language codes, the use of Unicode-friendly IRIs over ASCII-based URIs, the favoring of Skolem IRIs over unnamed blank nodes, and the removal of Extensible Markup Language (XML) literal type from the RDF concepts. Aside from that, the majority of changes are based on specification cleanup and semantic and pragmatic flow of the language used. A more extensive list of changes made to the 2004 version is in the working draft of 2011. The implications of these changes are clearly more interesting than the changes themselves, and a description follows.
Please note that, at the time of writing this article, the RDF Concepts 1.1 specification is in working draft status, which means that some of the concepts might not be carried through to the final published specification. Regardless, it does show contemporary thought and best practice. You must also consider that this is an RDF concepts document, and, therefore, it is about concepts rather than any specific format. The RDF Working Group at the W3C is due to complete their projects in upgrading RDF Concepts, RDFS, Turtle, RDF/XML, and RDF/JSON by February 2013.
Based on my summary of the changes between RDF concepts, you can see that the developers within the Semantic Web community use RDF as a modeling framework (as it should be used) and separating it from its formats (whether XML, Turtle, and so on). This change should allow the future development and use of other formats for the RDF model. This change also means that developers can use whichever format they feel comfortable with or is most suitable or viable for the particular software or web application being developed. It also allows third-party services or libraries to convert from one format to another with no logical issues.
The use of an international-friendly character set has always been at the heart of the W3C, an organization which, after all, has "World Wide" in its name. Therefore, the continuing use of Unicode throughout RDF's frameworks and formats is essential. As a result, you see an upgrade from Unicode v3 to Unicode v4 in the specification and the desire to use the Unicode-based IRIs over the ASCII-based URIs.
Blank nodes are always a tricky concept within Semantic Web theory, and their trickiness has led to each data store implementing them differently, often in a non-uniform manner. Skolemization comes from formal logic theory, and Skolem IRIs have been suggested as a uniform globally unique IRI generation system. There is still much debate about this, and some people are particularly worried about using the name Skolem due to its overtly mathematical nature and possible conceptual baggage. The most important aspect of this section is the avoidance of using blank nodes in the creation, manipulation, and rendering of data.
The final important implication of the working draft is the acknowledgment that literal values (that is, plain-text values) in RDF are somewhat bound to the XML literal data type. The idea is to completely separate RDF from its actual and symbolic dependence on XML serialization, and therefore it is necessary to rid RDF from the XML literal data type. The RDF Working Group hasn't completed the details of what an RDF plain literal will look like. Literals in RDF 1.1, however, will probably not look that different than literals in RDF 1.0, aside from the fact that developers must use a BCP 47 language tag when defining a language for a literal (for example,
for general English or
for Mandarin Chinese in the traditional script).
The second part of this article is a brief survey of some recently developed but important ontologies. They have the potential to be game changers in the future Linked Data web. I have separated them into data catalog ontologies and RDF database mapping systems.
As more organizations develop and use their own ontologies, and often release them to the public, there is more demand to list those datasets and to link between those datasets declaring what is the same in terms of class definitions and property definitions.
The Vocabulary of Interlinked Datasets (VoID) is an ontology developed using RDFS. It describes datasets or, rather, allows datasets to provide metadata on their datasets. It is separated into four areas:
- General metadata (through the Dublin Core)
- Access metadata (that is, how to connect to the dataset and access particular data)
- Structural metadata (that is, details of the schemas used, how to query the dataset, and examples of how to integrate with the dataset)
- A description of links between datasets, which essentially allows the creation of a label between linksets (in essence, distributed graphs)
The Data Catalog Vocabulary (dcat) is a simple ontology, developed using RDFS, that you can use for describing a collection of datasets. The purpose of dcat is catalog interoperability. It traditionally describes governmental datasets. Because governmental data tends to be in various formats, it is often awkward to query that data. Therefore, dcat provides metadata about the contained data, its available formats, and various other conceptual and organizational metadata.
Relational databases are traditionally used in software and web application systems. It is often quite difficult to move away from the relational structure of the relational database management system (RDBMS) to the object-oriented structure that RDF points to.
RDB to RDF Mapping Language (R2RML) is a language for providing mappings between a relational database and an RDF dataset. The R2RML specification uses the RDF/Turtle serialization to provide these mappings. The idea is to make the mapping as direct and as simple as possible. The power of this form of mapping can, for example, allow a user to use SPARQL to query a relational database—something that's traditionally possible only using Structured Query Language (SQL). It can also allow row-level linking from and to relational databases across the Linked Data web. This concept is incredibly powerful because it eliminates a manually created middle layer and essentially exposes relational data as RDF on the fly. You can easily switch all closed relational databases into web-accessible mode thanks to R2RML.
Relational database, or RDB, direct mapping is essentially the default behavior of R2RML. The idea is to provide a simple transformation mechanism from a relational database table to a collection of RDF data. This approach allows data injection using RDF to a relational database table and retrieval from a relational database table into RDF. RDB direct mapping provides simple direct graphs for each table, whereas the more robust R2RML system provides the capability to map across tables into more advanced RDF models.
In conclusion, RDF has reached maturity. The RDF 1.0 concepts specification was released in 2004, but RDF itself has been in the engine of the W3C since 1999. The concepts of RDF are crucial to Linked Data theory, which has its history in formal logic and the natural process of naming things. The RDF 1.1 concepts specification contains many welcome updates, the most important being the decoupling of RDF/XML from RDF. Many other updates in the current draft will be useful for integration and semantic improvements and, no doubt, additional concepts will be inserted into the 1.1 specification.
RDF concepts are not the only developments to happen within the RDF-related world. Many exciting ontologies have also been released, including VoID and dcat for ontology cataloguing plus R2RML and RDF direct mapping for database manipulation. As more ontologies reach maturity, both commercial web and open source applications implement them.
Exciting times are ahead for the Linked Data web. Some welcome changes are the separation of web-based Linked Data from the Semantic Web, the separation of RDF from RDF/XML, and the increased overall understanding of distributed data and internationalized data. The concepts, frameworks, ontologies, and technologies are in place to handle it. Now it's time to put them into practice.
- An introduction to RDF (Uche Ogbuji, developerWorks, December 2000): Explore this good, basic introduction to RDF and its RDF/XML syntax in particular.
- RDF Primer (W3C, February 2004): Find the basics of describing objects in RDF. Be aware that the examples are in RDF/XML.
- RDF Concepts and Abstract Syntax (W3C, February 2004): Learn about the conceptual aspects of RDF. I use this document for the comparison between the 1.0/2004 and 1.1/2011 versions of RDF concepts in this article.
- RDF 1.1 Concepts and Abstract Syntax (W3C, August 2011): Read about the conceptual aspects of RDF. I use this document for the comparison between 1.1/2011 and 1.0/2004 versions of RDF concepts in this article.
- DCAT Vocabulary Overview (DERI, March 2010): Review the conceptual aspects of the Data Catalog Vocabulary.
- VoID Vocabulary Overview (DERI, March 2010): Learn about the conceptual aspects of the Vocabulary of Interlinked Datasets.
- R2RML: RDB to RDF Mapping Language (W3C, September 2011): Explore the concepts and syntax for mapping relational databases to RDF data structures.
- A Direct Mapping of Relational Data to RDF (W3C, September 2011): Read about the concepts and syntax for mapping relational database tables directly to RDF data structures.
- FOAF Project website: Find further details on the Friend of a Friend ontology that is commonly used to describe user profiles and friendships.
- The SKOS home page: Check out a portal to further details on the SKOS ontology, which is commonly used to quickly and easily describe knowledge concepts.
- MOAT website: Get further details on this ontology, which is used to quickly and easily describe relationships between tags on online tagging systems. There is even a Drupal plugin that implements MOAT in the Drupal Taxonomy System, allowing more meaningful or semantic tagging.
- SIOC website: Check for further details on this ontology, which is commonly used to describe relationships between social networking systems.
- The Dublin Core Metadata Initiative: Explore this long-time group for metadata design and best practices. It originally assisted in adding metadata to Hypertext Markup Language (HTML) but now provides an ontology for RDF. The Dublin Core Metadata describes basic authorship and contextual metadata on online documents.
- RDF Vocabulary Description Language 1.0: RDF Schema (W3C, February 2004): Read the specification on the RDFS, which is a basic ontology/schema system for describing data types (that is, classes) in RDF.
- OWL (W3C, October 2009): See the specification for a robust ontology/schema system for describing data types (that is, classes) in logically-sound RDF.
- The ultimate mashup -- Web services and the semantic Web, Part 4: Create an ontology (Michel Mitri and Nicholas Chase, developerWorks, March 2007): Learn to create an ontology for use in RDF.
- The BCP 47 Specification (September 2009): Read and learn more about the structure, content, construction, and semantics of language tags.
- Toward a Basic Profile for Linked Data: A collection of best practices and a simple approach for a Linked Data architecture (Martin Nally and Steve Speicher, developerWorks, December 2011): Look through this collection of best practices and simple approaches for a Linked Data architecture. Although there is interest in using Linked Data for inferring new information from existing information, there is little guidance. This article gives background information on this interest and then provides a proposal for a Basic Profile for Linked Data.
- More articles by this author (Daniel J. Lewis, developerWorks, October 2008-current): Read articles about Semantic Web, Linked Data, PHP, CSS, ODBC, and other technologies.
- New to XML? Get the resources you need to learn XML.
- XML area on developerWorks: Find the resources you need to advance your skills in the XML arena, including DTDs, schemas, and XSLT. See the XML technical library for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.
- developerWorks on Twitter: Join today to follow developerWorks tweets.
- developerWorks podcasts: Listen to interesting interviews and discussions for software developers.
- developerWorks on-demand demos: Watch demos ranging from product installation and setup for beginners to advanced functionality for experienced developers.
Get products and technologies
- IBM product evaluation versions: Download or explore the online trials in the IBM SOA Sandbox and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
- developerWorks profile: Create your profile today and set up a watchlist.
- XML zone discussion forums: Participate in any of several XML-related discussions.
- The developerWorks community: Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.
Daniel Lewis is a British freelance computer scientist, web developer, and writer working under the business name of Vanir Systems. His academic background is in intelligent systems, artificial intelligence, and data mining. His professional background is in web development, the Semantic Web, Linked Data, database architecture, and technology evangelism. He is well-versed in RDF, XML, and HTML technologies in addition to various other languages such as PHP, Ruby, Python, Java, SQL, and SPARQL. He practices his skills using the Virtuoso Universal Server by OpenLink Software.