Web authors began to realize they needed more than the presentational flourishes. They needed to define to a search engine, or other such software, what the structured data really was. For example, this is a price, this is an event's date, and the other item is a person's contact information. There was a need for a "web of data," an idea that, for a long time, had been the main concern of those interested in Resource Description Framework (RDF).
The idea behind RDF is very simple, but unfortunately much of how it is defined, discussed, and processed is far too complex. Regarding web technology, a common pragmatic tip is: "Make sure a regular web author can learn it in a day." It has taken many years, but there's finally a flavor of RDF that passes this test: RDFa 1.1 Lite.
RDFa Lite is a simplified version of RDF annotations (RDFa). RDFa is a mechanism for encoding the full RDF data model within HTML and similar vocabularies. RDFa is a bit too complicated to meet the "learn it in a day" threshold, so the WHAT WG (the group that spawned HTML 5) created an HTML Microdata specification. Microdata became the kernel of Schema.org (see Resources), an initiative to codify ways to mark up data on the web. Microdata has since become a W3C Working Draft (see Resources), but advocates of RDF and RDFa saw an opportunity to promote RDFa if they had a subset at least as simple as Microdata. Thus, you have the benefits of a long-settled data model, but in a simple syntax.
Last year, Schema.org founders conducted a workshop for community experts in structured data on the web. I attended the workshop, where Ben Adida, editor of the RDFa specification, showed us the work the RDF Web Applications Working Group had done on RDFa Lite. The very positive feedback to this draft specification inspired the W3C to redouble those efforts. RDFa 1.1 Lite is a mercifully short working draft from the W3C (and may already be a full W3C recommendation by the time you read this).
In this article, learn about RDFa 1.1 Lite, and get a jump-start on producing HTML pages that participate in the web of data. This article assumes that you are already familiar with HTML.
RDFa involves marking up the structured data within a website. You don't simply focus on how the content should look to the user. You can mark a date as a date, a person's name as a person's name, an event as an event, an organization as an organization, and so on. RDFa Lite reduces that lofty ambition to the simplest thing that could possibly work: the addition of just five attributes to HTML or XHTML. A machine can easily interpret these attributes to extract useful data from the page. That's it, in a nutshell.
Listing 1 is an example of plain HTML for an online article.
Listing 1. Plain HTML for an online article
<body> <a href="http://www.ibm.com/developerworks/">IBM developerWorks</a> > <a href="http://www.ibm.com/developerworks/web/">Web development</a> > <a href="http://www.ibm.com/developerworks/views/web/library.jsp">Technical library</a> <div> An introduction to RDF by <a href="http://uche.ogbuji.net">Uche Ogbuji</a>, Partner, Zepheira. </div> <div>Published: 01 Dec 2000</div> <div> <b>Summary</b>: This article introduces Resource Description Framework (RDF),
RDF, which was developed by the W3C for web-based metadata, uses XML as an interchange syntax. RDF's essential goal is to make work easier for autonomous agents, which would refine the web by improving search engines and service directories. See "An introduction to RDF" (developerWorks, December 2000) for more information about RDF.
In the remainder of this article, we'll look at annotations of the HTML in
Listing 2 with RDFa
Lite to demonstrate all five attributes from the spec:
Listing 2. HTML for RDFa Lite demonstration
<div>Tags for this article: introduction, rdf, tutorial.</div> <div> This article's text is suitable for a wide audience, with a Fog index of 10.2. </div> </body>
Listing 3 shows the change
body element to wrap the whole article
Listing 3. Using the
<body vocab="http://schema.org/"> ... </body>
vocab article sets the stage in RDFa Lite.
It designates the vocabulary that will be used, by default, in the
annotations. The example uses a vocabulary defined in Schema.org, so we
pick a wrapper element to mark it as such. You don't have to use the
body element, but you should use an element
that wraps all the annotations.
Listing 4 shows the outline
div to enclose the one article being
Listing 4. Using the
<div typeof="Article"> .... </div>
typeof attribute marks its element as a
description or representation of an instance of the given class. This is
put into context by the in-scope
http://schema.org, making it
clear whose definition of an "article" we're dealing with.
Article is one of the classes defined in
Schema.org. Each class definition is documented, including conventions for
their use and associated properties.
Listing 5 shows the fragment that marks the article title.
Listing 5. Using the
<div property="name">An introduction to RDF</div>
property attribute provides a property of
the entity that was just declared as a Schema.org article. The
name property is defined to give the title of
Listing 6 shows the first three RDFa Lite attributes.
Listing 6. Three RDFa Lite attributes
<body vocab="http://schema.org/"> ... <div typeof="Article"> <div property="name">An introduction to RDF</div> ...More information regarding this article, or even the content itself... </div> ... </body>
Listing 6 is the RDFa way of saying the following so that a machine can process it:
This HTML document represents an article as defined by Schema.org, and the title of the article is "An introduction to RDF."
The HTML specification already provides a machine-processable place to put
an article title—in the
Schema.org's vocabulary, however, lets you express far more about an
article than just a title, including constructs that are not in HTML (46
properties, in fact, at the time of writing). Even in the case of
properties such as title, what HTML provides has not always been
sufficient. What if you have a page comprising an index of articles or a
landing page from a search? You would be describing multiple actual
article entities within one HTML page. HTML has no provisions for this
case, but with Schema.org you could have multiple
div elements—one for each referenced
If you can describe multiple entities on a page, then it becomes important
to be able to give each one a handle. RDFa provides the
resource attribute for this purpose. Listing 7 shows a snippet
annotating the article's author.
Listing 7. The
<p property="author" resource="#uche.ogbuji" typeof="Person"> <span property="name">Uche Ogbuji</span>, <span property="jobTitle">Partner</span>, <span property="worksFor">Zepheira</span>. 01 Dec 2000. </p>
The snippet adds details that "The article is authored by a person named
Uche Ogbuji who is Partner at Zepheira." Notice how the
p element has been marked as a description of a
new entity, of type
Person, as defined by
p also has a
property element that connects this entity to
the enclosing one (to the article).
resource attribute gives a handle to the entity
described in the
p element. It is a fragment
identifier relative to the page's URL itself. For example, if this page is
at http://www.ibm.com/developerworks/library/w-rdf, there is now an entity
that can be understood with the handle of
Zepheira is an organization. You could take this example further, by
as yet another nested entity, using the
Organization class in Schema.org. RDFa is all
about what you think is important to annotate. In this case, we're
interested in more detail about the author but not necessarily about the
organization he works for.
The final attribute defined by RDFa Lite is
prefix, which is used to combine multiple
vocabularies in one description. Listing 8 shows how you might annotate the readability
index of the article.
Listing 8. The
<div prefix="fben: http://rdf.freebase.com/ns/"> This article's texts is suitable for a wide audience, with a Fog index of <span property="fben:gunning_fog_index">10.2</span> </div>
Schema.org does not include a property for a text's Gunning-Fog readability index, but Freebase does. Freebase, which is a bit like a more structured version of Wikipedia, manages schemata and descriptions for a wide variety of entities and entity types. I defined a prefix to refer to properties as defined in Freebase, rather than the default vocabulary of Schema.org, then used that prefix around the text of the readability index value for this article.
Listing 9 pulls together everything demonstrated so far and adds back bits such as the breadcrumbs.
Listing 9. Using all five RDFa Lite attributes
<body vocab="http://schema.org/"> ... <div property="breadcrumb"> <a href="http://www.ibm.com/developerworks/">IBM developerWorks</a> > <a href="http://www.ibm.com/developerworks/web/">Web development</a> > <a href="http://www.ibm.com/developerworks/views/web/library.jsp" >Technical library</a> </div> <div typeof="Article"> <div property="name">An introduction to RDF</div> <p property="author" resource="#uche.ogbuji" typeof="Person"> by <span property="name">Uche Ogbuji</span>, <span property="jobTitle">Partner</span>, <span property="worksFor">Zepheira</span>. </p> <div>Published: <span property="datePublished">01 Dec 2000</span></div> <div property="description"> <b>Summary</b>: This article introduces Resource Description Framework (RDF), developed by the W3C for Web-based metadata, using XML as an interchange syntax. RDF's essential aim is to make work easier for autonomous agents, which would refine the Web by improving search engines and service directories. Author Uche Ogbuji gives an overview of RDF aspects from schemas to usage scenarios. The article assumes that you are already familiar with XML. </div> <div>Tags for this article: <span property="keywords">introduction</span>, <span property="keywords">rdf</span>, <span property="keywords">tutorial</span> .</div> <div prefix="fben: http://www.freebase.com/ns/"> This article's texts is suitable for a wide audience, with a Fog index of <span property="fben:gunning_fog_index">10.2</span>. </div> </div> ... </body>
There is nothing new here structurally, though Listing 9 does show you a few more Schema.org properties you can use for an article.
There are other vocabularies besides Schema.org, including the venerable Dublin Core (which has core support in RDFa) and Facebook's Social Open Graph. But, given its high-profile backers, Schema.org has been getting the press lately. This is just the latest development in the long saga of developments in search-engine optimization (SEO).
Though SEO might be a marketing concept, it is completely enhanced by core web engineering concepts such as accessibility, clean mark-up, and annotating pages where possible with more information about what their contents actually mean. Developments such as RDFa Lite should put powerful SEO within the reach of almost any web author. I hope this article helps you learn RDFa in a day; I encourage you to start annotating your pages right away.
RDFa Lite 1.1: Read the very
HTML Microdata: Read this
specification that defines the HTML microdata mechanism.
Schema.org: Dive in and get familiar with
the definitions of Article and Person (used in this
Freebase: Explore this large
registry of concepts and entities.
- "Put XHTML 2 to work now" developerWorks, June 2007): Get a sense
of how to add RDFa to XHTML.
introduction to RDF" (developerWorks, December 2000): Learn more
- "Implementing rich snippets on your WebSphere Commerce site to improve
search engine results" (Naomi Wan, developerWorks, August 2011):
Start generating HTML Microdata right away after reading this
- Read Uche Ogbuji's Real Web 2.0 column.
- developerWorks Web
development zone: Find articles covering various web-based
solutions. See the Web
development technical library for a wide range of technical
articles and tips, tutorials, standards, and IBM Redbooks.
technical events and webcasts: Stay current with technology in
- developerWorks Live! briefings: Get up to speed quickly on IBM
products and tools as well as IT industry trends.
developerWorks on-demand demos: Watch demos ranging from product
installation and setup for beginners, to advanced functionality for
- developerWorks on
Twitter: Join today to follow developerWorks tweets.
Get products and technologies
evaluation versions: Download or explore
the online trials in the IBM SOA Sandbox and get your hands on
application development tools and middleware products from DB2, Lotus,
Rational, Tivoli, and WebSphere.
community: Connect with other developerWorks users while exploring
the developer-driven blogs, forums, groups, and wikis.
- Find other developerWorks members interested in web development.
Uche Ogbuji is a partner at Zepheira, where he oversees creation of sophisticated web catalogs and other richly contextual databases. He has a long history of pioneering in advanced web technologies such as XML, semantic web and web services, and open source projects like Akara, an open source platform for web data applications. Uche is a computer engineer and writer born in Nigeria, living and working near Boulder, Colorado. You can find more about Mr. Ogbuji at his weblog, Copia.