Make your HTML pages smarter with RDFa 1.1 Lite

Optimize for search engines and other sophisticated tools

Resource Description Framework (RDF) has evolved into increasingly pragmatic formats over time. RDF annotation (RDFa) has been particularly successful as a system for annotating HTML documents inline on the web. It is supported by Google and other search engines in the form of Rich Snippets. The emergence of microdata and the Schema.org initiative applied pressure to simplify RDFa even further. The W3C took action and produced a radically simplified version: RDFa 1.1 Lite. In this article, learn about RDFa Lite, and get a head start on producing and processing the shape of Rich Snippets to come.

Share:

Uche Ogbuji, Partner, Zepheira, LLC

Photo of Uche OgbujiUche Ogbuji is a partner at Zepheira, where he oversees creation of sophisticated web catalogs and other richly contextual databases. He has a long history of pioneering in advanced web technologies such as XML, semantic web and web services, and open source projects like Akara, an open source platform for web data applications. Uche is a computer engineer and writer born in Nigeria, living and working near Boulder, Colorado. You can find more about Mr. Ogbuji at his weblog, Copia.



01 May 2012

Also available in Chinese Russian Japanese

Overview

Historically, technologies on the web have focused on how pages and sites looked and felt. The most important considerations were: a visually attractive design, links and JavaScript applications that worked as intended, and garnering the latest multimedia features. It became increasingly important to have a prominent listing on search engines. Alas, search engines see none of the design or dynamic behavior of a website; they see markup, most of which is in boldface, italics, or set aside as a content block.

Frequently used abbreviations

  • HTML: HyperText Markup Language
  • RDF: Resource Description Framework
  • SEO: Search Engine Optimization

Web authors began to realize they needed more than the presentational flourishes. They needed to define to a search engine, or other such software, what the structured data really was. For example, this is a price, this is an event's date, and the other item is a person's contact information. There was a need for a "web of data," an idea that, for a long time, had been the main concern of those interested in Resource Description Framework (RDF).

The idea behind RDF is very simple, but unfortunately much of how it is defined, discussed, and processed is far too complex. Regarding web technology, a common pragmatic tip is: "Make sure a regular web author can learn it in a day." It has taken many years, but there's finally a flavor of RDF that passes this test: RDFa 1.1 Lite.

RDFa Lite is a simplified version of RDF annotations (RDFa). RDFa is a mechanism for encoding the full RDF data model within HTML and similar vocabularies. RDFa is a bit too complicated to meet the "learn it in a day" threshold, so the WHAT WG (the group that spawned HTML 5) created an HTML Microdata specification. Microdata became the kernel of Schema.org (see Resources), an initiative to codify ways to mark up data on the web. Microdata has since become a W3C Working Draft (see Resources), but advocates of RDF and RDFa saw an opportunity to promote RDFa if they had a subset at least as simple as Microdata. Thus, you have the benefits of a long-settled data model, but in a simple syntax.

Last year, Schema.org founders conducted a workshop for community experts in structured data on the web. I attended the workshop, where Ben Adida, editor of the RDFa specification, showed us the work the RDF Web Applications Working Group had done on RDFa Lite. The very positive feedback to this draft specification inspired the W3C to redouble those efforts. RDFa 1.1 Lite is a mercifully short working draft from the W3C (and may already be a full W3C recommendation by the time you read this).

In this article, learn about RDFa 1.1 Lite, and get a jump-start on producing HTML pages that participate in the web of data. This article assumes that you are already familiar with HTML.


Attributes to clarify your content

Compatibility

RDFa is designed to be compatible with HTML 4, HTML 5, or XHTML.

RDFa involves marking up the structured data within a website. You don't simply focus on how the content should look to the user. You can mark a date as a date, a person's name as a person's name, an event as an event, an organization as an organization, and so on. RDFa Lite reduces that lofty ambition to the simplest thing that could possibly work: the addition of just five attributes to HTML or XHTML. A machine can easily interpret these attributes to extract useful data from the page. That's it, in a nutshell.

Listing 1 is an example of plain HTML for an online article.

Listing 1. Plain HTML for an online article
<body>

<a href="http://www.ibm.com/developerworks/">IBM developerWorks</a> >
 <a href="http://www.ibm.com/developerworks/web/">Web development</a> >
 <a href="http://www.ibm.com/developerworks/views/web/library.jsp">Technical library</a>

<div>
An introduction to RDF by <a href="http://uche.ogbuji.net">Uche Ogbuji</a>, 
Partner, Zepheira.
</div>

<div>Published: 01 Dec 2000</div>

<div>
<b>Summary</b>: This article introduces Resource Description Framework (RDF),

RDF, which was developed by the W3C for web-based metadata, uses XML as an interchange syntax. RDF's essential goal is to make work easier for autonomous agents, which would refine the web by improving search engines and service directories. See "An introduction to RDF" (developerWorks, December 2000) for more information about RDF.

In the remainder of this article, we'll look at annotations of the HTML in Listing 2 with RDFa Lite to demonstrate all five attributes from the spec: vocab, typeof, property, resource, and prefix.

Listing 2. HTML for RDFa Lite demonstration
<div>Tags for this article: introduction, rdf, tutorial.</div>

<div>
  This article's text is suitable for a wide audience, with a Fog index of 10.2.
</div>

</body>

The vocab attribute

Listing 3 shows the change to the body element to wrap the whole article description.

Listing 3. Using the vocab attribute
<body vocab="http://schema.org/">
    ...
</body>

The vocab article sets the stage in RDFa Lite. It designates the vocabulary that will be used, by default, in the annotations. The example uses a vocabulary defined in Schema.org, so we pick a wrapper element to mark it as such. You don't have to use the body element, but you should use an element that wraps all the annotations.

The typeof attribute

Listing 4 shows the outline of a div to enclose the one article being described.

Listing 4. Using the typeof attribute
<div typeof="Article">
    ....
</div>

The typeof attribute marks its element as a description or representation of an instance of the given class. This is put into context by the in-scope vocab designation of http://schema.org, making it clear whose definition of an "article" we're dealing with. Article is one of the classes defined in Schema.org. Each class definition is documented, including conventions for their use and associated properties.

The property attribute

Listing 5 shows the fragment that marks the article title.

Listing 5. Using the property attribute
<div property="name">An introduction to RDF</div>

The property attribute provides a property of the entity that was just declared as a Schema.org article. The name property is defined to give the title of the Schema.org Article.

Listing 6 shows the first three RDFa Lite attributes.

Listing 6. Three RDFa Lite attributes
<body vocab="http://schema.org/">
...
<div typeof="Article">

<div property="name">An introduction to RDF</div>

...More information regarding this article, or even the content itself...

</div>
...
</body>

Listing 6 is the RDFa way of saying the following so that a machine can process it:

This HTML document represents an article as defined by Schema.org, and the title of the
article is "An introduction to RDF."

More sophisticated statements

The HTML specification already provides a machine-processable place to put an article title—in the head element. Schema.org's vocabulary, however, lets you express far more about an article than just a title, including constructs that are not in HTML (46 properties, in fact, at the time of writing). Even in the case of properties such as title, what HTML provides has not always been sufficient. What if you have a page comprising an index of articles or a landing page from a search? You would be describing multiple actual article entities within one HTML page. HTML has no provisions for this case, but with Schema.org you could have multiple div elements—one for each referenced article.

The resource attribute

If you can describe multiple entities on a page, then it becomes important to be able to give each one a handle. RDFa provides the resource attribute for this purpose. Listing 7 shows a snippet annotating the article's author.

Listing 7. The resource attribute
<p property="author" resource="#uche.ogbuji" typeof="Person">
   <span property="name">Uche Ogbuji</span>, <span property="jobTitle">Partner</span>,
   <span property="worksFor">Zepheira</span>. 01 Dec 2000.
</p>

The snippet adds details that "The article is authored by a person named Uche Ogbuji who is Partner at Zepheira." Notice how the p element has been marked as a description of a new entity, of type Person, as defined by Schema.org. p also has a property element that connects this entity to the enclosing one (to the article).

The resource attribute gives a handle to the entity described in the p element. It is a fragment identifier relative to the page's URL itself. For example, if this page is at http://www.ibm.com/developerworks/library/w-rdf, there is now an entity that can be understood with the handle of http://www.ibm.com/developerworks/library/w-rdf#uche.ogbuji.

Zepheira is an organization. You could take this example further, by expressing the <span property="worksFor">Zepheira</span> as yet another nested entity, using the Organization class in Schema.org. RDFa is all about what you think is important to annotate. In this case, we're interested in more detail about the author but not necessarily about the organization he works for.

The prefix attribute

The final attribute defined by RDFa Lite is prefix, which is used to combine multiple vocabularies in one description. Listing 8 shows how you might annotate the readability index of the article.

Listing 8. The prefix attribute
<div prefix="fben: http://rdf.freebase.com/ns/">
  This article's texts is suitable for a wide audience, with a Fog index of 
<span property="fben:gunning_fog_index">10.2</span>
</div>

Schema.org does not include a property for a text's Gunning-Fog readability index, but Freebase does. Freebase, which is a bit like a more structured version of Wikipedia, manages schemata and descriptions for a wide variety of entities and entity types. I defined a prefix to refer to properties as defined in Freebase, rather than the default vocabulary of Schema.org, then used that prefix around the text of the readability index value for this article.


Pulling the example together

Listing 9 pulls together everything demonstrated so far and adds back bits such as the breadcrumbs.

Listing 9. Using all five RDFa Lite attributes
<body vocab="http://schema.org/">
...
<div property="breadcrumb">
  <a href="http://www.ibm.com/developerworks/">IBM developerWorks</a> >
    <a href="http://www.ibm.com/developerworks/web/">Web development</a> >
    <a href="http://www.ibm.com/developerworks/views/web/library.jsp"
        >Technical library</a>
</div>

<div typeof="Article">

<div property="name">An introduction to RDF</div>

<p property="author" resource="#uche.ogbuji" typeof="Person">
   by <span property="name">Uche Ogbuji</span>, <span property="jobTitle">Partner</span>,
   <span property="worksFor">Zepheira</span>.
</p>

<div>Published: <span property="datePublished">01 Dec 2000</span></div>

<div property="description">
  <b>Summary</b>: This article introduces Resource Description Framework (RDF),
  developed by the W3C for Web-based metadata, using XML as an interchange syntax.
  RDF's essential aim is to make work easier for autonomous agents, 
  which would refine the Web by improving search engines and service directories. 
  Author Uche Ogbuji gives an overview of RDF aspects from schemas to usage scenarios.
  The article assumes that you are already familiar with XML.
</div>

<div>Tags for this article: 
  <span property="keywords">introduction</span>,
  <span property="keywords">rdf</span>,
  <span property="keywords">tutorial</span>
.</div>

<div prefix="fben: http://www.freebase.com/ns/">
  This article's texts is suitable for a wide audience, with a Fog index of 
  <span property="fben:gunning_fog_index">10.2</span>.
</div>

</div>
...
</body>

There is nothing new here structurally, though Listing 9 does show you a few more Schema.org properties you can use for an article.


Wrap-up

There are other vocabularies besides Schema.org, including the venerable Dublin Core (which has core support in RDFa) and Facebook's Social Open Graph. But, given its high-profile backers, Schema.org has been getting the press lately. This is just the latest development in the long saga of developments in search-engine optimization (SEO).

Though SEO might be a marketing concept, it is completely enhanced by core web engineering concepts such as accessibility, clean mark-up, and annotating pages where possible with more information about what their contents actually mean. Developments such as RDFa Lite should put powerful SEO within the reach of almost any web author. I hope this article helps you learn RDFa in a day; I encourage you to start annotating your pages right away.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Web development on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=812234
ArticleTitle=Make your HTML pages smarter with RDFa 1.1 Lite
publish-date=05012012