08 Nov 2011 - As a followup to reader comments, the author added a span element to the birthDate and replaced the content of Listing 1.
In May of 2011, the triumvirate of Google, Yahoo, and Bing announced schema.org and got everyone talking about structured data. Schema.org is a new way for search engines to understand web pages. If web content authors add a little bit of metadata to their pages—just a few vocabulary terms—then their search results show up better in all three search engines.
The extra markup hasn't yet changed the way search results are displayed for many sites that have implemented schema.org. Web content authors are still eager, though, to get their pages marked up and ready for consumption by the big three.
Schema.org poses a challenge for web authors who don't have experience with the different syntaxes for adding structured data to HTML. The syntaxes are:
- Microformats
- RDFa
- Microdata
To add to the challenge, Google (the most influential search engine for many web authors) indicated that it will only process microdata. Microdata, which is the newest of the three syntaxes, does not yet have much tool support.
In this article, learn to use Drupal to add microdata to your pages. Prepare your content so it can be used in applications such as Google's Rich Snippets.
Download the source code for this article.
Microdata is a simple way to add structured data to pages. It defines a few attributes,
such as itemtype and itemprop,
that can be placed on HTML tags to indicate what the page is about. Microdata
was introduced by Ian Hickson, the editor of the HTML 5 specification, in 2009. The
roots of the idea existed much earlier than that, though.
Microdata is based on RDFa, which is a way of placing RDF in HTML. The idea for RDFa
was introduced by Mark Birbeck in 2004 with a note published by the W3C. The idea was then incorporated into the next version of XHTML. RDFa introduced several new
HTML attributes, such as property and about, and reused some attributes, such as rel.
RDFa is powerful, but it can be difficult for authors to know if their RDFa is correct due to the sometimes-complex interactions of the attributes. RDFa also inherited some features of XML, such as namespace prefixes, which can be confusing.
Microformats, another version of structured data in XHTML, was launched a little more
than a year later by a grassroots group of developers. In contrast to RDFa,
microformats reuse existing XHTML attributes that web content authors were already
used to, such as the rel attribute on links. Microformats
also add a
little bit of semantics within those attributes. An emphasis was placed on only
marking up visible content; it's easy for invisible content to be abused or go out of sync with visible content.
One problem with microformats is that there isn't a generic way to parse them. Instead, support has to be added for each microformat. For example, if you want to process both calendar data and address data, you have to make sure your parser supports both or use two different parsers. It can also be difficult to get a new microformat published through the community process.
Microdata brings together good ideas from both microformats and RDFa. Microdata:
- Reduces the complexity of RDFa by reducing the number of attributes and the options for their placement.
- Eliminates the namespace prefixes.
- Maintains the generic parsing of RDFa, which makes it much easier to make tools that work on top of published data.
- Maintains the ability for different groups of people to create their own sets of attribute values, called vocabularies, to use with microdata.
Placing the schema.org vocabulary with microdata
Schema.org is a vocabulary that works well with microdata. Because no approval body is in charge of vocabularies, the search engine owners were able to devise their own vocabulary to meet their needs. Most of the vocabulary deals with the kinds of things Google already focused on for its Rich Snippets: people, places, events, entertainment, and commerce.
Several good examples (see Resources) demonstrate how to place schema.org terms on a site. For example, Listing 1 shows simple markup for a description of a movie enhanced with schema.org terms.
Listing 1. Simple markup for a movie enhanced with schema.org
<div itemscope itemtype ="http://schema.org/Movie">
<h1 itemprop="name"&g;Avatar</h1>
<div itemprop="director" itemscope itemtype="http://schema.org/Person">
Director: <span itemprop="name">James Cameron</span>
(born <span itemprop="birthDate">August 16, 1954)</span>
</div>
<span itemprop="genre">Science fiction</span>
<a href="../movies/avatar-theatrical-trailer.html" itemprop="trailer">Trailer</a>
</div>
|
What the extra markup does might not be immediately clear. To get an idea, publish a page with this snippet to the web. You can then enter the URL for that page in Google's Rich Snippets Testing Tool (see Resources), as in Figure 1. If you don't have easy access to a web server, you can also copy and paste the snippet into the live microdata testing tool provided by Opera developer Philip Jägenstedt (see Resources).
Figure 1. Schema.org microdata extracted from the example in Listing 1
The tool pulled out information about two things: the movie and its director.
The two main concepts in microdata are items and properties of those items. A property can be set either to a string or another item. For example, the movie is an item. It has a name, which is a property with a string value. It also has a director, which is a property with an item value—the person.
To let the parser know you're starting to talk about an item, use the itemscope attribute. You can also use the itemtype attribute
to let the parser know what type of thing you're talking about.
Use itemtype to determine which properties can be used in
the itemprop attribute. For example, on the page for the
Movie itemtype you'll find a list of properties that can be
used on the movie (see Resources). Other properties outside this list can also be used if you use the full URL of the property. For example, the FOAF vocabulary also specifies a name property. You could use itemprop="http://xmlns.com/foaf/0.1/name" to use the FOAF name property instead of the schema.org name property.
All of the properties inside of the Movie's <div> are understood as properties of
the movie until you reach the end of the div or until you
reach an itemscope on a div inside the Movie, as in Listing 1. The itemscope attribute
indicates that you are now talking about a different thing (a Person in this case), so the birthplace property is understood as an attribute of the Person instead of the Movie.
Because you added a little structure to your content, it's easy for either of the tools to extract the relevant information. By adding the attributes in the HTML, you made the data in your page easy to process—almost as if it were in an Excel spreadsheet or a database.
Though microdata is fairly simple, it can still be difficult to place and maintain the content by hand. Some tools support the production of microdata, including Drupal's Microdata module (see Resources).
Using Drupal to add microdata to your pages
Drupal is a content management system that powers an estimated 2% of the web. With its user interface, site administrators can create forms to collect content from users. Drupal then automatically creates the appropriate tables and fields in the database for the form data and handles the display of the data in a configurable way.
Drupal is particularly well suited for outputting structured data because of the way the content is handled—as discrete things (called entities) that have properties in the form of field values. With Drupal 7, the capability to add structured data to HTML using RDFa was incorporated into the Drupal core.
Since the schema.org announcement on 02 June 2011, work has progressed to also add the same support for microdata output. The microdata module is still under development and isn't ready for use on live sites. For experimentation on testing sites, you can use the microdata module to generate microdata for fields and test the Rich Snippet displays based on that microdata.
Start by recreating the example above using Drupal. See Resources to download and enable the latest release of the following modules:
- Microdata
- Entity API
- CTools
A content type allows users to define what field values are collected and stored for an entity. For example, you might create a product content type that has form fields for collecting the price, available colors, sizes, and manufacturer's model number, which makes it easy to maintain an inventory.
For this exercise, you'll create a movie content type. Go to Structure > Content Types, click the Add Content type link, and enter the following information.
- Name:
Movie - Description:
A page describing a movie - Comment settings: Select Closed. You don't need the comment function on that page.
- Microdata settings: Add the itemtype
http://schema.org/Movie.The title is a special kind of field and does not have its own edit screen, so you add the title here as well. Use the
nameproperty to mark up the title.
You can test whether this example worked by creating a new Movie item. Go to Add
content to create the Movie. After you create it, use the Rich Snippets
testing tool to determine if you can extract the data from the page. You should
see a single item with a Type of http://schema.org/movie and
name of Cool Hand Luke as in Figure 2.
Figure 2. Microdata extracted after mapping the content type and title
The content type was recognized as being a Movie with a title. However, there's more information about this movie.
Fields are attached to content types to collect extra information about the content. In the example, add the genre of a movie as its own field.
To add the genre to the content type, go to Structure > Content types and click Manage fields for the Movie content type. You'll use a text field to collect the genre. Enter the following information.
- Label:
Genre - Field name:
genre - Field type:
Text - Field widget:
Text field
Click Save field settings on the next page. At the bottom of the field
instance configuration form, you will see Genre Microdata
Mapping, as in Figure 3. Set the field property to genre and click Save.
Figure 3. Interface for mapping the text field
Edit your piece of content and add the genre of the movie. Refresh the Rich Snippet. The genre now displays with the type and name.
Though the example didn't demonstrate images, you can add an image, such as the movie poster, to this content type. A thumbnail of the image then displays for the Rich Snippet.
To add the image to the content type, go to Structure > Content types and click Manage fields.
- Label:
Poster - Field name:
poster - Field type:
Image - Field widget:
Image
Use the image schema.org property
for the poster. In the field property field, enter image, as in Figure 4.
Figure 4. Interface for mapping the image field
Save and edit the movie to add an image. Retest the Rich Snippet. You should see the
image property with its URL,
http://lin-clark.com/sites/default/files/cool-hand-luke.jpg,
as in Figure 5. The single item also has a Type of http://schema.org/movie,
a name of Cool Hand Luke, and
a genre of prison drama.
Figure 5. Microdata extracted from the text and image field
You might also see a Rich Snippet displayed with a thumbnail of the poster, as in Figure 6. Google's testing tool is under very active development; the display of the Rich Snippet for the same markup changes over time. This Rich Snippet was captured on 14 September, but the display changed by 19 September.
Figure 6. Rich Snippet displayed for movie
Enabling microdata in field formatters
Text and image fields cover a lot of the data that people usually put on a site, but there are other types of data. To cover all kinds of data that a site administrator might need, Drupal's field system gives users a selection of basic field types and provides an API so that modules can define new field types. Within these modules, you can define different data collection forms (widgets), data storage, and display (formatters) for each field type. Site administrators can then install such field modules and configure the widgets and formatters without having to write any code.
Microdata has strict requirements about where to place the microdata attributes in the HTML, so each field type in Drupal needs to define where to place the attribute within its formatters. While microdata is supported for most field types defined by core, many widely used field types still do not support microdata.
To use a field formatter defined in a contributed module, you can check the table that tracks microdata support. Even if the field formatter isn't supported yet, that doesn't mean you can't use it. It's easy to add microdata support to a field formatter. You can even contribute microdata support back to the module by creating a patch with your changes. This is a great way to get started with the Drupal developer community.
In the example from schema.org, a link to the movie's trailer was marked up. At the time of this writing, the link field formatter defined by the Drupal Link module doesn't support microdata, but you can change that.
You'll add microdata support to the Link module. The examples below use the Link module code from 20 September 2011, which is provided in the download file with this article. (The current version of the Link module has changed and might already contain microdata support.)
The link field has two different bits of data that you might want to expose using microdata:
- The URL for the link
- The text that is linked to that URL
At this point, you need to notify the system of these two properties through an Entity API module: the Entity Property API.
You must add the information to the field definition, which is registered by link_field_info. Add the property_type for
the field itself and the property_callbacks, as in Listing 2.
Listing 2. Add property information for the field to
link_field_info
/**
* Implements hook_field_info().
*/
function link_field_info() {
return array(
'link_field' => array(
'label' => t('Link'),
'description' => t('Store a title, href, and attributes in the database to
assemble a link.'),
// ...
'property_type' => 'field_item_link',
'property_callbacks' => array('link_field_property_info_callback'),
),
);
}
|
The property type lets the system know the data type of the field. Because field_item_link isn't a recognized data type or entity, the data type
defaults to struct when it is processed.
This struct acts as a container for the properties that you mark
up (the link URL and linked text). Because it is simply a container, you don't enable
microdata for the field itself—only for its properties.
The property callback is a function that registers the same
property type information for the component properties. To mark up the
properties with microdata, set microdata to TRUE for
each property, as in Listing 3. This provides the graphical user
interface for adding microdata for these properties.
Listing 3. Register the field's properties with the property callback
/**
* Additional callback to adapt the property info of link fields.
* @see entity_metadata_field_entity_property_info().
*/
function link_field_property_info_callback(&$info, $entity_type, $field, $instance,
$field_type) {
$property = &$info[$entity_type]['bundles'][$instance['bundle']]['properties']
[$field['field_name']];
$property['property info'] = array(
'title' => array(
'type' => 'text',
'label' => t('The title of the link.'),
'microdata' => TRUE,
),
'url' => array(
'type' => 'uri',
'label' => t('The URL of the link.'),
'microdata' => TRUE,
),
);
if ($instance['settings']['title'] == 'none') {
unset($property['property info']['title']);
}
}
|
The user interface pulls the label from the property information and uses the type
to determine which kind of form fields to display. If the property is an item
instead of a string, an itemtype field also displays. Figure 7 shows an example for two properties of a trailer: the
link title and link URL.
Figure 7. Link microdata mapping form
You can now specify which vocabulary terms to use for the field's properties on the field configuration form. However, the attributes aren't inserted into the HTML until you add a little more code.
Adding microdata to the themed output
To place the microdata, you need to change the HTML output for the field. For example,
to add a link to a software application, you might want the link text (the name of the software) to use the name property
and the link itself to use the url property. Listing 4 shows how to do this by adding the itemprop of the URL to the <a> tag and inserting a span with
the itemprop of the text around the text content.
Listing 4. A link before and after adding microdata
<a href="http://drupal.org">Drupal</a> <a itemprop="url" href="http://drupal.org"><span itemprop="name">Drupal</span></a> |
Things are easier if you could get the Link module to insert these attributes. To
transform the content from the database for the field into HTML, each field formatter
module has its own view function. Within the view function, some formatters use theme
functions to generate the HTML. An example is theme_link_formatter_link_default(). Often, the microdata attributes need to be passed from the field_formatter_view function into the theme function.
In the Link module, the formatter already passes an array of attributes to be placed on
the <a> tag using the item variable. You can add the URL itemprop to that array to have it automatically output where you need
it, as in Listing 5.
Listing 5. Adding microdata in
hook_field_formatter_view
/**
* Implements hook_field_formatter_view().
*/
function link_field_formatter_view($entity_type, $entity, $field, $instance,
$langcode, $items, $display) {
$elements = array();
$microdata = array();
// If the microdata module is enabled, the microdata mapping will have been
// passed in via the entity.
if (module_exists('microdata')) {
$microdata = $entity->microdata[$field['field_name']];
}
foreach ($items as $delta => $item) {
// Add the url attributes to $item['attributes'] because the theme function
// will pass it through to l(), properly placing the itemprop for the url.
if (isset($microdata['url'])) {
$item['attributes'] += $microdata['url']['#attributes'];
}
// Pass the microdata array to the theme function so it can be used to place
// the link title's attribute.
$elements[$delta] = array(
'#markup' => theme('link_formatter_'. $display['type'], array('element' => $item,
'field' => $instance, 'microdata' => $microdata)),
);
}
return $elements;
}
|
There is no automatic way to place the attributes for the text content, however. You have to pass them into the theme function and change the theme function to use them.
After you pass the microdata variables to the theme
function, you can add the <span> tag containing the
itemprop around the title. The code checks to see whether
there is an itemprop for the text and, if there is, you add
the microdata, as in Listing 6.
Listing 6. Add microdata in the theme function
/**
* Theme function for 'default' text field formatter.
*/
function theme_link_formatter_link_default($vars) {
$url = $vars['element']['url'];
$microdata = $vars['microdata'];
// If there is an itemprop set for the title, wrap the title in a span and
// add the itemprop to that span.
if (!empty($microdata['title'])) {
$title = '<span ' . drupal_attributes($microdata['title']['#attributes'])
. '>' . $vars['element']['title'] . '</span>';
}
else {
$title = $vars['element']['title'];
}
// Create the array of options to pass to l().
$link_options = $vars['element'];
unset($link_options['element']['title']);
unset($link_options['element']['url']);
// Display a normal link if both title and URL are available.
if (!empty($title) && !empty($url)) {
return l($title, $url, $link_options);
}
// If only a title, display the title.
elseif (!empty($title)) {
return check_plain($title);
}
// If only a url, display the full url as a link.
elseif (!empty($url)) {
return l($url, $url, $link_options);
}
}
|
You can now test the microdata output for the formatter.
Contributing your changes back to the community
One of the things that makes Drupal a powerful technical solution is the large number of contributors that make up its community. Contributors aren't just people who live and breath Drupal; many contributors make the occasional code fix for their own sites, which they then post as a patch for others to use.
If you add microdata to a field formatter for your own project, you can contribute that work back to the Drupal community. Simply post an issue in the issue queue for the module and suggest that the module support microdata. This type of issue is called a feature request. You can then post a patch with your changes on the issue. (There are some great tutorials that demonstrate how to create patches for Drupal projects.) Once you've posted the patch, mark the issue as "needs review."
In this article, you learned to use Drupal to add microdata to your pages so your content can be used in applications like Google's Rich Snippets. With the new microdata module you can configure microdata output for basic field types and add microdata output to custom field types. Now your data is available for others to make applications on top of it.
| Description | Name | Size | Download method |
|---|---|---|---|
| Article source code | microdata-source2.zip | 820KB | HTTP |
Information about download methods
Learn
- Schema.org: Learn more about this collection of schemas, which are HTML tags that webmasters can use to mark up their pages in ways recognized by major search providers.
- Getting started with schema.org: In these tutorials, learn to mark up your content using microdata and to use the schema.org vocabulary. Advanced topics are also covered.
- Itemtype URL: Find the properties that you can use on a schema.org item by visiting the itemtype URL (http://schema.org/Movie, for example).
- Microdata support: Find out if a field formatter has microdata support.
- Data types: See how microdata in Drupal uses the entity properties.
- The Semantic web, Linked Data and Drupal, Part 1: Expose your data using RDF (Lin Clark, developerWorks, April 2011): Make your web data more interoperable and your data sharing more efficient. An example shows how to use Drupal 7 to publish Linked Data by exposing content with RDF.
- The Semantic web, Linked Data and Drupal, Part 2: Combine linked datasets with Drupal 7 and SPARQL Views (Stéphane Corlosquet and Lin Clark, developerWorks, May 2011): Learn to use the existing Linked Data available today on the web of data, and how to enrich a Drupal 7 site with data coming from different endpoints.
- Creating patches for Drupal projects: Learn what patches are and how to work with them in the context of the Drupal project. From Drupal's HTML5 initiative lead, Jacine Luisi.
- Scientific American article on the Semantic web: Read this seminal article by Tim Berners-Lee, James Hendler and Ora Lassila.
- Linked Data: Read the ReadWriteWeb interview about linked data with Tim Berners-Lee.
- Linked Data Design Issues: Learn more about linked data from Tim Berners-Lee.
- Rich snippets (microdata, microformats, and RDFa) - Webmaster Tools Help: Learn more about Google Rich snippets and how to label your web content to indicate clearly the data type, such as a restaurant name, an address, or a rating.
- Implement Semantic web standards in your Web site (Rob Crowther, developerWorks, May 2008): Create a simple social networking site using PHP and MySQL, which implements Semantic web standards such as hCard and Friend of a Friend (FOAF) as part of a semantic Uniform Resource Identifier (URI) scheme.
- Developing Drupal publications to support standards-based XML (Garrick Bodine and Stephanie Schlitz, developerWorks, Feb 2011): Learn how to customize your Drupal installation to support the publication of TEI (or other) XML documents.
- Drupal Installation Guide: Read about preparing for installation, running the installation script itself, and the steps to do after running the installation script completes.
- Install Drupal 7 with the Acquia Stack Installer: Get step-by-step instructions in this video.
- FOAF Vocabulary Specification 0.98: Explore the FOAF language, defined as a dictionary of named properties and classes using W3C's RDF technology.
- Dublin Core Metadata Initiative (DCMI): Learn about this open organization engaged in the development of interoperable metadata standards that support a broad range of purposes and business models.
- SIOC (Semantically-Interlinked Online Communities) Core Ontology Specification: Learn the main concepts and properties required to describe information from online communities (such as message boards, wikis, or weblogs) on the Semantic web.
- SPARQL Explorer for http://dbpedia.org/sparql: Try a demonstration query interface available on the web.
- New to XML? Get the resources you need to learn XML.
- XML area on developerWorks: Find the resources you need to advance your skills in the XML arena, including DTDs, schemas, and XSLT. See the XML technical library for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.
- developerWorks on Twitter: Join today to follow developerWorks tweets.
- developerWorks podcasts: Listen to interesting interviews and discussions for software developers.
- developerWorks on-demand demos: Watch demos ranging from product installation and setup for beginners to advanced functionality for experienced developers.
Get products and technologies
- Acquia Drupal: Get the freely available packaged distribution of the open source Drupal social publishing system.
- Google's Rich Snippets Testing Tool: Test your schema.org markup.
- Google Rich Snippets, Field collection, and Entity API: Download the modules and be sure to get the development releases.
- Live Microdata testing tool: Get another tool, created by Opera developer Philip Jägenstedt, for testing microdata.
- IBM product evaluation versions: Download or explore the online trials in the IBM SOA Sandbox and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
Discuss
- developerWorks profile: Create your profile today and set up a watchlist.
- XML zone discussion forums: Participate in any of several XML-related discussions.
- The developerWorks community: Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Lin Clark is a Drupal developer specializing in Linked Data. She is the maintainer of multiple Drupal modules, such as Microdata and SPARQL Views, and is an active participant in the W3C’s HTML Data Task Force and Drupal's HTML5 initiative. She attended Carnegie Mellon University and is finishing a research masters degree at the Digital Enterprise Research Institute at NUI Galway. More information is available at lin-clark.com.




