Combine Drupal, HTML5, and microdata

Make your content easier to find and reuse

With Google, Yahoo, and Bing's announcement of schema.org, microdata is quickly gaining ground as a way to create applications that rely on data from many different websites. In this article, learn how to use Drupal to add microdata to your pages. Easily make your content available for use in applications such as Google's Rich Snippets.

Lin Clark, Drupal Developer, Digital Enterprise Research Institute, NUI Galway

Author photoLin Clark is a Drupal developer specializing in Linked Data. She is the maintainer of multiple Drupal modules, such as Microdata and SPARQL Views, and is an active participant in the W3C’s HTML Data Task Force and Drupal's HTML5 initiative. She attended Carnegie Mellon University and is finishing a research masters degree at the Digital Enterprise Research Institute at NUI Galway. More information is available at lin-clark.com.



08 November 2011 (First published 01 November 2011)

Also available in Chinese Russian Japanese Vietnamese

08 Nov 2011 - As a followup to reader comments, the author added a span element to the birthDate and replaced the content of Listing 1.

Introduction

In May of 2011, the triumvirate of Google, Yahoo, and Bing announced schema.org and got everyone talking about structured data. Schema.org is a new way for search engines to understand web pages. If web content authors add a little bit of metadata to their pages—just a few vocabulary terms—then their search results show up better in all three search engines.

The extra markup hasn't yet changed the way search results are displayed for many sites that have implemented schema.org. Web content authors are still eager, though, to get their pages marked up and ready for consumption by the big three.

Schema.org poses a challenge for web authors who don't have experience with the different syntaxes for adding structured data to HTML. The syntaxes are:

  • Microformats
  • RDFa
  • Microdata

To add to the challenge, Google (the most influential search engine for many web authors) indicated that it will only process microdata. Microdata, which is the newest of the three syntaxes, does not yet have much tool support.

In this article, learn to use Drupal to add microdata to your pages. Prepare your content so it can be used in applications such as Google's Rich Snippets.

Download the source code for this article.


What is microdata?

Frequently used abbreviations

  • FOAF: Friend of a Friend
  • RDF: Resource Description Framework
  • RDFa: RDF in attributes

Microdata is a simple way to add structured data to pages. It defines a few attributes, such as itemtype and itemprop, that can be placed on HTML tags to indicate what the page is about. Microdata was introduced by Ian Hickson, the editor of the HTML 5 specification, in 2009. The roots of the idea existed much earlier than that, though.

Microdata is based on RDFa, which is a way of placing RDF in HTML. The idea for RDFa was introduced by Mark Birbeck in 2004 with a note published by the W3C. The idea was then incorporated into the next version of XHTML. RDFa introduced several new HTML attributes, such as property and about, and reused some attributes, such as rel.

RDFa is powerful, but it can be difficult for authors to know if their RDFa is correct due to the sometimes-complex interactions of the attributes. RDFa also inherited some features of XML, such as namespace prefixes, which can be confusing.

Microformats, another version of structured data in XHTML, was launched a little more than a year later by a grassroots group of developers. In contrast to RDFa, microformats reuse existing XHTML attributes that web content authors were already used to, such as the rel attribute on links. Microformats also add a little bit of semantics within those attributes. An emphasis was placed on only marking up visible content; it's easy for invisible content to be abused or go out of sync with visible content.

One problem with microformats is that there isn't a generic way to parse them. Instead, support has to be added for each microformat. For example, if you want to process both calendar data and address data, you have to make sure your parser supports both or use two different parsers. It can also be difficult to get a new microformat published through the community process.

Microdata brings together good ideas from both microformats and RDFa. Microdata:

  • Reduces the complexity of RDFa by reducing the number of attributes and the options for their placement.
  • Eliminates the namespace prefixes.
  • Maintains the generic parsing of RDFa, which makes it much easier to make tools that work on top of published data.
  • Maintains the ability for different groups of people to create their own sets of attribute values, called vocabularies, to use with microdata.

Placing the schema.org vocabulary with microdata

Schema.org is a vocabulary that works well with microdata. Because no approval body is in charge of vocabularies, the search engine owners were able to devise their own vocabulary to meet their needs. Most of the vocabulary deals with the kinds of things Google already focused on for its Rich Snippets: people, places, events, entertainment, and commerce.

Several good examples (see Resources) demonstrate how to place schema.org terms on a site. For example, Listing 1 shows simple markup for a description of a movie enhanced with schema.org terms.

Listing 1. Simple markup for a movie enhanced with schema.org
<div itemscope itemtype ="http://schema.org/Movie">
  <h1 itemprop="name"&g;Avatar</h1>
  <div itemprop="director" itemscope itemtype="http://schema.org/Person">
  Director: <span itemprop="name">James Cameron</span> 
            (born <span itemprop="birthDate">August 16, 1954)</span>
  </div>
  <span itemprop="genre">Science fiction</span>
  <a href="../movies/avatar-theatrical-trailer.html" itemprop="trailer">Trailer</a>
</div>

What the extra markup does might not be immediately clear. To get an idea, publish a page with this snippet to the web. You can then enter the URL for that page in Google's Rich Snippets Testing Tool (see Resources), as in Figure 1. If you don't have easy access to a web server, you can also copy and paste the snippet into the live microdata testing tool provided by Opera developer Philip Jägenstedt (see Resources).

Figure 1. Schema.org microdata extracted from the example in Listing 1
Screen capture of schema.org microdata extracted from the example in Listing 1

The tool pulled out information about two things: the movie and its director.

The two main concepts in microdata are items and properties of those items. A property can be set either to a string or another item. For example, the movie is an item. It has a name, which is a property with a string value. It also has a director, which is a property with an item value—the person.

To let the parser know you're starting to talk about an item, use the itemscope attribute. You can also use the itemtype attribute to let the parser know what type of thing you're talking about.

Use itemtype to determine which properties can be used in the itemprop attribute. For example, on the page for the Movie itemtype you'll find a list of properties that can be used on the movie (see Resources). Other properties outside this list can also be used if you use the full URL of the property. For example, the FOAF vocabulary also specifies a name property. You could use itemprop="http://xmlns.com/foaf/0.1/name" to use the FOAF name property instead of the schema.org name property.

All of the properties inside of the Movie's <div> are understood as properties of the movie until you reach the end of the div or until you reach an itemscope on a div inside the Movie, as in Listing 1. The itemscope attribute indicates that you are now talking about a different thing (a Person in this case), so the birthplace property is understood as an attribute of the Person instead of the Movie.

Because you added a little structure to your content, it's easy for either of the tools to extract the relevant information. By adding the attributes in the HTML, you made the data in your page easy to process—almost as if it were in an Excel spreadsheet or a database.

Though microdata is fairly simple, it can still be difficult to place and maintain the content by hand. Some tools support the production of microdata, including Drupal's Microdata module (see Resources).


Using Drupal to add microdata to your pages

Drupal is a content management system that powers an estimated 2% of the web. With its user interface, site administrators can create forms to collect content from users. Drupal then automatically creates the appropriate tables and fields in the database for the form data and handles the display of the data in a configurable way.

Drupal is particularly well suited for outputting structured data because of the way the content is handled—as discrete things (called entities) that have properties in the form of field values. With Drupal 7, the capability to add structured data to HTML using RDFa was incorporated into the Drupal core.

Since the schema.org announcement on 02 June 2011, work has progressed to also add the same support for microdata output. The microdata module is still under development and isn't ready for use on live sites. For experimentation on testing sites, you can use the microdata module to generate microdata for fields and test the Rich Snippet displays based on that microdata.

Start by recreating the example above using Drupal. See Resources to download and enable the latest release of the following modules:

  • Microdata
  • Entity API
  • CTools

Marking up the content type

A content type allows users to define what field values are collected and stored for an entity. For example, you might create a product content type that has form fields for collecting the price, available colors, sizes, and manufacturer's model number, which makes it easy to maintain an inventory.

For this exercise, you'll create a movie content type. Go to Structure > Content Types, click the Add Content type link, and enter the following information.

  • Name: Movie
  • Description: A page describing a movie
  • Comment settings: Select Closed. You don't need the comment function on that page.
  • Microdata settings: Add the itemtype http://schema.org/Movie.

    The title is a special kind of field and does not have its own edit screen, so you add the title here as well. Use the name property to mark up the title.

You can test whether this example worked by creating a new Movie item. Go to Add content to create the Movie. After you create it, use the Rich Snippets testing tool to determine if you can extract the data from the page. You should see a single item with a Type of http://schema.org/movie and name of Cool Hand Luke as in Figure 2.

Figure 2. Microdata extracted after mapping the content type and title
Screen capture of microdata extracted after mapping the content type and title

The content type was recognized as being a Movie with a title. However, there's more information about this movie.


Marking up text fields

Fields are attached to content types to collect extra information about the content. In the example, add the genre of a movie as its own field.

To add the genre to the content type, go to Structure > Content types and click Manage fields for the Movie content type. You'll use a text field to collect the genre. Enter the following information.

  • Label: Genre
  • Field name: genre
  • Field type: Text
  • Field widget: Text field

Click Save field settings on the next page. At the bottom of the field instance configuration form, you will see Genre Microdata Mapping, as in Figure 3. Set the field property to genre and click Save.

Figure 3. Interface for mapping the text field
Screen capture of the interface for mapping the text field

Edit your piece of content and add the genre of the movie. Refresh the Rich Snippet. The genre now displays with the type and name.

Marking up image fields

Though the example didn't demonstrate images, you can add an image, such as the movie poster, to this content type. A thumbnail of the image then displays for the Rich Snippet.

To add the image to the content type, go to Structure > Content types and click Manage fields.

  • Label: Poster
  • Field name: poster
  • Field type: Image
  • Field widget: Image

Use the image schema.org property for the poster. In the field property field, enter image, as in Figure 4.

Figure 4. Interface for mapping the image field
Screen capture of the interface for mapping the image field

Save and edit the movie to add an image. Retest the Rich Snippet. You should see the image property with its URL, http://lin-clark.com/sites/default/files/cool-hand-luke.jpg, as in Figure 5. The single item also has a Type of http://schema.org/movie, a name of Cool Hand Luke, and a genre of prison drama.

Figure 5. Microdata extracted from the text and image field
Screen capture of the microdata extracted from the text and image field

You might also see a Rich Snippet displayed with a thumbnail of the poster, as in Figure 6. Google's testing tool is under very active development; the display of the Rich Snippet for the same markup changes over time. This Rich Snippet was captured on 14 September, but the display changed by 19 September.

Figure 6. Rich Snippet displayed for movie
Screen capture of a Rich Snippet displayed for movie. Captured on 14 September, the display changed by 19 September

Enabling microdata in field formatters

Text and image fields cover a lot of the data that people usually put on a site, but there are other types of data. To cover all kinds of data that a site administrator might need, Drupal's field system gives users a selection of basic field types and provides an API so that modules can define new field types. Within these modules, you can define different data collection forms (widgets), data storage, and display (formatters) for each field type. Site administrators can then install such field modules and configure the widgets and formatters without having to write any code.

Microdata has strict requirements about where to place the microdata attributes in the HTML, so each field type in Drupal needs to define where to place the attribute within its formatters. While microdata is supported for most field types defined by core, many widely used field types still do not support microdata.

To use a field formatter defined in a contributed module, you can check the table that tracks microdata support. Even if the field formatter isn't supported yet, that doesn't mean you can't use it. It's easy to add microdata support to a field formatter. You can even contribute microdata support back to the module by creating a patch with your changes. This is a great way to get started with the Drupal developer community.

In the example from schema.org, a link to the movie's trailer was marked up. At the time of this writing, the link field formatter defined by the Drupal Link module doesn't support microdata, but you can change that.

You'll add microdata support to the Link module. The examples below use the Link module code from 20 September 2011, which is provided in the download file with this article. (The current version of the Link module has changed and might already contain microdata support.)

Registering properties

The link field has two different bits of data that you might want to expose using microdata:

  • The URL for the link
  • The text that is linked to that URL

At this point, you need to notify the system of these two properties through an Entity API module: the Entity Property API.

You must add the information to the field definition, which is registered by link_field_info. Add the property_type for the field itself and the property_callbacks, as in Listing 2.

Listing 2. Add property information for the field to link_field_info
/**
 * Implements hook_field_info().
 */
function link_field_info() {
return array(
    'link_field' => array(
      'label' => t('Link'),
     'description' => t('Store a title, href, and attributes in the database to
 assemble a link.'),
      // ...
      'property_type' => 'field_item_link',
      'property_callbacks' => array('link_field_property_info_callback'),
    ),
  );
}

The property type lets the system know the data type of the field. Because field_item_link isn't a recognized data type or entity, the data type defaults to struct when it is processed. This struct acts as a container for the properties that you mark up (the link URL and linked text). Because it is simply a container, you don't enable microdata for the field itself—only for its properties.

The property callback is a function that registers the same property type information for the component properties. To mark up the properties with microdata, set microdata to TRUE for each property, as in Listing 3. This provides the graphical user interface for adding microdata for these properties.

Listing 3. Register the field's properties with the property callback
/**
 * Additional callback to adapt the property info of link fields.

 * @see entity_metadata_field_entity_property_info().
 */

function link_field_property_info_callback(&$info, $entity_type, $field, $instance,
$field_type) {
  $property = &$info[$entity_type]['bundles'][$instance['bundle']]['properties']
[$field['field_name']];


  $property['property info'] = array(
    'title' => array(
     'type' => 'text',
     'label' => t('The title of the link.'),
     'microdata' => TRUE,
   ),
   'url' => array(
     'type' => 'uri',
     'label' => t('The URL of the link.'),
      'microdata' => TRUE,
    ),
  );
 if ($instance['settings']['title'] == 'none') {
   unset($property['property info']['title']);
 }
}

The user interface pulls the label from the property information and uses the type to determine which kind of form fields to display. If the property is an item instead of a string, an itemtype field also displays. Figure 7 shows an example for two properties of a trailer: the link title and link URL.

Figure 7. Link microdata mapping form
Screen capture of link microdata mapping form

You can now specify which vocabulary terms to use for the field's properties on the field configuration form. However, the attributes aren't inserted into the HTML until you add a little more code.

Adding microdata to the themed output

To place the microdata, you need to change the HTML output for the field. For example, to add a link to a software application, you might want the link text (the name of the software) to use the name property and the link itself to use the url property. Listing 4 shows how to do this by adding the itemprop of the URL to the <a> tag and inserting a span with the itemprop of the text around the text content.

Listing 4. A link before and after adding microdata
<a href="http://drupal.org">Drupal</a>

<a itemprop="url" href="http://drupal.org"><span itemprop="name">Drupal</span></a>

Things are easier if you could get the Link module to insert these attributes. To transform the content from the database for the field into HTML, each field formatter module has its own view function. Within the view function, some formatters use theme functions to generate the HTML. An example is theme_link_formatter_link_default(). Often, the microdata attributes need to be passed from the field_formatter_view function into the theme function.

In the Link module, the formatter already passes an array of attributes to be placed on the <a> tag using the item variable. You can add the URL itemprop to that array to have it automatically output where you need it, as in Listing 5.

Listing 5. Adding microdata in hook_field_formatter_view
/**
 * Implements hook_field_formatter_view().
 */
function link_field_formatter_view($entity_type, $entity, $field, $instance, 
         $langcode, $items, $display) {
  $elements = array();
  $microdata = array();

  // If the microdata module is enabled, the microdata mapping will have been
  // passed in via the entity.
  if (module_exists('microdata')) {
    $microdata = $entity->microdata[$field['field_name']];
  }

  foreach ($items as $delta => $item) {
    // Add the url attributes to $item['attributes'] because the theme function
    // will pass it through to l(), properly placing the itemprop for the url.
    if (isset($microdata['url'])) {
      $item['attributes'] += $microdata['url']['#attributes'];
    }
    // Pass the microdata array to the theme function so it can be used to place
    // the link title's attribute.
    $elements[$delta] = array(
      '#markup' => theme('link_formatter_'. $display['type'], array('element' => $item, 
      'field' => $instance, 'microdata' => $microdata)),
    );
  }
  return $elements;
}

There is no automatic way to place the attributes for the text content, however. You have to pass them into the theme function and change the theme function to use them.

After you pass the microdata variables to the theme function, you can add the <span> tag containing the itemprop around the title. The code checks to see whether there is an itemprop for the text and, if there is, you add the microdata, as in Listing 6.

Listing 6. Add microdata in the theme function
/**
 * Theme function for 'default' text field formatter.
 */
function theme_link_formatter_link_default($vars) {
  $url = $vars['element']['url'];
  $microdata = $vars['microdata'];
  // If there is an itemprop set for the title, wrap the title in a span and
  // add the itemprop to that span.
  if (!empty($microdata['title'])) {
    $title = '<span ' . drupal_attributes($microdata['title']['#attributes']) 
                . '>' . $vars['element']['title'] . '</span>';
  }
  else {
    $title = $vars['element']['title'];
  }

  // Create the array of options to pass to l().
  $link_options = $vars['element'];
  unset($link_options['element']['title']);
  unset($link_options['element']['url']);
  
  // Display a normal link if both title and URL are available.
  if (!empty($title) && !empty($url)) {
    return l($title, $url, $link_options);
  }
  // If only a title, display the title.
  elseif (!empty($title)) {
    return check_plain($title);
  }
  // If only a url, display the full url as a link.
  elseif (!empty($url)) {
    return l($url, $url, $link_options);
  }
}

You can now test the microdata output for the formatter.


Contributing your changes back to the community

One of the things that makes Drupal a powerful technical solution is the large number of contributors that make up its community. Contributors aren't just people who live and breath Drupal; many contributors make the occasional code fix for their own sites, which they then post as a patch for others to use.

If you add microdata to a field formatter for your own project, you can contribute that work back to the Drupal community. Simply post an issue in the issue queue for the module and suggest that the module support microdata. This type of issue is called a feature request. You can then post a patch with your changes on the issue. (There are some great tutorials that demonstrate how to create patches for Drupal projects.) Once you've posted the patch, mark the issue as "needs review."


Conclusion

In this article, you learned to use Drupal to add microdata to your pages so your content can be used in applications like Google's Rich Snippets. With the new microdata module you can configure microdata output for basic field types and add microdata output to custom field types. Now your data is available for others to make applications on top of it.


Download

DescriptionNameSize
Article source codemicrodata-source2.zip820KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source, Web development
ArticleID=767911
ArticleTitle=Combine Drupal, HTML5, and microdata
publish-date=11082011