The Semantic Web, Linked Data and Drupal, Part 1: Expose your data using RDF

Drupal 7 is the first mainstream content management system to support Semantic Web technology in its core. Readily usable, Semantic Web technology lets you push the web from a web of documents to a web of data. In this article, learn to make your web data more interoperable and your data sharing more efficient. An example shows how to use Drupal 7 to publish Linked Data by exposing content with RDF.

Share:

Lin Clark (lin.w.clark@gmail.com), Drupal Developer, Digital Enterprise Research Institute, NUI Galway

Author photoLin Clark is a Drupal developer that specializes in Linked Data. She contributed to the RDF in Drupal 7 core initiative, created SPARQL Views as part of the 2010 Google Summer of Code, and has spoken extensively about the benefits of using Linked Data technologies in everyday applications. She attended Carnegie Mellon University and is currently pursuing a research masters at the Digital Enterprise Research Institute at NUI Galway. More information is available at lin-clark.com.



12 April 2011

Also available in Chinese Russian Japanese

Introduction

In 2001, in the now legendary article "The Semantic Web," Sir Tim Berners-Lee outlined a world where your handheld agent would exchange data with other agents and make decisions to help simplify your life. The Semantic Web effort has evolved quite a bit since then, and has taken a more pragmatic turn in recent years With the Linked Data initiative, research and development have focused on what Berners-Lee calls the most important piece of the puzzle: data interoperability.

You can benefit from Linked Data technologies without creating a lot of custom code. In this article, explore how Drupal 7 provides far-reaching data interoperability capabilities. Learn to expose your web data using Resource Description Framework (RDF). You can download the source code used in this article.

Prerequisites

To follow along with the example in this article, install a Drupal 7 site. Drupal requires PHP and a database server, such as MySQL, to run.


Data interoperability

Much of the information on the web is not currently interoperable. For example, if you want to take data from one site and mash it up with data from another site, you might need to write a custom crawler to scrape only the information you want from the page. This is especially true if you want to use information on sites with smaller budgets, such as those run by individuals, governments, and educational institutions. When developers do have access to structured information on sites, it is often through proprietary APIs that differ from site to site.

The Linked Data initiative uses a narrow slice of semantic web technologies and concepts (such as RDF) to try to solve the interoperability problem, and to make it easier to reuse and combine data on the web.

With the new focus on data interoperability, the pace of development and innovation has quickened as companies see the power of semantic web technologies. For example, Google uses RDF in attributes (RDFa) to enable Rich Snippets, which are presentations of snippets that apply Google's algorithms to highlight structured data embedded in web pages. They provide more helpful search results by displaying certain parts of the page content prominently, as shown in Figure 1. Different accounts suggest that the enhanced search results have a 15-30% increase in click-through rate.

Figure 1. A Google Rich Snippet for a recipe
A Google Rich Snippet displays certain elements of the page more prominently

Facebook began using RDFa in 2010 to power the Like button that developers can place on their site. The inclusion of just a little bit of RDFa on a web page makes it equivalent to a Facebook page. When site visitors click the Like button, a connection is made in Facebook between the visitor and the external page. Such interoperability power comes from adherence to some simple principles and from a handful of semantic web technologies.


RDF, vocabularies, and Linked Data principles

With Linked Data, the most important principle is using distinct names for things on the web instead of using a serial ID or other identifier. An easy way to create distinct names is to use the domain naming system. For example, if you're providing information about jane-doe on your website, it can be very hard to separate information about different people named Jane Doe. By using the identifier http://example.com/people/jane-doe, it's easy to identify the intended Jane Doe. This style of identifier is called an HTTP URI.

Since you're using HTTP URIs to identify things, you can also use the architecture of the web to make more information available about Jane Doe. You can provide users with more information about Jane, such as her name, her online accounts, the location where she is based, or publications she has authored when they go to http://example.com/people/jane-doe.

Exposing data with RDF and vocabularies

If you've worked with databases or with object-oriented languages before, then RDF will look familiar. It is simply an entity-attribute-value model (though in RDF, attributes are often called properties), as shown in Table 1.

Table 1. Entity-attribute-value model
EntityPropertyValue
http://example.com/people/jane-doenameJane Doe

To ensure interoperability, you must also use property names that everyone will understand, so use URIs for the properties as well. People publish bundles of property URIs in what are called vocabularies (or, more formally, ontologies).

Below are examples of vocabularies.

Friend of a Friend (FOAF)
Provides properties for describing people: name, homepage, mbox (email), account, based_near
Dublin Core (DC)
Provides properties for describing published works: abstract, created, dateCopyrighted, publisher
Semantically-Interlinked Online Communities (SIOC)
Provides properties for describing online social networks and their users: follows, has_reply, last_reply_date, moderator_of, subscriber_of

See Resources for more information about vocabularies.

Using URIs for properties, the entity-attribute-value statement from above looks something like this:

<http://example.com/people/jane-doe> <http://xmlns.com/foaf/0.1/name> "Jane Doe"

Angle brackets are placed around full URI values, and quotes are placed around literal values.

You can make things a little easier to read by using compact URIs (CURIEs) and defining what the prefixes of those CURIEs mean. Listing 1 shows an example.

Listing 1. RDF statement using CURIEs
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX person: <http://example.com/people/>

person:jane-doe foaf:name "Jane Doe"

In this way, the web architecture becomes a sort of database structure. But instead of all of the data living in one data store, the data can live on different independent sites and can easily be brought together and integrated as needed.

You can use the SPARQL query language to retrieve data from this RDF dataset.


Using SPARQL to query DBpedia

Many datasets and information hubs, such as Wikipedia, are being opened up with Linked Data. While much of the content in Wikipedia is free text and is hard to expose as Linked Data, a lot of structured information is found in the infoboxes to the right of many Wikipedia pages. The infoboxes might contain such information as the population of a city or the genre and pen names of an author.

DBpedia is a community effort to extract structured information from Wikipedia and to make the information available on the web by exposing it with RDF. The entire RDF file can be downloaded in a dump for use in applications. A demonstration query interface is also available.

An easy way to start exploring the data in DBpedia is to find the DBpedia URI that's associated with a topic. For example, assume you wanted to find the population or other information about Pittsburgh. First find the URL for the Wikipedia page, and then use the SPARQL query in Listing 2 to find the DBpedia URI associated with that page. Because the Snorql interface includes the prefixes in the example, you can use CURIEs without having to declare your prefixes.

Listing 2. Query to find the DBpedia URI for Pittsburgh
SELECT ?uri WHERE {
  ?uri foaf:page <http://en.wikipedia.org/wiki/Pittsburgh>
}

The URI is <http://dbpedia.org/resource/Pittsburgh> (though the Snorql interface simply gives you a link to another query, as shown in Listing 3).

Listing 3. Query to find all information in DBpedia about Pittsburgh
SELECT ?property ?hasValue ?isValueOf
WHERE {
  { <http://dbpedia.org/resource/Pittsburgh> ?property ?hasValue }
  UNION
  { ?isValueOf ?property <http://dbpedia.org/resource/Pittsburgh> }
}

Don't worry if this query is confusing. The important thing to understand is that when you run this query, you will see all of the information DBpedia contains about Pittsburgh, such as average temperatures throughout the year, latitude/longitude coordinates, an image of the city flag, and famous people born in the city.

If you wanted to pull just one fact out of DBpedia—such as the city's metro population—you could use a query like the one shown in Listing 4.

Listing 4. Query to find population of Pittsburgh
SELECT ?population WHERE {
<http://dbpedia.org/resource/Pittsburgh> dbpedia2:populationMetro ?population
}

You can also find more facts by traversing links between different things. To explore further, get a list of people who were born in Pittsburgh. The dbo prefix is added so you can write dbo:birthPlace instead of dbpedia:ontology/birthPlace, as shown in Listing 5.

Listing 5. Query to get a list of people born in Pittsburgh
PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT ?person WHERE {
?person dbo:birthPlace <http://dbpedia.org/resource/Pittsburgh> .
}

Now you have a list of people born in Pittsburgh. To make it more interesting, filter that list based on some criteria. For example, ask for only those people who are categorized as American bloggers. To do this, use a special property called rdf:type, as shown in Listing 6.

Listing 6. Query for people born in Pittsburgh who are also American bloggers
PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT ?person WHERE {
?person dbo:birthPlace <http://dbpedia.org/resource/Pittsburgh> ;
        rdf:type <http://dbpedia.org/class/yago/AmericanBloggers> .
}

You can get more information about the list of bloggers born in Pittsburgh. Listing 7 shows how to get the article abstract, which can serve as a short biography.

Listing 7. Query to get short abstracts for American bloggers born in Pittsburgh
PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT ?person ?bio WHERE {
?person dbo:birthPlace <http://dbpedia.org/resource/Pittsburgh> ;
        rdf:type <http://dbpedia.org/class/yago/AmericanBloggers> ;
        dbo:abstract ?bio .
}

You might have noticed that this query gives you a little more than you bargained for. It contains the abstract in English and in every other language for which a translation of the article exists. Get just the English version by using a FILTER and checking the language of the value, as shown in Listing 8.

Listing 8. Query for English abstracts
PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT ?person ?bio WHERE {
?person dbo:birthPlace <http://dbpedia.org/resource/Pittsburgh> ;
        rdf:type <http://dbpedia.org/class/yago/AmericanBloggers> ;
        dbo:abstract ?bio .
FILTER langMatches( lang(?bio), "en" )
}

The above query returns only the English version of the abstract.

SPARQL has made it easy to target the information you want from an RDF dataset, which in turn makes it easy to access data and reuse it across the web.


RDF support in Drupal 7

Drupal provides far-reaching data interoperability capabilities to small data publishers. It is an open source content management system that makes it easy for developers and end users to create robust data entry forms. The forms are perfect for capturing structured data and flexibly formatting that data in different ways.

With Drupal 7, even small sites are capable of exposing their data with RDF. Basic page and article content is exposed by default in any new Drupal 7 site. The key to this offering is the RDF Mapping API. With this API, any form field can be mapped to an RDF property and any content type can be mapped to an RDF type. For example, if the site has listings of recording artists, the content can have the type mo:MusicArtist. The fields can be mapped to mo:fanpage, mo:discography, mo:biography, and so on.

Any content types that have the mapping defined will automatically expose content using RDFa, which is RDF in HTML attributes. For example, the Jane Doe scenario used previously would look like Listing 9 in RDFa.

Listing 9. Statement from Listing 1 expressed in RDFa
<div about="http://example.com/people/jane-doe">
  <h1 property="foaf:name">Jane Doe</h1>
</div>

Drupal takes care of the formatting of the markup, making it much easier to publish valid RDFa. Site owners can now share their data with others. It's also easy to take advantage of opportunities like Google's Rich Snippets.


Setting up custom RDF mappings in Drupal

In this section, you'll start publishing Linked Data. The example creates a Recipe content type for publishing cooking recipes that can be displayed as Google Rich Snippets.

Getting started

You can download the source code used in this article.

To follow along with the example, install a Drupal 7 site. Drupal requires PHP and a database server, such as MySQL, to run. If you already have PHP and a database server installed on your development environment, move ahead with the instructions in the installation guide. If not, Acquia provides the easy-to-use Acquia Stack Installer, which installs the Apache Web Server, PHP, MySQL server, and your Drupal 7 site for you. Resources has more information.

Once you have the site ready, try adding an Article by using the Add content button in the second toolbar. Your new content should show up in the list on the front page. Look at the source code, and notice that RDFa has been included in the HTML. For example, the article author is marked up with sioc:has_creator because both Page and Article content types have default RDF mappings.

In the rest of this section, create a new content type with its own custom mapping that you'll define with code. (Content types and their mappings can also be defined through a user interface, but that is outside the scope of this article.)

Creating the module

The first requirement for a Drupal module is the .info file, as shown in Listing 10. It provides information to the system about the files in the module and about other modules that yours depends on.

Listing 10. rdf_example.info file
name = RDF Example
description = Demonstrates an RDF mapping using the RDF Mapping API.
core = 7.x
dependencies[] = richsnippets 
dependencies[] = field_collection

The dependencies array tells Drupal which other modules need to be installed before this one can be. Note that field_collection itself depends on Entity API. See Resources for links to download the modules.

The example adds a dependency on the Rich Snippets module because you need to make some minor adjustments to the RDFa for Google. The dependency is added on field_collection because you'll use it to create groups of fields, such as the nutrition information or ingredients.

Creating the content type

The next step is to create the content type and attach the relevant fields in an .install file. The .install file consists of three functions, as described below.

FunctionPurpose
rdf_example_installImplements hook_install and is called when this module is installed.
_rdf_example_installed_fieldsA private function that is called by rdf_example_install; it provides field definitions for the fields we need.
_rdf_example_installed_instancesAnother private function called by rdf_example_install; it defines which fields to attach to which content types.

In rdf_example_install, first create the Recipe node type, as shown in Listing 11.

Listing 11. Defining a node type in rdf_example_install
  // Define the node type.
  $rdf_example = array(
    'type' => 'recipe',
    'name' => $t('Recipe'),
    'base' => 'node_content',
    'description' => $t('The recipe node is defined to demonstrate RDF mapping.'),
  );

  // Set additional defaults and save the content type.
  $content_type = node_type_set_defaults($rdf_example);
  node_type_save($content_type);

Create the fields that can be attached to the Recipe node type, as shown in Listing 12. These fields can also be reused by other modules and by the user through the Fields UI. The field definition is kept in another function (_rdf_example_installed_fields) to keep the code clean.

Listing 12. Creating fields that can be added to the node type
  foreach (_rdf_example_installed_fields() as $field) {
    field_create_field($field);
  }

The previous code calls _rdf_example_installed_fields to get a list of fields that need to be installed. The field definitions contain the field name and type, as well as the cardinality (which is the number of values this field can have). Additional settings are configured on some fields. For the field definition, the example simply creates all the fields that will be used. You don't have to worry about which content type they will be attached to.

Note in Listing 13 that recipe_nutrition is a field_collection field, which is a special type. It links to a field_collection entity that can have its own RDF type and contain its own fields. The entity type is automatically created for recipe_nutrition. Attach fields to it in the next step, as shown in Listing 14.

Listing 13. Defining the fields in _rdf_example_installed_fields()
array(
    'recipe_photo' => array(
      'field_name' => 'recipe_photo',
      'cardinality' => 1,
      'type'        => 'image',
    ),
    'recipe_summary' => array(
      'field_name'  => 'recipe_summary',
      'cardinality' => 1,
      'type'        => 'text',
      'settings'    => array(
        'max_length' => 500,
      ),
    ),
    'recipe_nutrition' => array(
      'field_name'  => 'recipe_nutrition',
      'cardinality' => 1,
      'type'        => 'field_collection',
    ),
    'recipe_serving_size' => array(
      'field_name'  => 'recipe_serving_size',
      'cardinality' => 1,
      'type'        => 'text',
    ),
    'recipe_calories' => array(
      'field_name'  => 'recipe_calories',
      'cardinality' => 1,
      'type'        => 'number_integer',
    ),
  );

Now that the fields are created in the system, attach instances of the fields to the Recipe content type and the Recipe Nutrition field collection using field_create_instance. (In Drupal, these types are called bundles.) Once again, the instance definitions are kept in a separate function (_rdf_example_installed_instances) to keep the code clean. Listing 14 shows an example.

Listing 14. Attaching instances of fields to the content types in rdf_example_install
  foreach (_rdf_example_installed_instances() as $bundle_name => $bundle) {
    foreach ($bundle as $instance) {
      $instance['entity_type'] = $bundle_name == 'recipe' ? 'node' : 
'field_collection_item';
      $instance['bundle'] = $bundle_name;
      field_create_instance($instance);
    }
  }

The above code calls _rdf_example_installed_instances to get the field instance definition. In _rdf_example_installed_instances, you create an array keyed by the type that the instance is being attached to. In this example, that is either the Recipe content type or the Recipe Nutrition field collection, as shown in Listing 15.

Listing 15. Defining the instances in _rdf_example_installed_instances()
  $instances = array();
  $instances['recipe'] = array(
    'recipe_photo' => array(
      'field_name'  => 'recipe_photo',
      'label'       => $t('Photo of the prepared dish'),
    ),
    'recipe_summary' => array(
      'field_name' => 'recipe_summary',
      'label'       => $t('Short summary describing the dish'),
      'widget'      => array(
        'type'    => 'text_textarea',
      ),
    ),
    'recipe_nutrition' => array(
      'field_name' => 'recipe_nutrition',
      'label'      => $t('Recipe Nutrition Information'),
    ),
  );
  $instances['recipe_nutrition'] = array(
    'recipe_serving_size' => array(
      'field_name' => 'recipe_serving_size',
      'label'       => $t('Serving size'),
    ),
    'recipe_calories' => array(
      'field_name' => 'recipe_calories',
      'label'       => $t('Calories'),
    )
  );

Mapping the content to RDF

In the .module file, create the RDF mapping for the fields and content types you defined. Since the content type is defined in this module, use hook_rdf_mapping to create the RDF mapping for the content type. Altering the mapping for a content type created by another module is not demonstrated here. To alter existing mappings, in the install function you would use the rdf_mapping_load and rdf_mapping_savefunctions provided by the RDF Mapping API.

Just as you did when creating the instances, structure this array by the bundle type. Mappings are defined for the Recipe content type in Listing 16, and for the Recipe Nutrition field collection in Listing 17.

Listing 16. RDF mapping declaration for the recipe from rdf_example.module
    'recipe' => array(
      'type' => 'node',
      'bundle' => 'recipe',
      'mapping' => array(
        'rdftype' => array('v:Recipe'),
        // We don't use the default bundle mapping for title. Instead, we add
        // the v:name property. We still want to use dc:title as well, though,
        // so we include it in the array.
        'title' => array(
          'predicates' => array('dc:title', 'v:name'),
        ),
        'recipe_summary' => array(
          'predicates' => array('v:summary'),
        ),
        // The photo URI isn't a string but instead points to a resource, so we
        // indicate that the attribute type is rel. If type isn't specified, it
        // defaults to property, which is used for string values.
        'recipe_photo' => array(
          'predicates' => array('v:photo'),
          'type' => 'rel',
        ),
        'recipe_nutrition' => array(
          'predicates' => array('v:nutrition'),
          'type' => 'rel',
        ),
      ),
    ),
Listing 17. RDF mapping declaration for the recipe nutrition from rdf_example.module
    'nutrition' => array(
      'type' => 'field_collection_item',
      'bundle' => 'recipe_nutrition',
      'mapping' => array(
        'rdftype' => array('v:Nutrition'),
        'recipe_serving_size' => array(
          'predicates' => array('v:servingSize'),
        ),
        'recipe_calories' => array(
          'predicates' => array('v:calories'),
        ),
      ),
    ),

Finally, implement hook_rdf_namespaces so the prefix definition is included at the top of the HTML document, as shown in Listing 18.

Listing 18. RDF namespace declaration from rdf_example.module
function rdf_example_rdf_namespaces() {
  return array(
    // Google's namespace for their custom vocabularies.
    'v' => 'http://rdf.data-vocabulary.org/#', 
  );
}

Testing the Rich Snippets

The module is finished, so try installing it. The Recipe content type should display on the Add content page.

The title, image, and summary field will be available on the Recipe node form. To see a Rich Snippet in the testing tool, you will need to add an image. Once you have submitted a node, you'll have the option to add nutrition information. Be sure to add at least a calorie count.

Now that you have your recipe, test it with the Rich Snippets Testing Tool. You should see a preview showing your image and the calorie count.


Conclusion

Linked Data technologies help make the web's data more interoperable, reusable, and easier to work with. Companies like Google and Facebook have started building upon these technologies because of the simplicity they lend to data sharing. With Drupal, small data publishers and consumers can also benefit from these technologies without creating a lot of custom code. In this article, you learned how to expose your site's data using RDF.

Stay tuned for Part 2 of this series, which will teach how to consume data provided by others, and how to combine it with your own data in your Drupal website.


Download

DescriptionNameSize
Article source coderdf_example.zip3KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Web development on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=646534
ArticleTitle=The Semantic Web, Linked Data and Drupal, Part 1: Expose your data using RDF
publish-date=04122011