The Semantic Web, Linked Data and Drupal, Part 2: Combine linked datasets with Drupal 7 and SPARQL Views

Part 1 of this series explored some of the new concepts recently incorporated into the Drupal content management system, such as how sharing and reusing data is now easier because the technologies discourage proprietary APIs in favor of a machine-readable format -- Resource Description Framework (RDF). In this article, learn how to take advantage of the existing Linked Data available today on the web of data, and how to enrich a Drupal 7 site with data coming from different endpoints. A real-world use case with data from independent publishers provides examples.

Share:

Stéphane Corlosquet, Software Englineer, MassGeneral Institute for Neurodegenerative Disease (MIND), MGH

Photo of Stéphane CorlosquetStéphane Corlosquet holds a master's degree specializing in the Semantic Web from the Digital Enterprise Research Institute (DERI) in Ireland. He is a software engineer at the Mass General Institute for Neurodegenerative Disease (MIND), MGH, and is working on the Science Collaboration Framework, a Drupal-based distribution to build online communities of researchers in biomedicine. Stéphane has contributed to Drupal 6 and is one of the top 30 contributors to Drupal 7 core. He maintains the RDF module in Drupal 7 and is a member of the Drupal security team. More information is available at http://openspring.net/.



Lin Clark, Drupal Developer, Digital Enterprise Research Institute, NUI Galway

Author photoLin Clark is a Drupal developer that specializes in Linked Data. She contributed to the RDF in Drupal 7 core initiative, created SPARQL Views as part of the 2010 Google Summer of Code, and has spoken extensively about the benefits of using Linked Data technologies in everyday applications. She attended Carnegie Mellon University and is currently pursuing a research masters at the Digital Enterprise Research Institute at NUI Galway. More information is available at lin-clark.com.



03 May 2011

Also available in Chinese Japanese

Introduction

Part 1 of this series, "The Semantic Web, Linked Data and Drupal, Part 1: Expose your data using RDF," covered some of the new features incorporated in Drupal 7. The article outlined how to make your web data more interoperable and your data sharing more efficient. An example showed how to use Drupal 7 to publish Linked Data by exposing content with RDF.

In this article, learn how to take advantage of existing Linked Data on the web. Explore how to enrich Drupal 7 with data coming from different endpoints, and walk through a real-world use case with data from two independent publishers.


Scenario: British research funding

The scenario in this article is from a real-world use case with data from two independent publishers: DBpedia and the British government.

The DBpedia project, presented in Part 1, takes the infobox information from Wikipedia and makes that content machine readable. One of the first, and largest, efforts in the Linking Open Data initiative, it contains information about more than 3.5 million things and includes 1,850,000 image links. All data is provided under a Creative Commons attribution and share a like license, meaning it's safe to reuse it on your Drupal site.

The British government is a more recent addition to the Linking Open Data movement. On the advice of the web's inventor, Sir Tim Berners Lee, the government launched an initiative to get government agencies and local authorities to release their public data online. The government is serious about getting more and more data published. At the time of this writing, 5,400 datasets are available. Some of the datasets are available as RDF through a SPARQL endpoint, such as the research funding dataset. This dataset includes more than 43,000 facts about projects that have been funded by various British funding agencies, including:

Resources has links to more information about:

  • Linking Open Data initiative
  • A SPARQL endpoint for querying data
  • 5,400 datasets from the British government
  • The research funding dataset
  • The facts for a typical project
  • Linking different datasets together using RDF
  • Name of the project
  • Description of the project
  • What agency funded the project
  • Value of the grant
  • Start and target dates of the project
  • Grant reference number
  • Status of the project (still running or closed)

The Linking Open Data cloud is a community project for publishing open datasets on the web and linking the data from different datasets together using RDF. Both the DBpedia and data.gov.uk research funding datasets are important nodes in the Linking Open Data cloud. Having links between datasets means it's possible to browse the web of data as if it were a database, pulling content from different places and merging it together.

The data.gov.uk research funding dataset includes links to resources in DBpedia, which lets you combine data from both datasets. Each project from the research funding dataset links to the DBpedia URI of its funding agency. An example of such a funding agency is the Engineering and Physical Sciences Research Council  (EPSRC), whose DBpedia URI is http://dbpedia.org/resource/Engineering_and_Physical_Sciences_Research_Council. Browsing to this HTTP URI will show you the information that DBpedia has about this agency, including:

  • Agency name
  • Agency abbreviation
  • A description of what the agency does
  • Location
  • Homepage

In Figure 1, a portion of the Linked Open Data cloud shows the link between the data.gov.uk research funding dataset and DBpedia.

Figure 1. Linking Open Data cloud, link between data.gov.uk research funding dataset and DBpedia
Many circles and arrows, but with the data.gov.uk research funding dataset and DBpedia cirled and linked with a red arrow

At this point, the data is on two separate pages on the Internet — a page for an agency and a page for a typical project — with no page aggregating the wealth of information you can find in both these pages. What if you could build a single page describing the agency, with data from DBpedia and a listing of all the projects this agency funded from the research funding dataset? In the rest of this article, you'll build such a page with Drupal 7 and SPARQL Views.


Set up Drupal 7 and required packages

Part 1 explained how to install a LAMP stack and get started with Drupal 7. For the use case in this article, you should download all packages from drupal.org. These are the current versions at the time of writing. (see Resources for individual links).

  • RDF Extensions 7.x-2.0-alpha1
  • SPARQL 7.x-2.0-alpha1
  • Views 7.x-3.0-beta3
  • SPARQL Views 7.x-2.0-alpha2
  • Entity API 7.x-1.0-beta8
  • CTools 7.x-1.0-alpha4

Feel free to use the latest version of the modules when one is available. You can use Drush's package manager (equivalent to apt-get or yum for Drupal) to download, in one step, all the packages needed for Drupal 7, as shown in Listing 1.

Listing 1. Drush's package manager to download all packages needed for Drupal 7
drush pm-download rdfx sparql views sparql_views entity ctools

After all these packages are downloaded into the /sites/all/modules directory, enable:

  • SPARQL Views
  • RDF UI
  • Views UI modules

Drupal will also install the dependencies of these modules. Or use: drush pm-enable sparql_views rdfui views_ui.

The main RDF package requires the ARC2 library. See the README.txt in sites/all/modules/rdfx for more instructions. Drush can save you some time, and download ARC2 at the right place for you, by using: drush rdf-download.


Introducing SPARQL Views

Before delving into SPARQL Views, it's important to understand that it leverages the power of Views. Views, one of the most popular Drupal modules, is used to build listings of content in various formats. Views lets site administrators decide what data to show and how to display it using a flexible user interface.

Even those unfamiliar with query languages can create a query, use filters and arguments to customize their query, and use a wide variety of plug-ins that have templates for presenting the data, such as Google charts and jQuery slideshows. Because all parts of that process are pluggable, there are many different modules that extend Views and make it an extremely powerful tool for selecting and presenting data.

A view typically displays some kind of related data, such as members of a group, a list of blog posts, image galleries, and so on. While the previous versions of Views could only access data in the local Drupal database, Views 3 can connect to any data source and pull in data from anywhere: Flickr, Amazon S3, Solr server, and so forth. SPARQL Views adds the ability to query SPARQL endpoints with Views and to display the results as if they were coming from the local database.

Tell Drupal about the datasets

Drupal needs to know where the data will be coming from. For this, you first need to set up a few namespaces and register the two SPARQL endpoints that will be used for this example.

  1. To set up the namespaces, click Configuration in the administrative toolbar, go to the RDF publishing settings page in the Web Services block, and select the RDF namespaces tab. The Research funding dataset uses a specific RDF schema with the namespace: http://research.data.gov.uk/def/project/.

    Associate this namespace to the project prefix, as shown in Figure 2. For DBpedia, use the prefix dbp associated with the namespace http://dbpedia.org/property/.

    Figure 2. Register project namespace used in research funding dataset
    Screen showing prefix and Vocabulary URI fields. Prefix has project in the field and vocabulary URI has http://research.data.gov.uk/def/project/
  2. To register the SPARQL endpoints, click Structure in the administrative toolbar and select SPARQL Endpoints Registry. Figure 3 shows an example. Add each SPARQL endpoint by entering their title and endpoint:
    • DBpedia: http://dbpedia.org/sparql
    • Research funding: http://services.data.gov.uk/research/sparql
    Figure 3. SPARQL endpoints registry with endpoints of two datasets
    SPARQL endpoints registry containing the endpoints of our two datasets

Describe the datasets to Drupal

You need to describe the datasets within Drupal so that SPARQL Views knows what particular data items you're interested in pulling from the datasets. As explained in Part 1, the web of data is composed of resources that have attributes (predicates) and values (objects). When querying RDF, SPARQL Views will need to know what attributes are actually available for querying. These attributes are called fields in Views. You need to define what these fields are prior to building any SPARQL views. The example use case has two types of resources: agency and project. For each type of resource, define the fields you're interested in and what RDF predicate can be used to grab their values from the respective datasets.

Click Structure in the administrative toolbar and select SPARQL Views resource types. Click Add sparql views resource type and enter the name of the endpoint for each of your types: agency will be associated with DBpedia and project will be associated with the Research funding endpoint, as shown in Figure 4.

Figure 4. Form for creating a SPARQL View resource type
Form for creating a SPARQL View resource type, with Research funding endpoint checked.

Now that both resource types are in the system, you can add the appropriate fields that you want to query for each of them. Creating a field in Drupal 7 is easy, and it works the same way as creating fields for node content types.

Click Manage fields for the agency resource type, and enter the name of the field (for example, Description). Choose a machine name, usually the same but in lowercase, description; the machine name can only contain lowercase characters, digits, and underscores. It is what is used internally to refer to the field. For the field type, choose Text, as shown in Figure 5, and Save the form.

Figure 5. Add fields corresponding to attributes you want to extract from dataset
Form to add fields corresponding to the attributes you want to extract from the dataset

You should then be at the field configuration form. When saving the form, leave all the settings as they are except for the last fieldset. This is where you specify what RDF predicate to use to extract this data item from the dataset. The example uses rdfs:comment for the Description field of an agency on DBpedia, as shown in Figure 6.

Figure 6. Specify the RDF predicate used in the dataset to query for a given value
Specify the RDF predicate used in the dataset to query for a given value (here the description of an agency)

Table 1 shows the fields and associated RDF mappings to be used for the agency resource type.

Table 1. Fields and associated RDF mappings for agency resource type
FieldAssociated RDF mapping
Namerdfs:label
Abbreviationdbp:abbreviation
Descriptionrdfs:comment
Homepagefoaf:homepage
Locationdbp:location

Table 2 shows the fields and RDF mappings corresponding to the project resource type.

Table 2. Fields and RDF mappings corresponding to the project resource type
FieldAssociated RDF mapping
Namerdfs:label
Reference Numberproject:grantRefNumber
Valueproject:grant
Abstractdc:abstract
Funding agencyproject:funder

Create all the fields shown in the tables for each resource type. When creating the field Name for the second time, make sure to use the Add existing field row of the field creation form. The names you choose for the Drupal fields are arbitrary and do not have to match the RDF mapping used in the dataset.

Prepare a page to display the SPARQL View results

Eventually, you'll need a container to display the results of both datasets. The example uses a Drupal node as a container for that purpose. The first task is to create a content type Agency page. In Structure > Content type, click Add content type:

  • Name: Agency page
  • Description: A page containing information about an agency from various datasets.
  • Display settings: Uncheck Display author and date information because it's irrelevant for the example.
  • Comment settings: Select Closed because you don't need the comment function on that page.

After you've saved this form, click Manage fields and delete the Body field; it won't be used. Add a field called Agency URI, which will hold the URI of the agency about which you want to display information. You don't need to specify any RDF mapping for this field.

Now you can create an actual page for the agency about which you want to display information. Click Add content in the grey shortcut bar and select Agency page. Fill in a title for this page, such as "Information about the Engineering and Physical Sciences Research Council." In the Agency URI text field, paste the DBpedia URI of this agency, which you can find on the project page:

http://research.data.gov.uk/doc/project/epsrc/EP/C545222/1

by right-clicking on the name of the funder:

http://dbpedia.org/resource/Engineering_and_Physical_Sciences_Research_Council

This particular URI was chosen because it is the common denominator between the two datasets. After submitting the form, you'll notice that the URI you just pasted is displayed below the title. This URI will play an important role later when you assemble the views onto that page, but it does not necessarily need to be displayed. You can hide it in the Manage Display tab when editing the Agency page content type by selecting <hidden> in the format drop-down.


Build the DBpedia SPARQL view

Now that all the elements have been prepared, it's time to:

  • Build the actual views that will query the datasets using the RDF mappings you specified.
  • Display the results in the appropriate page.

Create a new SPARQL view by going to Structure > Views and click Add new view. Start by creating the view for the information about the agency from DBpedia. Give it a name, such as "Agency information," and select SPARQL Views: DBpedia type in the Show drop-down list. You won't create a page, but instead will create a block with the title "Funding Agency Details (source: DBpedia)." Clicking Continue and edit will bring you to the main user interface of Views. Most of this user interface is not specific to SPARQL Views. If you're familiar with Views, you'll recognize a lot of the concepts, such as fields, relationships, sort criteria, filters, and so on. Views is very flexible, with a lot of options and settings available.

Fields are the data items that will be made available during the display of the results. They can be displayed, hidden, or combined with one another depending on the use case.

  1. Click the add button at the top right corner of the Fields box to list the available fields. Check the box for agency: field_name and click Add and configure fields to add it to the fields list. Leave the configure field form as it is and click Apply.

    Repeat this for all the other fields: description, homepage, location, and abbreviation. Experiment with the settings of each field to specify a label, wrap the field output in <strong> or <em>. The homepage URI can be turned into a link by choosing Output this field as a link and setting the link path to [agency_field_homepage].

  2. Contextual filters let you specify an input parameter for the view to use when executing a query. The example uses an argument to tell the "Agency information" view that it should only query data about the Agency at hand. This is where the URI you entered on the node will come in handy.

    To add a contextual filter, open the "Advanced" fieldset on the right and click Add, check the agency: URI checkbox, and click Add and configure contextual filters. In "When the filter value is NOT in the URL" choose Provide default argument. In "Type" choose Field Value (Node). In Source Field select the Agency URI field. These settings will instruct the view to use the value of the Agency URI as the subject of the WHERE pattern of the SPARQL query. Click Apply.

  3. Don't forget to click Save in the top right corner to save all the settings you made.

You now have your first basic SPARQL view, which can be placed inside the node you created for the Engineering and Physical Sciences Research Council. This view will be materialized using a block to be displayed in that node.

  1. Go to Structure > Blocks to manage the blocks on the site, and identify the block named agency_information: Block. Move this block in the content region, right above the Main page content page block, as shown in Figure 7. Click Save.
    Figure 7. Manage the blocks on the site
    3-column form with dropdown fields and a configure option.
  2. To manage the visibility settings of this block, click Configure for this block. In the Content types vertical tab, choose Agency page so that this SPARQL View block only appears on the intended pages. See Figure 8.
    Figure 8. Manage the visibility settings
    Visibility settings form, with agency page checked under show block for specific content types
  3. Go to the node you created for the agency, and you should now see the SPARQL View block you just created containing details about the agency from DBpedia, as shown in Figure 9.
    Figure 9. SPARQL View block with details about the agency
    Two tabs on a page with SPARQL View block with details about the agency

Build the research funding SPARQL view

Building the second SPARQL view is very similar to the way you built the first view. Following the same instructions, name the second view Funded projects and choose the type SPARQL Views: Research funding. The title of the block will be "Projects funded (source: data.gov.uk)." Add the following fields:

  • Name
  • Reference Number
  • Value
  • Abstract

Leave out Funding Agency because you already know what it is when you look at its page.

Add the contextual filter field_funding_agency and configure it the same way you did in the first view. Add the block to the content region and configure it like the block of the first view. Take a peek at the agency node. You should see that a lot more content has been added. However, you'll notice that the layout does not look right: the first view was displaying the value of only one resource (the agency). The second view displays many resources at once, so you need a different template for it. A table makes more sense here; each result will be a row, and the values will be the column of the table. This is not a problem for Views.

Head back to the edit form of the Funded Projects view. In the format pane on the left, click Unformatted list and select Table instead. A list of all the columns will appear where you can enhance the layout of the table, such as right aligning the Value column for better readability. You can also make it sortable.

You could make other improvements to the view, such as choosing Trim this field to a maximum length for the abstract (for example, 150 words). You can output the Name field as a link by adding the URI field, excluding it from the display, and using it as a replacement pattern for the name. Save the view and browse to the agency page, which now includes a table of all projects that have been funded by the agency, as shown in Figure 10.

Figure 10. Agency page including a table of all projects that have been funded
Agency page including a table of all projects that have been funded

Conclusion

In this article, you learned how to combine data about a single resource from two different SPARQL endpoints using Drupal 7 and SPARQL Views. The article explored only a few of the numerous ways of combining Views with SPARQL, the RDF querying language. We hope you will continue to explore the new possibilities that Drupal 7, Views, and SPARQL Views have to offer, and to find news ways to combine Linked Data.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Web development on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=656225
ArticleTitle=The Semantic Web, Linked Data and Drupal, Part 2: Combine linked datasets with Drupal 7 and SPARQL Views
publish-date=05032011