Part 1 of this series, "The Semantic Web, Linked Data and Drupal, Part 1: Expose your data using RDF," covered some of the new features incorporated in Drupal 7. The article outlined how to make your web data more interoperable and your data sharing more efficient. An example showed how to use Drupal 7 to publish Linked Data by exposing content with RDF.
In this article, learn how to take advantage of existing Linked Data on the web. Explore how to enrich Drupal 7 with data coming from different endpoints, and walk through a real-world use case with data from two independent publishers.
Scenario: British research funding
The scenario in this article is from a real-world use case with data from two independent publishers: DBpedia and the British government.
The DBpedia project, presented in Part 1, takes the infobox information from Wikipedia and makes that content machine readable. One of the first, and largest, efforts in the Linking Open Data initiative, it contains information about more than 3.5 million things and includes 1,850,000 image links. All data is provided under a Creative Commons attribution and share a like license, meaning it's safe to reuse it on your Drupal site.
The British government is a more recent addition to the Linking Open Data movement. On the advice of the web's inventor, Sir Tim Berners Lee, the government launched an initiative to get government agencies and local authorities to release their public data online. The government is serious about getting more and more data published. At the time of this writing, 5,400 datasets are available. Some of the datasets are available as RDF through a SPARQL endpoint, such as the research funding dataset. This dataset includes more than 43,000 facts about projects that have been funded by various British funding agencies, including:
- Name of the project
- Description of the project
- What agency funded the project
- Value of the grant
- Start and target dates of the project
- Grant reference number
- Status of the project (still running or closed)
The Linking Open Data cloud is a community project for publishing open datasets on the web and linking the data from different datasets together using RDF. Both the DBpedia and data.gov.uk research funding datasets are important nodes in the Linking Open Data cloud. Having links between datasets means it's possible to browse the web of data as if it were a database, pulling content from different places and merging it together.
The data.gov.uk research funding dataset includes links to resources in DBpedia, which lets you combine data from both datasets. Each project from the research funding dataset links to the DBpedia URI of its funding agency. An example of such a funding agency is the Engineering and Physical Sciences Research Council (EPSRC), whose DBpedia URI is http://dbpedia.org/resource/Engineering_and_Physical_Sciences_Research_Council. Browsing to this HTTP URI will show you the information that DBpedia has about this agency, including:
- Agency name
- Agency abbreviation
- A description of what the agency does
In Figure 1, a portion of the Linked Open Data cloud shows the link between the data.gov.uk research funding dataset and DBpedia.
Figure 1. Linking Open Data cloud, link between data.gov.uk research funding dataset and DBpedia
At this point, the data is on two separate pages on the Internet — a page for an agency and a page for a typical project — with no page aggregating the wealth of information you can find in both these pages. What if you could build a single page describing the agency, with data from DBpedia and a listing of all the projects this agency funded from the research funding dataset? In the rest of this article, you'll build such a page with Drupal 7 and SPARQL Views.
Set up Drupal 7 and required packages
Part 1 explained how to install a LAMP stack and get started with Drupal 7. For the use case in this article, you should download all packages from drupal.org. These are the current versions at the time of writing. (see Resources for individual links).
- RDF Extensions 7.x-2.0-alpha1
- SPARQL 7.x-2.0-alpha1
- Views 7.x-3.0-beta3
- SPARQL Views 7.x-2.0-alpha2
- Entity API 7.x-1.0-beta8
- CTools 7.x-1.0-alpha4
Feel free to use the latest version of the modules when one is available. You can use Drush's package manager (equivalent to apt-get or yum for Drupal) to download, in one step, all the packages needed for Drupal 7, as shown in Listing 1.
Listing 1. Drush's package manager to download all packages needed for Drupal 7
drush pm-download rdfx sparql views sparql_views entity ctools
After all these packages are downloaded into the /sites/all/modules directory, enable:
- SPARQL Views
- RDF UI
- Views UI modules
Drupal will also install the dependencies of these
modules. Or use:
drush pm-enable sparql_views rdfui views_ui.
The main RDF package requires the ARC2 library. See the README.txt in sites/all/modules/rdfx for more instructions. Drush can save you some time,
and download ARC2 at the right place for you, by using:
Introducing SPARQL Views
Before delving into SPARQL Views, it's important to understand that it leverages the power of Views. Views, one of the most popular Drupal modules, is used to build listings of content in various formats. Views lets site administrators decide what data to show and how to display it using a flexible user interface.
Even those unfamiliar with query languages can create a query, use filters and arguments to customize their query, and use a wide variety of plug-ins that have templates for presenting the data, such as Google charts and jQuery slideshows. Because all parts of that process are pluggable, there are many different modules that extend Views and make it an extremely powerful tool for selecting and presenting data.
A view typically displays some kind of related data, such as members of a group, a list of blog posts, image galleries, and so on. While the previous versions of Views could only access data in the local Drupal database, Views 3 can connect to any data source and pull in data from anywhere: Flickr, Amazon S3, Solr server, and so forth. SPARQL Views adds the ability to query SPARQL endpoints with Views and to display the results as if they were coming from the local database.
Tell Drupal about the datasets
Drupal needs to know where the data will be coming from. For this, you first need to set up a few namespaces and register the two SPARQL endpoints that will be used for this example.
- To set up the namespaces, click
Configuration in the administrative toolbar, go to the
RDF publishing settings page in the Web Services block, and
select the RDF namespaces tab.
The Research funding dataset uses a
specific RDF schema with the namespace:
Associate this namespace to the
projectprefix, as shown in Figure 2. For DBpedia, use the prefix
dbpassociated with the namespace
Figure 2. Register
projectnamespace used in research funding dataset
- To register the SPARQL endpoints, click Structure
in the administrative toolbar and select SPARQL Endpoints Registry.
Figure 3 shows an example. Add each SPARQL endpoint by
entering their title and endpoint:
- Research funding:
Figure 3. SPARQL endpoints registry with endpoints of two datasets
Describe the datasets to Drupal
You need to describe the datasets within Drupal so that SPARQL
Views knows what particular data items you're interested in pulling from
the datasets. As explained in Part
1, the web of data is composed of
resources that have attributes (predicates) and values (objects). When
querying RDF, SPARQL Views will need to know what attributes are actually
available for querying. These attributes are called fields in
need to define what these fields are prior to building any SPARQL views.
The example use case has two types of resources:
For each type of resource, define the fields you're
interested in and what RDF predicate can be used to grab their values from
the respective datasets.
Click Structure in the administrative toolbar and
select SPARQL Views resource types. Click Add sparql views resource
type and enter the name of the endpoint for each of your types:
agency will be associated with DBpedia and
project will be associated with the Research
funding endpoint, as shown in Figure 4.
Figure 4. Form for creating a SPARQL View resource type
Now that both resource types are in the system, you can add the appropriate fields that you want to query for each of them. Creating a field in Drupal 7 is easy, and it works the same way as creating fields for node content types.
Click Manage fields for the
agency resource type, and enter the name of the
field (for example,
Description). Choose a
machine name, usually the same but in lowercase,
description; the machine name can only contain lowercase characters, digits, and underscores. It is what is used internally to
refer to the field. For the field type, choose Text, as
shown in Figure 5, and Save the form.
Figure 5. Add fields corresponding to attributes you want to extract from dataset
You should then be at the field configuration form. When saving the form,
leave all the settings as they are except for the last fieldset. This is
where you specify what RDF predicate to use to extract this
data item from the dataset. The example uses
rdfs:comment for the Description field of an
agency on DBpedia, as shown in Figure 6.
Figure 6. Specify the RDF predicate used in the dataset to query for a given value
Table 1 shows the fields and associated RDF
mappings to be used for the
Table 1. Fields and associated RDF mappings for agency resource type
|Field||Associated RDF mapping|
Table 2 shows the fields and RDF mappings corresponding
project resource type.
Table 2. Fields and RDF mappings
corresponding to the
|Field||Associated RDF mapping|
Create all the fields shown in the tables for each
type. When creating the field Name for the second time, make
sure to use the Add existing field row of the field
creation form. The names you choose for the Drupal fields are
arbitrary and do not have to match the RDF mapping used in the
Prepare a page to display the SPARQL View results
Eventually, you'll need a container to display the results of both datasets.
The example uses a Drupal node as a container for that purpose. The first task is to
create a content type
Agency page. In
Structure > Content type, click Add
- Name: Agency page
- Description: A page containing information about an agency from various datasets.
- Display settings: Uncheck Display author and date information because it's irrelevant for the example.
- Comment settings: Select Closed because you don't need the comment function on that page.
After you've saved this form, click Manage fields and delete the Body
field; it won't be used. Add a field called
Agency URI, which will hold
the URI of the agency about which you want to display information. You
don't need to specify any RDF mapping for this field.
Now you can create an actual page for the agency about which you want to display information. Click Add content in the grey shortcut bar and select Agency page. Fill in a title for this page, such as "Information about the Engineering and Physical Sciences Research Council." In the Agency URI text field, paste the DBpedia URI of this agency, which you can find on the project page:
by right-clicking on the name of the funder:
This particular URI was chosen because it is the common denominator between
the two datasets. After submitting the form, you'll notice that the URI
you just pasted is displayed below the title. This URI will play an
important role later when you assemble the views onto that page, but it
does not necessarily need to be displayed. You can hide it in
the Manage Display tab when editing the
Agency page content type by selecting
<hidden> in the format drop-down.
Build the DBpedia SPARQL view
Now that all the elements have been prepared, it's time to:
- Build the actual views that will query the datasets using the RDF mappings you specified.
- Display the results in the appropriate page.
Create a new SPARQL view by going to Structure > Views and click Add new view. Start by creating the view for the information about the agency from DBpedia. Give it a name, such as "Agency information," and select SPARQL Views: DBpedia type in the Show drop-down list. You won't create a page, but instead will create a block with the title "Funding Agency Details (source: DBpedia)." Clicking Continue and edit will bring you to the main user interface of Views. Most of this user interface is not specific to SPARQL Views. If you're familiar with Views, you'll recognize a lot of the concepts, such as fields, relationships, sort criteria, filters, and so on. Views is very flexible, with a lot of options and settings available.
Fields are the data items that will be made available during the display of the results. They can be displayed, hidden, or combined with one another depending on the use case.
- Click the add button at the top right corner of the Fields box to list the available fields.
Check the box for agency: field_name and click
Add and configure fields to add it to the fields list. Leave the
configure field form as it is and click Apply.
Repeat this for all the other fields: description, homepage, location, and abbreviation. Experiment with the settings of each field to specify a label, wrap the field output in <strong> or <em>. The homepage URI can be turned into a link by choosing Output this field as a link and setting the link path to
- Contextual filters let you specify an input parameter for the view to use when
executing a query. The example uses an argument to tell the "Agency
information" view that it should only query data about the Agency at
hand. This is where the URI you entered on the node will come in handy.
To add a contextual filter, open the "Advanced" fieldset on the right and click Add, check the agency: URI checkbox, and click Add and configure contextual filters. In "When the filter value is NOT in the URL" choose Provide default argument. In "Type" choose Field Value (Node). In Source Field select the Agency URI field. These settings will instruct the view to use the value of the Agency URI as the subject of the WHERE pattern of the SPARQL query. Click Apply.
- Don't forget to click Save in the top right corner to save all the settings you made.
You now have your first basic SPARQL view, which can be placed inside the node you created for the Engineering and Physical Sciences Research Council. This view will be materialized using a block to be displayed in that node.
- Go to Structure > Blocks to manage the blocks on the site, and identify the block named
agency_information: Block. Move this block in the content region, right above the Main page content page block, as shown in Figure 7. Click Save.
Figure 7. Manage the blocks on the site
- To manage the visibility settings of this block, click
Configure for this block. In the
Content types vertical tab, choose Agency page so that this SPARQL
View block only appears on the intended pages. See Figure 8.
Figure 8. Manage the visibility settings
- Go to the node you created for the agency, and you should now see the
SPARQL View block you just created containing details about the agency
from DBpedia, as shown in Figure 9.
Figure 9. SPARQL View block with details about the agency
Build the research funding SPARQL view
Building the second SPARQL view is very similar to the way you built the
first view. Following the same instructions, name the second
Funded projects and choose the type SPARQL Views: Research
funding. The title of the block will be "Projects funded (source:
data.gov.uk)." Add the following fields:
- Reference Number
Leave out Funding Agency because you already know what it is when you look at its page.
Add the contextual filter
configure it the same way you did in the first view.
Add the block to the content region and configure it like the
block of the first view. Take a peek at the agency node. You should see
lot more content has been added. However, you'll notice that the layout does not look right:
the first view was displaying the value of only one resource (the
agency). The second view displays many resources at once, so you need a
different template for it. A table makes more sense here; each
result will be a row, and the values will be the column of the table. This
is not a problem for Views.
Head back to the edit form of the Funded Projects view. In the format pane on the left, click Unformatted list and select Table instead. A list of all the columns will appear where you can enhance the layout of the table, such as right aligning the Value column for better readability. You can also make it sortable.
You could make other improvements to the view, such as choosing Trim this field to a maximum length for the abstract (for example, 150 words). You can output the Name field as a link by adding the URI field, excluding it from the display, and using it as a replacement pattern for the name. Save the view and browse to the agency page, which now includes a table of all projects that have been funded by the agency, as shown in Figure 10.
Figure 10. Agency page including a table of all projects that have been funded
In this article, you learned how to combine data about a single resource from two different SPARQL endpoints using Drupal 7 and SPARQL Views. The article explored only a few of the numerous ways of combining Views with SPARQL, the RDF querying language. We hope you will continue to explore the new possibilities that Drupal 7, Views, and SPARQL Views have to offer, and to find news ways to combine Linked Data.
- DBpedia is a project aiming to extract structured content from the information created as part of the Wikipedia project.
- The Linked Data initiative is about using the web to connect related data that wasn't previously linked, or using the web to lower the barriers to linking data currently linked using other methods.
- The Linking Open Data cloud diagram shows datasets that have been published in Linked Data format by contributors to the Linking Open Data community project and other individuals and organizations.
- SPARQL Views was built as part of Google Summer of Code 2010 by Lin Clark, who has been constantly improving the project since then.
- Share your findings with us on the Semantic Web group.
- The DBpedia project is available for querying at the following SPARQL endpoint.
- The Opening up government research funding dataset includes links to resources in DBpedia, which allow you to combine data from both datasets.
- At the time of this writing, 5,400 datasets are available at Opening up government - Data. Some of these datasets are available as RDF through a SPARQL endpoint, such as the research funding dataset available at services.data.gov.uk/research/sparql. This dataset includes more than 43,000 facts about the projects that have been funded by various British funding agencies.
- View the facts for a typical project from the HM Government site "Opening up government."
- The seminal article "The Semantic Web" by Tim Berners-Lee, James Hendler, and Ora Lassila explores a new form of web content.
- Read the ReadWriteWeb interview with Tim Berners-Lee about Linked Data.
- Learn about Linked Data design issues from Tim Berners-Lee.
- Read the W3C specification for SPARQL Query Language for RDF.
- Read the Google Rich Snippets Documentation.
- "Implement Semantic Web standards in your Web site" (developerWorks, May 2008) explains how to implement a simple social networking site using PHP and MySQL, which will implement Semantic Web standards such as hCard and Friend of a Friend (FOAF) as part of a semantic Uniform Resource Identifier (URI) scheme.
- In "Developing Drupal publications to support standards-based XML" (developerWorks, Feb 2011), learn to customize your Drupal installation to support the publication of TEI (or other) XML documents.
- Read the official Drupal Installation Guide.
- Get step-by-step instructions on how to install Drupal 7 with the Acquia Stack Installer.
- The FOAF Vocabulary Specification 0.98 describes the FOAF language, defined as a dictionary of named properties and classes using W3C's RDF technology.
- The Dublin Core Metadata Initiative (DCMI) is an open organization engaged in the development of interoperable metadata standards that support a broad range of purposes and business models.
- The SIOC (Semantically-Interlinked Online Communities) Core Ontology Specification provides the main concepts and properties required to describe information from online communities (such as message boards, wikis, weblogs, and so on) on the Semantic Web.
- A demonstration query interface is also available on the Web at http://dbpedia.org/snorql/.
- The developerWorks Web Development zone specializes in articles covering various web-based solutions.
- developerWorks technical events and webcasts: Stay current with developerWorks technical events and webcasts.
Get products and technologies
- Get support in the SPARQL Views bug tracker.
- Get RDF Extensions 7.x-2.0-alpha1.
- Get SPARQL 7.x-2.0-alpha1.
- Get Views 7.x-3.0-beta3.
- Get SPARQL Views 7.x-2.0-alpha1.
- Get Entity API 7.x-1.0-beta8.
- Get CTools 7.x-1.0-alpha4.
- The main RDF package requires the ARC2 library.
- Drush is the Drupal command line tool which can save you a lot of time. It can download modules and perform various operations on your Drupal site.
- You can download the modules from Google Rich Snippets, Field collection, and Entity API. Be sure to download the development releases.
- Acquia Drupal is a freely available packaged distribution of the open source Drupal social publishing system.
- Innovate your next development project with IBM trial software.
- Create your developerWorks profile today and set up a watchlist on the semantic web. Get connected and stay connected with developerWorks community.
- Find other developerWorks members interested in Web development.
- Share what you know: Join one of our developerWorks groups focused on Web topics.
- Roland Barcia talks about Web 2.0 and middleware in his blog.
- Follow developerWorks' members' shared bookmarks on Web topics.
- Get answers quickly: Visit the Web 2.0 Apps forum.
- Get answers quickly: Visit the Ajax forum.