What's new in RDF application development in DB2 10.1 Fix Pack 2

Beginning with DB2® 10.1 for Linux®, UNIX®, and Windows®, DB2 has supported RDF data and SPARQL application development. In this article, learn about important enhancements for RDF application development that were added in DB2 10.1 Fix Pack 2.

Share:

Mario Briggs (mario.briggs@in.ibm.com), Senior Software Engineer, IBM

Mario  Briggs photoMario Briggs leads the open source offerings for IBM DB2 and IBM Informix, including PHP, Ruby/Rails, Python/Django, Perl, and Java data access frameworks. He also leads the RDF support in DB2. He has about 14 years of experience in software development with many of those years spent in the area of data access, relational engines, and application-database performance.



Priya Ranjan Sahoo (prrsahoo@in.ibm.com), Senior Software Engineer, IBM

Priya SahooPriya Ranjan Sahoo works for the RDF support in DB2. He has about six years of experience in software development, mostly in Java application development and databases.



Farzana Anwar (fanwar@ca.ibm.com), Information Developer, IBM

Farzana AnwarFarzana Anwar is a member of the DB2 for Linux, UNIX, and Windows team at the IBM Canada Lab in Markham, Ontario. Since 2004, she has held various roles across IBM, in the areas of application development, system verification test, technical support, and information development. She has a bachelor's degree in computer science from Acadia University and a master's degree in information systems from Athabasca University. In her current role as a technical writer for DB2 for Linux, UNIX, and Windows, she focuses on making DB2 products more consumable for users.



07 February 2013

Introduction

The Resource Description Framework (RDF) is a family of W3 specification standards that enable the interchange of data and metadata. SPARQL is the query language that enables you to retrieve data stored in RDF format. DB2 10.1 Fix Pack 2 (DB2 10.1 FP2) contains a number of important enhancements for RDF application development. These include enhancements to SPARQL support, SPARQL over HTTP support, SPARQL 1.1 Graph Store HTTP protocol support, performance enhancements, JENA model API enhancements, and new and updated utilities.

Enhancements to SPARQL support

Support for the SPARQL 1.1 UPDATE specification

DB2 10.1 FP2 supports the SPARQL 1.1 Update specification. You can now execute SPARQL update queries to modify data in the RDF data set in addition to the JENA API. To run update queries from the command line, you can use the updaterdfstore utility. The arguments for this utility are described in the section on the new utilities in DB2 10.1 FP2. For a sample of how to use a Java API to run SPARQL updates, see the DB2 Information Center link in the Resources section of this article.

Enhancements to SPARQL 1.1 Query

Support for the VALUES clause

Unlike SQL, SPARQL does not support the notion of parameterized queries. The closest concept is the SPARQL 1.1 VALUES clause, which DB2 10.1 FP2 supports. If your application has SPARQL queries that you must execute at once with multiple input values, you can now efficiently execute the application by using the VALUES clause. DB2 prepared statement batching is used to optimize the execution of the SPARQL code.

For a link to a short set of examples on how to use the SPARQL Values clause, see the Resources section.

Removal of the SPARQL restriction from the DB2 10.1 base release

The restriction in the DB2 SPARQL engine outlined in the DB2 Information Center has been eliminated. For details on this restriction, see SPARQL query support in the DB2 Information Center.

Easy querying of particular versions of RDF graphs

RDF users often need to store historical versions of the data. This requirement is met using named graphs. A common and easy way to achieve this goals consists of the following steps:

  1. For each version of the data, create a named graph or set of named graphs whose triples contain the names of the graphs for the data. Such a named graph is often referred to as the baseline graph.
  2. When querying using SPARQL, query only the required graphs.

SPARQL supports using the FROM and FROM NAMED clauses in a query, but using this approach poses several problems, such as the following:

  • The list of graphs can be huge, and hence the SPARQL code becomes unreadable.
  • Because the FROM and FROM NAMED clauses are part of the query itself, you must maintain separate queries for each version of the data that you want to query.
  • You must issue two SPARQL queries: one that retrieves the contents of the baseline graph and one that contains the actual SPARQL.

DB2 10.1 FP2 includes a simple solution for all these problems. The RdfStoreFactory class provides an overloaded connectDataset method, whose parameters include the name of the set of baseline graphs and the predicate whose values refer to the names of the named graphs that make up the particular version of the data.

The code snippet shown in Listing 1 shows how to use the connectDataset method to obtain a Dataset reference object that can be used to execute SPARQL queries against only the named graphs referenced by the http://purl.org/dc/terms/references predicate in the baseline graph http://someURI/versionv1, as shown:

Listing 1. Using connectDataset method to obtain a Dataset reference object
Dataset baselineDs = RdfStoreFactory.connectDataset(historyStore, db2Connection,
       new String [] { "http://someURI/versionv1"} , 
ResourceFactory.createProperty(“http://purl.org/dc/terms/references”));

RdfStoreQueryExecutionfactory.create("select * where { ?s ?p ?o } LIMIT 10",baselineDs)

SPARQL over HTTP

Fuseki is a SPARQL server, providing support for the SPARQL protocol over HTTP. In DB2 10.1 FP2, SPARQL over HTTP support is provided for Apache Fuseki versions 0.2.4 and 0.2.5.

Setting up Apache Fuseki

To set up Fuseki on your system:

  1. For the complete list of binaries for Fuseki, go to http://www.apache.org/dist/jena/binaries. Download and extract the jena-fuseki-0.2.5-distribution.zip file.
  2. Navigate to the Fuseki_install_dir/jena-fuseki-0.2.5 directory.
  3. Add the DB2 RDF jar files, JCC driver jar file, and Fuseki jar file to the classpath. The following example shows how to do this on a Windows operating system:
    SET CLASSPATH=./fuseki-server.jar;<DB2_FOLDER>/rdf/lib/rdfstore.jar; <DB2_FOLDER>/rdf/lib/wala.jar;<DB2_FOLDER>/rdf/lib/antlr-3.3-java.jar;<DB2_FOLDER>/rdf/lib/commons-logging-1-0-3.jar;<DB2_FOLDER>/java/db2jcc4.jar;%CLASSPATH%;
  4. Start Fuseki:
    java org.apache.jena.fuseki.FusekiCmd --config config.ttl
  5. Start your browser and point it to the URL localhost:3030. You should see the Fuseki main page, as follows:
    Figure 1. Fuseki main page
    screen cap: links to Control Panel, Fuseki docs, validators, general SPARQL service, and standards
  6. Configure Fuseki to use DB2, following these steps:
    1. In the root folder into which you extracted the Fuseki files, locate the config.ttl file.
    2. In the file, add a service for the DB2 database, as shown in the following example.
    3. Immediately after all the prefix definitions, add the db2rdf prefix:
      @prefix db2rdf:  <http://rdfstore.ibm.com/IM/fuseki/configuration#> .
    4. In the section of the file where the services are registered, add one or more DB2 services. Create a separate service for each DB2 RDF data set.
      Listing 2. Creating a service for each RDF data set
      fuseki:services (
           <#service1>
           <#service2>
           <#serviceDB2RDF_staffing>
         ) .
    5. Register the assembler that creates the DB2Dataset class. This step also registers the DB2QueryEngine and DB2UpdateEngine engines.
      Listing 3. Registering the assembler
      # DB2 
      [] ja:loadClass "com.ibm.rdf.store.jena.DB2" .
      db2rdf:DB2Dataset  rdfs:subClassOf  ja:RDFDataset .
    6. Configure the registered DB2 RDF service using the example as shown:
      Listing 4. Configuring the RDF service
      # Service: DB2 staffing store
      <#serviceDB2RDF_staffing>
      rdf:type fuseki:Service ;
      rdfs:label "SPARQL against DB2 RDF store" ;
      fuseki:name "staffing" ; 
      fuseki:serviceQuery "sparql" ;
      fuseki:serviceQuery "query" ;
      fuseki:serviceUpdate "update" ;
      fuseki:serviceUpload "upload" ;
      fuseki:serviceReadWriteGraphStore "data" ; 
      fuseki:serviceReadGraphStore "get" ;
      fuseki:serviceReadGraphStore "" ; 
      fuseki:dataset <#db2_dataset_read> ;
      .
      
      <#db2_dataset_read> rdf:type db2rdf:DB2Dataset ;
      
      # Specify the RDF store data set and schema
      db2rdf:store "staffing" ;
      db2rdf:schema "db2admin" ;
      
      # Database details. Either specify a jdbcConnectString
      # with user name and password or specify a jndiDataSource.
      db2rdf:jdbcConnectString "jdbc:db2://localhost:50000/RDFSAMPL" ;
      db2rdf:user "db2admin" ;
      db2rdf:password "db2admin" .
      
      #db2rdf:jndiDataSource "jdbc/DB2RDFDS" .
    7. Restart Fuseki:

      java org.apache.jena.fuseki.FusekiCmd --config config.ttl

    8. Go to http://localhost:3030/control-panel.tpl, select the /staffing data set, and click Select.
      Figure 2. Fuseki Control Panel
      screen cap: shows dataset option
    9. This brings up the Fuseki Query screen. Use this interface to issue SPARQL queries, make updates, and load RDF graphs into the DB2 store.
      Figure 3. Fuseki main page
      screen cap: shows SPARQL query pane and SPARQL update pane

SPARQL 1.1 Graph Store HTTP Protocol

In DB2 10.1 FP2, the SPARQL 1.1 Graph Store HTTP Protocol is supported. You can exploit this feature by using Apache Fuseki, as shown in the previous section.


Performance enhancements

SPARQL-to-SQL cache

For SPARQL queries that return a few rows and have a very short execution time, such as 1 millisecond, the creation of the SQL statements from the SPARQL statements might influence the total time to execute the request. For this scenario, DB2 10.1 FP2 supports a SPARQL-to-SQL cache. The first time that a SPARQL is executed and its SQL is generated, the SQL string is put in a cache whose key is the hash of the SPARQL itself, with constants removed. When another SPARQL query is executed, it is first parsed to remove the constants in the SPARQL, and the result is then used to look up the SPARQL-to-SQL cache. If a result is returned from the cache, the SQL string from the cache is reused by binding the new parameter values rather than generating the SQL for the SPARQL. The benefit of a cache result is that it removes the need for building a plan for the SPARQL query. This performance boost can be very handy for SPARQL queries whose execution times are in the range of a few milliseconds.

The SPARQL-to- SQL cache is not turned on by default. You must explicitly turn it on by setting the db2rdf.QueryCache Java system property to true, as shown in the following code snippet:

System.setProperty("db2rdf.QueryCache","true");

The SPARQL-to- SQL cache uses a default number of 512 entries and uses the least recently used (LRU) algorithm when elements are added beyond the configured size. You can change the default size of the SPARQL-to-SQL by setting the db2rdf.QueryCacheSize Java system property. For example:

System.setProperty("db2rdf.QueryCacheSize",”1024”);

Describe handlers

In the DB2 10.1 base release, a default DESCRIBE handler was provided that was efficient at describing resources when the RDF data was spread across multiple named graphs. In DB2 10.1 FP2, the DESCRIBE handler was enhanced so that it can describe multiple resources to the required level in a single JDBC request of the DB2 server. This enhancement provides significant performance gains.

Also, in DB2 10.1 FP2, the DESCRIBE handler does is not registered by default. For situations where the RDF data in the data set is spread across multiple named graphs, you can register the provided DESCRIBE handler as follows:

Listing 5. Registering the DESCRIBE handler
DescribeHandlerRegistry.get().add(new DescribeHandlerFactory() {
			public DescribeHandler create() {
				return new DB2DescribeHandler();
			}
		});

Do not use the provided DESCRIBE handler if all the RDF triples are in a single graph, because the optimizations that the DESCRIBE handler uses do not exist when all the RDF triples are in a single graph. For this situation, you should develop and register DESCRIBE handlers that use SPARQL or SQL query language to exploit the way the data is organized in the data set.


JENA Model API enhancements

The restriction for the DB2 implementation of the JENA Model API, as explained in the DB2 Information Center, has been removed. The restriction was that if you specified duplicate triples for the Model.add(model) or Model.read() API, the duplicates were not filtered.

A further optimization feature is provided for applications that model its RDF data as distributed over numerous named graphs, where each graph is small and the graph is inserted or deleted as a whole rather than individual triples being updated. In this case, you can optimize the graph insertion process significantly if you set the following symbol:

store.getContext().set(Symbols.partialGraphInserts,false);


New and updated utilities

DB2 10.1 FP2 provides a number of new command-line utilities in the rdf/bin folder. The utilities are briefly covered in the following subsections. For details about the parameters for each command, see the DB2 Information Center link in the Resources section.

loadrdfstore utility

You can use this utility to easily load a smaller sized RDF files into a DB2 RDF store from the command line. This utility does not use the DB2 LOAD command which is the most efficient way of bulkloading large amounts of data into DB2. This utility can load RDF data from NQUAD (.nq), NTRIPLE (.nt), RDFXML (.xml), TURTLE (.ttl), N3 (.n3), or TRIG (.trig) files. The file extensions that you use must match the ones in parentheses.

The loadrdfstore utility uses a streaming mechanism when loading NQUAD and NTRIPLE files. This ensures a JVM out-of-memory exception does not occur. In case of NQUAD files, to optimize the loading process, sort the file by the graph column. In the case of NTRIPLE files, to optimize the loading process, sort the file by subject.

queryrdfstore utility

You can use this utility to run a SPARQL query against a DB2 RDF store from the command line. You can specify the SPARQL itself in an external file, by using the -file option, or directly, as part of the command. You can run multiple queries by specifying a comma-separated list of files for the -file parameter. You can view the SQL statements that are generated for a SPARQL by using the -querylog parameter. You can run the queries multiple times by using the -numruns parameter.

updaterdfstore utility

You can use this utility to run a SPARQL update query against a DB2 RDF store from the command line. You can specify the SPARQL update query in an external file, by using the -file option, or directly, as part of the command.

genpredicatemappings utility

You can use this utility to compute the predicate correlation and produce the optimized predicate mappings of a DB2 RDF store from the command line. The output is sent to the console; you can redirect the output to a file by using the greater than (>) operator. You can then use the optimized predicate mappings when using the createrdfstore command to create an optimized RDF store.

createrdfstore utility

This utility has been updated with additional parameters. You can use the -predicatemappings parameter to specify the optimized predicate mappings so that an optimized store is created instead of a default store. You can use the -systempredicates parameter to create an RDF store with row and column access control.


Listing RDF stores in a database

You might often need information about all the RDF stores that you created in a database. In the DB2 10.1 base release, there was no single place to find this information. However, with DB2 10.1 FP2, whenever you create an RDF store, the information is stored in a table named RDFSTORES in the DB2 SYSTOOLS schema. Similarly, when you drop a store, the information is updated in this table. Thus, to know the names of RDF stores in a database, you can issue the following SQL statement:

select storeName, schemaName from SYSTOOLS.RDFSTORES


Conclusion

This article took you through a quick tour of all the main feature enhancements that have been added to RDF application development. The DB2 for Linux, UNIX, and Windows Version 10.1 FP2 release has added significant features in the areas of SPARQL specification support, including SPARQL 1.1 update, SPARQL 1.1. Graph Store HTTP protocol, and new SPARQL 1.1 features. Additionally, the fix pack includes performance enhancements such as the sparql-2-sql cache, updated describe handlers, and new and updated utilities. Finally, you can now use DB2 RDF support out-of-the-box with the latest Apache Fuseki software.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=860559
ArticleTitle=What's new in RDF application development in DB2 10.1 Fix Pack 2
publish-date=02072013