Resource description framework application development in DB2 10 for Linux, UNIX, and Windows, Part 1: RDF store creation and maintenance

The Resource Description Framework (RDF) is a family of W3 specification standard that enables the exchange of data and metadata. Using IBM® DB2® 10 for Linux®, UNIX®, and Windows® Enterprise Server Edition, applications can store and query RDF data. This tutorial walks you through the steps of building and maintaining a sample RDF application. During this process you will learn hands-on how to use DB2 software in conjunction with RDF technology. Note: This tutorial has been updated in response to reader questions to include an additional method for querying an RDF store using the QueryRdfStore command.


Mario Briggs (, Senior Software Engineer, IBM China

Mario  Briggs photoMario Briggs leads the open source offerings for IBM DB2 and IBM Informix, including PHP, Ruby/Rails, Python/Django, Perl, and Java data access frameworks. He also leads the RDF support in DB2. He has about 14 years of experience in software development with many of those years spent in the area of data access, relational engines, and application-database performance.

Priya Ranjan Sahoo (, Senior Software Engineer, IBM China

Priya SahooPriya Ranjan Sahoo works for the RDF support in DB2. He has about six years of experience in software development, mostly in Java application development and databases.

Gayathri Raghavendra (, Associate Software Engineer, IBM China

Gayathri RaghavendraGayathri Raghavendra works as a QA engineer for IBM ISL. Gayathri has worked on FVT and regression testing for several releases of IBM DB2 Universal JDBC and SQLJ Driver on z/OS and Linux, UNIX, and Windows platforms. She also worked on testing of the RDF support in DB2. Gayathri also handles FVT for the OCR reporting tool of DB2.

Rajendran Appavu (, Advisory Software Engineer, IBM China

Rajendran AppavuRajendran Appavu works for the RDF support in DB2. He has about 19 years of software design and development experience. He has worked on networking, network management, provisioning, CORBA, and data management software products.

Farzana Anwar (, Information Developer, IBM China

Farzana Anwar is a member of the DB2 for Linux, UNIX, and Windows team at the IBM Canada Lab in Markham, Ontario. Since 2004, she has held various roles across IBM, in the areas of application development, system verification test, technical support, and information development. She has a bachelor's degree in computer science from Acadia University and a master's degree in information systems from Athabasca University. In her current role as a technical writer for DB2 for Linux, UNIX, and Windows, she focuses on making DB2 products more consumable for users.

23 January 2013 (First published 24 May 2012)

Also available in Chinese

Before you start

About this tutorial and series

This tutorial is the first in a three-part series that gives you hands-on experience in using DB2 for Linux, UNIX, and Windows software with Resource Description Framework (RDF) technology. This tutorial introduces a sample use case for an RDF application and walks you through the following steps of building and maintaining this application:

  • Creating an RDF store
  • Inserting data into it
  • Querying the data
  • Updating the data
  • Maintaining the store by ensuring that statistics required for good query performance are up to date

The tutorial also includes instructions on migrating to DB2 software.

Part 2 covers more advanced topics, such as access control and optimized stores. Part 3 explains how to use SPARQL-specific features, such as the various query forms like DESCRIBE, CONSTRUCT, and union of graphs.

The example application development use case

This tutorial uses as a fictitious example a company named XYZ Inc., which has a number of organizations. The locations span multiple geographies, so this company must conform to the regulations in each geography. Each organization runs several projects of its own, and each project requires skilled personnel and various resources. All XYZ employees are managed by a single HR system. Based on the needs of the business and the employees' interests, the employees can be moved across projects within an organization or even to a different organization. Each organization manages its projects and its finances independently.

Major challenges XYZ faces are conforming to legal regulations and staffing upcoming projects with specialized skilled personnel. Most of time for these new projects, the resources are available within XYZ, but getting all the necessary data is difficult because it is located in several systems. Today, the systems must follow a pattern of logging in to a system to collect some information and then logging in to another system to use the information from the first system to get the next piece of information. In order to collect information more efficiently, XYZ decides to build a new system: the Staffing System. Assume that your job is to build this system.

Requirements for the new Staffing System

The Staffing System must interact with the following other systems to meet the requirements for information:

  • Org System— This system maintains all the details about an organization in the company. Details include the name of the organization, the name of the head of the organization, number of employees in the organization, and financials. This system generates a unique organization ID other systems can refer to.
  • Projects System— This system maintains the details about each project that has been completed or is being done by the company. Details include the names of the employees working for a project, name of the project manager, billing details, and name of the organization to which the project belongs. This system generates a unique project ID other systems can refer to.
  • HR System— This system maintains the details of the company employees. Details include the employee's name, employee's address, project the employee is working on, and references to any work permits. This system generates a unique employee ID other systems can refer to.
  • Legal System— This system maintains the legal approvals the company must obtain. It also maintains references to document IDs issued by government agencies.

The new system must be able to meet the following needs of the company:

  • When a new project is started, the project manager must be able to find the available staff with the required skills for the new project.
  • Based on the location of the project, the staff, and other resources, such as software, the system must be able to find the legal approvals for the project.
  • The project manager must be able to consider the staff's interests in working on specific technology areas and eligible skills.
  • Today, the Staffing System must integrate with only the four systems specified earlier, but the company expects that in the future, the Staffing System must integrate with other systems. Addition of new systems should be seamless and must not affect existing systems.
  • Each existing system must be able to independently undergo other revisions, data schema changes, or interface changes without taking into account compatibility with the new Staffing System. Similarly, the Staffing System should not be affected by changes to the internals of those existing systems.

The four existing systems are isolated from each other, although the maintained information is interconnected. The unique identifiers each system generates identify some entities, but their relationships with other entities in other systems are not available in one place. This results in multiple hops among these existing systems. The company wants to use the Staffing System to simply and efficiently navigate relationships across entities in the different systems.

To address this, you require a data store that can store the identifiers generated in each system and store information about their relationships. Because interfaces with other systems might be required in the future, this data store must be generic in nature. RDF provides a data store that is ideal for meeting these kinds of requirements. An RDF data store does not have a fixed schema. You can use the SPARQL query language to query the data without having to know the data schema. This is an architecture commonly referred to as linked data. The new Staffing System will adopt this architecture.

The following table describes the URI structure each system will use to uniquely identify its entities.

Table 1. URI structures
<> Refers to XYZ Inc.
<> Refers to HR organization in the company XYZ Inc.
<> Employee of XYZ Inc.
<> Manager in XYZ Inc.
<> An XYZ project
<> An XYZ Product
<> Lead of a project
<> Team member
<> QA for a project
<> Information development person for a project in XYZ Inc.
<> Approval ID to use some software
<> Government legal approval for an employee etc.

Overview of the new Staffing System

Whenever data is inserted or modified within the existing four systems, these systems must generate the relevant RDF data, then the Staffing System loads this data into its RDF store. Users can then run SPARQL queries on the Staffing System.

Each of the existing systems needs access to the unique IDs (such as URIs) that the other systems generate. For example, when the Projects System refers to a member of a particular project, it must use the unique ID that the HR system generates for the employee. An existing system can obtain a unique URI by querying the new Staffing System.

The existing information in the Org, Projects, HR, and Legal systems must be generated in RDF. Then this data must be loaded into the new Staffing System. To keep the application simple for this tutorial, assume that each of the existing systems generates an N-triple formatted file with this data, which is then loaded into the Staffing System. Therefore, the file is available on disk. Again, for simplicity, assume that the existing systems generate an N-triple file whenever data is inserted or updated in those systems and that this file is accessible on disk to the Staffing System. (In reality, updates would flow through the systems through an online restful service.)

Samples provided with this tutorial

This tutorial provides a file called This contains a sample Java™ Eclipse project, which contains Java programs. Download it to a local disk and extract the compressed files.

In this tutorial, it is assumed that you will perform all DB2 tasks by using the db2admin authorization ID, which has all required administrative privileges. If you use a different authorization ID, first ensure that it has the required privileges.

Getting started with RDF stores

Types of RDF stores in DB2 database servers

DB2 supports the creation of two types of RDF stores:

  1. Default RDF stores— You should create a default RDF store when you do not have information about the RDF data being loaded or when no appropriate sample is available. Use the createrdfstore RDF command to create this type of store.
  2. Optimized RDF stores— You can create an optimized store when there is representative sample data for the RDF data set. Optimized stores provide an optimal schema based on the input RDF data set you provide. Creating optimized stores will be covered in Part 2.

In a DB2 database, an RDF store retains the data of one RDF data set. Any insert, update, or query operation can be performed on only one RDF data set or store at one time.

For the Staffing System, you will use a default RDF store.

Creating an RDF store

Now that you have the details of the Staffing System, let's start building it by using the RDF support in the DB2 software. You must first create a store that holds the RDF data of the Staffing System.

Prerequisites for creating RDF stores

The prerequisites for creating an RDF store:

  • The database must have a 32-KB page size.
  • Set the LOGFILSIZ configuration parameter to be greater than 20000.
  • The SYSTOOLSPACE table space should exist.
  • The DB2 administrative task scheduler must be turned on.
  • Runstats should be turned ON.
  • The authorization ID creating the RDF store should have permission to create external routines.
  • The authorization ID creating the RDF store should update permissions on the SYSTOOLS.ADMINTASKSTATUS table.
  • Set the buffer pools to automatic with good defaults.

If the SYSTOOLSPACE table space does not exist, you can create it by using the following command at a DB2 command prompt:


In this tutorial, you must create a database called RDFSAMPL, with a 32-KB page size. To create the database and meet the previously stated prerequisites:

  1. Turn on the administrative task scheduler by using the following command: db2set DB2_ATS_ENABLE=YES".
  2. At a DB2 command prompt, run the setup.sql script with system administrator authority: db2 -f c:\db2rdftutorial\resources\setup.sql.

You are now ready to create the RDF store.

Setting up your environment to run DB2 RDF commands

It is assumed that you installed DB2 for Linux, UNIX, and Windows on your local Windows operating system. You can do the steps in this section on a Linux or UNIX operating system, too; the only difference is that you must execute shell files instead of .bat files.

The DB2 installation provides RDF support in the install_path/sqllib/rdf folder. The RDF folder contains command-line utilities in the bin folder and the DB2 RDF JAR files in the lib folder.

To set up your environment to use DB2 RDF support:

  1. Download ARQ package Version 2.8.5 from Copy the JAR files from the lib folder of the ARQ package to the <install_path>/SQLLIB/rdf/lib directory.
    Note: You can skip copying over the 'xxx-tests.jar', 'xxx-sources.jar', and 'xxx-test-sources.jar' JAR files.
  2. From the Apache commons project, download the commons-logging.jar file into the sqllib/rdf/lib folder. Figure 1 shows the sqllib/rdf/lib folder after you complete the first and second steps.
    Figure 1. JARs in the sqllib/rdf/lib folder 
    Image shows a list of JAR files
  3. Open a command prompt and navigate to the sqllib/rdf/bin folder: cd "\Program Files\IBM\SQLLIB\rdf\bin".
  4. Add the DB2 JCC driver (the db2jcc4.jar file), located in the /sqllib/java directory to the classpath: set classpath=c:\progra~1\ibm\sqllib\java\db2jcc4.jar;%classpath%.

The remainder of this tutorial refers to the previously mentioned command prompt as the DB2 RDF command prompt whenever you must execute DB2 RDF commands.

Now you are ready to execute the DB2 RDF commands.

Creating an RDF store

Every RDF store has a unique name. This name must follow the naming conventions of DB2 database objects. For the Staffing System, you will use the following store name: staffing.

To create the staffing store, at the DB2 RDF command prompt, issue the following command:

createrdfstore staffing -db RDFSAMPL -user \
db2admin -password db2admin -objectnames \


  • -user specifies the authorization name to use to establish the connection — in this case, db2admin.
  • -password specifies the password to use to establish the connection — in this case, db2admin.
  • –db specifies the database name — in this case, RDFSAMPL.
  • –objectnames specifies a file that contains the tables to be created and their table spaces. In this case, the name of the file is staffingstore_names.props. This file is provided in the resources folder of the unzip_directory, which is the directory into which you extracted the file.

The contents of the file specify that STAFFING_DPH is the name for the direct_primary_hash table, and UserSpace1 as the table space for that table. Other tables' names are similarly formatted. By specifying the table space names, you can manage storage more effectively. If you do not specify the table space name for a table in the file for the –objectnames option, the default table space is used. If you do specify the table space, it must be large enough to hold the record length of the direct_primary_hash and reverse_primary_hash tables; for default stores, 32-KB page size.

If you specify the -objectnames option, the file must contain at least the names for all the tables. If you do not specify the -objectnames option, system-generated names are used for the tables, and default table spaces are used. The system-generated table names generally follow the pattern of STORENAME_DPH/DS.

You must set up the appropriate table-level permissions for the tables of the staffing store.

Assume that the DB2 server is on your local machine and is running at the default Windows port on 50000. Therefore, you do not have to specify the -host or -port option. Also, assume you are creating the store in the default schema for the db2admin authorization name. Therefore, you do not have to specify the -schema option.

You can now move on to inserting the RDF data generated for existing information in the Org, HR, Legal, and Projects systems. To accomplish these tasks, you must use the JENA API supported by the DB2 software, using the Eclipse IDE. The JENA API is based on the Java language.

Setting up the RDF application development environment

In this tutorial, you use the IBM Data Studio product, which is based on Eclipse and provides numerous features for working with databases. However, you can also use Eclipse for this tutorial. Ensure that JRE 1.6 or later is installed on your system.

To set up the RDF application development environment:

  1. Import the DB2RDFTutorial project into IBM Data Studio:
    1. Click File > Import.
    2. In the pop-up window, select General > Existing Projects into Workspace.
    3. In the location into which you extracted the files from the file, select the folder.
    4. elect the DB2RDFTutorial project, and click Finish.
    Figure 2. Importing the DB2RDFTutorial into IBM Data Studio
    Image shows import window
  2. Add all the required JAR files to the classpath:
    1. Right-click the project and select Properties.
    2. In the pop-up window, select Java build path.
    Figure 3. Properties
    Image shows Java build path
  3. Add all the JAR files in the install_path/sqllib/rdf/lib folder to the project build path by clicking Add External Jars and selecting all the JAR files in the install_path/sqllib/rdf/lib folder.
  4. Add the db2jcc4.jar file from the install_path/sqllib/java folder to the build the path in the same way as in the previous step.

Your final structure of the workspace should look like Figure 3.

Figure 4. Package Explorer view for JARs in this tutorial
Image shows list of JARs

Loading data into the RDF application

The four original systems generate files you can load into the Staffing System. When the Staffing System is set up, it requests that the four original systems create the data in N-triple format. For this tutorial, sample N-triple files are located under the resources folder.

Now let's see how to insert this data into the Staffing System.

Inserting graphs

If you have a complete graph to insert into the store, and the graph does not exist in the store, you can use the bulk insert option, instead of manually inserting data. In this case, all the triples in the graph are inserted into the store by using a batch operation.

To insert the initial data from the four systems into the RDF store, use the sample program. The program performs the following steps:

  1. It creates a connection. To run any RDF programs with the DB2 database server, you must create a connection and pass it as an argument.
    Listing 1. Creating a connection
    	Connection conn = null;
    	Store store = null;
    	String storeName = "staffing";
    	String schema = "prsahoo";
    	try {
    	conn = DriverManager.getConnection(
    	"jdbc:db2://localhost:50000/REPOSDB", "prsahoo", "Ranjan60");
    	} catch (ClassNotFoundException e1) {
  2. It connects to the store by using this connection:
    Listing 2. Connecting to the store
    // Connect to the store
    store = StoreManager.connectStore(conn, schema, storeName);

    You must create a connection and connect to the store each time you want to work on the store. These actions are required for all the sample programs in this tutorial, but will be skipped in subsequent descriptions of sample programs.

  3. It creates an in-memory model for each N-triple file. The method is used to read from the N-triple files.
    Listing 3. Creating an in-memory model
    	private static Model createInMemoryModel(String inputFileName) {
    	Model memModel = null;
    	//Input the file using the FileManager
    	InputStream in = FileManager.get().open(inputFileName);
    	if (in == null) {
    	   throw new IllegalArgumentException(
    	           "File: " + inputFileName + " not found");
    	// Read the RDF/XML file
    	Data setGraph dsg = Data setGraphFactory.createMem();
    	// Because n-triples is nothing but n-quad without graph, this will
    	// just go to defaultgraph, dsg, Lang.NQUADS, "");
    	memModel = ModelFactory.createModelForGraph(dsg.getDefaultGraph());
    	try {
    	} catch (IOException e) {
    	return memModel;
  4. It merges all four models into a single model by using the Model.union() operation.
    Listing 4. Creating a connection
    	// Merge all the four in-memory models into a single graph
    	Model merged = mHr.union(mProj);
    	merged = merged.union(mOrg);
    	merged= merged.union(mLegal);
  5. It adds this merged model to the default graph (model) in the staffing store. This operation must be wrapped within a DB2 transaction.
    Listing 5. Adding the merged model to the default graph
    // Connect to the defaultModel in the staffing store
    Model staffingDefaultModel = RdfStoreFactory.connectDefaultModel(store, conn);
    // Begin a transaction, add the merged in-memory model to \
    the staffing store's default model and commit the transaction

    To add the data to a named graph instead of the default graph, use the RdfStoreFactory.connectNamedModel() method to get a reference to the named graph.

  6. It validates that the data was correctly added to the DB2 database, by dumping the data to the console:
    Listing 6. Sending the data to the console
    //Verify that the data is stored in DB2 by dumping out the data from DB2.
    RiotWriter.writeNQuads(System.out, ds.asData setGraph());

The output is as follows:

Listing 7. Data from DB2
<> \
<> \
<> .
<> \
<> \
<> .
<> \
<> \
<> .
<> \
<> \
<> .
<> \
<> <> .
<> \
<> <> .
<> \
<> <> .
<> \
<> <> .
<> \
<> <> .
<> \
<> <> .
<> \
<> <> .
<> \
<> <> .
<> \
<> <> .
<> \
<> <> .
<> \
<> <> .
_:BX2D4c66b0d5X3A1362f5f6ca0X3AX2D7ffc \
<> "Iti Rawat" .
<> \
<> "Priya Ranjan Sahoo" .
<> \
<> .
<> \
<> <> .
<> \
<> <> .
<> <> 
<> .
<> \
<> .
<> \
<> "Farzana Anwar" .
<> \
<> _:BX2D4c66b0d5X3A1362f5f6ca0X3AX2D7ffb .
_:BX2D4c66b0d5X3A1362f5f6ca0X3AX2D7ffb \
<> .
_:BX2D4c66b0d5X3A1362f5f6ca0X3AX2D7ffc \
<> .
_:BX2D4c66b0d5X3A1362f5f6ca0X3AX2D7ffd \
<> .
_:BX2D4c66b0d5X3A1362f5f6ca0X3AX2D7ffe \
<> .
<> \
<> _:BX2D4c66b0d5X3A1362f5f6ca0X3AX2D7fff .
_:BX2D4c66b0d5X3A1362f5f6ca0X3AX2D7ffd \
<> "Robert" .
<> \
<> <> .
<> \
<> <> .
<> \
<> .
_:BX2D4c66b0d5X3A1362f5f6ca0X3AX2D7fff \
<> "Varuna Subramaniam" .
<> \
<> "Rajesh K Arora" .
<> \
<> <> .
<> \
<> "Gayathri Raghavendra" .
_:BX2D4c66b0d5X3A1362f5f6ca0X3AX2D7ffb \
<> "Alan Ng" .
<> \
<> _:BX2D4c66b0d5X3A1362f5f6ca0X3AX2D7ffd .
<> \
<> _:BX2D4c66b0d5X3A1362f5f6ca0X3AX2D7ffc .
<> \
<> "Mario Ds Briggs" .
<> \
<> .

Here, the triples from all the four N-triple files are loaded into the default graph in the Staffing System.

Using SPARQL queries in the Staffing System application 

Now that the Staffing System is populated with data, you can query it to easily find information that spans the multiple original systems. Let's build the SPARQL query for a sample scenario and use the JENA APIs to execute the SPARQL query against the DB2 database server to display the results.

Later, this section shows how you can run SPARQL queries against the DB2 database server through HTTP, by using the JOSEKI project. SPARQL Protocol [5] describes a means for conveying SPARQL queries and updates to a SPARQL processing service and returning the results through HTTP to the entity that requested them.

Sample SPARQL query 

Assume that you are staffing a new project very similar to the existing project RobotX. It is a good idea to start with all the friends of the members of the RobotX project because it is likely that the friends have similar skills. This information about friends is spread across the existing HR and Projects systems. In the past, getting this information would have required an API or some kind of integration between the systems. However, with the new Staffing System, you can use a simple SPARQL query to fetch this information.

To find all members of the project RobotX, you need a triple pattern in SPARQL. Specify the subject as the IRI of projectRobotX, and use as the predicate, as follows:

Listing 8. Finding triple pattern in SPARQL
select ?member where {
    <> <> ?memberId .
    ?memberId <> ?member .

To now find all the friends of these project members, you can modify the previous query as follows:

select ?member ?friendname where {
	<> <> ?memberId .
	?memberId <> ?member .
	?memberId <>?friendId.
	?friendId      <>  ?friendname .

Executing SPARQL queries by using JENA APIs

The following code is implemented in the sample file.

Listing 9.
// Create a 'Query' object by passing in the SPARQL  query string \
to the 'create' method of the RdfStoreQueryFactory class
Query q=RdfStoreQueryFactory.create(query);
// Use the connectData set method RdfStoreFactory class to get a \
Data set interface to the store
Data set ds = RdfStoreFactory.connectData set(store, conn);

// Use the connectData set method RdfStoreFactory class to get a \
Data set interface to the store
QueryExecution qe = RdfStoreQueryExecutionFactory.create(q, ds);

// Call the 'execSelect' method on the QueryExecution object to get a back a resultset
ResultSet rs=qe.execSelect();

There are many ways to iterate through the result set and access the individual result sets. Following is one simple way, in which you iterate through the result set by using the hasNext method and retrieve each solution by using the next method.

Listing 10. Iterating through the result set
while (rs.hasNext()) {
    QuerySolution qs =;

The output of this SPARQL query is as shown below.

Listing 11. Output from SPARQL query
( ?member = "Rajesh K Arora" ) ( ?friendname = "Varuna Subramaniam" )
( ?member = "Mario Ds Briggs" ) ( ?friendname = "Rajesh K Arora" )
( ?member = "Priya Ranjan Sahoo" ) ( ?friendname = "Robert" )
( ?member = "Gayathri Raghavendra" ) ( ?friendname = "Iti Rawat" )
( ?member = "Farzana Anwar" ) ( ?friendname = "Alan Ng" )

Executing SPARQL queries using QueryRdfStore command

DB2 RDF also provides a Java class java in rdfstore.jar, which can also be executed for running SPARQL queries.

The syntax for the QueryRdfStore command is:

Listing 12. QueryRdfStore command
java <RDF-store-name>
                -db <database-name>
                -user <user-name>
                -password <user-password>
              [ -host <host-name>
              [ -port <port-number>
              [ -schema <schema-name>
                 <query-string / query-file>


  • RDF-store-name is the name of the RDF store
  • database-name, user, password, host-name, port-number, and schema name identify the database and schema where the store exists
  • query-string / query file are either the SPARQL query as a string, or the path of the file containing the query.

At the DB2 RDF Command prompt , execute the command:

java staffing -db RDFSAMPL -user db2admin -password db2admin "select ?member ?friendname where { <> <> ?memberId .?memberId <> ?member . ?memberId <> ?friendId . ?friendId <> ?friendname .}

The output of the command will be as follows:

Listing 13. QueryRdfStore command
| member                      | friendname                   |
| "Rajesh K Arora"            | "Varuna Subramaniam"         |
| "Mario Ds Briggs"           | "Rajesh K Arora"             |
| "Priya Ranjan Sahoo"        | "Robert"                     |
| "Gayathri Raghavendra"      | "Iti Rawat"                  |
| "Farzana Anwar"             | "Alan Ng"                    |
Time taken      : 43

You can also put the query in a file named query.txt and execute it as follows:

java staffing -db RDFSTORE -user db2admin -password db2admin query.txt

Note: This utility is not yet externalized since its parameters might change in the future.

Executing SPARQL queries over HTTP

The W3C has specification drafts for performing RDF operations over HTTP. The Graph Store HTTP Protocol describes updating RDF graph content in a Graph Store and fetching RDF graph content from a Graph Store over HTTP in the REST style.

The Joseki HTTP engine is supported for executing SPARQL queries (not updates) over HTTP against the DB2 database.

Setting up Joseki

To set up Joseki on your system:

  1. Download and extract the Joseki on your local system (see Resources).
  2. Open a command prompt and change to the folder where you extracted Joseki.
  3. Set the JOSEKIROOT variable to the current folder.
  4. Run the bin/rdfserver.bat file.
  5. Start your browser and point it to the URL localhost:2020. You should see the Joseki main page, as follows:
    Figure 5. Joseki web application
    Screen cap shows SPARQLer
  6. Modify Joseki to access the DB2 database server. In the root folder into which you extracted the Joseki files, you will find a joseki-config.ttl file. In this file, add a service for the DB2 database as shown in the following example.
    Listing 14. Adding a service for DB2 to the joseki-config.ttl file
    # Service for DB2
    # General-purpose SPARQL processor for DB2 RDF store, 
    # The store name and how to connect to it are specified in the 'processor' definition
    # The data set should be skipped in the service definition
    # Service for DB2 - SPARQL processor for a DB2 RDF store
    @prefix db2rdf: <> .
    rdf:type   joseki:Service ;
       rdfs:label          "SPARQL against DB2 RDF store" ;
       joseki:serviceRef   "db2" ;   # web.xml must route this name to Joseki
       # data set part
       #joseki:data set      <#sample> ; NOTE RDF store does not take in data set
       # Service part.
       # This processor does not allow either the protocol,
       # or the query, to specify the data set.
       joseki:processor    joseki:ProcessorDB2SPARQL ;
    ## Processors
    # DB2 processor support using FROM/FROM NAMED in the request
    # It does not support specify the data set in the protocol request).
       rdfs:label "DB2 RDF General SPARQL processor" ;
       rdf:type joseki:Processor ;
       module:implementation joseki:DB2ImplSPARQL ;
       # Parameters - this processor processes FROM/FROM NAMED automatically
        # RDF store parameters: Database details. Specify either \
    specify a jdbcConnectString
    with username and password or specify a jndiDataSource
    	db2rdf:jdbcConnectString   "jdbc:db2://localhost:50000/RDFSAMPL" ;
    	db2rdf:userName        "db2admin" ;
    	db2rdf:password "db2admin";
    	#db2rdf:jndiDataSource       "testDS" ;
    	# RDF data set details : storeName and schema in which the store exists
    	db2rdf:storeName       "staffing" ;
    	#db2rdf:schema       "db2admin" ;
       rdf:type   joseki:ServiceImpl ;
      <> .
  7. In the lib folder under the location into which you extracted the Joseki files, add the rdfstore.jar, wala.jar, antlr-3.3-java.jar, commons-logging-1-0-3.jar, and db2jcc4.jar files.
  8. In the lib folder under the location into which you extracted the Joseki files, add the rdfstore-joseki.jar file. The rdfstore-joseki.jar file is in the\lib folder.
  9. In the webapps\joseki\WEB-INF\web.xml file, add /db2 as the service name entry, as shown in the following example. This tutorial uses /db2 because db2 is the service name that is registered in the joseki-config.ttl file. This needs to be added in the same place where other <servlet-mapping> entries are located.
    Listing 15. Servlet-mapping entry
    	   <servlet-name>SPARQL service processor</servlet-name>
  10. In the webapps\joseki\sparql.html file, change the value of the action attribute of the form element from SPARQL to the DB2 database server, as shown in the following example. This tutorial uses db2 because db2 is the service name that is registered in the joseki-config.ttl file.
    Listing 16. Modifying sparql.html
    	<div class="moreindent">
    	     <form action="db2" method="get">
    	       <p>General SPARQL query : input query, \
    set any options and press "Get Results"</p>
  11. Restart Joseki. Sample output is as follows:
    Listing 17. Sample output
    	15:24:54 INFO  Configuration        :: ==== Configuration ====
    	15:24:54 INFO  Configuration        :: Loading : <joseki-config.ttl>
    	15:24:55 INFO  ServiceInitSimple    :: Init: Example initializer
    	15:24:55 INFO  Configuration        :: ==== Datasets ====
    	15:24:55 INFO  Configuration        :: New dataset: Books
    	15:24:55 INFO  Configuration        ::   Default graph : books.ttl
    	15:24:55 INFO  Configuration        :: New dataset: MEM
    	15:24:55 INFO  Configuration        ::   Default graph : <<blank node>>
    	15:24:55 INFO  Configuration        :: ==== Services ====
    	15:24:55 INFO  Configuration        :: Service reference: "books"
    	15:24:55 INFO  Configuration        ::   Class name: org.joseki.processors.SPARQL
    	15:24:55 INFO  SPARQL               :: SPARQL processor
    	15:24:55 INFO  SPARQL               :: Locking policy: \
    multiple reader, single writer
    	15:24:55 INFO  SPARQL               :: Dataset description: \
    false // Web loading: false
    	15:24:55 INFO  Configuration        :: Dataset: Books
    	15:24:55 INFO  Configuration        :: Service reference: "db2"
    	15:24:55 INFO  Configuration        ::   Class name: \
    	15:24:57 INFO  Configuration        :: Service reference: "sparql"
    	15:24:57 INFO  Configuration        ::   Class name: org.joseki.processors.SPARQL
    	15:24:57 INFO  SPARQL               :: SPARQL processor
    	15:24:57 INFO  SPARQL               :: Locking policy: none
    	15:24:57 INFO  SPARQL               :: Dataset description: \
    true // Web loading: true
    	15:24:57 INFO  Configuration        :: ==== Bind services to the server ====
    	15:24:57 INFO  Configuration        :: Service: <db2>
    	15:24:57 INFO  Configuration        :: Service: <sparql>
    	15:24:57 INFO  Configuration        :: Service: <books>
    	15:24:57 INFO  Configuration        :: ==== Initialize datasets ====
    	15:24:58 INFO  Configuration        :: ==== End Configuration ====
    	15:24:58 INFO  Dispatcher           :: Loaded data source \
    configuration: joseki-config.ttl
    	15:24:58 INFO  log                  :: Logging to \
    org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
    	15:24:58 INFO  log                  :: jetty-6.1.25
    	15:24:58 INFO  log                  :: NO JSP Support for /
    , did not find org.apache.jasper.servlet.JspServlet
    	15:24:58 INFO  log                  :: Started \
  12. Point your browser to http://localhost:2020/sparql.html, type in your SPARQL query, and click Get Results, as shown:
    Figure 6. SPARQL query view in a web browser
    Screen cap shows SPARQL query

The browser will now display the following output.

Figure 7. Output for Get Results action
Image shows query results

Updating data in the Staffing System application

From time to time after the Staffing System is up and running, the company must update data in each of the original four systems. For example, the company must make updates after adding staff, completing projects, and moving staff to other projects. Let's see how to reflect these updates in the new Staffing System by using the DB2 database server.

Inserting new triples into existing graphs

Assume that a new employee joins the organization and that the HR system sends the new employee information in an N-triples format file to be added to the Staffing System. In the tutorial, the new employee information is in the newdata.nt file, located in the resources folder. To add the new information:

  1. Run the sample program. Following is an excerpt.
    Listing 18.
    // Get the new triples to add
    StmtIterator it = getNewTriplesToAdd("./resources/newdata.nt");
    // Connect to the defaultModel in the store
    Model storeDefModel = RdfStoreFactory.connectDefaultModel(store, conn);
    // Begin a DB2 transaction, and add each new triple to the store
    while (it.hasNext()) {

    As shown, the program performs the following steps:
    1. Uses the getNewTriplesToAdd method to read the newdata.nt file and return the list of triples in it.
    2. Connects to the default model in the store.
    3. Adds the triples into the staffing store one by one. This must be done within a DB2 transaction boundary.
  2. To verify that the new data was inserted, count the employees in the staffing store. Previously, there were five; now there should be seven. You can use the following SPARQL query to get the employee count.
    Listing 19. SPARQL query for employee count
    select (count(?emp)AS?employeeCount)where{ 
     ?emp <> 

    The results of this query are as follows:

    Figure 8. New employee count
    Image shows count

Deleting triples

To delete a triple or a set of triples from a graph in the store, use one of the following approaches. The code shown in these examples is in the file.

Use the model interface. This approach is illustrated by the removeEmployeeInformationUsingModelInterface method.

Listing 20. removeEmployeeInformationUsingModelInterface method
//Connect to the default model
Model defModel = RdfStoreFactory.connectDefaultModel(store, conn);
//List the statements for the particular subject
Resource employee = new ResourceImpl(""+eid);
StmtIterator stmtIterator = defModel.listStatements(employee,null,(RDFNode)null);
//Remove the statements for the particular subject

As shown, the method performs the following steps:

  1. It connects to the model.
  2. It lists all the statements for any particular combination of subject, object, and predicate. In this example, all the statements with a particular employee ID as the subject are listed.
  3. Deletes the statements individually by passing the iterator to the model.remove method. You can also use the defModel.removeAll() method (not shown) to remove all the statements.

Use the graph interface model. This approach is illustrated by the removeEmployeeInformationUsingGraphInterface method.

Listing 21. removeEmployeeInformationUsingGraphInterface method
//Connect to the default graph
Graph defGraph = RdfStoreFactory.connectDefaultGraph(store, conn);
//Create a triple with the particular subject and any predicate or value
Node s = Node.createURI(""+eid);
Triple triple = new Triple(s, Node.ANY, Node.ANY);
//Search for the all the matching triples
ExtendedIterator<Triple> matchedTripleIterator = defGraph.find(triple);

//Delete all the matching triples
Triple matchedTriple =;

As shown, the method performs the following steps:

  1. It finds the triples to remove by using the Graph.find method. This method takes a triple as its argument. The subject and predicate are set to a particular employee ID, and their value is set to Node.ANY. All the triples with the particular employee ID as the subject are selected. You can use other variations of triples. The find method returns a triple iterator.
  2. It removes all the required triples from the graph by using the iterator in a loop.

Deleting graphs

To delete an entire graph from the store, use the removeGraph method, which is shown in the sample file, specifying the name of the graph.

Listing 22. Removing the graph
Data setGraph dsg = RdfStoreFactory.connectData set(store, conn).asData setGraph();

You cannot delete the default graph from the data set. However, you can remove all the triples from the default graph by using the removeAll method.

Listing 23. removeAll method
Model defaultmodel = RdfStoreFactory.connectDefaultModel(store, conn);

Updating triples

To update the value of a triple:

  1. Find the triple by using the graph.find method.
  2. Delete the triple.
  3. Add a new triple with the new value.

Administration and migration

Administering RDF stores

Statistics in the RDF store

DB2 RDF store uses two sets of distribution statistics for improving performance of the SPAQL queries.

  • DB2 table statistics— The DB2 database server gathers these statistics for the RDF store tables. Setup of the RUNSTATS utility profile for the DB2 database is done automatically during store creation. The DB2 software uses these statistics to optimize the access plan for SQL queries.
  • RDF store statistics. Use the RDF store to gathers these statistics which help optimize SPARQL queries.

The following statistics are collected:

  • The average number of triples per subject, object, or graph
  • The total number of triples in the store subject, object, or graph, with poor selectivity of triples

Setting up automated statistics collection

You can schedule statistics to be collected at frequent intervals. This is one using the DB2 administrative tasks scheduler.

  1. Turn on the Administrative Task Scheduler. Issue the following command: db2set DB2_ATS_ENABLE=YES.
  2. Activate the database.
  3. Ensure that the user who created the store has update privileges on SYSTOOLS.ADMIN_TASKS and SYSTOOLS.ADMIN_TASK_STATUS tables.

The statistics in all RDF stores are updated only if the following additional conditions are satisfied:

  • The number of triples in the stores exceeds 10 million.
  • There is a 25-percent increase or decrease in the number of triples since statistics were previously collected.

To schedule statistics collection, use the schedule parameter of the setstatsschedule command. For example, to trigger the scheduler to gather store statistics 15 minutes past every hour, issue the following command.

Listing 24. Scheduling statistics collection
setstatsschedule staffing -db RDFSAMPL -user db2admin -password db2admin 
-schedule "15 * * * *"

Refer to the DB2 for Linux, UNIX, and Windows Information Center for more details on the parameters for the setstatsschedule command.

Setting up manual statistics collection

To manually gather statistics for the RDF store, issue the updaterdfstorestats command: updaterdfstorestats staffing -db RDFSAMPL -user db2admin -password db2admin.

Refer to the DB2 for Linux, UNIX, and Windows Information Center for more details on the parameters for the updaterdfstorestats command.

Migrating to the DB2 database server from other RDF storage engines

The DB2 database server offers benefits such as compression, scalability, parallel execution, security, backup and recovery, and mature administration practices. If you are using RDF technology with a different RDF storage engine, consider moving your RDF data to the DB2 database server. This section shows an example of how you can use the createrdfstoreandloader command to move your RDF data from other RDF stores (open source or proprietary) to the DB2 database server.

Exporting data from existing RDF storage engines

First, export your entire RDF data set from the current RDF storage engine into an nquad-formatted file. Most RDF storage engines provide APIs or commands to export the RDF data set to an N-quad or N-triple file. Save this file to disk as c:\exported.nq.

Populating a new RDF store

The createrdfstoreandloader command loads RDF bulk data into the DB2 database server. The command parses the N-quad or N-triple input file, generates DB2 load files, and creates the required RDF store tables. You can then use the load files to populate a new DB2 RDF store.

To populate a new RDF store:

  1. At the DB2 RDF command prompt, issue the createrdfstoreandloader command, as shown in the following example. In the example, the new store that will contain the migrated RDF data is called migratedStore.
    Listing 25. createrdfstoreandloader command
    createrdfstoreandloader migratedstore -db RDFSAMPL -user db2admin -password 
    db2admin -rdfdata c:\exported.nq -storeloadfile c:\loadfolder\migratedstore.sql 
    -storeschemafile c:\loadfolder\migratestoreddl.sql


    • -db specifies the database in which to create the store — in this case, RDFSAMPL.
    • -user specifies the authorization name to use to establish the connection — in this case, dbadmin.
    • -password specifies the password to use to establish the connection — in this case, db2admin.
    • -rdfdata specifies the nquad or ntriple input file (c:\exported.nq, in this case) to load onto the DB2 database server.
    • -storeloadfile specifies an SQL file that will be generated that contains the DB2 load commands. If you do not specify the path in which to create the file, the file is created in the current folder. In this example, the existing folder c:\loadfolder is specified as the path.
    • -storeschemafile (optional parameter) creates an SQL file containing the DDL statements for the creation of the store. This parameter is provided in case you must deploy multiple instances of the RDF data. Instead of running the createrdfstoreandloader command multiple times, for faster performance you can specify the -storeschemafile parameter followed by a .sql file name. Ensure that you save the file and the dependent load files.

    Assume that the DB2 server is on a local system and is running off the default Windows port 50000. You do not have to specify the -host and -port parameters. Assume also that you are using the default schema. Therefore, you do not have to specify the -schema parameter. Because you did not specify the objectnames parameter, which controls the names of the tables and their table spaces, system-generated table names and default table spaces are used.

    Important: On Windows, the createrdfstoreandloader command requires Cygwin. V4.0 of the Gawk utility and V8.14 or later of the Core utility are required for this command. After installing Cygwin add <CgyWin_install_directory>/bin to the PATH environment variable. If you don't have Cygwin, the following error message is displayed when you run the createrdfstoreandloader command: 'Cannot run program "sh": CreateProcess error=2, The system cannot find the file specified.'.

    On Windows, the createrdfStoreAndLoader command can be invoked from the Cygwin command prompt or default command prompt. When using a Cygwin command prompt, please note that all file paths (-rdfdata , -storeloadfile, -storeschemafile , -objectnames) should not include the 'cygdrive' prefix. Instead use normal Windows path like 'C:\....'

    If folder or file names contain a space, then the whole string should be enclosed within double quotes.

  2. Open a DB2 command prompt and change the directory to the location of the file that contains the DB2 load commands. In this example, the folder is c:\loadfolder.
  3. Connect to the database that you created — in this case, RDFSAMPL: db2 connect to RDFSAMPL user db2admin.
  4. Run the file containing the DB2 commands — in this case, the migratedstore.sql file: db2 –f migratedstore.sql.

    Note: Do not use the -t argument when running the SQL script because it is generated with newline as the command separator.

In this example, the DB2 database now contains an RDF store called migratedstore that contains all the RDF data from the RDF store you used earlier. You can now update and query migratedstore.


This tutorial showcased a sample RDF application scenario and walked you through the steps to build this application by using the DB2 RDF feature. You learned how to create a default RDF store and insert graphs into it. Next, you learned how to execute SPARQL queries by using the JENA APIs, as well as over HTTP by using Joseki. You then learned how to update the graphs in the RDF store.

In addition, you learned how to maintain the statistics for an RDF store so that queries perform efficiently. Finally, you learned about a process for migrating to the DB2 database server from other RDF storage engines.

In Part 2, we cover advanced topics such as creating optimized RDF stores and using access control for RDF.


DB2RDF JavadocDB2RDFjavadoc.zip87KB
Tutorial samplesDB2RDFTutorial.zip21KB



Get products and technologies

  • Download Joseki.
  • Build your next development project with IBM trial software, available for download directly from developerWorks.
  • Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.



developerWorks: Sign in

Required fields are indicated with an asterisk (*).

Need an IBM ID?
Forgot your IBM ID?

Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.


All information submitted is secure.

Dig deeper into Information management on developerWorks

Zone=Information Management
ArticleTitle=Resource description framework application development in DB2 10 for Linux, UNIX, and Windows, Part 1: RDF store creation and maintenance