Resource description framework application development in DB2 10 for Linux, UNIX, and Windows, Part 2: Optimize your RDF data stores in DB2 and provide fine-grained access control

The Resource Description Framework (RDF) is a family of W3 specification standards that enable the exchange of data and metadata. Using IBM® DB2® 10 for Linux®, UNIX®, and Windows® Enterprise Server Edition, applications can store and query RDF data. This tutorial looks at the characteristics of RDF data and describes the process for creating optimized stores. In addition, it describes how to provide fine-grained access control to RDF stores using the DB2 engine or the application. It includes a sample application.

Mario Briggs (mario.briggs@in.ibm.com), Senior Software Engineer, IBM China

Mario  Briggs photoMario Briggs leads the open source offerings for IBM DB2 and IBM Informix, including PHP, Ruby/Rails, Python/Django, Perl, and Java data access frameworks. He also leads the RDF support in DB2. He has about 14 years of experience in software development with many of those years spent in the area of data access, relational engines, and application-database performance.



Rajendran Appavu (apprajen@in.ibm.com), Advisory Software Engineer, IBM China

Rajendran AppavuRajendran Appavu works for the RDF support in DB2. He has about 19 years of software design and development experience. He has worked on networking, network management, provisioning, CORBA, and data management software products.



Farzana Anwar (fanwar@ca.ibm.com), Information Developer, IBM China

Farzana AnwarFarzana Anwar is a member of the DB2 for Linux, UNIX, and Windows team at the IBM Canada Lab in Markham, Ontario. Since 2004, she has held various roles across IBM, in the areas of application development, system verification test, technical support, and information development. She has a bachelor's degree in computer science from Acadia University and a master's degree in information systems from Athabasca University. In her current role as a technical writer for DB2 for Linux, UNIX, and Windows, she focuses on making DB2 products more consumable for users.



Ganesh Choudhary (kumar.ganesh@in.ibm.com), System Software Engineer, IBM China

Photograph of author Ganesh ChoudharyGanesh Choudhary is a System Software Engineer in the Open Source Team at IBM India Software Labs. He works on pureQuery auto-tuning feature for Hibernate applications. He graduated with an M.Tech in Information Technology from IIIT, Bangalore.



Priya Ranjan Sahoo (prrsahoo@in.ibm.com), Senior Software Engineer, IBM China

Priya SahooPriya Ranjan Sahoo works for the RDF support in DB2. He has about six years of experience in software development, mostly in Java application development and databases.



04 October 2012

Also available in Chinese

Before you start

About this tutorial

This tutorial is the second in a three-part "Resource description framework application development in DB2 10 for Linux, UNIX, and Windows" series of tutorials that gives you hands-on experience in using DB2 for Linux, UNIX, and Windows software with Resource Description Framework (RDF) technology:

  • Part 1 of the series introduces a sample use case for an RDF application. The tutorial walks you through the steps of building a sample application, creating an RDF store, querying data by using SPARQL queries, and maintaining the statistics. Part 1 also provides a process to migrate your RDF stores to DB2 software.
  • Part 2 covers advanced topics:
    • Providing an overview of the characteristics of RDF data and how DB2 software optimizes the storage of RDF data.
    • Describing the process for creating optimized stores.
    • Comparing the optimized store against the default store you created for the same application in Part 1, so you can see the benefits of an optimized store.
    • Describing how to provide more fine-grained access control for the RDF stores. The tutorial walks you through enforcing access control by using the DB2 engine and by using the application.

Here in Part 2, we build upon the sample application use case in Part 1. If you have not done so already, review the Part 1 for context.

Part 3 will cover more SPARQL-specific features, such as the DESCRIBE and CONSTRUCT query forms and unions of named graphs.

Default and optimized stores

To review from Part 1, DB2 software supports the creation of two types of RDF stores:

  • Default RDF stores— Create a default RDF store when you do not have information about the RDF data you are loading or when no appropriate sample is available. To create this type of store, use the createrdfstore RDF command.
  • Optimized RDF stores— Create an optimized store when there is representative sample data for the RDF data set. Optimized stores provide an optimal schema based on the input RDF data set you provide.

A major focus of this tutorial is the creation of optimized stores.


Characteristics of RDF data

Schemaless data model

The RDF data model is schemaless. Unlike the relational model in which each table has a fixed number of columns, an RDF data set does not have a fixed number of predicates. A particular RDF subject can have any number of predicates. Furthermore, an RDF data set stores data across any number of domains, which further increases the "schemalessness" of the model. So when mapping RDF data to a relational schema, you must use a mechanism to support the schemaless nature of RDF data.

The most common mechanism to handle the schemaless feature when mapping RDF data to a relational schema is to have a table with three columns — one each for subject, predicate, and object. With this method, each triple is in a new row in the table, so that a variable number of predicates can be handled. However, this mapping does not scale well and has performance problems because querying data requires large number of self-joins with no useful use of relational indices. As an example, a simple query to retrieve two predicates of a single subject involves a self-join and fetching of two rows. In comparison, a traditional relational modeling of the same data, where both predicates exist in a single row, does not require any join, and data can be retrieved in with a single fetch.

DB2 handling of the schemaless feature

DB2 software removes the need for a large number of self-joins when you query RDF data. It does so by storing all predicates and objects about a subject in a single row or minimal number of rows in a table. Because a relational table must have a fixed number of columns (governed by page size and column length), the mechanism for handling a variable number of predicates depends on how predicates are assigned to columns in the table.

DB2 software uses two mechanisms to assign predicates to columns in a table:

  • Hashing— To reduce hash collisions, a set of hash functions, rather than a single hash function, is used. Hashing is random, and, in spite of the use of multiple hash functions, collisions can still occur relatively easily. If collisions occur, a new row is created in the table. The hashing mechanism is used in the default store.
  • Predicate correlation— If a representative sample of the RDF data is available, DB2 software calculates the correlation among the predicates of the various resource types in the RDF data set. The software uses this correlation to assign predicates to columns in the table. This process leads to better space utilization in the table with diminished chances of collision. Multiple correlation functions are used to further diminish chances of collision. The predicate correlation mechanism is used in the optimized store.

Creating an optimized store

There are multiple ways to create an optimized store with DB2. The migration scenario in Part 1 uses the createrdfstoreandloader command to create an optimized store. This does bulk loading of RDF data into a DB2 database. In the case of bulk loading, it is expected that the input data has a good sampling of how predicates correlate across the different resource types. However, you might begin application development without having representative data to create an optimized store. In this case, you have two options:

  • Start by creating a default store, as in Part 1, and use it until you generate sufficient data — in the system QA cycle for the application, for example. You can then have the DB2 database engine calculate the predicate correlation for the default store and use this correlation information to create an optimized store for production. We will walk you through this mechanism here in Part 2.
  • Start by creating a default store and use the REORG family of DB2 RDF commands to reorganize the store into an optimized store. For more information, see the DB2 Information Center topic titled "Converting a default store to an optimized store" (see Resources).

To use predicate correlation to recreate the default staffing store you created in Part 1 as an optimized store:

  1. Using the existing store, generate predicate mappings.
  2. Create the optimized store by using the predicate mappings.

A predicate mapping is a mapping between a predicate in the data set and the column in the underlying relational table of the DB2 RDF store in which the predicate will be stored. A predicate mapping is determined based on the correlation of the predicate among the various RDF resource types on which it occurs. This is done to pack the RDF data in the fewest number of columns.

Generating the predicate mapping for the existing store

The com.ibm.rdf.store.cmd.GeneratePredicateMappings Java™ class generates the predicate mapping for a default store and prints the mapping to the console. The syntax for the method is as follows:

java com.ibm.rdf.store.cmd.GeneratePredicateMappings RDF_store_name
                -db database_name
                -user user_name
                -password user_password
              [ -host host_name ]
              [ -port port_number ]
              [ -schema schema_name ]

The program resources for this tutorial are in the DB2RDFTutorial-Part2.zip file (see Downloads). Please follow the steps described in the section "Setting up the RDF application development environment" in Part 1 to import the Java project DB2RDFTutorial-Part2.

You can run GeneratePredicatesMappings at the command prompt by using the genpredicatemappings.bat file available in the resources folder of the DB2RDFTutorial-Part2.zip file (see Downloads). Copy genpredicatemappings.bat to the <DB install location>\SQLLIB\rdf\bin folder and run it from there by passing all the required arguments as follows. You should direct the output of this command to a file:

C:\Program Files\IBM\SQLLIB\rdf\bin>genpredicatemappings.bat staffing -db RDFSAMPL 
-user db2admin -password db2admin > /predicate_mappings.txt

To run it inside IBM Data Studio, you can use the GeneratePredicateMappings.launch configuration file in the DB2RDFTutorial-Part 2 project to call the com.ibm.rdf.store.cmd.GeneratePredicateMappings class to generate the predicate mapping for the default staffing store. The output is printed to the console and written to the staffing_predicate_mappings.txt file. The output is saved so you can use it in the next step, in which you create an optimized store.

To generate the mappings by using the launch configuration:

  1. Open IBM Data Studio.
    Figure 1. The launch application in IBM Data Studio
    Screen cap shows GeneratePredicateMappings.launch highlighted

    Optional: Change the location of the output file by right-clicking the GeneratePredicateMappings.launch configuration file and selecting Run As > Run Configurations.

    Figure 2. Run Configurations view to change the location
    Screen cap shows where to change the output location

    Click Close to close this window.

  2. Right-click the launch configuration and click Run As > GeneratePredicateMappings. The predicate mapping is generated, printed to the console, and written to the specified file.

Creating an optimized store

To create an optimized store:

  1. Issue the createrdfstore command, using the –predicatemappings parameter to specify the location of the predicate mapping you generated. In this tutorial, issue the createrdfstore command as follows:
    C:\Program Files\IBM\SQLLIB\rdf\bin>createrdfstore 
    staffing_optimized -db RDFSAMPL -user 
    db2admin -password db2admin –predicatemappings \
    staffing_predicate_mappings.txt
  2. Load the data into the optimized store. You can use the InsertGraph.java sample program, described in Part 1. Because you specified the name of the store in the previous step as staffing_optimized, you must modify line 44 of the InsertGraph.java program as follows:
	String storeName = "staffing_optimized";

The InsertGraph.java program inserts the triples successfully into the optimized store. This indicates that any application you build to use a default store will work with an optimized store, without any code changes.


Structural differences between optimized and default stores

Optimized stores and default stores differ structurally in two of their tables: the Direct Primary (DP) and Reverse Primary (RP) tables. All other tables of the RDF store are the same for default and optimized stores. The differences in DP and RP tables in an optimized store can help speed up insert and query performance. To understand the differences in the structure of the stores, you must understand how the predicates are mapped to the columns and are stored in the underlying tables.

DP tables

A DP table stores the RDF triples and the graph they belong to, indexed by subject. Predicates and objects for a subject are stored in pair columns in this table. A particular predicate can occur in any one of three columns in this table. The object for that predicate is stored in the corresponding object column of the predicate-object pair.

Compared to a DP table in a default store, a DP table in an optimized store has fewer columns and fewer null values. The DP table in the default store you created in Part 1 has 128 pairs of predicate and object columns (prop XXX, val XX). Many columns have null values, as shown. The DP table in the optimized store you just created has only eight pairs of predicate and object columns and, of course, far fewer null values.

RP tables

An RP table stores the RDF triples and the graph that they belong to, indexed by object. Predicates and subjects for an object are stored in pair columns in this table. A particular predicate can occur in any one of three columns in this table, and the subject for that predicate is in the corresponding subject column of the pair.

Figure 3. Data in the direct primary table in default and optimized stores
Screen cap shows number of placeholders for predicates

Comparing the tables in the optimized store and default store reveals that:

  • Fewer columns and fewer null values are required in the optimized store.
  • You have faster insert operation and improved query performance.

If you compare the RP tables in the default and optimized stores you created, you will notice the same pattern of fewer columns and better utilization in the optimized store, as shown below.

Figure 4. Data in the Reverse Primary table in default and optimized stores
Screen cap shows five placeholders for predicates in optimized store and 33 placeholders in default store

Optimized stores and new predicates

An optimized store can handle new predicates you add to the store (predicates that were unavailable during the predicate correlation calculation).

If you add predicates, the optimized store automatically and transparently uses hash functions to determine the column positions for the new predicates. Therefore, an optimized store can use a mixture of predicate correlation and hashing for the assignment of predicates to columns in the primary tables.

Application development and maintenance for optimized stores

From the perspective of store maintenance in the DB2 database and application development, there is no difference between a default store and an optimized store. You do not have to develop an application differently depending on whether it uses an optimized store or a default store.


Access control for RDF stores 

As a starting point for controlling access to an RDF store, you can use the table-level permissions on the tables that make up the RDF store to set up coarse-grained access control on the store. You can use this mechanism to restrict who has access to the entire store and control what actions they can do, for example, whether they can only read or can also update the store.

To enable store-level access control, issue the following SQL statement: GRANT privilege ON TABLE tablename TO auth_name where privilege can be SELECT, UPDATE, INSERT, or DELETE.

You can control the names of the tables that make up the RDF store by specifying the -objectnames parameter for the createrdfstore or createrdfstoreandloader command. However, if you did not use the -objectnames parameter and want to issue the GRANT statement, you can easily determine the names of the tables of an RDF store by using the following SQL statement:

SELECT DIRECTPRIMARY, REVERSEPRIMARY, DIRECTSECONDARY, REVERSESECONDARY FROM tablename

RDF graph-level access control

Store-level access control might be too coarse-grained. You might want to use a fine-grained access-control mechanism for a more granular level. DB2 software supports access control at the graph level within an RDF store. You can use fine-grained graph-level access control and coarse-grained store-level access control at the same time.

To use graph-level access control:

  • You must use the named graphs feature of RDF. Particularly, you must keep the confidential triples in a separate named graph and the non-confidential triples another named graph. Additionally, each named graph will store triples that govern the access to the graph.
  • SPARQL queries must query triples across the named graphs rather than in the default graph. (A handy mechanism for this is the unionDefaultGraph feature, which creates a union of all the named graphs and treat that as the default graph.) The sample application described below demonstrates with examples how to use this mechanism.)

Sample application scenario for graph-level access control

So far, data in the sample staffing application is available to all users of the application because the data is not confidential. Now, assume you want to update the staffing application to also store the salary details of employees. In this case, you must ensure that the only people who can view the salary information for a particular employee are the employee and the employee's direct manager.

Using named graphs for access control

In our example, to restrict access to an employee's salary information we:

  1. Add the triples containing information about each employee's salary in a separate named graph.
  2. Add all triples containing non-confidential information about an employee, keeping it together in another named graph.

Every graph has two triples that store the employee ID and manager ID. The predicates for these triples are http://xyz.com/ctx/employeeId and http://xyz.com/ctx/managerId. These triples govern access to the graph.

The sample data for this is stored in an nquad file named accesscontrol.nq in the resources folder. Notice the following aspects of the accesscontrol.nq file:

  • Triples for each employee's salary details are included in a separate named graph for each employee. The predicates used for salary details are http://xyz.com/hr/salaryBand, http://xyz.com/hr/perksBand, and http://xyz.com/hr/stocksBand.
<http://xyz.com/employee#000001>  <http://xyz.com/hr/salaryBand> 
<http://xyz.com/hr/salaryBand/band3>  <http://xyz.com/ns/graph/hr/employee#000001> .
<http://xyz.com/employee#000001>  <http://xyz.com/hr/PerksBand> 
<http://xyz.com/hr/PerksBand/HIGH>  <http://xyz.com/ns/graph/hr/employee#000001> .
<http://xyz.com/employee#000001>  <http://xyz.com/hr/StocksBand> 
<http://xyz.com/hr/StocksBand/HIGH>  <http://xyz.com/ns/graph/hr/employee#000001> .
<http://xyz.com/employee#000002>  <http://xyz.com/hr/salaryBand> 
<http://xyz.com/hr/salaryBand/band3>  <http://xyz.com/ns/graph/hr/employee#000002> .
<http://xyz.com/employee#000002>  <http://xyz.com/hr/PerksBand> 
<http://xyz.com/hr/PerksBand/LOW>   <http://xyz.com/ns/graph/hr/employee#000002> .
<http://xyz.com/employee#000002>  <http://xyz.com/hr/StocksBand> 
<http://xyz.com/hr/StocksBand/HIGH>  <http://xyz.com/ns/graph/hr/employee#000002> .
  • The named graphs that have the salary details also contain the http://xyz.com/ctx/employeeId and http://xyz.com/ctx/managerId predicates. The http://xyz.com/ctx/managerId predicate values show that Gayathri is the manager for the four employees: Rajesh, Mario, Farzana, and Ranjan.
<http://xyz.com/employee#000001>  <http://xyz.com/ctx/employeeId> "rajesh" 
<http://xyz.com/ns/graph/hr/employee#000001> .
<http://xyz.com/employee#000002>  <http://xyz.com/ctx/employeeId> "mario"  
<http://xyz.com/ns/graph/hr/employee#000002> .
<http://xyz.com/employee#000003>  <http://xyz.com/ctx/employeeId> "ranjan" 
 <http://xyz.com/ns/graph/hr/employee#000003> .
<http://xyz.com/employee#000004>  <http://xyz.com/ctx/employeeId> "gayathri" 
 <http://xyz.com/ns/graph/hr/employee#000004> .
<http://xyz.com/employee#000005>  <http://xyz.com/ctx/employeeId> "farzana"  
<http://xyz.com/ns/graph/hr/employee#000005> .
<http://xyz.com/employee#000001>  <http://xyz.com/ctx/managerId> "gayathri"  
<http://xyz.com/ns/graph/hr/employee#000001> .
<http://xyz.com/employee#000002>  <http://xyz.com/ctx/managerId> "gayathri" 
 <http://xyz.com/ns/graph/hr/employee#000002> .
<http://xyz.com/employee#000003>  <http://xyz.com/ctx/managerId> "gayathri"  
<http://xyz.com/ns/graph/hr/employee#000003> .
<http://xyz.com/employee#000004>  <http://xyz.com/ctx/managerId> "gayathri"  
<http://xyz.com/ns/graph/hr/employee#000004> .
<http://xyz.com/employee#000005>  <http://xyz.com/ctx/managerId> "gayathri"  
<http://xyz.com/ns/graph/hr/employee#000005> .
  • The http://xyz.com/ctx/employeeId and http://xyz.com/ctx/managerId predicates are set to PUBLIC in the other graphs.
<http://xyz.com/ns/graph/hr> <http://xyz.com/ctx/employeeId> \
"PUBLIC" <http://xyz.com/ns/graph/hr> .
<http://xyz.com/ns/graph/hr> <http://xyz.com/ctx/managerId> \
"PUBLIC" <http://xyz.com/ns/graph/hr> .
<http://xyz.com/ns/graph/legal> <http://xyz.com/ctx/employeeId> \
"PUBLIC" <http://xyz.com/ns/graph/legal> .
<http://xyz.com/ns/graph/legal> <http://xyz.com/ctx/managerId> \
"PUBLIC" <http://xyz.com/ns/graph/legal> .
<http://xyz.com/ns/graph/org> <http://xyz.com/ctx/employeeId> \
"PUBLIC" <http://xyz.com/ns/graph/org> .
<http://xyz.com/ns/graph/org> <http://xyz.com/ctx/managerId> \
"PUBLIC" <http://xyz.com/ns/graph/org> .
<http://xyz.com/ns/graph/project> <http://xyz.com/ctx/employeeId> \
"PUBLIC" <http://xyz.com/ns/graph/project> .
<http://xyz.com/ns/graph/project> <http://xyz.com/ctx/managerId> \
"PUBLIC" <http://xyz.com/ns/graph/project> .

Creating a store with graph-level access control

To create an RDF store with graph-level access control, you must specify the predicates whose values govern access to the RDF graphs in the data set. In the staffing application scenario, these predicates are http://xyz.com/ctx/employeeId and http://xyz.com/ctx/managerId. When creating a store, you specify these predicates by using the -systempredicates parameter of the createrdfstoreandloader command.

The -systempredicates parameter accepts a properties file whose keys are the predicates used for graph-level access control. The values are the DB2 column data type. Currently, only the VARCHAR data type is supported. You can use the colon (:) or equal sign (=) as the separator character between the key and value. If the key or value itself contains the separator character (for example, IRIs contain a colon), you must use a backslash (\) as the escape character. For more information, see the sample syspredsfile.props file provided with the tutorial.

To create a store with graph-level access control for the staffing application scenario:

  1. Issue the createrdfstoreandloader command, specifying the syspredsfile.props file for the -systempredicates parameter and the accesscontrol.nq file for the -rdfdata parameter, as shown in the following example:
C:\Program Files\IBM\SQLLIB\rdf\bin>createrdfstoreandloader.bat \
STAFFINGSECURE -db RDFSAMPL -user db2admin -password db2admin
	-rdfdata .\accesscontrol.nq -systempredicates .\syspredsfile.props
	-storeloadfile staffingsecureloader.sql

The secureStaffingstore RDF store is created.

  1. At the DB2 command prompt, load the store with the triples by running the staffingsecureloader.sql file:
	C:\Program Files\IBM\SQLLIB\rdf\bin>db2 –t staffingsecureloader.sql

Enforcing access control by using DB2 row and column access control

Now, set up DB2 row and column access control (RCAC) for the RDF store so that:

  • A named graph that contains a triple whose http://xyz.com/ctx/employeeId predicate's value is PUBLIC is accessible to everyone.
  • Named graphs that do not match the above condition are accessible only if the named graph contains a triple whose http://xyz.com/ctx/employeeId predicate's value matches the loginID of the current user or the named graph contains a triple whose http://xyz.com/ctx/managerId predicate's value matches the login ID of the current user.

To set up DB2 row and column access control for the above conditions:

  1. Create the following five users and their passwords.
Table 1. User IDs and passwords
User IDPassword
rajeshrajesh123
mariomario123
ranjanranjan123
farzanafarzana123
gayathrigayathri123

To create the users, issue the following commands at a Windows® command prompt. You must have the Windows privilege to create new users.

net user rajesh rajesh123 /add
net user mario mario123 /add
net user ranjan ranjan123 /add
net user farzana farzana123 /add
net user gayathri gayathri123 /add
  1. Grant the users permissions on the RDF store tables in the DB2 database. You can grant the required permission on the RDF store tables to PUBLIC, rather than granting them to all five users individually. To grant permissions:
    1. Log in as the db2admin user.
    2. From the DB2 CLP, execute the resource/grantperms.sql file, which contains the SQL statements for granting permissions:
      C:\Users\IBM_ADMIN>db2 connect to RDFSAMPL user db2admin
      Enter current password for db2admin:
      
         Database Connection Information
      
       Database server        = DB2/NT64 10.1.1
       SQL authorization ID   = DB2ADMIN
       Local database alias   = RDFSAMPL
      
      C:\Users\IBM_ADMIN>db2 -f grantperms.sql
      DB20000I        The SQL command completed successfully.
      DB20000I        The SQL command completed successfully.
      DB20000I        The SQL command completed successfully.
      DB20000I        The SQL command completed successfully.
      DB20000I        The SQL command completed successfully.
      DB20000I        The SQL command completed successfully.
      DB20000I        The SQL command completed successfully.
      DB20000I        The SQL command completed successfully.
      DB20000I        The SQL command completed successfully.
      DB20000I        The SQL command completed successfully.
      
      C:\Users\IBM_ADMIN>
  2. Identify the names of the columns on the DP and RP tables of the RDF store that contain the predicate's value that govern access control to the graph. To do this, query the SP table, as shown:
    C:\Users\IBM_ADMIN>db2 select colName from \
    staffingsecure_sp where mapName='http://xyz.com/ctx/employeeId'
    
    COLNAME
    --------------------------------
    SYSPRED_1
    
      1 record(s) selected.
    
    
    C:\Users\IBM_ADMIN>db2 select colName from \
    staffingsecure_sp where mapName='http://xyz.com/ctx/managerId'
    	
    COLNAME
    --------------------------------
    SYSPRED_0
    	
     1 record(s) selected.

    The preceding query output shows that SYSPRED_0 is the column for the http://xyz.com/ctx/managerId predicate's value and that SYSPRED_1 is the column for the http://xyz.com/ctx/employeeId predicate's value.

  3. Enforce the rule that the only people who can view a particular employee's salary are the employee and the direct manager. To enforce the rule, create row permissions on the SYSPRED_0 and SYSPRED_1 columns of the DP and RP tables of the staffingsecure RDF store. The SQL statement to create these permissions, which is in the resources/fgacperms.sql file, is as follows:
    CREATE PERMISSION DPHROWACCESS ON DB2ADMIN.STAFFINGSECURE_DPH FOR ROWS WHERE
      ( SYSPRED_0 = 'PUBLIC'  OR ( UCASE(SYSPRED_1) = SESSION_USER OR 
       UCASE(SYSPRED_0) = SESSION_USER) ) ENFORCED FOR ALL ACCESS ENABLE

    Run this file from a DB2 CLP window, as shown:

    C:\Users\IBM_ADMIN>db2 -f fgacperms.sql
    
    DB20000I        The SQL command completed successfully.
    DB20000I        The SQL command completed successfully.
    DB20000I        The SQL command completed successfully.
    DB20000I        The SQL command completed successfully.

    For running the above commands, the user needs to have SECADM authority. The SCEADM authority can be granted using the command GRANT SECADM ON DATABASE TO DB2ADMIN.

    You have now set up graph-level access control for the RDF store by using DB2 row and column access control. To verify that it behaves as required, run the SPARQL query to select the employee salary band by logging in as different users.

    For running the SPARQL queries, you need the queryrdfstore.bat file. This file is present in the resources folder or DB2RDFTutorial-Part2.zip. Copy the queryrdfstore.bat to the <DB install location>\SQLLIB\rdf\bin folder.

    1. First, run the query logged in as the db2admin user. Notice that no rows of data are returned.
      C:\Program Files\IBM\SQLLIB\rdf\bin>queryrdfstore.bat staffingsecure -db RDFSAMP
      L -user db2admin -password db2admin -schema db2admin \
      -uniondefaultgraph true "select 
      ?name ?salband where { ?s <http://xyz.com/hr/salaryBand> ?salband . ?s 
      <http://xmlns.com/foaf/0.1/name> ?name }"
      ------------------
      | name | salband |
      ==================
      ------------------
      Time taken      : 712 ms
    2. Run the same query as the user Mario. Notice that only this employee's salary band is displayed.
      C:\Program Files\IBM\SQLLIB\rdf\bin>queryrdfstore.bat 
      staffingsecure -db RDFSAMPL 
      -user mario -password mario123 -schema db2admin -uniondefaultgraph true "select 
      ?name ?salband where { ?s <http://xyz.com/hr/salaryBand> ?salband . ?s 
      <http://xmlns.com/foaf/0.1/name> ?name }"
      ------------------------------------------------------------
      | name              | salband                              |
      ===============================================
      | "Mario Ds Briggs" | <http://xyz.com/hr/salaryBand/band3> |
      ------------------------------------------------------------
      Time taken      : 830 ms
    3. Now, run the same query logged in as the user Gayathri, who is the manager. All the employees for that manager are listed, along with their salary bands.
      C:\Program Files\IBM\SQLLIB\rdf\bin>queryrdfstore.bat staffingsecure -db RDFSAMP
      L -user gayathri -password gayathri123 -schema db2admin -uniondefaultgraph true 
      "select ?name ?salband where { ?s <http://xyz.com/hr/salaryBand> ?salband . ?s 
      <http://xmlns.com/foaf/0.1/name> ?name }"
      -----------------------------------------------------------------
      | name                   | salband                              |
      ===============================================================
      | "Rajesh K Arora"       | <http://xyz.com/hr/salaryBand/band3> |
      | "Mario Ds Briggs"      | <http://xyz.com/hr/salaryBand/band3> |
      | "Priya Ranjan Sahoo"   | <http://xyz.com/hr/salaryBand/band3> |
      | "Gayathri Raghavendra" | <http://xyz.com/hr/salaryBand/band3> |
      | "Farzana Anwar"        | <http://xyz.com/hr/salaryBand/band3> |
      -----------------------------------------------------------------
      Time taken      : 711 ms

The current set of row access permissions in the sample allows employees to update their own salary details, which would not be the case in reality. In reality, you would set up the row access permissions to prevent employees from modifying their own salaries. The is left as an independent exercise for the user and the Resources section includes links to articles on advanced features of DB2 row and column access control to help you with this.

There are many benefits to having the DB2 engine enforce graph-level access control by using the DB2 row and column access-control feature, including:

  • The same access-control mechanism is used for ANY and ALL SQL access to the tables — for example, a user issuing SQL directly against the store's table.
  • You have the flexibility of defining access-control rules as complex as you need.
  • You can enforce graph-level access control by using the Java API.

The Resources section provides a link to a DB2 Information Center topic on using DB2's RDF Java API to enforce runtime graph-level access control for SPARQL queries.


Conclusion

We have examined the characteristics of RDF data and how it is stored in the DB2 database. You walked through the process of creating an optimized store and reviewed its benefits by comparing it to a default store.

You then learned how to create an RDF store with graph-level access control. The tutorial explained how to implement graph-level access control by using the column and row access-control feature and by using the Java API.

In Part 3, we cover the basics of how DB2 software converts SPARQL to SQL. We will then delve into more SPARQL-specific features, such as the DESCRIBE and CONSTRUCT query forms and creation of a union of named graphs.


Download

DescriptionNameSize
Sample programsDB2RDFTutorial-Part2.zip27KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=838885
ArticleTitle=Resource description framework application development in DB2 10 for Linux, UNIX, and Windows, Part 2: Optimize your RDF data stores in DB2 and provide fine-grained access control
publish-date=10042012