Administration and migration
DB2 RDF store uses two sets of distribution statistics for improving performance of the SPAQL queries.
- DB2 table statistics — The DB2 database server
gathers these statistics for the RDF store tables. Setup of the
RUNSTATSutility profile for the DB2 database is done automatically during store creation. The DB2 software uses these statistics to optimize the access plan for SQL queries.
- RDF store statistics. Use the RDF store to gathers these statistics which help optimize SPARQL queries.
The following statistics are collected:
- The average number of triples per subject, object, or graph
- The total number of triples in the store subject, object, or graph, with poor selectivity of triples
You can schedule statistics to be collected at frequent intervals. This is one using the DB2 administrative tasks scheduler.
- Turn on the Administrative Task Scheduler. Issue the following
- Activate the database.
- Ensure that the user who created the store has update privileges on SYSTOOLS.ADMIN_TASKS and SYSTOOLS.ADMIN_TASK_STATUS tables.
The statistics in all RDF stores are updated only if the following additional conditions are satisfied:
- The number of triples in the stores exceeds 10 million.
- There is a 25-percent increase or decrease in the number of triples since statistics were previously collected.
To schedule statistics collection, use the
schedule parameter of the
setstatsschedule command. For example, to
trigger the scheduler to gather store statistics 15 minutes past every
hour, issue the following command.
Listing 24. Scheduling statistics collection
setstatsschedule staffing -db RDFSAMPL -user db2admin -password db2admin -schedule "15 * * * *"
Refer to the DB2 for Linux, UNIX, and Windows Information Center for
more details on the parameters for the
To manually gather statistics for the RDF store, issue the
updaterdfstorestats staffing -db RDFSAMPL -user db2admin -password
Refer to the DB2 for Linux, UNIX, and Windows
Information Center for more details on the parameters for the
The DB2 database server offers benefits such as compression,
scalability, parallel execution, security, backup and recovery, and
mature administration practices. If you are using RDF technology with
a different RDF storage engine, consider moving your RDF data to the
DB2 database server. This section shows an example of how you can use
createrdfstoreandloader command to move
your RDF data from other RDF stores (open source or proprietary) to
the DB2 database server.
First, export your entire RDF data set from the current RDF storage engine into an nquad-formatted file. Most RDF storage engines provide APIs or commands to export the RDF data set to an N-quad or N-triple file. Save this file to disk as c:\exported.nq.
createrdfstoreandloader command loads
RDF bulk data into the DB2 database server. The command parses the
N-quad or N-triple input file, generates DB2 load files, and creates
the required RDF store tables. You can then use the load files to
populate a new DB2 RDF store.
To populate a new RDF store:
- At the DB2 RDF command prompt, issue the
createrdfstoreandloadercommand, as shown in the following example. In the example, the new store that will contain the migrated RDF data is called migratedStore.
Listing 25. createrdfstoreandloader command
createrdfstoreandloader migratedstore -db RDFSAMPL -user db2admin -password db2admin -rdfdata c:\exported.nq -storeloadfile c:\loadfolder\migratedstore.sql -storeschemafile c:\loadfolder\migratestoreddl.sql
-dbspecifies the database in which to create the store — in this case, RDFSAMPL.
-userspecifies the authorization name to use to establish the connection — in this case, dbadmin.
-passwordspecifies the password to use to establish the connection — in this case, db2admin.
-rdfdataspecifies the nquad or ntriple input file (c:\exported.nq, in this case) to load onto the DB2 database server.
-storeloadfilespecifies an SQL file that will be generated that contains the DB2 load commands. If you do not specify the path in which to create the file, the file is created in the current folder. In this example, the existing folder c:\loadfolder is specified as the path.
-storeschemafile(optional parameter) creates an SQL file containing the DDL statements for the creation of the store. This parameter is provided in case you must deploy multiple instances of the RDF data. Instead of running the
createrdfstoreandloadercommand multiple times, for faster performance you can specify the
-storeschemafileparameter followed by a .sql file name. Ensure that you save the file and the dependent load files.
Assume that the DB2 server is on a local system and is running off the default Windows port 50000. You do not have to specify the
-portparameters. Assume also that you are using the default schema. Therefore, you do not have to specify the -
schemaparameter. Because you did not specify the
objectnamesparameter, which controls the names of the tables and their table spaces, system-generated table names and default table spaces are used.
Important: On Windows, the
createrdfstoreandloadercommand requires Cygwin. V4.0 of the Gawk utility and V8.14 or later of the Core utility are required for this command. After installing Cygwin add <CgyWin_install_directory>/bin to the PATH environment variable. If you don't have Cygwin, the following error message is displayed when you run the
'Cannot run program "sh": CreateProcess error=2, The system cannot find the file specified.'.
On Windows, the
createrdfStoreAndLoadercommand can be invoked from the Cygwin command prompt or default command prompt. When using a Cygwin command prompt, please note that all file paths (-rdfdata , -storeloadfile, -storeschemafile , -objectnames) should not include the 'cygdrive' prefix. Instead use normal Windows path like 'C:\....'
If folder or file names contain a space, then the whole string should be enclosed within double quotes.
- Open a DB2 command prompt and change the directory to the location of the file that contains the DB2 load commands. In this example, the folder is c:\loadfolder.
- Connect to the database that you created — in this
db2 connect to RDFSAMPL user db2admin.
- Run the file containing the DB2 commands — in this case, the
db2 –f migratedstore.sql.
Note: Do not use the -t argument when running the SQL script because it is generated with newline as the command separator.
In this example, the DB2 database now contains an RDF store called migratedstore that contains all the RDF data from the RDF store you used earlier. You can now update and query migratedstore.