Before you start
About this series
If you are preparing to take the DB2 DBA certification exam 611, you've come to the right place — a study hall of sorts. This series of DB2 certification preparation tutorials covers the major concepts you'll need to know for the test. Do your homework here and ease the stress on test day.
About this tutorial
This tutorial introduces skills you must have to properly manage a DB2 server. This is Part 5 in a series of eight to help you prepare for the DB2 10.1 for Linux, UNIX, and Windows Database Administration Certification (exam 611).
This tutorial is based on "DB2 9 DBA exam 731 prep, Part 5: DB2 Utilities" with updates specific to DB2 10.1 and additional topics and examples.
In this tutorial, you will learn:
- How to extract data using the
- How to populate tables with the
- How to use the
- When and how to use the
- How to use
ADMIN_COPY_SCHEMAin data movement.
- How to use the
RUNSTATS, REORG, REORGCHK, and
REBINDutilities, and the
FLUSH PACKAGE CACHEstatement.
- When and how to use the DB2 Design Advisor.
To take the DB2 10.1 DBA exam, you must have already passed DB2 10.1 Fundamentals, exam 610 or DB2 9 Fundamentals, exam 730. We recommend you take the DB2 Fundamentals tutorial series before starting this series. This tutorial is one tool to help you prepare for exam 611. You should also review the resources at the end of this tutorial for more information about DB2 utilities (see Resources). Although not all materials discussed in the Fundamentals tutorial series are required to understand the concepts described here, you should at least have a basic knowledge of:
- DB2 products
- DB2 tools
- DB2 instances
- Database objects
You do not need DB2 to complete this tutorial. However, you will get more out of it if you download the free trial version of IBM DB2 10.1 to work along with this tutorial.
Data movement utilities
A key objective of the exam is to demonstrate the ability to use traditional DB2 data movement utilities. In this section, we will discuss the various available DB2 data movement tools, utilities, stored procedures, and commands.
DB2 has four data movement utilities:
In addition to the above utilities, the following commands and stored procedures are available:
File formats and sources types
Four OS file formats supported by the DB2 export, import, ingest, and load utilities are described:
- Non-delimited or fixed-length ASCII (ASC)— As the name implies, this file type contains ASCII data in fixed length to align with column data. Each ASC file is a stream of ASCII characters consisting of data values ordered by row and column. Rows in the data stream are separated by row delimiters, which are assumed to be newline characters.
- Delimited ASCII (DEL)— This is the most common file format used by a variety of database managers for data exchange. It contains ASCII data and uses special character delimiters to separate column values. Rows in the data stream are separated by a newline character as the row delimiter.
- PC version of the Integrated Exchange Format (PC/IXF)— This is a structured description of a database table. This file format can be used not only to import data but also to create a table that does not already exist in the target database.
- Cursor— A cursor is declared with a query. It can only be used as the input of a load operation.
Table 1. Supported file formats summary
|ASC (including binary||supported||supported||supported|
|Cursor||not supported||supported||not supported|
EXPORT utility extracts data from database tables to a file using an
XQUERY statement. The exported data can be
in the DEL, IXF, or WSF file formats. It is recommended that you include the
MESSAGES clause in the export to capture errors, warnings, and
informational messages during the export.
To successfully invoke the
EXPORT utility, you must have SYSADM or DBADM
authority, or CONTROL or SELECT privilege on the tables or views being accessed with
When you export data protected by label-based access control (LBAC), your LBAC credentials might limit the exported rows. Rows you don't have read access to will not be exported. No error or warning is given. However, if your LBAC credentials does not allow reading from one or more protected columns included in the export, the export will fail with an error.
Let's look at a simple export example. The command below exports the result of the
SELECT statement to a file in DEL format. The message file msg.out
records useful information, as well as any errors or warnings encountered:
Listing 1. Simple export example
EXPORT TO myfile.del OF DEL MESSAGES msg.out SELECT staff.name, staff.dept, org.location FROM org, staff WHERE org.deptnumb = staff.dept;
File type modifiers
In the previous example, data is extracted to a file in DEL format. By default,
column values are separated by commas (
,), and character strings are
enclosed by quotation marks (
"). What if the data to be extracted
already contains commas and quotes? It will then be impossible for the import or
load utility to determine which symbols are actual data and which are delimiters. To
EXPORT operates, you can use the
clause and specify what to change with file type modifiers. The
command with the
MODIFIED BY clause will look like this:
MODIFIED BY clause example
EXPORT TO file_name OF file_type MODIFIED BY file_type_modifiers MESSAGES message_file select_statement
A complete listing of the file type modifiers can be found in the Command Reference Guide under EXPORT. Some commonly used modifiers are listed here for demonstration:
- Specify x to be the new single character string
delimiter. The default value is a quotation mark
- Specify x to be the new single character string delimiter. The default value is a quotation mark (
- Specify x to be the new single character column
delimiter. The default value is a comma (
- Specify x to be the new single character column delimiter. The default value is a comma (
- Specify x, an ASCII character string, to be the new code page of the output data. During the export operation, character data is converted to this code page from the application code page.
- x is the format of the time stamp in the source table.
Listing 3. File type modifiers example
EXPORT TO myfile.del OF DEL MODIFIED BY chardel! coldel@ codepage=1208 timestampformat="yyyy.mm.dd hh:mm tt" MESSAGES msg.out SELECT * FROM schedule
The command above exports data from the SCHEDULE table in DEL format with the following behavior:
- Character strings are enclosed by the exclamation mark (
- Columns are delimited by the
- Character strings are converted to code page 1208
- The user-defined timestamp in the SCHEDULE table has a format of yyyy.mm.dd hh:mm tt
Exporting large objects with the
When exporting tables with large object columns, by default, only the first 32 KB of LOB data is exported. This part of the object is placed in the same file as the rest of the column data. If you are exporting LOB values that exceed 32 KB, you should have the LOB data written to a separate file to avoid truncation.
To export the LOB data in full and store them in files different from the other column data, you must use the LOB options. In DB2 10.1, you can specify whether you want multiple LOB values to be concatenated and exported in the same output file or each LOB value to be exported to a separate file.
Below is an
EXPORT command with the
which causes the export utility to write multiple LOB values in the same output
EXPORT command with the
EXPORT TO file_name OF file_type LOBS TO lobfile_directory_1, lobfile_directory_2, ... LOBFILE lobfilename MODIFIED BY LOBSINFILE MESSAGES message_file select_statement
LOBS TO clause specifies the directories in which the LOB files will
be stored. If no
LOBS TO clause is found, LOB data is sent to the
current working directory. Notice from the command above that you can specify more
than one path as the LOB file target directories. There will be at least one file
per LOB path, and each file will contain at least one LOB.
It is probably helpful to identify the extracted LOB files with user-specified file
LOBFILE clause can be used for this purpose. Each LOB file
will have a sequence number as the file extension (e.g., lobfile.001, lobfile.002,
When either the LOBS TO or LOBFILE option is specified, the LOBSINFILE behavior will
be implicitly activated. However, it is always a good practice to explicitly specify
LOBSINFILE modifier to avoid confusion with the
LOBSINSEPFILES modifier behavior, which will be discussed
Exporting large objects with the
When exporting large objects with the
LOBSINFILE modifier, a LOB
Location Specifier (LLS) is generated and stored in the export output file. The LLS
is a string used to indicate where LOB data can be found. It has a format of
filename.ext.lob.nnn.mmm/. Let's look at that in more detail:
- filename.ext.lob is the name of the file that contains the LOB data; ext is a sequence number, as described in the previous panel.
- nnn is the offset of the large object within the LOB file in bytes.
- mmm is the length of the large object in bytes.
For example, an LLS of resume.001.lob.1257.2415/ indicates that the large object is located in the file resume.001.lob, that the actual LOB data begins at an offset of 1257 bytes of the file, and that it is 2,415 bytes long.
Listing 5. LLS usage example
EXPORT TO empresume.del OF DEL LOBS TO d:\lob1\ LOBFILE resume MODIFIED BY LOBSINFILE MESSAGES msg.out SELECT * FROM emp_resume
Exporting XML data
As mentioned, you can also choose to export LOB data in full and store each of them in separate files. The LOB options described earlier remain the same, except that the LOBSINSEPFILES modifier is used instead. Following is an example with such a modifier.
Listing 6. Exporting XML data example
EXPORT TO empresume.del OF DEL LOBS TO d:\lob1\ LOBFILE resume MODIFIED BY LOBSINSEPFILES MESSAGES msg.out SELECT * FROM emp_resume
EXPORT command, the export utility will write LOB data in
files with names resume.ext.lob (i.e. resume.001.lob, resume.002.lob,
resume.003.lob, etc.) which are all located in the LOB path d:\lob1.
Exporting XML data with XML options and modifiers
When exporting a table (defined with XML data) without specifying any XML-related options, the associated XML data will be written to a file or files separate from the rest of the exported relational data.
Listing 7. Example of
EXPORT command issued on the
PRODUCT table with one XML defined column
EXPORT TO prodexport.del OF DEL MESSAGES msg.out SELECT * FROM product
In this example, the export utility will generate two output files one for the data other than the XML data and the other for the XML data. The first file is called prodexport.del as specified in the command, which, in addition to the relational data of the table, contains XML Data Specifiers (XDS).
XDS is a string represented as an XML tag named "XDS." It has attributes that describe information about the actual XML data in the column. Here are the attributes you might see in an XDS string.
- FIL specifies the name of the file that contains the XML data.
- OFF specifies the byte offset of the XML data in the file named by the FIL attribute.
- LEN specifies the length in bytes of the XML data in the file named by the FIL attribute.
- SCH specifies the fully qualified SQL identifier of the XML schema used to validate this XML document. This attribute will be discussed next.
From the content of prodexport.del above, you can see that the first XML data is stored in prodexport.del.001.xml, starting at 0 byte offset, and it has a length of 252 bytes.
The second file generated by the export utility in this example is prodexport.del.001.xml, which contains the XML content. All XML data exported is concatenated and written to this file. Here is the content of the prodexport.del.001.xml file to give you a better idea.
Like exporting large objects, you can specify the path(s) where the exported XML documents will go and the base filename of the output files. Consider the following example:
Listing 8. Specifying the path for XML documents
EXPORT TO prodexport.del OF DEL XML TO d:\xmlpath XMLFILE proddesc MODIFIED BY XMLINSEPFILES XMLNODECLARATION XMLCHAR XMLSAVESCHEMA MESSAGES msg.out SELECT * FROM product
Here, the relational data of the PRODUCT table is exported to the prodexport.del
file. All XML data is then written in the directory specified in the
TO clause, d:\xmlpath. The files with XML data are named
proddesc.ext.xml, where ext is a sequence number (e.g., proddesc.001.xml,
proddesc.002.xml, proddesc.003.xml, etc.). This base filename is defined with the
You might also notice that a few modifiers are used in the example. Here is a summary of all the XML-related modifiers.
XMLINSEPFILEScauses the export utility to write each exported XML document to a separate XML file.
XMLNODECLARATIONindicates that the XML data is exported without an XML declaration tag. An XML declaration tag is by default written at the beginning of an XML document that includes an encoding attribute.
XMLCHARindicates that the XML data is written in the character codepage. By default, XML data is written out in Unicode. When this modifier is used, the value of the codepage file type modifier or the application codepage will be used instead.
XMLGRAPHICindicates that the exported XML data will be encoded in the UTF-16 codepage regardless of the codepage file type modifier or the application codepage. Note that
XMLGRAPHICis not used in this example.
The last option we are introducing here is
XMLSAVESCHEMA. When an XML
document was inserted, it can be validated against an XML schema. The
XMLSAVESCHEMA option causes the export utility to also save the XML
schema information for every exported XML data. A fully qualified SQL identifier of
that schema will be stored as an SCH attribute inside the corresponding XML data
specifier (XDS). Note that if the exported XML document was not validated against an
XML schema or the schema object no longer exists in the database, an SCH attribute
will not be included in the corresponding XDS.
Exporting XML data with an XQuery
EXPORT command also allows you to specify an XQuery statement so
that the export utility writes the result of an XQuery to an XML file. Let's examine
the following example.
EXPORT command to specify an XQuery
EXPORT TO custexport.del OF DEL XML TO d:\xmlpath XMLFILE custphone MODIFIED BY XMLINSEPFILES XMLNODECLARATION MESSAGES msg.out SELECT XMLQUERY ('$doc/customerinfo/phone' PASSING INFO AS "doc") FROM customer
The XQuery in the above example returns all the phone numbers for every customer
stored in the CUSTOMER table under the XML column INFO. All the XML options and
modifiers discussed apply to XQuery statements. Hence, this example will generate
separate XML documents for each result of the XQuery. The files are located in
d:\xmlpath, and they are named custphone.ext.xml where ext is a sequence
number. In addition, no XML declaration tag will be included in the documents.
Following is the content of one of the exported XML documents:
Exporting from the IBM Data Studio
In addition to executing the
EXPORT command from the DB2 command line,
you can export using the Data Studio. From this tool, it is possible to specify all
the options and clauses supported by the export such as the large objects and XML
Figure 1. Invoking the export utility from the Data Studio
IMPORT utility populates data into a table with an input file in a
file type of ASC, DEL, or IXF. The target can be a table, a typed table, or a view.
However, imports to detached tables and temporary tables are not permitted. It is
also recommended that you use the
MESSAGES clause so errors, warnings,
and informational messages are recorded.
To successfully import data, you must have SYSADM or DBADM authority, or underlying privileges (SELECT, INSERT, CONTROL, or CREATETAB) on the target table or database, depending on which option you use. To import data into a table that has protected rows and columns, you must have LBAC credentials that allow write access to all protected data in the table. In addition, importing to table with protected rows requires that your LBAC credentials are part of the security policy protecting the table.
IMPORT command with five options
IMPORT FROM file_name OF file_type MESSAGES message_file [ INSERT | INSERT_UPDATE | REPLACE | REPLACE_CREATE | CREATE ] INTO target_table_name
INSERToption inserts imported data to the table. The target table must already exist.
INSERT_UPDATEinserts data to the table, or updates existing rows of the table with matching primary keys. The target table must exist with a primary key defined.
REPLACEoption deletes all existing data and inserts imported data to an existing target table.
- With the
REPLACE_CREATEoption, if the target table exists, the utility deletes existing data and inserts new data as if the
REPLACEoption were specified. If the target table is not defined, the table and its associated indices will be created before data is being imported. As you can imagine, the input file must be in PC/IXF format because that format contains a structured description of an exported table. If the target table is a parent table referenced by a foreign key,
REPLACE_CREATEcannot be used.
CREATEoption creates the target table and its indices, then imports data into the new table. The only file format supported is PC/IXF. You can also specify the name of the tablespace where the new table should be created.
IMPORT FROM emp.ixf OF IXF MESSAGES msg.out CREATE INTO employee IN datatbsp INDEX IN indtbsp
IMPORT is basically a utility to insert data into a table in bulk. This
bulk insert operation is just like a normal insert statement in that the activity is
logged, indices are updated, referential integrity is checked, and table constraints
are checked. By default,
IMPORT commits only once, at the end of the
operation. If a large number of rows are imported or inserted into the table,
sufficient transaction logs are required for rollback and recovery. You can request
periodic commits to prevent the logs from getting full. By committing the inserts
regularly, you also reduce the number of rows being lost if a failure occurs during
the import operation. The
COMMITCOUNT option forces a COMMIT after a
set number of records are imported. You can also specify the AUTOMATIC option, which
allows the import internally determines when a commit needs to be performed. The
utility will consider issuing a commit to avoid running into log full situation or
to avoid lock escalation.
COMMITCOUNT option example
IMPORT FROM myfile.ixf OF IXF COMMITCOUNT 500 MESSAGES msg.out INSERT INTO newtable
If for some reason the above command fails during its execution, you could use the message file to determine the last row that was successfully imported and committed. Then, you could restart the import with the RESTARTCOUNT option. Note that the behavior of the SKIPCOUNT option is the same as RESTARTCOUNT. In the command below, the utility will skip the first 30,000 records before beginning the IMPORT operation.
IMPORT operation example
IMPORT FROM myfile.ixf OF IXF COMMITCOUNT 500 RESTARTCOUNT 30000 ROWCOUNT 100000 MESSAGES msg.out INSERT INTO newtable
In the example, notice that the
ROWCOUNT option is also used. It
specifies the number of physical records to be imported. Because the
RESTARTCOUNT option is used, the import utility will skip the first
30,000 records and import the next 100,000 records into the table.
By default, the import utility will acquire an exclusive lock on the target table
before any rows are inserted. The exclusive lock is released as soon as the import
completes. This is the behavior of the
ALLOW NO ACCESS option. To allow
concurrent applications to access the table data, you can use the
ACCESS option. Note that this option is not compatible with the
REPLACE, CREATE, or
REPLACE_CREATE import options.
ALLOW WRITE ACCESS option
IMPORT FROM myfile.ixf OF IXF ALLOW WRITE ACCESS MESSAGES msg.out INSERT INTO newtable
Importing XML data
To import XML files, use the
XML FROM option to specify one or more
paths where XML files are stored. Otherwise, the import utility will look for the
XML files in the current directory. You can choose how the XML documents are parsed;
strip whitespace or preserve whitespace. If the
XMLPARSE option is not
specified, the parsing behavior for XML documents will be determined by the
CURRENT XMLPARSE OPTION special register.
AXML FROM and
IMPORT FROM myfile.ixf OF IXF XML FROM d:\xmlpath XMLPARSE PRESERVE WHITESPACE MESSAGES msg.out INSERT INTO newtable
When you insert or update an XML document, you might want to determine whether the
structure, content, and data types of the XML document are valid. The import utility
also supports XML validation through the use of the
There are three possible methods:
USING XDS— Recall that you can export XML schema information and store it in the SCH attribute of the XML Data Specifier (XDS). The value of the SCH attribute will be used to perform validation. If there is no SCH attribute in the XDS, the
DEFAULT, IGNORE, or
MAPwill be considered.
USING SCHEMA schema-sqlid— Use the XML schema specified in this clause.
USING SCHEMALOCATION HINTS— Validate the XML documents against the schemas identified by the XML schema location hints in the source XML documents.
XMLVALIDATE option example
IMPORT FROM myfile.ixf OF IXF XML FROM d:\xmlpath XMLPARSE PRESERVE WHITESPACE XMLVALIDATE USING XDS DEFAULT S1.SCHEMA_A IGNORE (S1.SCHEMA_X, S1.SCHEMA_Y, S1.SCHEMA_Z) MAP (S1.SCHEMA_A, S1.SCHEMA_B) COMMITCOUNT 500 RESTARTCOUNT 30000 MESSAGES msg.out INSERT INTO newtable
IMPORT command will:
- Insert data from myfile.ixf and XML files located in d:\xmlpath.
- Whitespace is preserved when the XML document is parsed.
- Each XML document is validated using the schema information identified in the SCH attribute of the XDS. However, if XDS for any particular row doesn't contain a SCH attribute, S1.SCHEMA_A will be used instead.
- For SCH attribute specified as S1.SCHEMA_X, or S1.SCHEMA_Y, or S1.SCHEMA_Z, validation will not be performed for the imported XML document.
- If the SCH attribute is specified as S1.SCHEMA_A, it will then mapped to S1.SCHEMA_B. Note that although the DEFAULT clause specifies S1.SCHEMA_A, any subsequent mapping will not be performed.
- The import utility will issue a commit after every 500 rows are imported.
- The import operation is started at record 30,001. The first 30,000 records are skipped.
- Any errors, warnings, and informational messages are written to the msg.out file.
- New data are inserted (or appended) into the newtable.
This example only gives you some idea of how the imported XML documents can be
validated. There are more examples in the DB2 documentation that demonstrate
the power of the
File type modifiers
IMPORT utility also supports file type modifiers to customize the
import operation. A complete list of modifiers can be found in the DB2 Command
Reference, under IMPORT. A few of them are outlined here:
- Uses non-atomic compound SQL to insert data. x number of statements will be attempted each time.
- Uses the specified schema for the index during index creation.
- Truncates any trailing blank spaces when loading data into a variable-length field.
- Indicates that LOB data is being imported. The utility will check the LOBS FROM clause to get the path of the input LOB files.
Listing 16. File type modifiers example
IMPORT FOR inputfile.asc OF ASC LOBS FROM /u/db2load/lob1, /u/db2load/lob2 MODIFIED BY compount=5 lobinsfile INSERT INTO newtable
IMPORT using the IBM Data Studio
The Data Studio provides easy-to-use graphical interfaces to perform import operations. All the import options and file modifiers discussed in the previous section are also available in this interface.
Overview of the
LOAD utility is another method to populate tables with data.
Formatted pages are written directly into the database. This mechanism allows more
efficient data movement than the
IMPORT utility. However, some
operations, such as referential or table constraints check and triggers invocation,
are not performed by the
Following is the core of the
LOAD command; other options and modifiers
are supported and will be introduced later in this section. To successfully execute
this command, you must have SYSADM, DBADM, or LOAD authority, or INSERT and/or
DELETE privileges on the table involved in the load. To load data into a table that
has protected columns, you must have LBAC credentials that allow write access to all
protected columns in the table. To load data into a table that has protected rows,
you must have been granted a security label for write access that is part of the
security policy protecting the table.
LOAD command example
LOAD FROM input_source OF input_type MESSAGES message_file [ INSERT | REPLACE | TERMINATE | RESTART ] INTO target_tablename
An example of using CURSOR as the load input is shown here:
DECLARE mycursor CURSOR FOR SELECT col1, col2, col3 FROM tab1; LOAD FROM mycursor OF CURSOR INSERT INTO newtab;
The load target must exist before the utility starts. It can be a table, a typed
table, or a table alias. Loading to tables with XML columns, system tables and
temporary tables is not supported. Use the
MESSAGES option to capture
any errors, warnings, and informational messages during the load.
LOAD can be executed in four modes:
INSERTmode adds input data to a table without changing the existing table data.
REPLACEmode deletes all existing data from the table and populates it with input data.
TERMINATEmode terminates a load operation and rolls back to the point in time at which it started. One exception is that if
REPLACEwas specified, the table will be truncated.
RESTARTis used to restart a previously interrupted load. It will automatically continue from the last consistency point. To use this mode, specify the same options as in the previous
LOADcommand but with
RESTART. It allows the utility to find all necessary temporary files generated during the load processing. Therefore, it is important not to manually remove any temporary files generated from a load unless you are sure they are not required. Once the load completes without error, the temporary files will be automatically removed. By default they are created in the current working directory. You can specify the directory where temporary files are stored with
Phases of a load process
A complete load process has four distinct phases:
- Load phase:
- Loads data into the table.
- Collects index keys and table statistics.
- Records consistency points.
- Places invalid data into dump files and records messages in the message file. When rows of data do not comply with the definition of the table, they are considered to be invalid data and will be rejected (not loaded into the table). Use the dump-file modifier to specify the name and location of a file to record any rejected rows.
- Build phase:
indices based on keys collected during the load phase. If
STATISTICS USE PROFILEis specified, statistics is also collected according to the profile defined for the target table. This profile must be created before load is executed, otherwise a warning is returned and no statistics are collected.
- Creates indices based on keys collected during the load phase. If
- Delete phase:
rows that caused unique key violations and places them in the exception
table. Besides when data simply doesn't match the definition of the
target table as described above, there may be data that passes the load
phase but violates a unique constraint defined in the table. Note that
only the unique key-violated rows are considered as bad data here; other
constraints are not being checked at this time. Since this type of data
is already loaded into the table, the
LOADutility will delete the offending rows in this phase. An exception table can be used to store the deleted rows so you can decide what to do with them after the load operation completes. If no exception table is specified, the offending rows are deleted without a trace. The exception table is discussed in more detail below.
- Records messages in the message file.
- Deletes rows that caused unique key violations and places them in the exception table. Besides when data simply doesn't match the definition of the target table as described above, there may be data that passes the load phase but violates a unique constraint defined in the table. Note that only the unique key-violated rows are considered as bad data here; other constraints are not being checked at this time. Since this type of data is already loaded into the table, the
- Index copy phase:
ALLOW READ ACCESSis specified with
USE TABLESPACE, index data is copied from the system temporary tablespace to the tablespace where the index should reside.
An exception table is a user-defined table that has to have the same column definition of the target table being loaded. If at least one of the columns is not present in the exception table, the offending rows will be discarded. Only two additional columns can be added to the end of the table: a timestamp column to record when a row is inserted and a CLOB column to store the reason (or message) why the row is considered bad.
A load example
Let's look at an example to illustrate the steps involved in a load process.
Listing 18. Load process example
LOAD FROM act.del OF DEL MODIFIED BY DUMPFILE=/home/db2inst1/act.dmp MESSAGES msg.out INSERT INTO act FOR EXCEPTION actexp
ACT table definition is as follows:
ACT table definition example
CREATE TABLE "DB2INST1"."ACT" ( "ACTNO" SMALLINT NOT NULL, "ACTKWD" CHAR(6) NOT NULL, "ACTDESC" VARCHAR(20) NOT NULL, PRIMARY KEY (ACTNO) )
The table is currently populates with one row.
Figure 2. Table
The load input data file act.del file contains three rows with the last row containing a duplicate of that in the ACT table, thus, violating the PK constraint.
Figure 3. act.del file content
An exception table, ACTEXP, is created with the same definition as the ACT table.
CREATE TABLE ACTEXP LIKE ACT
In the load phase, all the data from the input file is loaded into ACT table.
In the delete phase, the last row marked in blue is deleted from ACT as it violates the Primary Key constraint and inserted into the exception table ACTEXP.
Please notice that any row in violation of a unique index or a primary key index is copied to the exception table and is not written to the dump file. The dump file, otherwise, contains rows that cannot be loaded because they are invalid or have syntax errors.
At the end of the load, you should examine the message file, the dump file, and the exception table, then decide how to deal with the rejected rows.
Listing 20. ACT table content after the load operation is done
ACTNO ACTKWD ACTDESC ------ ------ -------------------- 10 MANAGE MANAGE/ADVISE 20 ECOST ESTIMATE COST 100 TEACH TEACH CLASSES
And the ACTEXP table:
ACTNO ACTKWD ACTDESC ------ ------ -------------------- 100 TEACH TEACH CLASSES
After the load is done, the tablespace containing the loaded table will be in a BACKUP BENDING state, and the table will be in a SET INTEGRITY BENDING state.
Load options and file type modifiers
Some load options and file type modifiers were introduced previously. A few more are discussed here:
ROWCOUNT n:Allows users to specify only the first n records in the input file to be loaded.
SAVECOUNT n:Establishes consistency points after every n rows are loaded. Messages are generated and recorded in the message file to indicate how many input rows were successfully loaded at the time of the save point. This point is not possible when input file type is CURSOR.
WARNINGCOUNT n:Stops the load after n warnings have been raised.
INDEXING MODE [ REBUILD | INCREMENTAL | AUTOSELECT | DEFERRED ]: In the build phase, indices are built. This option specifies whether the LOAD utility is to rebuild indices or to extend them increment ally. Four modes are supported:
- REBUILD forces all indices to be rebuilt.
- INCREMENTAL extends indices with new data only.
- AUTOSELECT allows the utility to choose between REBUILD and INCREMENTAL.
- DEFERRED means index create is not going to happen during the load. The indices involved are marked with refresh required. They will be rebuilt when the database is restated or at the first access to such indices.
STATISTICS USE PROFILE: After a load is performed, previous statistics of the target table are most likely not valid, as a lot more data has been added. You can choose to collect the statistics in the build phase according to the profile defined for the target table.
File type modifiers. File type modifiers are specified with the MODIFIED BY clause. Here are few you may find useful:
- fastparse — Syntax checking on loaded data is reduced to enhance performance.
- identityignore, identitymissing, and identityoverride — Used to ignore, indicate missing, or override identity column data, respectively.
- indexfreespace n, pagefreespace n, and totalfreespace n— Leaves specified amount of free pages in index and data pages.
- norowwarnings — Suppresses row warnings.
- lobsinfile — Indicates that LOB files are to be loaded; checks LOBS FROM option for LOB path.
Table access during load
While a table is being loaded, it is locked by the LOAD utility with an exclusive
lock. No other access is allowed until the load completes. This is the default
behavior of the ALLOW NO ACCESS option. During such a load, the table is in the
state of LOAD IN PROGRESS. There is a handy command that checks the status of a load
operation and also returns the table state:
LOAD QUERY TABLE table_name
You may have guessed that there is an option to allow table access. The
READ ACCESS option causes the table to be locked in share mode. Readers
may access the data that already exists in the table but not the new portion. Data
that is being loaded is not available until the load is complete. This option puts
the loading table in LOAD IN PROGRESS and READ ACCESS ONLY state.
As mentioned, a full index can be rebuilt, or an index can be extended with the new
data during the build phase. With
ALLOW READ ACCESS, if a full index is
being rebuilt, a shadow copy of it is created. When the
gets to the index copy phase (see Four phases of a load process), the target table
is taken offline, and the new index is copied into the target tablespace.
Regardless of which table access option is specified, various locks are required for
the load to process. If the target table is already locked by some application, the
LOAD utility will have to wait until the locks are released.
Instead of waiting for a lock, you can use
LOCK WITH FORCE in the
LOAD command to force off other applications that hold conflicting
SET INTEGRITY PENDING table state
So far, we know that input data that does not comply with the target table definition
is not loaded into the table. Such data is rejected and recorded in the message file
at the load phase. In the delete phase, the
LOAD utility deletes rows
that violated any unique constraints. The offended rows are inserted into an
exception table if specified. What about other constraints that the table might have
defined, such as referential integrity and check constraints? These constraints are
not checked by the
LOAD utility. The table will be placed in SET
INTEGRITY PENDING state, which forces you to manually check data integrity before
the table can be accessed. Table state can be queried using the
QUERY command as discussed. The column CONST_CHECKED in the system
catalog table SYSCAT.TABLES also indicates the status of each constraint defined in
To manually turn off integrity checking for one or more tables, use the
INTEGRITY command. Examples are presented here to demonstrate some
options. To check integrity for the appended option of the tables EMPLOYEE and STAFF
SET INTEGRITY FOR employee, staff IMMEDIATE CHECKED
To bypass foreign key checking on table EMPLOYEE with the
UNCHECKED option, use
SET INTEGRITY FOR employee FOREIGN KEY
In some cases, you may want to place the target table as well as its descendent
tables with foreign key relationship in SET INTEGRITY PENDING state after the load
completes. This ensures that all these tables are in control for accessibility until
a manual integrity check is performed. The load option is
PENDING CASCADE IMMEDIATE, which indicates that the check pending state
for foreign key constraints is immediately extended to all descendent foreign key
tables. By default, only the loaded table will be placed in check pending state.
LOAD utility writes formatted pages into the database
directly, no database logging is performed to record the new data being loaded. If
you have a recoverable database (i.e., with LOGREATIN and/or USEREXIT turned on),
DB2 needs to ensure that the database is still recoverable after the load completes.
In order to enforce recoverability, the tablespace where the table is stored will be
placed in BACKUP PENDING mode. This means that the tablespace must be backed up
before it can be accessed.
This is the default way to make the tablespace accessible after a load operation
completes. Another method is to back up the loaded data while the load is running
with the option
COPY YES. A backup file will be created at the end of
There is another option you may consider to avoid backing up the tablespace right
after the load completes.
NONRECOVERABLE marks the table being loaded
as unrecoverable. The associated tablespace is fully accessible after load
completes. DB2 does not stop you in querying and modifying the table data. However,
if you need to eventually restore the tablespace and roll forward to a time that
NONRECOVERABLE load operation, the loaded table is not
recoverable. The recovery progress skips all the logs associated with the table. You
can only drop and recreate the table. Therefore, it is still recommended that you
back up the tablespace at a convenient time so existing and loaded data is saved in
Load using the IBM Data Studio
The Data Studio provides easy-to-use graphical interfaces to perform load operations. All the load options and file modifiers discussed are also available in this interface.
INGEST utility is a new DB2 10.1 data movement tool. It is a
high-speed client-side DB2 that streams data from files and pipes into DB2 target
tables. Sometimes it is referred to as Continuous Data Ingest (CDI) because it can
be run continuously processing continuous data streams from pipes. Additionally, it
is best for data currency and availability as it uses row locking, so it does not
lock the entire target tables.
INGEST utility can perform the following DML operation on target
tables using a SQL-like interface:
INSERT, UPDATE, MERGE, REPLACE, and
INGEST utility also supports the use of
SQL expressions to build individual column values from more than one data field.
INGEST utility features:
- Meets modern requirements by processing data from a continuous data stream while maintaining table availability
- Capable of performing data transformation
- Can place unwanted rows in an exception file or table or just discard them
- Ability to restart failed
INGESToperations from the last commit
- Part of DB2 Client, so it can be installed and executed on a separate machine other than the DB2 server with no additional license
INGEST command goes through three major phases:
In this phase, the transporters threads read from the input data source and put records on the formatter queues. For
MERGEoperations, there is one transporter thread for each input source (for example, one thread for each input file). For
DELETE, there is only one transporter thread.
In this phase, input data is converted to the required DB2 format. The formatters pull records from the formatter queues, parse each record, convert data into the format that DB2 database systems require, and put each formatted record on one of the flusher queues for that record's partition.
The number of formatter threads is specified by the
num_formattersconfiguration parameter. The default is (number of logical CPUs)/2.
In this phase, ingest DML operations are run on the target DB2 tables. The flushers issue the SQL statements to perform the operations on the DB2 tables.
The number of flushers for each partition is specified by the
num_flushers_per_partitionconfiguration parameter. The default is
max( 1, ((number of logical CPUs)/2)/(number of partitions) ).
Listing 21. A typical
INGEST FROM input_source FORMAT format-definition MESSAGES message_file RESTART [NEW | OFF | CONTINUE | TERMINATE] SQL-statement
The SQL statement specifies the DML operation to be run against the target tables.
These operations can be
DELETE, INSERT, MERGE, REPLACE, or
The following example inserts data from a delimited text file:
INGEST FROM FILE delimited_file.txt FORMAT DELIMITED INSERT INTO mytable;
Extra fields used to compute column values example
The following example shows some of the basic transformation capabilities of the
You have an input file myfile.txt whose format consists of five columns product ID,
product description, price, sales tax, and shipping. The fields are separated by the
| delimiter. This data is to be inserted to the mytable table,
which comprises three columns: prod_id, description, and total_price. The
total_price column is the summation of the last three columns in the input file.
The input file content is as follows:
PROD1|product 1|10|5|3 PROD2|product 2|15|5|3 PROD3|product 3|20|15|3
INGEST command usage
INGEST FROM FILE myfile.txt FORMAT DELIMITED BY '|' ( $prod_ID CHAR(8), $description CHAR(32), $price DECIMAL(5,2) EXTERNAL, $sales_tax DECIMAL(4,2) EXTERNAL, $shipping DECIMAL(3,2) EXTERNAL ) INSERT INTO mytable(prod_ID, description, total_price) VALUES($prod_id, $description, $price + $sales_tax + $shipping);
In the above statement, we have:
- The default delimiter, the comma, is overridden using
DELIMITED BY |.
EXTERNALkeyword is used with numeric field types to indicate that the field value is specified as ASCII characters rather than in binary.
- The values of the
total_pricecolumns are calculated from the external columns in the
You can define the fields to be inserted to the table columns according to their position in the input file.
Listing 23. Positional example syntax
INGEST FROM FILE input_file.txt FORMAT POSITIONAL ( $field1 POSITION(1:8) INTEGER, $field2 POSITION(10:19) DATE 'yyyy-mm-dd', $field3 POSITION(25:34) CHAR(10) ) INSERT INTO target_table VALUES($field1, $field2, $field3);
So, field1 starts from positions 1-8 in the input file, field2 starts from 10-19, etc.
The following example updates the table rows whose primary key
matches the corresponding field
$key1 in the input file.
It can be written in either of the following syntaxes.
UPDATE example syntax 1
INGEST FROM FILE input_file.txt FORMAT DELIMITED ( $key1 INTEGER EXTERNAL, $data1 CHAR(8), $data2 CHAR(32), $data3 DECIMAL(5,2) EXTERNAL ) UPDATE target_table SET (data1, data2, data3) = ($data1, $data2, $data3) WHERE (key1 = $key1);
UPDATE example syntax 2
INGEST FROM FILE input_file.txt FORMAT DELIMITED ( $key1 INTEGER EXTERNAL, $data1 CHAR(8), $data2 CHAR(32), $data3 DECIMAL(5,2) EXTERNAL ) UPDATE target_table SET data1 = $data1, data2 = $data2, data3 = $data3 WHERE (key1 = $key1);
You can merge the data from the input file to that of the target table using the
MERGE INTO clause. This example merges data from the input file
into the target table. For input rows whose primary key fields match a table row, it
updates that table row with the input row. For other input rows, it adds the row to
Listing 26. Basic
MERGE example syntax
INGEST FROM FILE input_file.txt FORMAT DELIMITED ( $key1 INTEGER EXTERNAL, $data1 CHAR(8), $data2 CHAR(32), $data3 DECIMAL(5,2) EXTERNAL ) MERGE INTO target_table ON (key1 = $key1) WHEN MATCHED THEN UPDATE SET (data1, data2, data3) = ($data1, $data2, $data3) WHEN NOT MATCHED THEN INSERT VALUES($key1, $data1, $data2, $data3);
You can delete rows from the target table based on key values in the input file. This example deletes table rows whose primary key matches the corresponding primary key fields in the input file.
Listing 27. Basic
DELETE example syntax
INGEST FROM FILE input_file.txt FORMAT DELIMITED ( $key1 INTEGER EXTERNAL, $key2 INTEGER EXTERNAL ) DELETE FROM target_table WHERE (key1 = $key1) AND (key2 = $key2);
Consider the following example in which there is a table with columns KEY, DATA, and ACTION, and you have an input file with two fields: KEY and DATA. You can conditionally update the data field when the KEY value matches the corresponding one in the input file and the ACTION value is U.
Listing 28. SQL example syntax
INGEST FROM FILE input_file.txt FORMAT DELIMITED ( $key_fld INTEGER EXTERNAL, $data_fld INTEGER EXTERNAL ) UPDATE target_table SET data = $data_fld WHERE (key = $key_fld) AND (action = 'U');
Handling rejected rows
INGEST can handle rejected rows the same way
These rows can simply be just discard (default) or written to a file or DB2 table
for further review. This allows you to potentially fix the issue why the record was
rejected and then rerun
INGEST using the new file.
Listing 29. Rejected rows inserted into the exception table excp_table example
INGEST FROM FILE input_file.txt FORMAT DELIMITED ( $field1 INTEGER EXTERNAL, $field2 DATE 'mm/dd/yyyy', $field3 CHAR(32) ) EXCEPTION TABLE excp_table MESSAGES messages.txt INSERT INTO target_table VALUES($field1, $field2, $field3);
Creating the restart table
INGEST fails, it can be restarted from the last committed point, but
you are required first to create the restart table used by the
utility to store its state while operating and thus holds the needed information to
Only one restart table can be used by all the
INGEST operations in the
database. The restart table does not hold a copy of the ingested data, only counters
and indicators defining the state.
To create a restart table:
db2 "CALL SYSPROC.SYSINSTALLOBJECTS('INGEST', 'C', tablespace-name, NULL)"
Restarting a failed ingest operation
You can restart a failed
INGEST operation by issuing the
INGEST command with the
RESTART CONTINUE option. This
will cause the
INGEST utility to restart from the last commit
The user ID restarting the failed
INGEST command must have SELECT,
INSERT, UPDATE, and DELETE privilege on the restart table.
INGEST command failed example
INGEST FROM FILE input_file.txt FORMAT DELIMITED ( $field1 INTEGER EXTERNAL, $field2 CHAR(8) ) RESTART NEW 'job01' INSERT INTO target_table VALUES($field1, $field2);
Listing 31. Restarting
INGEST from the last commit
INGEST FROM FILE input_file.txt FORMAT DELIMITED ( $field1 INTEGER EXTERNAL, $field2 CHAR(8) ) RESTART CONTINUE 'job01' INSERT INTO target_table VALUES($field1, $field2);
In the above example, you will notice:
- The first
INGESTcommand has the
RESTART NEWarbitrary ID option. It is recommended to give the
INGESTjob an arbitrary ID — job01 in this example — so you can use it to restart the job in case of failure.
- The second
INGESTcommand uses the job01 id in the
RESTART CONTINUEclause to restart the job.
Terminating a failed
If you do not want to restart a failed
INGEST job, you need to reissue
the command with the
RESTART TERMINATE option in order to clean up its
The user terminating the failed command must have SELECT and DELETE privilege on the restart table.
To terminate the
INGEST job in the above example instead of restarting
it, you can issue the following.
Listing 32. Terminating
INGEST job instead of
INGEST FROM FILE input_file.txt FORMAT DELIMITED ( $field1 INTEGER EXTERNAL, $field2 CHAR(8) ) RESTART TERMINATE 'job01' INSERT INTO target_table VALUES($field1, $field2)
There are two commands for monitoring the progress of the ingest utility: INGEST LIST and INGEST GET STATS/STATISTICS commands. They must be run on the same machine running the ingest utility in a separate CLP session.
INGEST LIST gives basic information about all running INGEST utilities
on the machine, while
INGEST GET STATS gives more detailed information
about a specific INGEST command.
INGEST LIST example
=> INGEST LIST Ingest job ID = DB21000:20101116.123456.234567:34567:45678 Ingest temp job ID = 1 Database Name = MYDB Input type = FILE Target table = MY_SCHEMA.TARGET_TABLE Start Time = 04/10/2010 11:54:45.773215 Running Time = 01:02:03 Number of records processed = 30,000
The temporary job ID can be given to the
INGEST GET STATS command to get
detailed information about a specific
INGEST command. It just saves you
from retyping the actual job ID.
Example of a basic INGEST GET STATS command output
Listing 34. Statistics for the INGEST job whose temporary job id equals 4
=> INGEST GET STATS FOR 4 Ingest job ID = DB21000:20101116.123456.234567:34567:4567 Database = MYDB Target table = MY_SCHEMA.TARGET_TABLE Overall Overall Current Current ingest rate write rate ingest rate write rate Total records (records/second) (writes/second) (records/second) (writes/second) ---------------- --------------- --------------- ---------------- ---------------- 54321 5432 76543 87654 98765
You can use the use the
EVERY--number—SECONDS clause to get statistics
at fixed intervals. Additionally, you can use
SHOW TOTALS to get
aggregate statistics about each phase of ingest processing (transporter, formatter,
and flusher). For more information about the
INGEST command syntax and
available options, please refer to the DB2 documentation.
IMPORT, LOAD and
IMPORT, LOAD and
|Server side utility||Server side utility||Client side utility coming with the DB2 client|
|Transactional — Slower on large amount of data||Writes formatted pages – Faster for large amount of data loads||High-speed client side DB2 that can continuously streams data from files and pipes into DB2 target tables|
|All inserted rows are logged||Minimal logging||All inserted rows are logged|
|Triggers are fired||Triggers are not fired||Triggers are fired|
|Constraints are validated during import||All unique keys are validated. Other constraints should be validated by
||Constraints are validated during the ingest|
|If interrupted, tables are usable up to the last commit point||If interrupted, tables are held in the LOAD pending state. Either restart or restore tables affected||If interrupted, tables are usable up to the last commit point. You can restart the ingest process from this point|
|No backup required||Backup is required for the affected tablespaces||No backup required|
|Its impact on server can be throttled using ||Its impact on server can be throttled using ||Not affected by using |
|Supports importing to updatable view (except typed view), range clustered tables (RCT) and nicknames||Does not support loading to updatable view, range clustered tables (RCT) or nicknames||Supports ingesting to updatable view (except typed view), range clustered tables (RCT) and nicknames|
|Supports importing to typed tables||Does not support typed tables||Does not support typed tables|
|Supports LOBs from files: BLOB, CLOB, DBCLOB, NCLOB||Supports LOBs from files: BLOB, CLOB, DBCLOB, NCLOB||Does not support LOBs from files|
|Supports importing structured types||Does not support structured types||Does not support structured types|
Data movement utilities and hidden columns
A hidden column is a table column defined with the
HIDDEN attribute. That column is unavailable unless explicitly
referenced. For example, if a
SELECT * query is run against a table,
implicitly hidden columns are not returned in the result table. To be able to access
this column, its name should be explicitly referenced in the
IMPORT, INGEST, and
LOAD utilities require you to specify
whether data for the hidden columns is included in the operation. If a column list
is not specified, data movement utilities must use the
file type modifiers when working with tables that contain implicitly hidden columns.
You can also use the
DB2_DMU_DEFAULT registry variable to set the
default behavior when data movement utilities encounter tables with implicitly
hidden columns. Similarly,
EXPORT requires that you specify whether
data for the hidden columns is included in the operation.
If you have a table, table1, with a hidden column c3, the following command shows loading data to hidden columns by explicitly stating the hidden column name using the insert into clause.
db2 load from delfile1 of del insert into table1 (c1, c2, c3)
Alternatively, you can specify the
implicitlyhiddeninclude modifier when
the input file contains data for the hidden columns, or the
implicitlyhiddenmissing modifier when the input file does not.
db2 load from delfile1 of del modified by implicitlyhiddeninclude insert into table1
Additionally, you can use the
DB2_DMU_DEFAULT registry variable on the
server-side to set the behavior when data movement utilities encounter tables with
implicitly hidden columns.
db2set DB2_DMU_DEFAULT=IMPLICITLYHIDDENINCLUDE db2 load from delfile1 of del insert into table1
You can move tables both online and offline using
fact, this procedure can be used to move the data in a table to a new table object
of the same name (but with possibly different storage characteristics, such as
different tablespace) while the data remains online and available for access. You
can also generate a new optimal compression dictionary when a table is moved.
ADMIN_MOVE_TABLE stored procedure creates a protocol table composed
of rows containing status information and configuration options related to the table
to be moved. The return set from this procedure is the rows from that protocol table
related to the table to be moved.
You can invoke the ADMIN_MOVE_TABLE by two methods.
Method 1— Modify only certain parts of the table definition
for the target table (if you just want to modify the tablespaces for the table
objects, for exmaple). All you need to do is to fill out the
lob_tbsp parameters while calling the
procedure, leaving the other optional parameters blank.
This example calls the stored procedure using the first method, where the target table is defined within the procedure, to move a table named T1, which is located in the schema titled SCHEMA1. Additionally, the column definitions of the target table are passed to the procedure.
Listing 35. Example for stored procedure using the first method
CALL SYSPROC.ADMIN_MOVE_TABLE( 'SCHEMA1', 'T1', 'ACCOUNTING', 'ACCOUNT_IDX', 'ACCOUNT_LONG', '', '', '', 'CUSTOMER VARCHAR(80), REGION CHAR(5), YEAR INTEGER, CONTENTS CLOB', '', 'MOVE')
the above example moves the table SCHEMA1.T1 to a new table with the same name that has the columns definitions: CUSTOMER VARCHAR(80), REGION CHAR(5), YEAR INTEGER, CONTENTS CLOB and resides in tablespace ACCOUNTING with its indices tablespace ACCOUNT_IDX and its LOBs tablespace ACCOUNT_LONG.
Method 2— Create the target table and provide its name to the procedure. This provides you with more control and flexibility by allowing you to create the target table beforehand, rather than having the stored procedure create it.
This example is equivalent to the previous one, but it calls the stored procedure
using the second method, where the target table is created outside the procedure and
is then named within the
target_tabname parameter, to move the same
table as in the previous example.
Listing 36. Example for stored procedure using the second method
CREATE TABLE SCHEMA1.T1_TGT ( CUSTOMER VARCHAR(80), REGION CHAR(5), YEAR INTEGER, CONTENTS CLOB) IN ACCOUNTING INDEX IN ACCOUNT_IDX LONG IN ACCOUNT_LONG'
CALL SYSPROC.ADMIN_MOVE_TABLE( 'SCHEMA1', 'T1', 'T1_TGT', '', 'MOVE')
For online data movement:
- The procedure creates a shadow table to which the data are copied.
- Any changes to the source table during the copy phase are captured using triggers and placed in a staging table.
- After the copy phase is completed, the changes captured in the staging table are replayed to the shadow copy.
- Following that, the stored procedure briefly takes the source table offline and assigns the source table name and index names to the shadow copy and its indices.
- The shadow table is then brought online, replacing the source table. By default,
the source table is dropped, but you can use the
KEEPoption to retain it under a different name.
Obviously, the online operation costs more server resources (disk space and processing power), so make sure you only use it if you value availability more than cost, space, move performance, and transaction overhead. Additionally, avoid performing online moves for tables without indices, particularly unique indices as it might result in deadlocks and complex or expensive replay.
Handling online move failure
If the online move fails, rerun it:
- Fix the problem that caused the table move to fail.
- Determine the stage that was in progress when the table move failed by querying the SYSTOOLS.ADMIN_MOVE_TABLE protocol table for the status.
- Call the stored procedure again, specifying the applicable option:
the status of the procedure is INIT, use the
- If the status of the procedure is COPY, use
- If the status of the procedure is REPLAY,
- If the status of
the procedure is CLEANUP, use
- If the status of the procedure is INIT, use the
If the status of an online table move is not COMPLETED or CLEANUP, you can cancel the
move by specifying
CANCEL for the stored procedure.
ADMIN_MOVE_TABLE procedure and
There are some limitations when using the
procedure to move data in an active system-period temporal table into a new table
with the same name. The following actions are blocked:
- Alter table operations that change the definition of the system-period temporal table or the associated history table are blocked during online move operations.
ADMIN_MOVE_TABLEis unavailable for system-period temporal tables.
Additionally, the online-table-move operation is not supported for history tables.
Other DB2 data movement utilities
db2move is a data movement tool that can be used to move large numbers of tables
between DB2 databases. Supported actions in the command are
COPY. The behavior of actions
LOAD is exactly the same as described
previously. The only action you probably are not familiar with is
It duplicates tables in a schema or schemas into a target database. The syntax of
db2move is as simple as:
db2move database_name action options
A list of user tables is extracted from the system catalog tables, and each table is exported in PC/IXF format. The PC/IXF files can then be imported or loaded into another DB2 database.
Here are some examples. This command imports all tables in the sample database in
REPLACE mode with the specified user ID and password:
sample IMPORT -io REPLACE -u userid -p password. And this command loads
all tables under the schemas db2admin and db2user in
db2move sample LOAD -sn db2admin,db2user -lo REPLACE.
Refer to the Command Reference to get a complete listing and descriptions of all the
COPY warrants a discussion. With
you specify one or more schemas with the
-sn option. Only tables with
exactly the same schema names specified in the
-sn option will be
copied (via export). If multiple schema names are specified, use commas to separate
them and no blanks are allowed. Refer to the example below.
db2move sample COPY -sn db2inst1,prodschema -co TARGET_DB acctdb USER peter USING petepasswd DDL_AND_LOAD
The above db2move command copies supported objects under the db2inst1 and prodschema
-co option that follows makes the command more
TARGET_DB specifies the target database, which the schemas
are going to be copied — acctdb in this case. This option is mandatory when
COPY is specified. In addition, the target database must be
different from the source database. You may provide the user and password with the
USING options when connecting to the target
By default, supported objects from the source schema will be created, and tables will be populated in the target database. This is the behavior of the DDL_AND_LOAD mode. Two other modes are available: DDL_ONLY and LOAD_ONLY. As the names imply, DDL_ONLY only creates all the supported objects from the source schema and LOAD_ONLY loads all specified tables from the source to the target database. Note that tables must already exist in the target database when this option is used.
Sometimes you may want to rename the schema when copying the objects to the target
SCHEMA_MAP option can be used for this purpose. You
simply provide one or more pairs of schema mappings like this:
Extra attention is recommended when SCHEMA_MAP is used. Only the schema of the object
itself is renamed, qualified objects inside the object body remains unchanged. For
CREATE VIEW FOO.v1 AS 'SELECT c1 FROM FOO.T1'.
Schema rename from FOO to BAR will result in
CREATE VIEW BAR.v1 AS 'SELECT c1
FROM FOO.T1'. BAR.v1 created in the target database might fail if FOO.T1
is not defined.
A similar mapping idea also applies to tablespaces. For example, you want the copied
tables to be stored in a different tablespace name from the source database. The
db2move command is extended to let you specify tablespace name mappings. Consider
the following option:
The above tablespace name mapping indicates that source TS1 is mapped to target TS2,
source TS2 is mapped to target TS3. The
SYS_ANY indicates that the
remaining tablespaces will use tablespaces chosen by the database manager based on
the tablespace selection algorithm. Let's put the pieces together in an example.
Listing 37. db2move utility example
db2move sample COPY -sn db2inst1,prodschema -co TARGET_DB acctdb USER peter USING petepasswd LOAD_ONLY SCHEMA_MAP ((db2inst1,db2inst2),(prodschema,devschema)) TABLESPACE_MAP SYS_ANY NONRECOVERABLE
This command copies supported objects in the db2inst1 and prodschema from the SAMPLE database to the ACCTDB database. The authorization ID peter and the associated password is used to connect to ACCTDB. The target tables already exist in ACCTDB and the tables will be repopulated. All objects under the db2inst1 and prodschema schemas are now under db2inst2 and devschema, respectively. Instead of using the tablespace name defined in the SAMPLE database, the default tablespace in ACCTDB will be used instead.
NONRECOVERABLE option allows the user to use the tablespaces that
were loaded into immediately after the copy completed. Backups of the tablespaces
are not required, but highly recommended at the earlier convenient time.
ADMIN_COPY_SCHEMA is used to copy a specific schema and all objects in
it. The new target schema objects will be created using the same object names as the
objects in the source schema, but with the target schema qualifier.
ADMIN_COPY_SCHEMA can be used to copy tables with or without the
data of the original tables.
This procedure invokes the
LOAD command for loading the data into the
target schema and, hence, it has three modes of operations:
DDL— Create empty copies of all supported objects from the source schema.
COPY— Create empty copies of all objects from the source schema, then load each target schema table with data. Load is done in NONRECOVERABLE mode. A backup must be taken after calling
ADMIN_COPY_SCHEMA; otherwise, the copied tables will be inaccessible following recovery.
COPYNO— Create empty copies of all objects from the source schema, then load each target schema table with data. Load is done in COPYNO mode.
In this example, the SOURCE_SCHEMA objects residing in table spaces SOURCETS1 and
SOURCETS2 will be moved to the target schema and copied to the target TARGET_SCHEMA
to tablespaces TARGETTS1 and TARGETTS2, respectively.
that the remaining tablespaces will use tablespaces chosen by the database manager
based on the tablespace selection algorithm.
CALL SYSPROC.ADMIN_COPY_SCHEMA('SOURCE_SCHEMA', 'TARGET_SCHEMA', 'COPY', NULL, 'SOURCETS1 , SOURCETS2', 'TARGETTS1, TARGETTS2, SYS_ANY', 'ERRORSCHEMA', 'ERRORNAME')
db2look is a handy tool that can be invoked from the command prompt. It can:
- Extract database definition language (DDL) statements from database objects.
UPDATEstatements to update database manager and database configuration parameters.
- Generate db2set commands to set the DB2 profile registries.
- Extract and generate database statistical reports.
UPDATEstatements to replicate statistics on database objects.
Utilities like LOAD require the existence of a target table. You can use db2look to
extract the table's DDL, run it against the target database, then invoke the
LOAD operation. db2look is very easy to use, as the following
examples illustrates. This command generates DDL statements for all objects created
by peter from the database department, and the output is stored in
db2look -d department -u peter -e -o alltables.sql
The next command generates:
- DDL for all objects in the database department (specified by options
-d, -a, and
UPDATEstatements to replicate the statistics on all tables and indices in the database (specified by option
- GRANT authorization statements (specified by option
UPDATEstatements for the database manager and database configuration parameters, and db2set commands for the profile registries (specified by option
db2look -d department -a -e -m -x -f -o db2look.sql
The db2look is also capable of generating commands to register XML schemas. The
following example generates the necessary
REGISTER XMLSCHEMA and
COMPLETE XMLSCHEMA commands (specified by option
for objects with schema name db2inst1. The output db2look.sql will be created under
/home/db2inst1, which is specified in the -xdir option.
db2look -d department -z db2inst1 -xs -xdir /home/db2inst1 -o db2look.sql
Benchmarking is a process of evaluating the application in various aspects, such as database response time, CPU and memory usage. Benchmark tests are based on a repeatable environment so the same test runs under the same conditions. Results collected from the tests can then be evaluated and compared.
db2batch is a benchmarking tool that takes a set of SQL and/or XQuery statements, dynamically prepares, and describes the statements, and returns an answer set. Depending on the options used in the db2batch command, the answer set might return elapsed time of execution of the statements, database manager snapshots on memory usage, such as bufferpool and cache information.
You can specify the statements you want to run benchmark on in a flat file or
standard input. A number of control options can be set in the input file. They are
specified with this syntax:
--#SET control_option value. Here is an
example of an input file with control options. For a complete listing of control
options, please refer to the Information Center.
Listing 38. db2batch example
-- db2batch.sql -- ------------ --#SET PERF_DETAIL 3 --#SET ROWS_OUT 5 -- This query lists employees, the name of their department -- and the number of activities to which they are assigned for -- employees who are assigned to more than one activity less than -- full-time. --#COMMENT Query 1 select lastname, firstnme, deptname, count(*) as num_act from employee, department, emp_act where employee.workdept = department.deptno and employee.empno = emp_act.empno and emp_act.emptime < 1 group by lastname, firstnme, deptname having count(*) > 2; --#SET PERF_DETAIL 1 --#SET ROWS_OUT 5 --#COMMENT Query 2 select lastname, firstnme, deptname, count(*) as num_act from employee, department, emp_act where employee.workdept = department.deptno and employee.empno = emp_act.empno and emp_act.emptime < 1 group by lastname, firstnme, deptname having count(*) <= 2;
PERF_DETAIL 3means that performance detail on elapsed time, a snapshot for the database manager, the database, and the application will be returned.
ROWS_OUT 5means that only five rows are to be fetched from the result set regardless of the actual number of rows returned for the query.
COMMENT Query1simply gives the statement a name: Query1.
The following command invokes the benchmark tool against the SAMPLE database with the
input file db2batch.sql:
db2batch -d sample -f db2batch.sql .
This command will return the result set of both queries limited to five rows, elapsed time, and CPU time of the queries. Database manager, database, and application snapshots are also returned. Since the output is quite large, we are only showing the summary of the db2batch command here.
Listing 39. db2batch summary table
* Summary Table: Type Number Repetitions Total Time (s) Min Time (s) ... --------- ----------- ----------- -------------- -------------- Statement 1 1 0.052655 0.052655 ... Statement 2 1 0.004518 0.004518 ... ...Max Time (s) Arithmetic Mean Geometric Mean Row(s) Fetched Row(s) Output -------------- --------------- -------------- -------------- ------------- ... 0.052655 0.052655 0.052655 5 5 ... 0.004518 0.004518 0.004518 8 5 * Total Entries: 2 * Total Time: 0.057173 seconds * Minimum Time: 0.004518 seconds * Maximum Time: 0.052655 seconds * Arithmetic Mean Time: 0.028587 seconds * Geometric Mean Time: 0.015424 seconds
The db2batch command supports many options. We are just listing a few here for you to get an idea the power of the tool:
-m parameter_filespecifies an input file with parameter values to bind to the SQL statement parameter markers.
-r result_filespecifies an output file to contain the result of the command.
-i short|long|completespecifies what is being measured for the elapsed time intervals.
shortmeasures the elapsed time to run each statement.
longmeasures the elapsed time to run each statement including overhead between statements.
completemeasures the elapsed time to run each statement where the prepare, execute, and fetch times are reported separately.
-isospecifies the isolation level used for the statement. By default, db2batch uses the Repeatable Read isolation level.
DB2 maintenance utilities
The RUNSTATS utility
DB2 utilizes a sophisticated cost-based optimizer to determine how data is being accessed. Its decisions are heavily influenced by statistical information about the size of the database tables and indices. Therefore, it is important to keep the database statistics up to date so an efficient data access plan can be chosen. The RUNSTATS utility is used to update statistics about the physical characteristics of a table and the associated indices. Characteristics include number of records (cardinality), number of pages, average record length, etc.
Let's use some examples to illustrate the usage of this utility. The following
command collects statistics on the table db2user.employee. Readers and writers are
allowed to access the table while the statistics are being calculated:
RUNSTATS ON TABLE db2user.employee ALLOW WRITE ACCESS.
The following command collects statistics on the table db2user.employee, as well as on the columns empid and empname with distribution statistics. While the command is running, the table is only available for read-only requests:
RUNSTATS ON TABLE db2user.employee WITH DISTRIBUTION ON COLUMNS ( empid, empname ) ALLOW READ ACCESS
The following command collects statistics on table db2user.employee and detailed
statistics on all its indices:
RUNSTATS ON TABLE db2user.employee AND DETAILED
You can be very specific when it comes to collecting statistics on the database
objects. Different combinations of the
RUNSTATS options can be used to
collect table statistics, index statistics, distribution statistics, sampling
information, etc. To simplify statistics collection, you can save the options you
specify when you issue the
RUNSTATS command in a statistics profile. If
you want to collect the same statistics repeatedly on a table and do not want to
retype the command options:
RUNSTATS ON TABLE db2user.employee USE
This command collects statistics on db2user.employee using the options recorded in
the statistics profile for that table. So, how do you set a statistics profile? It
is as easy as using the
SET PROFILE ONLY option.
RUNSTATS ON TABLE db2user.employee WITH DISTRIBUTION ON COLUMNS ( empid, empname ) SET PROFILE ONLY
Notice that the option will only set the profile, the
will not run. If you need to modify a previously registered statistics profile, use
UPDATE PROFILE ONLY option. Similarly, this option will only update
the profile without running the
RUNSTATS command. If you want to update
the profile as well as the statistics, use
UPDATE PROFILE instead.
RUNSTATS ON TABLE db2user.employee WITH DISTRIBUTION DEFAULT NUM_FREQVALUES 50 NUM_QUANTILES 50 UPDATE PROFILE
RUNSTATS is a resource-intensive utility. However, in order to maintain
efficient database operation, statistics must be collected regularly. You should
find regular windows of reduced database activity so database statistics can be
collected without affecting database performance. In some environments, there is no
such window. Throttling of
RUNSTATS can be considered to limit the
amount of resources consumed by the utility. When database activity is low, the
utility runs more aggressively. On the other hand, when database activity increases,
the resources allocated to executing
RUNSTATS are reduced. Following is
how to specify the level of throttling.
RUNSTATS ON TABLE db2user.employee WITH DISTRIBUTION DEFAULT NUM_FREQVALUES 50 NUM_QUANTILES 50 UTIL_IMPACT_PRIORITY 30
The acceptable priority value ranges from 1 to 100, 100 representing the highest priority (meaning un-throttled) and 1 representing the lowest; 50 is the default priority level.
Note that automatic statistics collection is enabled by default when the database is
created. It can be turned off by setting the database configuration parameter
AUTO_RUNSTATS to OFF.
The REORG and REORGCHK utilities
Data being added and removed from the database might not be physically placed in a sequential order. In such a case, DB2 must perform additional read operations to access data. This usually means that more disk I/O operations are required, and we all know such operations are costly. In such a case, you should consider physically reorganizing the table to the index so related data is located close to one other, minimizing I/O operations.
REORG is a utility to reorganize data for a table or index. Although data is
physically rearranged, DB2 provides the option of performing this online or offline.
Offline REORG by default allows other users to read the table. You may restrict
table access by specifying the
ALLOW NO ACCESS option. Online REORG
(also called in-place REORG) supports read and write access to the table. Since data
pages are rearranged, concurrent applications might have to wait for REORG to
complete with the current pages. You can easily stop, pause, or resume the process
with the appropriate options.
The following examples are fairly self-explanatory:
REORG TABLE db2user.employee INDEX db2user.idxemp INPLACE ALLOW WRITE ACCESS REORG TABLE db2user.employee INDEX db2user.idxemp INPLACE PAUSE
You can also reorganize an index. If the
CLEANUP clause is used as shown
in one of the examples below, a cleanup will be done instead of a
REORG INDEX db2user.idxemp FOR TABLE db2user.employee ALLOW WRITE ACCESS REORG INDEX db2user.idxemp FOR TABLE db2user.employee CLEANUP ONLY
REORGCHK is another data maintenance utility that has an option to retrieve current
database statistics or update the database statistics. It will also generate a
report on the statistics with REORG indicators. Using the statistics formulae,
REORGCHK marks the tables or indices with asterisks (
*) if there is a
need to REORG.
Let's consider some examples. The following command generates a report of the current
statistics on all tables owned by the runtime authorization ID:
CURRENT STATISTICS ON TABLE USER.
The following command updates the statistics and generates a report on all the tables
created under the schema smith:
REORGCHK UPDATE STATISTICS ON SCHEMA
And here's some REORGCHK sample output:
Listing 40. REORGCHK sample output
Table statistics: F1: ... < 5 F2: ... > 70 F3: ... > 80 SCHEMA.NAME ... F1 F2 F3 REORG --------------------------------------------- Table: DB2INST1.XMLFILES ... 0 89 98 --- --------------------------------------------- Index statistics: F4: ... > 80 F5: ... > MIN(50, (100 - PCTFREE)) F6: ... < 100 F7: ... < 20 F8: ... < 20 SCHEMA.NAME ... PCT_PAGES_SAVED F4 F5 F6 F7 F8 REORG ---------------------------- ... ----------------------------------------- Table: DB2INST1.XMLFILES Index: DB2INST1.IDX1 ... 50 98 82 - 0 0 ----- Index: DB2INST1.IDX2 ... 0 71 - - 0 0 *---- Index: DB2INST1.IDX3 ... 33 98 220 - 0 0 ----- --------------------------------------------------------------------------
Notice that DB2INST1.IDX2 is marked by an asterisk in the REORG column as its formula F4 calculation is 71, which does not meet the evaluation criteria: F4: ... > 80.
The above REORGCHK output indicates that only the DB2INST1.IDX2 index requires
reorganization. So, you will only need to run the
REORG command on that
index instead of the whole database.
The REBIND utility and
Before a database application program or any SQL statement can be executed, it is precompiled by DB2, and a package is produced. A package is a database object that contains compiled SQL statements used in the application source file. DB2 uses the packages to access data referenced in the SQL statements. So, how does the DB2 optimizer choose the data access plan for these packages? It relies on database statistics at the time the packages are created.
For static SQL statements, packages are created and bound to the database at compile
time. If statistics are updated to reflect the physical database characteristics,
existing packages should also be updated. The REBIND utility allows you to recreate
a package so that the current database statistics can be used. The command is very
REBIND PACKAGE package_name.
In many cases, SQL statements contain host variables, parameter markers, and special
registers. The values of these variables are not known until runtime. With the
REOPT clause in the
REBIND command, you can specify
whether to have DB2 optimize an access path using real values for host variables,
parameter markers, and special registers. There are three
NONE— Real values of the host variables, parameter markers, and special registers used in the SQL statement will not be used to optimize an access path. The default estimates for these variable will be used instead.
ONCE— The access path for a given SQL statement will be optimized using the real values of the host variables, parameter markers, or special registers when the query is first executed.
ALWAYS— The access path for a given SQL statement will always be compiled and reoptimized using the values of the host variables, parameter markers, or special registers.
REBIND PACKAGE ACCTPKG REOPT ONCE
However, if you are going to change the application source, the existing associated package needs to be explicitly dropped and recreated. The REBIND utility is not used for this purpose. We bring this to your attention here because DBAs often misunderstand the usage of REBIND.
As for dynamic SQL statements, they are precompiled at runtime and stored in the
package cache. If statistics are updated, you may flush the cache so that dynamic
SQL statements are compiled again to pick up the updated statistics. The command
looks like this:
FLUSH PACKAGE CACHE DYNAMIC.
Database maintenance process
Now that you understand
RUNSTATS, REORG, REORGCHK, REBIND, and
FLUSH PACKAGE CACHE, let's review the data maintenance process that
should be performed regularly against your database. The process is illustrated in
the following diagram.
Figure 4. Database maintenance process
DB2 Design Advisor
The Design Advisor can help you determine what database objects can improve the performance of a given workload. A workload is basically a set of SQL statements that the Design Advisor evaluates based on the characteristics of the workload, the characteristics of the database, and the hardware resources. It uses the DB2 optimizer, the database statistics, and the Explain mechanism to generate recommendations for new indices, new materialized query tables (MQT), conversion to multidimensional clustering (MDC) tables, redistribution of tables, deletion of indices, and MQTs unused by the specified workload. You might be familiar with MQTs and MDCs; they are advanced database objects, which are not covered in the DBA exam (611). If you are interested in advanced database objects, please refer to the DB2 Information Center.
You can start the DB2 Advisor from the command line with the db2advis along with the necessary inputs. There are few ways to specify a workload using the db2advis command.
You can specify a single SQL statement when issuing the db2advis command. The following example evaluates the given SQL statements and makes recommendations accordingly.
db2advis -d sample -s "select * from employee where workdept='A00' and salary > 40000" -o output.out
You can use a set of dynamic SQL statements captured in a DB2 snapshot. To do so, you
need to reset the database monitor with the command
db2 reset monitor for
Let your application run for a desired period of time and allow DB2 snapshots to
capture the dynamic SQL statements. Issue the following db2advis command to evaluate
the workload and make recommendations. Here the
-g option tells the
Design Advisor to get the SQL statements from the dynamic SQL snapshots. In
-p option causes the captured SQL statements to be stored
in the ADVISE_WORKLOAD system tables.
db2advis -d sample -g -p -o output.out
Alternatively, you can create a workload file containing a set of SQL statements. Set the frequency of the statements in the workload. Following is a sample workload file.
--#SET FREQUENCY 100 SELECT COUNT(*) FROM EMPLOYEE; SELECT * FROM EMPLOYEE WHERE LASTNAME='HAAS'; --#SET FREQUENCY 1 SELECT * FROM EMPLOYEE WHERE WORKDEPT='A00' AND SALARY > 40000;
Then, simply run the db2advis command with the
db2advis -d sample -i input.sql -o output.out
In this tutorial, a number of DB2 utilities were introduced to assist you in maintaining DB2 data. You have learned about:
- File formats that DB2 data movement utilities can work with
- The EXPORT utility, which is used to extract data from a table or a view (XML data is also supported)
- The IMPORT utility, which can be used to perform bulk insert into a table or a view (XML data is also supported)
- The LOAD utility, which can also populate tables with input data by writing formatted pages into the database directly
- The four phases of a load operation: load, build, delete, and index copy
- That a loaded table is placed in the SET INTEGRITY PENDING state if the table has constraints other than unique constraints defined
- The use of the
SET INTEGRITYcommand to remove SET INTEGRITY PENDING state
- That, for recoverable databases, the tablespace for which the loaded table is defined will be placed in BACKUP PENDING state
- The INGEST utility, which is a high-speed client-side DB2 utility that ingests data from files and pipes into DB2 LUW tables, using SQL-like commands
- Different options and file type modifiers to customize the
EXPORT, IMPORT, LOAD, and
- A comparison of the IMPORT, LOAD and INGEST utilities
- The purpose of each data maintenance tool:
RUNSTATS, REORG, REORGCHK, REBIND, and the
FLUSH PACKAGE CACHEcommand
- The use of stored procedures that can be used in data movement:
- The use of utilities such as db2move, db2look, db2batch and the Design Advisor.
Good luck with the exam!
- Read the Preparation Guide for DB2 10.1 DBA for Linux, UNIX, and Windows Exam 611 to learn in-depth information about each of the concepts presented in this tutorial. This guide is compilation of topics from the DB2 10.1 documentation.
- Use the DB2 documentation in IBM Knowledge Center to find more details about each of the concepts presented in this tutorial.
- See the other tutorials in this DB2 10.1 certification preparation series to continue your education.
- Use an RSS feed to request notification for the upcoming tutorials in this series. (Find out more about RSS feeds of developerWorks content.)
- Learn more about Information Management at the developerWorks Information Management zone. Find technical documentation, how-to articles, education, downloads, product information, and more.
- Stay current with developerWorks technical events and webcasts.
- Follow developerWorks on Twitter.
Get products and technologies
- Build your next development project with IBM trial software, available for download directly from developerWorks.
- Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.
Dig deeper into Information management on developerWorks
Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.
Keep up with the best and latest technical info to help you tackle your development challenges.
Software development in the cloud. Register today to create a project.
Evaluate IBM software and solutions, and transform challenges into opportunities.