© 2002 International Business Machines Corporation. All rights reserved.
This paper is intended to describe and differentiate the three methods available for generating unique values in a table. They are the:
GENERATE_UNIQUEfunction available in DB2 UDB Version 6.1 and laterIDENTITYcolumn in theCREATE TABLEstatement available in DB2 UDB Version 7.1 and laterSEQUENCEobject available in DB2 UDB Version 7.2
In general, these methods are designed to avoid the concurrency and performance problems that can occur when an application generates its own unique counter outside the database. When you do not use these methods to automatically generate unique primary keys, a common design is to store a counter in a table with a single row. Each transaction then locks this table, increments the number, and then commits the transaction to unlock the counter. Unfortunately, this design only allows a single transaction to increment the counter one at a time. In contrast, if you use one of the above methods to automatically generate unique values, the application can achieve much higher levels of concurrency and reduce coding complexities.
This overview is limited to Windows® and UNIX® environments. If
you plan to use identity column attributes in a DB2 UDB® Enterprise
- Extended Edition (EEE) database, be aware that this functionality is not
supported when more than one partition is implemented. In addition, the
ALTER TABLE statement cannot be used to add an
identity column to an existing table. DB2 UDB Version 7.1 documentation
incorrectly states that you can issue an
ALTER TABLE statement to add an identity
column. All examples are executed in the DB2 Command Window. When using a
termination character ';', as shown in the example below, make sure you
specify the -t command option as shown here:
C:\>db2 -t "SELECT * FROM TABLE;" |
This function is the first method designed by DB2 to handle the creation of
unique values in a table. The GENERATE_UNIQUE
function returns a bit data character string 13 bytes long
(CHAR(13) FOR BIT DATA) that is unique compared
to any other execution of the same function. The 13-bit data type allows
unique timestamp values to be created. This not only provides a unique row
ID but also eliminates the need for an additional column to record the row
creation date and time. Following are examples that show the usage of the
GENERATE_UNIQUE function.
In this first example, the GENERATE_UNIQUE
function is used to populate a table. Notice how the table is created with
a ROW_ID of CHAR(13) FOR BIT DATA data type.
Figure 1A
CREATE TABLE WIDGET_INVENT ( ROW_ID CHAR(13) FOR BIT DATA, WIDGET_NO CHAR(6), INV_COUNT INT WITH DEFAULT 0 ); |
After creating the table in Figure 1A, an insert can be performed as shown in Figure 1B:
Figure 1B
INSERT INTO WIDGET_INVENT
VALUES (GENERATE_UNIQUE(), '000005', 600),
(GENERATE_UNIQUE(), '000006', 200);
|
In Figure 1C a trigger is created on the table
WIDGET_INVENT to generate unique values with
the insertion of a new row in the table.
Figure 1C
CREATE TRIGGER UPDATE_WIDGET NO CASCADE BEFORE INSERT ON UPDATE_WIDGET REFERENCING NEW AS NEW_WIDGET FOR EACH ROW MODE DB2SQL SET NEW_WIDGET.UNIQUE_ID = GENERATE_UNIQUE(); |
With this trigger defined, the previous INSERT
statement could be issued without the first column as follows:
INSERT INTO WIDGET_INVENT (WIDGET_NO, INV_COUNT)
VALUES ('000005', 600),
('000006', 200);
|
The timestamp (in UTC) for when a row is added to
WIDGET_INVENT can be returned using the
following select query:
SELECT TIMESTAMP(ROW_ID), WIDGET_NO, INV_COUNT FROM WIDGET_INVENT; |
Although this function generates a unique timestamp value, when multiple
rows are inserted using this function, the unique values from consecutive
rows will have time gaps. This is important to recognize if you plan to
perform a query based on a particular number of rows. For instance, it
will not be practical to query the first 500 rows of a table using the
unique values generated by the GENERATE_UNIQUE
function.
Another point to keep in mind is that since the timestamp value created
from the GENERATE_UNIQUE function is obtained
from the system clock, adjustments to the system clock backward could
result in duplicate values.
Like the GENERATE_UNIQUE function, the identity
column introduced in DB2 UDB Version 7.1 provides developers with an easy
way to automatically generate a unique value for every row in a table. And
like the GENERATE_UNIQUE function, the identity
column is tied to the table. To create an identity column, include the
IDENTITY clause in your
CREATE TABLE statement. Unlike the
GENERATE_UNIQUE function, the identity column
is declared with a numeric data type that has a precision of zero instead
of the CHAR(13) FOR BIT DATA data type. This
allows the identity column to be incremented by sequential integers. The
following figure shows an example for creating a table with an identity
column:
Figure 2A
CREATE TABLE WIDGET_INVENT
( ROW_ID INT NOT NULL GENERATED ALWAYS AS IDENTITY
(START WITH 1, INCREMENT BY 1, NO CACHE),
WIDGET_NO CHAR(6),
INV_COUNT INT WITH DEFAULT 0
);
|
In the above figure, table WIDGET_INVENT uses
INTEGER data type for the identity column
ROW_ID. Other data types that are supported
with the identity column include
SMALLINT, BIGINT, or
DECIMAL with a scale of zero, or a distinct
type based on one of the aforementioned data types.
After creating the table with an identity column, you can test the column by inserting values into the other two columns. Make sure that when you insert values into the table you specify each column name, excluding the identity column as shown below:
Figure 2B
INSERT INTO WIDGET_INVENT (WIDGET_NO, INV_COUNT)
VALUES ('000005', 600),
('000006', 200); |
The result of a select statement on the table will produce the following output:
C:\>db2 -t "SELECT * FROM WIDGET_INVENT;" ROW_ID WIDGET_NO INV_COUNT 1 000005 600 2 000006 200 2 record(s) selected. |
- The
GENERATED ALWAYS AS IDENTITYclause indicates that DB2 will always generate a unique value for the identity column for each row inserted into the table. DB2 will issue an error if a value is specified in an insert operation for the identity column defined using this clause. - Another clause that may be used in creating identity columns is the
GENERATED BY DEFAULT AS IDENTITYclause. This will cause DB2 to generate unique values for the identity column during insert operations if no value is specified for the identity column. However, if a value is specified for the identity column, DB2 will use it in the insert operation. - Identity columns are implicitly
NOT NULLand a table may have only one identity column. - The
START WITHnumeric-constant clause specifies the starting value for the identity column which can be positive or negative. If noSTART WITHclause is used, the default is 1. - The
INCREMENT BYnumeric-constant clause specifies the interval between consecutive values when rows are inserted in a consecutive manner. This interval can be assigned a positive or negative value. If the value is negative, then the sequence of values for this identity column descends. If this value is positive, then the sequence of values for this identity column ascends. If noINCREMENT BYclause is used, the default increment value is 1. - The
CACHEinteger-constant option specifies how many values of the identity column DB2 pre-allocates and keeps in memory. Pre-allocating and storing values in the cache reduces logging when values are generated for the identity column. If a new value is needed for the identity column and there are none available in the cache, then the allocation of the value involves writing to the log. However, when a new value is needed for the identity column and there is an unused value in the cache, the allocation of that identity value can be made quicker without performing the logging. - In the event of a database deactivation, all cached sequence values
that have not been used are lost. The value specified for the
CACHEoption is the maximum number of values for the identity column that could be lost in case of database deactivation. The minimum value is 2 and the maximum value is 32767. The default is CACHE 20. - The
NO CACHEoption specifies that values for the identity column are not to be pre-allocated. When this option is specified, the values of the identity column are not stored in the cache. In this case, every request for a new identity value results in logging.
Using IMPORT and LOAD commands with identity columns
Another improvement with the introduction of identity columns is the ability to import or load data into tables and populate the identity column with unique values. When performing these operations, there are three modifiers that should be taken into consideration. They are:
The IDENTITYMISSING modifier is used when data
is LOADed or
IMPORTed from a delimited file that contains no
identity column data into a table containing an identity column. For
example, consider a file a.del that contains two rows:
000005, 600 000006, 200 |
The above data is to be imported into a table called
WIDGET_INVENT that has the following
definition:
CREATE TABLE WIDGET_INVENT
( ROW_ID INT NOT NULL GENERATED ALWAYS AS IDENTITY
(START WITH 100, INCREMENT BY 10, NO CACHE),
WIDGET_NO CHAR(6),
INV_COUNT INT WITH DEFAULT 0
);
|
Since file a.del does not have values for
ROW_ID, we will need to use the
IDENTITYMISSING modifier in the
IMPORT or LOAD
command in order to have generated values inserted into the
ROW_ID column. Following is an example of using
this modifier with the IMPORT command:
IMPORT FROM a.del OF DEL MODIFIED BY IDENTITYMISSING INSERT INTO WIDGET_INVENT; |
A select query on table WIDGET_INVENT after
issuing the above command will return the following results:
C:\>db2 -t "SELECT * FROM WIDGET_INVENT;" ROW_ID WIDGET_NO INV_COUNT 100 000005 600 110 000006 200 2 record(s) selected. |
It can be seen from the above SELECT statement
that the identity column ROW_ID contains values
generated by DB2 which were previously not in the a.del file.
The IDENTITYIGNORE modifier is used when data is
LOADed or IMPORTed
from a delimited file that contains identity column data. This modifier
will ignore the identity column data and replace those values with DB2
generated values. Consider again the file a.del containing the
following two rows:
1, 000005, 600 2, 000006, 200 |
These rows are to be loaded into the table
WIDGET_INVENT that is created with the
following definition:
CREATE TABLE WIDGET_INVENT
( ROW_ID INT NOT NULL GENERATED ALWAYS AS IDENTITY
(START WITH 100, INCREMENT BY 10, NO CACHE),
WIDGET_NO CHAR(6),
INV_COUNT INT WITH DEFAULT 0
);
|
The file a.del has values for ROW_ID
which are highlighted in bold above. We want to load the data from
a.del into WIDGET_INVENT and have
DB2 generate its own unique values for ROW_ID.
Here's the command you would use:
LOAD FROM a.del OF DEL MODIFIED BY IDENTITYIGNORE INSERT INTO WIDGET_INVENT; |
The output from a select on table WIDGET_INVENT
after issuing the above command will contain the following results:
C:\>db2 "SELECT * FROM WIDGET_INVENT;" ROW_ID WIDGET_NO INV_COUNT 100 000005 600 110 000006 200 2 record(s) selected. |
Notice how the results of the above select query verify that the identity
column ROW_ID shows values generated by DB2 and
not the values from the file a.del.
The IDENTITYOVERRIDE modifier is valid only for
the LOAD command and is used when data from a
delimited file contains user supplied values for the identity column and
those values are to be inserted into a table with an identity column. This
identifier will override any attempts by DB2 to generate unique values for
the identity column and will insert the user-supplied values into the
identity column. Here is an example. File a.del contains the
following rows:
1, 000005, 600 2, 000006, 200 |
The above file will be loaded into a table called
WIDGET_INVENT with the following definition:
CREATE TABLE WIDGET_INVENT
( ROW_ID INT NOT NULL GENERATED ALWAYS AS IDENTITY
(START WITH 100, INCREMENT BY 10, NO CACHE),
WIDGET_NO CHAR(6),
INV_COUNT INT WITH DEFAULT 0
);
|
The file a.del has values for ROW_ID,
which are highlighted in bold above. We want to load the data from
a.del into the table WIDGET_INVENT
and have the values for ROW_ID inserted into
the identity column. Here's the command you would use:
LOAD FROM a.del OF DEL MODIFIED BY IDENTITYOVERRIDE INSERT INTO WIDGET_INVENT; |
A select query on table WIDGET_INVENT after
issuing the above command will return the following results:
C:\>db2 -t "SELECT * FROM WIDGET_INVENT;" ROW_ID WIDGET_NO INV_COUNT 1 000005 600 2 000006 200 2 record(s) selected. |
The resulting select statement verifies that the identity column
ROW_ID reflects values from file a.del
and overrides the ability for DB2 to generate unique values. It is
important to note that using the
IDENTITYOVERRIDE modifier will not guarantee
uniqueness within the identity column. It is also only valid for identity
columns created with the GENERATED ALWAYS
option.
Note: An IXF file can also be used to import data and this will be explained in the following section, "Using EXPORT command with identity columns."
Using EXPORT command with identity columns
The export utility can be used to export data from a table containing an
identity column. If the SELECT statement
specified for the export operation is of the form "select * from
tablename", and the METHOD option is not used,
exporting identity column properties to IXF files is supported. The
REPLACE_CREATE and the
CREATE options of the
IMPORT command can then be used with the IXF
file to recreate the table, including its identity column properties. If
such an IXF file is created from a table containing an identity column of
type GENERATED ALWAYS, the only way that the
data file can be successfully imported is to specify the
IDENTITYIGNORE modifier. Otherwise, all rows
will be rejected.
Here is an example of the EXPORT and
IMPORT commands using an IXF file: Consider a
table WIDGET_INVENT with the following
definition:
CREATE TABLE WIDGET_INVENT ( ROW_ID INT NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 100, INCREMENT BY 10, NO CACHE), WIDGET_NO CHAR(6), INV_COUNT INT WITH DEFAULT 0 ); |
and containing the following data:
100, 000005, 600 110, 000006, 200 |
Execute the following command to export the above table data into an IXF file:
EXPORT TO a.ixf OF IXF SELECT * FROM WIDGET_INVENT; |
To recreate this table in a different database, execute the following command:
IMPORT FROM a.ixf OF IXF MODIFIED BY IDENTITYIGNORE CREATE INTO WIDGET_INVENT; |
The output from a select on table WIDGET_INVENT
after issuing the above command will contain the following results:
C:\>db2 -t "SELECT * FROM WIDGET_INVENT;" ROW_ID WIDGET_NO INV_COUNT 100 000005 600 110 000006 200 2 record(s) selected. |
The sequence object is a new feature in DB2 that has been introduced in DB2 UDB Version 7.2 (i.e., DB2 UDB Version 7.1 + fixpak 3). A sequence is a database object that allows the automatic generation of numeric values.
While the identity column feature also allows automatic generation of numeric values, it is essentially an attribute of a column and thus exists for the period of existence of the table containing that column. On the other hand, a sequence exists outside of a table and is not tied to a column of a table. This allows more flexibility in using the sequence values in SQL operations.
We will explore briefly the processes of creating, using, deleting and altering a sequence. Before we conclude, we will discuss some of the differences between sequences and identity columns and the performance considerations with using sequences.
A DB2 sequence can be created to generate values in:
- An ascending sequence without an upper bound
- A descending sequence without a lower bound
- An ascending sequence up to a user-defined limit and stop or cycle to the start of the sequence
- A descending sequence up to a user-defined limit and stop or cycle to the start of the sequence
The sequence is created by issuing the
CREATE SEQUENCE command. The user creating a
sequence should have either CREATEIN privilege
for the implicitly or explicitly specified schema or have either
SYSADM or DBADM
authority. Figures 3A and 3B outline how sequences are created in DB2.
Figure 3A: Creating an ascending sequence
CREATE SEQUENCE myseq AS INTEGER START WITH 1 INCREMENT BY 1 NO MINVALUE NO MAXVALUE NO CYCLE CACHE 10 ORDER |
where myseq is the name of the sequence.
AS INTEGER:
specifies the data type of the sequence, which is
INTEGER in this case. Other valid data types
are SMALLINT, BIGINT or
DECIMAL. The data type of the sequence must be
a numeric data type with a scale of zero or a user-defined distinct that
has an exact numeric type with scale 0 as its source type. The default is
INTEGER.
START WITH
numeric-constant:
specifies the first value for the sequence (1 in this case), which can be
a positive or a negative value with no non-zero digits to the right of the
decimal point. If this clause is omitted, the default first value is
MINVALUE for ascending sequences and
MAXVALUE for descending values.
INCREMENT BY
numeric-constant:
specifies the interval between consecutive values of the sequence. The
numeric value specified here can be positive or negative with no non-zero
digits to the right of the decimal point and the value should not exceed
the numeric limits for the data type of the sequence. A descending
sequence is created by specifying a negative increment value. The default
increment value is 1. To restrict the sequence to a single value, specify
an increment value of 0 or an increment value that is greater than the
range defined by MINVALUE and
MAXVALUE.
NO MINVALUE:
The MINVALUE clause specifies the:
- minimum value at which a descending sequence either cycles or stops generating values
- minimum value to which an ascending sequence cycles to after reaching its maximum value
If NO MINVALUE is specified, the default value
- for ascending sequences is the
START WITHvalue or 1 ifSTART WITHis not specified - for descending sequences is the minimum value of the data type of the sequence
To specify a MINVALUE, define the sequence using
the 'MINVALUE
numeric-constant' clause instead of the
NO MINVALUE clause. The default is
NO MINVALUE.
NO MAXVALUE:
The MAXVALUE clause specifies the
- maximum value to which a descending sequence cycles to after reaching the minimal value
- maximum value at which an ascending sequence either cycles or stops generating values
If NO MAXVALUE is specified, the default
MAXVALUE
- for ascending sequences is the maximum value of the data type of the sequence
- for descending sequences is the
START WITHvalue or -1 ifSTART WITHis not specified
To specify a MAXVALUE, define the sequence using
the 'MAXVALUE
numeric-constant' clause instead of the
NO MAXVALUE clause. The default is
NO MAXVALUE.
NO CYCLE:
Specifies whether the sequence should continue generating values after
reaching either its maximum or minimum value.
NO CYCLE is the default behavior. To create a
sequence which cycles through its values, specify
CYCLE.
CACHE
integer-constant:
Specifies the number of sequence values that will be pre-allocated and kept
in cache memory for faster access to improve performance. Note that in the
event of a system failure the cache values are all destroyed. The minimum
number of values in the CACHE is 2. The maximum
number of cache values is 32767. The default number of cache values is 20.
To disable caching of sequence values, specify
NO CACHE instead of 'CACHE
integer-constant'. This will force the sequence object to write to
the log every time a new value is generated that will impact application
performance.
ORDER:
Specifies that the sequence values are to be generated in the order of
request. The default is NO ORDER, which
specifies that the sequence values need not be generated in the order of
request.
Figure 3B: Creating a descending sequence
CREATE SEQUENCE myseq1 START WITH 1 INCREMENT BY -1 MINVALUE -10 NO MAXVALUE CYCLE CACHE 4 |
This command creates a descending sequence that generates consecutive integer values starting with 1 and incremented by -1, until the minimum value of -10 is reached. At this point, the sequence cycles back to 1 since CYCLE has been specified. This sequence caches 4 values.
Using sequences in SQL statements
After a sequence object has been created as described above, its values can be accessed only through sequence expressions. The next value in the sequence is accessed using the 'NEXTVAL FOR sequence_name' clause and the most recently generated value is accessed using the 'PREVVAL FOR sequence_name' clause.
For example, given the sequence myseq from Figure 3A, the statement
C:\>db2 -t "SELECT NEXTVAL FOR myseq FROM staff WHERE id=10;" |
will produce the output:
1
-------
1
|
Re-executing the above statement will produce the output:
1
-------
2
|
The statement
C:\>db2 -t "SELECT PREVVAL FOR myseq FROM staff WHERE id=10;" |
will produce the output:
1
-------
2
|
A DB2 sequence is dropped using the
DROP SEQUENCE command. The full syntax of the
command is
'DROP SEQUENCE sequence_name RESTRICT' |
where sequence_name is the name of the sequence explicitly or
implicitly qualified with a schema name. This is the same name that is
used in the CREATE SEQUENCE command. The
RESTRICT keyword is not optional. The purpose
of this keyword is to prevent the sequence from being dropped if it is
referenced by a table column.
Sequences that are system-generated on identity columns cannot be dropped using this command. Dropping a sequence will also drop all privileges on that sequence. Figure 3C shows how to drop a sequence:
Figure 3C: Dropping a sequence
DROP SEQUENCE myseq RESTRICT |
The attributes of an existing sequence can be modified with using the
ALTER SEQUENCE command. The attributes of the
sequence that can be modified include:
- Changing the increment between future values
(
INCREMENT BY) - Establishing a new minimum value
(
MINVALUE/NO MINVALUE) - Establishing a new maximum values
(
MAXVALUE/NO MAXVALUE) - Changing the number of cached sequence numbers
(
CACHE/NO CACHE) - Changing whether the sequence will cycle or not
(
CYCLE/NO CYCLE) - Changing whether sequence numbers must be generated in order of
request (
ORDER/NO ORDER) - Restarting the sequence with or without a specified value (
RESTART/RESTART WITHnumeric-constant)
The user ID issuing this command must have at least one of the following privileges:
- Definer of the sequence
ALTERINprivilege for the implicitly or explicitly specified schema of the sequenceSYSADMorDBADMauthority
Figure 3C shows an example of issuing the
ALTER SEQUENCE command:
Figure 3C: Altering a sequence
ALTER SEQUENCE myseq RESTART WITH 2 INCREMENT BY 3 MAXVALUE 33 CYCLE CACHE 12 ORDER |
The sequence myseq, which was initially defined to start with an initial value of 1 with an increment of 3, no maximum value, without recycle, with a cache of 10 and without order or request, is altered by the above command to restart with an initial value of 1 and increment 3 and will keep a cache of 12 values and will recycle upon reaching or exceeding the new maximum value of 33.
Note that restarting a sequence or changing the sequence to
CYCLE may result in the generation of duplicate
values. Also, only new sequence values are affected by this command. The
data type of the sequence cannot be changed with this command; instead the
sequence will need to be dropped and a new sequence will need to be
created with the desired data type. All previously cached values of the
sequence are lost by changes made using the
ALTER SEQUENCE command.
Comparing sequence objects with identity columns
Sequences and identity columns can both be used to fulfill the need for generating unique integer values. However, there are some differences between them that must be kept in mind.
- An identity column is tied to a table; i.e., it is dependent on a base table in the database. A sequence is a database object that is not tied to any column of a table and hence resides independently of a base table.
- An identity column generates unique values for a particular column of a table, whereas a sequence object generates unique values that can be used in any SQL statement.
- The values generated by an identity column are guaranteed to be unique, whereas there could be duplicate values generated by a sequence object when the sequence is recycled or is restarted with the same initial value and increment.
- The values from an identity column may be DB2 generated values
(
GENERATED ALWAYS) or user-specified values (GENERATED BY DEFAULT). The values from a sequence object are always generated by DB2 (i.e., generated by definition of the sequence object). - With sequence objects, it is possible to access using SQL expressions
the next value in a sequence or the most recently generated value in a
sequence using the and '
PREVVAL FORsequence_name' clauses respectively. This is not possible with identity columns.
Performance considerations with sequence objects
Sequence objects help to avoid locking or concurrence issues that can arise
with other techniques for the generation of unique sequential integer
values as discussed in the beginning of this paper. The values generated
by a sequence object can be accessed only via sequence expressions namely, 'PREVVAL FOR
sequence_name' retrieve the next consecutive value in the sequence
and 'PREVVAL FOR
sequence_name' to retrieve the most recently generated value in the
sequence.
Sequence objects can cache values in memory to improve DB2 response time when an application accesses a sequence value. A sequence object that does not have caching enabled is forced to write to the log every time a new value is generated. This can cause a negative impact to DB2 response time when accessing sequence values.
On the other hand, if caching of sequence values is enabled, there is a
potential for gaps to appear in the order of the sequence values. This
happens when the values in the cache are destroyed through events such as
altering the sequence object or a system failure. If no gaps should appear
in the order of the sequence values, then the
ORDER and NO CACHE
clauses should be specified for the sequence object. Though this will
impact application performance due to a log write that takes place for
every new sequence value that is generated, it will guarantee that no gaps
will arise in generated values.
This paper attempted to gather in one place the information about the different approaches available in DB2 for the generation of unique values. In order to determine which approach to choose, individual database and application requirements will need to be assessed to determine which of these basic approaches will best serve those needs.
For example, if uniqueness alone is desired, the
GENERATE_UNIQUE function will probably suffice.
If uniqueness is desired in a sequential manner, then the identity feature
or the sequence object will probably need to be considered. Likewise, a
multithreaded application that performs a high degree of write operations
on several tables with unique integer values will benefit from using a
sequence due to the high concurrency possible in accessing sequences.
The reader is advised to refer to the
SQL Reference manual
for more information on the IDENTITY clause and
the GENERATE_UNIQUE function, the release notes
for DB2 UDB Version 7.1 fixpack 3 for more information on sequences and
the Administration Guide for more information about identity columns.
Nanda Pilaka can be reached at nandap@us.ibm.com.
Comments (Undergoing maintenance)





