Database partitioning occurs when a database consists of two or more database partitions, where each database partition stores a subset of table data for each table that resides within it. Partitions can be on separate machines or on the same machine (known as logical partitions). Within this type of database, data is hashed for storage location, to identify which database partition will store the data. A database partition consists of its own data, indexes, configuration files, and transaction logs. In older versions of DB2, a database partition was referred to as a node or a database node; however, in this article we use the term partition. Logical partitions share resources of the same machine where they reside.
Tables within a partitioned database can be located in one or more database partitions. The idea behind partitioning is to spread a table's data across multiple partitions, storing some of its rows in one partition and other rows in other partitions. The database manager (DBM) handles data retrieval and update by decomposing SQL requests automatically into sub-requests. The sub-requests are then executed in parallel among the applicable database partitions. This splitting of tables across database partitions is completely transparent to users and means that data is physically stored across more than one database partition, yet accessed as though it were located in the same place.
Figure 1 shows a single partition running on a single machine, which is analogous to not using DPF:
Figure 1. A single partition on a single machine
The next two figures show DPF in action. Figure 2 illustrates DPF being used across three different machines, with a single partition on each.
Figure 2. A single partition on each machine using high-speed communications
Note the use of the high-speed communications interconnect, which could be a TCPIP network that is used between the individual machines. The next figure, Figure 3, shows three partitions running on the same machine; inter-partition here is performed within memory, as opposed to a network (DB2 _FORCE_FCM_BP is enabled by default on all Linux platforms).
Figure 3. Three partitions on a single machine using memory for communication
It is important to note that in the above figures, the symmetric multi-processors (SMP) could be replaced by a uni-processor; however, SMP servers are more typical in a DPF environment.
Partitioning keys consist of a column or set of columns from the table and are used to allow users to choose how to partition their data. Using database partition groups and tablespaces, a user can determine across which and how many database partitions their table data is to be spread.
User interaction occurs through one database partition, known as the coordinator partition for that user. The coordinator partition runs on the same database partition as the application, or, in the case of remote applications, the database partition to which that application is connected. Any database partition can become a coordinator partition.
In order to partition a database, you will require the Database Partitioning Feature (DPF) of DB2. This optional Database Partition Feature license is only available for DB2 Enterprise Server Edition (ESE) and the cost is based on the number of CPUs used in the partitioned database.
DB2 UDB ESE with DPF is ideal to manage OLAP, data warehousing, and data mining workloads. It can also make a good choice for large-scale OLTP workloads. The most common partitioned environment consists of a single database partition on each physical system that makes up the database partition cluster. The processors on each system are used by the database manager at each database partition to manage its own part of the total data in the database.
Since the data is divided across database partitions, you can use the power of multiple processors on multiple computers to satisfy requests for information. Data retrieval and update requests are decomposed automatically into sub-requests, which are then executed in parallel among the applicable database partitions. Each additional server means more CPUs, more memory, and more disks for your partitioned database.
This performance boost stems from the fact that large amounts of data are broken into smaller, more manageable chunks that each partition/system can work on simultaneously. You may also want to consider using DPF if you are doing extract, transform, and load operations (ETL) and your batch window is small. For example, DB2 LOAD could run simultaneously on each partition, splitting the load operation up in parallel across the servers. Index creations times can also see significant performance boosts. Furthermore, backup and recovery times can be significantly reduced when using DPF, due to the smaller amounts of data that each partitioning machine would be dealing with.
Here we cover the necessary steps to get DB2 DPF up and running on your SUSE Server (or multiple server) environment.
DB2 requires the following packages to be installed on your Linux system:
- Public domain korn shell (pdksh)
- Remote shell (rsh) or open source secure shell (OpenSSH)
- IBM Java™ 2 Java Run Time
- Network File Server (nfs)
Use the rpm command to verify packages installation:
rpm -qa | grep pdksh
rpm -qa | grep rshor
rpm -qa | grep openssh
rpm -qa | grep IBMjava
rpm -qa | grep nfs
Before installing Linux, please confirm which versions of Linux are currently recommended and validated for DB2. This link "http://www.ibm.com/software/data/db2/linux/validate/" shows the supported distributions of Linux. Ensure a recommended and supported version of Linux is installed.
During the installation of the DB2, you may want to pay attention to some of the Kernel parameters:
Table 1. Kernel parameters that impact DB2
|kernel.shmmax||Defines the system-wide maximum allowable shared memory segment size in bytes|
|kernel.msgmni||Defines the maximum number of message queue identifiers allowed on the system at any given time|
|fs.file-max||Defines the maximum number of file-handles that the Linux kernel will allocate|
|kernel.sem||Defines the semaphore limits:
As of Kernel 2.4.18 and later, the file-max parameter's default value is 8192; therefore, you may
not need to modify it. You can check the file-max value, using the following
You can display the current Linux kernel setting using the lpcs command:
As of DB2 UDB Version 8.2, DB2 checks the values of the semmni, msgmni, and shmmax parameters at start-up time, and updates them if the current values are not optimal. DB2 will automatically set up the kernel parameters as follows:
- semmni is changed to 1024
- msgmni is changed to 1024
- shmmax is changed to 268435456 (32-bit) or 1073741824 (64-bit)
It should be noted that the maximum semaphores systemwide (semmns) is equal to:
max number of arrays (semmni) * max semaphores per array (semmsl)
You can also verify the semni, msgmni, shmmax parameters after the DB2 installation by using
the db2pd command:
For performance tuning, you might need to adjust other kernel parameters. The manual procedure is as follows:
- Log in as a user with root authority.
- Create a /etc/sysctl.conf file, if it does not already exist.
- Add the kernel entries in the /etc/sysctl.conf file.
sysctl -pto load in sysctl settings from the default file /etc/sysctl.conf.
- Add sysctl -p to a system initialization file to set kernel parameters after each reboot.
The file /etc/hosts contains addresses that are assigned to hostnames (including an IP address, the fully qualified hostname, and the hostname). For optimal performance, you should verify that all the participating servers are defined in the /etc/hosts file.
For data partitioning environments, DB2 requires a remote shell utility to execute commands on a remote partition. Prior to DB2 UDB V8.2.2, the rsh utility was implicitly used as the remote shell mechanism for communicating among the partitions. With the demand of strong authentication and secure network communications, DB2 UDB V8.2.2 allows you to now specify the full path name of the remote shell utility in a registry profile variable called DB2RSHCMD, which adds support for an alternative remote shell utility.
OpenSSH (Open Source Secure Shell) is intended to replace the rsh utility in order to provide secured communication over insecure channels. OpenSSH supports two types of encryption, namely host-based authentication and public-key authentication. For details of using OpenSSH with DB2, refer to the technical article entitled "Configure DB2 Universal Database on Unix to use OpenSSH," which illustrates how to enable OpenSSH with DB2.
For supporting a secured environment, you should use OpenSSH instead of rsh in the
database partition environment. If you choose to enable rsh service, you just need to remove the
"#" character for in.rshd service in /etc/inetd.conf file. To restart the inetd server, you can
For a data partitioning configuration, you must have a file system that is available to all
participating machines that participate in your partitioned database system. This file system will
be used for the DB2 instance home directory. The machine that owns the DB2 instance home directory
is called the instance-owning machine. In order to share the file system, the Network File System
must be configured. For Linux, you can verify that the NFS server is running by issuing the following
showmount -e hostname
This command shows the list of file systems defined in an NFS server. If NFS is not active, you
will receive the following message:
showmount: hostname: RPC: Program not registered
You can configure NFS server using YaST by selecting Misc and Installation Server. Note that you should log on as the root ID.
Figure 4. YaST Installation Server: Selecting the server type
Set up NFS server on instance-owning machine
You need to export the instance owner's directory, which will be shared with the other database partitioning server(s).
- Create the instance-owner directory, and ensure the directory is mounted. Use commands
similar to the following:
- Modify the file system table (fstab), which is located in /etc/fstab to include the new
file system so that the new file system will be mounted at startup time. The entry format
<device> <mountpoint> <filesystemtype> <options> <dump> <fsckorder>
Your entry may look similar to this:
Listing 1. Sample instance-owning /etc/fstab listing
/dev/had7 /db2home ext3 defaults 1 2
This entry uses an ext3 file system, which is supported in Linux kernel versions 2.4.16 and higher, and the order for file system check (fsck) program is set to 2.
After the instance-owning file system has been set up, you can export this file system through an NFS service at startup time by adding the following entry in the /etc/exports file:
Listing 2. Sample /etc/exports listing
db2home InstanceOwnerServer (rw,sync,no_root_squash) db1(rw,sync,no_root_squash)
where InstanceOwnerServer is the instance owner server name and db1 is the other database server name.
The export file options are listed in Table 2:Table 2. Export file options
Option Description rw File system is exported with read-write permission. sync Server waits for files to be written to the disk before the next read. no_root_squash Keeps the root permissions valid.
- Execute the exportfs command to make the instance-owner directory available for NFS clients
where option a is to export all directories listed in the /etc/exports file.
Setup NFS client on participating machines
Once the instance-owner directory has been made available, you can import the directory to the remaining database partitioning servers.
- Create the shared directory in the other servers as follows:
Add an entry to /etc/fstab in order to NFS mount the file system automatically at boot time:
Listing 3. Sample participating machines' /etc/fstab
server1:/db2home /db2home nfs rw,timeo=300,retrans=5,hard,intr,bg,suid
where server1 is an instance-owner server name, and the other options are listed in the following table:Table 3. NFS mounting options
Option Description rw File system is exported with read-write permission. timeo This value is in tenths of a second before sending the first retransmission after an RPC timeout. The default value is 7-tenths. retrans The number of minor timeouts and retransmissions that must occur before a major action is either aborted or a "server not responding" message is printed on the console. hard If an NFS file operation has a major timeout then report "server not responding" on the console and continue retrying indefinitely. This is the default. intr If an NFS file operation has a major timeout and it is hard mounted, then allow signals to interrupt the file operation and operations to be interrupted. bg If the first NFS mount attempt times out, retry the mount in the background. suid This value permits the set-user-identifier bit (SUID) or set group-identifier bit (SGID) to take effect.
- Mount the exported file system on the other database partitioning servers with the following command:
mount server1:/db2home /db2home
Verify the NFS lock daemon
DB2 needs an NFS lock daemon for sharing the instance-owner directory among all participating
machines in a cluster. You can verify if the NFS lock daemon is enabled by using the chkconfig command:
You can enable the NFS lock daemon and restart nfslock, with the following commands:
chkconfig nfslock on
Verify NFS lock and NSM (Network Status Monitor) services
The NSM service, rpc.statd, implements a reboot notification service, which is used by the NFS file locking service (rpc.lockd). You can verify the status of these services using the following command:
ps -ef | grep rpc.statd
ps -ef| grep rps.lockd
In a database partitioning environment, you need to create three groups and user IDs for DB2. They are shown in the table below:
Table 4. Required DB2 groups and users
|User ID||Group ID||Description|
|db2inst1||db2iadm1||DB2 Instance user ID and group ID|
|db2fenc1||db2fadm1||DB2 fenced user ID and group ID|
|dasusr1||dasadm1||DB2 Administration Server user ID and group ID|
The DB2 instance owner ID and password should be the same on all participating machines, as well, its home directory should be located within the NFS-shared directory. The fenced ID should be configured similarly to the DB2 instance owner ID. The DAS user ID and password can be different on all participating machines; however, you might want to keep the DAS user ID and password the same for simple configuration and maintenance.
For example, you can create DB2 groups, create DB2 user IDs, and set initial passwords in all machines with the following commands:
groupadd -g 999 db2iadm1
groupadd -g 998 db2fadm1
groupadd -g 997 dasadm1
useradd -u 1100 -g db2iadm1 -m -d /db2home/db2inst1 db2inst1 -p password1
useradd -u 1101 -g db2fadm1 -m -d /db2home/db22fenc1 db2fenc1 -p password2
useradd -u 1102 -g dasadm1 -m -d /home/dasadm1 dasusr1 -p password3
The db2_install script can be used to install all DB2 ESE packages on your Linux system using the RPM installation utility. You can use this script to first install DB2 on the instance-owning machine and then on the participating machines.
Since DB2 must be installed on each of the participating machines, this is best accomplished by putting the install code on a shared disk (like /db2home, in our case). You can also optionally record your settings into a response file and then use that file to configure the remaining machines.
First, you must log on as a root user to install DB2:
- Enter the db2_install command to start the db2_install script, which is stored in the root directory of the CD-ROM.
- Enter DB2.ESE when db2_install prompts you for the product keyword.
This machine will be considered the instance-owning machine, as it stores and shares the instance owner's home directory. It's within this directory that the sqllib directory exists, as well as the Database Manager Configuration file (DBM CFG), Instance Profile Registry, Node directory, and System Database directory. Since these files are shared, updates done to any one of them will be seen by all participating machines. For configuring the instance-owning machine, you need to perform the following steps:
- Create a DB2 Instance
You should log in to the system using the DB2 instance owner ID, and create a DB2 instance shared by all machines by executing db2icrt under /opt/IBM/db2/V8.2/instance:
./db2icrt -u db2fenc1 db2inst1
where -u specifies the fenced ID.
It is important to note that a DPF Instance will actually exist separately on each participating machine, all using the same instance name, password, and shared home directory. It is the database and not the instance that is partitioned when using DPF.
- Enable FCM
All participating machines must be interconnected by a communication facility, such as a high-speed TCPIP network. The Fast Communications Manager (FCM) is designed to handle communication between database partitioning servers. To enable FCM, a port range must be reserved in the /etc/services file on all participating machines. DB2 provides a default port range that has been reserved in the /etc/services file; by default DB2 modifies the /etc/services file to reserve ports 60000-60003 for inter-partition communication and port 50000 for application communication. DB2 reserves several ports for this during the instance creation. One port for each logical partition is required and each machine will need to reserve the same ports. Each entry contains a port name and port number.
In addition to the base configuration, the FCM ports should appear similar to the following:
Listing 4. FCM ports in /etc/services
DB2c_db2inst1 50000/tcp #Add FCM port information # DB2_db2inst1 60000/tcp DB2_db2inst1_1 60001/tcp DB2_db2inst1_2 60002/tcp DB2_db2inst1_END 60003/tcp
As you see in Listing 4, the port name is composed of the instance name prefixed with "DB2_". The last port name is composed of the instance name, prefixed with "DB2_" and suffixed by "_END". The port names between the first one and the last one are suffixed by "_<sequence number>". The port number must be followed by "/tcp".
- Update DB2 instance parameters
Log into the machine using the instance owner ID and specify which protocols will be used by DB2. To initialize the protocol(s), you need to update the DB2COMM profile registry variable using the following command:
Additionally, you need to update the service name in the database manager (DBM) configuration parameter to indicate which port name will be used for communicating with the DB2 server and remote clients. This can be done using the following command:
db2 update dbm cfg using SVCENAME db2c_db2inst1
- Create DB2 Administration Server (DAS)
You must have a DAS running if you want to use any GUI tools like the Configuration Assistant, Control Center, Replication Center, or the Development Center. You can log on as the DAS user and create a DAS server using the following command:
./dascrt -u dasusr1
where option -u specifies the user ID of the DAS.
You can start the DAS server by logging on as the DAS administration user ID and executing the command
db2admin start. To start the DAS server automatically, you need to use the DAS administration user ID and execute the following:
To display autostart status, you can execute:
- Update db2nodes.cfg file
For the final configuration, you need to update $HOME/sqllib/db2nodes.cfg. The db2nodes.cfg file contains configuration information that tells DB2 which partition on which server participates in the instance. The file can have four columns:
- database partition number
- server hostname
- logical port number
- network interface for inter-partition communication (optional)
Each entry represents a database partition on a particular server that participates in the database partitioning of the instance. For example, the following db2nodes.cfg file defines four data partitions. The data partitions 0 and 1 are on server1, and the data partitions 2 and 3 on server2. Each server has two logical ports (one for each logical partition), namely 0 and 1.
Listing 5. Sample db2nodes.cfg
0 server1 0 1 server1 1 2 server2 0 3 server2 1
- Update the .rhosts file to enable remote commands
In order to enable the execution of remote commands, each database partition must have the authority to perform remote commands on all other participating server(s). Hence, you must supply an entry for each server in the /db2home/db2inst1/.rhosts file. The .rhosts file should have two columns, which are hostname and instance owner ID. For example, the following entry in .rhosts file represents two servers participating in an instance.
Listing 6. Sample .rhosts
server1 db2inst1 server2 db2inst1
You should ensure that only root will have read and write access by using:
chmod 600 /db2home/db2inst1/.rhosts
For configuring the participating machines, you need to do the following steps:
- Enable FCM
You must log on to each participating machine as the root user and add identical port names and port numbers for FCM definitions to the /etc/services file.
- Create DB2 Administration Server
You have to install DAS in all participating machines. For details, refer to Create DB2 Administration Server in Configure on the instance-owning machine.
Once all the updates are done, you might want to reboot the servers, and execute the command
db2_all date on all the participating servers to ensure successful
The following are some important concepts that are applicable to DB2 with DPF, and you should be aware of them.
The DB2 catalog is a set of tables and views that are maintained by the database manager. These tables and views contain metadata; that is, information about the database and its objects, such as descriptions of tables, views, and indexes and security information about the authority that users have on these objects.
The catalog partition is the database partition where the catalog tables for the database are
stored. Each database in a partitioned database environment can have its catalog partition on a
different database partition server. The catalog partition for a database is automatically created
on the database partition server where the
CREATE DATABASE command is
User interaction with a partitioned database occurs through one database partition, known as the coordinator partition for that user. The coordinator partition runs on the same database partition as the application, or in the case of a remote application, the database partition to which that application is connected. Any database partition can be used as or can become a coordinator node.
The coordinator partition is called what it is because of the coordinator agent that runs on it, communicating with the application. You can control which partition you use as the coordinator partition; for example, to connect to partition 3, issue:
set client connect_dbpartitionnum 3
connect to SAMPLE
A database partition group is a set of one or more database partitions. Before creating tables for the database, you first create the database partition group where the tablespaces will be stored, and then you create the tablespace where the tables will be stored.
You can define named subsets of one or more database partitions within a database. Each subset is known as a database partition group. Each subset that contains more than one database partition is known as a multi-partition database partition group. Multi-partition database partition groups can only be defined with database partitions that belong to the same instance. A database partition can belong to more than one partition group.
For example, Figure 5 shows a database partition group made of five partitions:
Figure 5. Sample of 3 database partition groups
In this case, the 'Database partition group 1' spans all but one of the database partitions. It also contains an additional single-partition database partition group called 'Database Partition Group 2.' There is also a single database partition within 'Database Partition Group 3' that is shared with Database Partition Group 1.
A new database partition group can be created with the
CREATE DATABASE PARTITION GROUP statement and modified using the
ALTER DATABASE PARTITION GROUP statement.
Data is divided across all the partitions in a database partition group and you can add or drop one or more database partitions from a database partition group. Each database partition that is part of the database system configuration must already be defined in a partition configuration file called db2nodes.cfg. A database partition group can contain as little as one database partition, or as much as the entire set of database partitions defined for the database system.
When a database partition group is created or modified, a partitioning map is associated with it. A partitioning map, in conjunction with a partitioning key and a hashing algorithm, is used by the database manager to determine which database partition in the database partition group will store a given row of data. In a non-partitioned database, no partitioning key or partitioning map is required. These concepts are discussed in more detail shortly.
A database partition is a part of the database, complete with user data, indexes, configuration files, and transaction logs. There are several default database partition groups that are created when a database is created:
- IBMCATGROUP - default database partition group for the tablespace containing the system catalogs.
- IBMTEMPGROUP is - default database partition group for system temporary tablespaces.
- IBMDEFAULTGROUP - default database partition group for the tablespaces containing the user tables.
A user temporary tablespace, for declared temporary tables, can be created in IBMDEFAULTGROUP or any user-created database partition group, but not in IBMTEMPGROUP.
To create a new partition group consisting of all partitions in db2nodes.cfg, issue the following
create database partition group ALLPART on all dbpartitionnums
If instead you wanted to create a new partition group, only consisting of partitions 1 and 2, you
could issue a command similar to the following:
create database partition group PART12 on dbpartitionnums (1,2)
The base data types of corresponding columns of partitioning keys are compared and can be declared partition compatible. Partition-compatible data types have the property that two variables, one of each type, with the same value, are mapped to the same partition number by the same partitioning algorithm. Partition compatibility has the following characteristics:
- A base data type is compatible with another of the same base data type.
- Internal formats are used for DATE, TIME, and TIMESTAMP data types and are not compatible with each other, and none are compatible with CHAR.
- Partition compatibility is not affected by columns with NOT NULL or FOR BIT DATA definitions.
- NULL values of compatible data types are treated identically; those of non-compatible data types may not be.
- Base data types of a user-defined type are used to analyze partition compatibility.
- Decimals of the same value in the partitioning key are treated identically, even if their scale and precision differ.
- Trailing blanks in character strings (CHAR, VARCHAR, GRAPHIC, or VARGRAPHIC) are ignored by the hashing algorithm.
- BIGINT, SMALLINT, and INTEGER are compatible data types.
- REAL and FLOAT are compatible data types.
- CHAR and VARCHAR of different lengths are compatible data types.
- GRAPHIC and VARGRAPHIC are compatible data types.
- Partition compatibility does not apply to LONG VARCHAR, LONG VARGRAPHIC, CLOB, DBCLOB, and BLOB data types, because they are not supported as partitioning keys.
Collocation is the placement of rows from different tables that contain related data in the same database partition. Collocated tables allow DB2 to use more efficient join strategies.
You may discover that two or more tables frequently contribute data in response to certain queries. In such cases, you will want related data to be located as close together as possible. In an environment where the database is physically divided among two or more database partitions, there must be a way to keep the related pieces of the divided tables as close together as possible, which is called table collocation.
DB2 recognizes, when accessing more than one table for a join or a subquery, that the data to be joined is located at the same database partition. DB2 can then choose to perform the join or subquery at the database partition where the data is stored, instead of having to move data between database partitions. This localized join/subquery ability has significant performance advantages.
For collocation to occur, tables must:
- Be in the same database partition group, one that is not being redistributed. (During redistribution, tables in the database partition group may be using different partitioning maps -- they are not collocated.)
- Have partitioning keys with the same number of columns.
- Have the corresponding columns of the partitioning key be partition compatible.
Collocation can also occur when a table is located within a single partition database partition group defined on the same partition as another table.
Bufferpools are an area of memory into which data pages are read, modified, and held during
processing. By default, when you create a bufferpool it will be created on each partition. For ease
of management, it is usually best to keep the bufferpools the same size on each partition; however,
it is possible to change the size on specific partitions. For example, to customize the size of
partition four's 'BUF8K' bufferpool to 400MB, we could issue:
alter bufferpool BUF8K dbpartitionnum 4 size 51200
You can also selectively choose which partitions to create a bufferpool on by specifying a database partition group, in which case the bufferpool will only be created on partitions that are in that database partition group.
A tablespace is an abstraction of a collection of containers into which database objects are stored. A tablespace provides a level of indirection between a database and the tables stored within the database. A tablespace has space on storage devices assigned to it. The data, index, long field, and LOB portions of a table can be stored in the same tablespace, or can be individually broken out into separate tablespaces for performance reasons.
Database Partition Expressions are typically used when defining containers of a tablespace across multiple logical database partitions of the same machine, to avoid having the same path/device name used by more than one logical partition. This ensures that container names are unique across partitions, by using the database partition number as part of the container name. This is an alternative to manually specifying the location for each partition.
You use the argument " $N" ([blank]$N) to indicate a database partition expression and it can be used anywhere in the container name, and multiple database partition expressions can be specified. Terminate the database partition expression with a space character; whatever follows the space is appended to the container name after the database partition expression is evaluated. If there is no space character in the container name after the database partition expression, it is assumed that the rest of the string is part of the expression.
The argument can only be used in one of the following forms (for this example, assume the partition number is 5):
Table 5. Common database partition expressions
|[blank]$N%[number]||" $N%3" (% is modulus)||2|
So, if strings similar to the above were used and our partition was still 5, we would see:
Table 6. Database partition expressions in use
|'/dbdir/node $N /cont1'||'/dbdir/node5/cont1'|
|'/ $N+1000 /file1'||'/1005/file1'|
|' $N%10 /container'||'5/container'|
|'/dir/ $N%5+2000 /dmscont'||'/dir/2000/dmscont'|
If we issued the following statement on a two-partition database:
CREATE TABLESPACE TBSP1 MANAGED BY DATABASE USING (device '/dev/container $N' 10000)
We would see the following containers created:
- /dev/container0 - on DATABASE PARTITION 0
- /dev/container1 - on DATABASE PARTITION 1
A partitioning key is a column (or group of columns) that is used to determine the partition in which a particular row of data is stored.
A partitioning key is defined on a table with the
statement. If one is not supplied, one is created by default from the first column of the primary
key. If there is no primary key, the default partitioning key is the first non-long/non-LOB
column defined on that table. If no columns satisfy the requirement for a default partitioning
key, the table is created without one, which means the table cannot span more than one partition.
You should bear mind of the following considerations when choosing a partitioning key:
- How tables are to be accessed
- The nature of the query workload
- The join strategies employed by the database system
If collocation is not a major consideration, a good partitioning key for a table is one that spreads the data evenly across all database partitions in the database partition group. The partitioning key for each table in a tablespace that is associated with a database partition group determines if the tables are collocated.
An inappropriate partitioning key can cause uneven data distribution. Columns with unevenly distributed data and columns with a small number of distinct values should not be chosen as a partitioning key. The number of distinct values must be great enough to ensure an even distribution of rows across all database partitions in the database partition group. The cost of applying the partitioning hash algorithm is proportional to the size of the partitioning key. The partitioning key cannot be more than 16 columns, but fewer columns result in better performance. Unnecessary columns should not be included in the partitioning key.
The following points should also be considered when defining partitioning keys:
- Creation of a multiple partition table that contains only long data types (LONG VARCHAR, LONG VARGRAPHIC, BLOB, CLOB, or DBCLOB) is not supported.
- The partitioning key definition cannot be altered.
- The partitioning key should include the most frequently joined columns.
- The partitioning key should be made up of columns that often participate in a GROUP BY clause.
- Any unique key or primary key must contain all of the partitioning key columns.
- In an online transaction processing (OLTP) environment, all columns in the partitioning key
should participate in the transaction by using equal (=) predicates with constants or
host variables. For example, assume you have an employee number, emp_no, that is often
used in transactions such as:
UPDATE emp_table SET ... WHERE emp_no = host-variable
In this case, the EMP_NO column would make a good single-column partitioning key for EMP_TABLE.
Hash partitioning is the method by which the placement of each row in the partitioned table is determined. The method works as follows:
- The hashing algorithm is applied to the value of the partitioning key, and generates a partition number between zero and 4095.
- The partitioning map is created when a database partition group is created. Each of the partition numbers is sequentially repeated in a round-robin fashion to fill the partitioning map.
- The partition number is used as an index into the partitioning map. The number at that location in the partitioning map is the number of the database partition where the row is stored.
In a partitioned database environment, the database manager must have a way of knowing which table rows are stored on which database partition, so that it can find the data it needs. It uses a map, called a partitioning map, to find the data.
A partitioning map is an internally generated array containing either 4,096 entries for multiple-partition database partition groups, or a single entry for single-partition database partition groups.
For a single-partition database partition group, the partitioning map has only one entry containing the partition number of the database partition where all the rows of a database table are stored. For multiple-partition database partition groups, the partition numbers of the database partition group are specified in a round-robin fashion. Just as a city map is organized into sections using a grid, the database manager uses a partitioning key to determine the location (the database partition) where the data is stored.
For example, assume that you have a database created on five database partitions
(numbered 0-4). The partitioning map for the IBMDEFAULTGROUP database partition group of this
database would be:
0 1 2 3 4 0 1 2 3 4 0 1 2...
If a database partition group had been created in the database using database partitions 1
and 2, the partitioning map for that database partition group would be:
1 2 1 2 1 2 1 2...
If the partitioning key for a table to be loaded in the database is an integer that has possible values between 1 and 500,000, the partitioning key is hashed to a partition number between 0 and 4 095. That number is used as an index into the partitioning map to select the database partition for that row.
Partitioning maps are a flexible way of controlling where data is stored in a partitioned database. There is also a redistribution utility that allows you to modify (rebalance or skew) the data distribution across the database partitions in your database, but that is beyond the scope of this article.
The DB2NODE Environment Variable is used to specify the target logical partition that you want to attach to or connect to. If this variable is not set, it will default to the partition that is defined with port 0 on the machine (in the db2nodes.cfg file).
If you wanted to connect to logical partition 2, you would enter the following commands:
terminate command is necessary to ensure that the change to
To identify the current active logical node, you can issue the
db2 "values (current dbpartitionnum)"
These two tools make the task of executing commands across all machines or all partitions, easy.
Both tools share the same useful set of options, which can be displayed with either
rah ?" or "
db2_all ?" from a DB2
db2_all is used to run a command on all
the specified partitions.
db2_all ";db2 update db cfg for SAMPLE using NEWLOGPATH /db2_db/logs"
The semi-colon is used as a prefix to indicate that the command is to run concurrently on partitions.
rah is used to run a command on all machines that make up the partitioned environment.
When performing an offline database backup, the catalog partition needs to be backed separately; however, this is not a concern with online backups, as the logs are included in the backup image. For example, we have a database called sample and a directory /dev/backup, which is accessible from all partitions.
First we need to back up the catalog partition, which resides on partition 0 only by specifying
<<+0<"" (in this example):
db2_all '<<+0< db2 BACKUP DATABASE sample TO /dev/backup'
Next we back up the remaining partitions (exclude partition 0) by specifying
db2_all '|<<-0< db2 BACKUP DATABASE sample TO /dev/backup'
Note that the prefix sequence "|" will run the BACKUP commands in sequence. Now, each partition's backup image can be found in the /dev/backup directory.
The DBPARTITIONNUM function can be used to determine the partition that a row resides on. For example, if used in a SELECT clause, it returns the partition number for each row of the table that was used to form the result of the SELECT statement.
The accepted argument must be the qualified or unqualified name of a column in a table and can contain any data type. The data type of the result is INTEGER and is never null. Since row-level information is returned, the results are the same, regardless of which column is specified for the table. If there is no db2nodes.cfg file, the result is 0.
select lastname, dbpartitionnum(lastname) as part_num from employee order by 1
Listing 7. Result set using dbpartitionnum function
LASTNAME PART_NUM --------------- ----------- ADAMSON 0 BROWN 2 GEYER 1 GOUNOT 1 HAAS 1 HENDERSON 0 JEFFERSON 1 JOHNSON 1 JONES 3
A node directory is a directory that contains information that is necessary to establish communications from a client workstation to all applicable database servers.
A database directory is a directory that contains database access information for all databases to which a client can connect. The entries also reference the node directory.
The DB2 Design Advisor is the most straightforward tool for obtaining efficient partitioning suggestions and can be accessed from the Control Center GUI or db2advis command line tool. An updatable partitioning map is used with a hashing algorithm to specify the mapping of partitioning key to database partitions, which is used to determine the placement and retrieval of each row of data.
A workload can then be spread across multiple partitions for large tables, while allowing smaller tables to be stored on one or more database partitions. Since each database partition has local indexes on its data there is increased performance for local data access.
DB2 also supports partial de-clustering, where tables and tablespaces can be spread across a subset of the available partitions. Depending on the number of database partitions, you may have one or more single-partition database partition groups, and one or more multi-partition database partition groups. Each partition must use a unique partition number and the same database partition may be found in one or more database partition groups.
To ensure fast recovery of the partition containing system catalog tables, avoid placing user tables on the same database partition. This is accomplished by placing user tables in database partition groups that do not include the database partition in the IBMCATGROUP database partition group.
Small tables should be placed in single-partition database partition groups, except when you want to take advantage of collocation with a larger table. Avoid extending medium-sized tables across too many database partitions. For example, a 100 MB table may perform better on a 16-partition database partition group than on a 32-partition database partition group.
You can use database partition groups to separate online transaction processing (OLTP) tables from decision support (DSS) tables, to ensure that the performance of OLTP transactions are not adversely affected.
In a multi-partition database partition group, you can only create a unique index if it is a superset of the partitioning key.
When creating a database, make sure to specify a local (not shared) directory in the "ON"
clause for the database location. For example,
CREATE DATABASE SAMPLE ON /db2_db
where /db2_db is a pre-existing local directory.
The default database path (DBTDBPATH) parameter in the database manager configuration, defaults to the location of the home directory of the instance owner (which is shared by NFS). When creating a database without specifying the database location, it will create a database using DBTDBPATH which points to the shared instance-owner directory. This will degrade performance.
Once the database is created, you should ensure that each data partition should also have its
own log local directory. You can use the following command:
db2_all ";db2 update db cfg for SAMPLE using NEWLOGPATH /db2_db/logs"
The node number is automatically appended to the path. This is done to maintain the uniqueness of the path in multiple logical node configurations.
This article walked you through the rationale of using the Data Partitioning Feature (DPF) in DB2 UDB, while providing a detailed explanation of the DPF installation process and the important concepts and design considerations for enabling DPF in a SUSE Linux Enterprise environment. With a better understanding of DB2 DPF with SUSE Linux environment, you can jumpstart enabling DPF in your SUSE Linux environment.
The article "Configure DB2 Universal Database for UNIX to use OpenSSH" (developerWorks, June 2005) explains how to use OpenSSH as a secure alternative to rsh for configuring remote DB2 partitions.
The article "Install the DB2 UDB data partitioning feature on Linux" (developerWorks, April 2005) walks you through an example of installing the DPF on Redhat Linux.
The article "A colorful introduction to DB2 UDB: Overview of the Database Partitioning Featuree DPF on Redhat Linux" (developerWorks, March 2004) introduces the basics of data partitioning.
Visit the developerWorks DB2 UDB page to expand your skills on DB2 Universal Databases.
Visit the developerWorks
Linux zone to expand your Linux skills.
Stay current with developerWorks technical events and Webcasts.
Get products and technologies
Download a free trial version of DB2 Universal Database Enterprise Servier Edition.
Build your next development project with
IBM trial software, available for download directly from developerWorks.
- Participate in the discussion forum.
Participate in developerWorks
blogs and get involved in the developerWorks community.
Fraser McArthur is a consultant at the IBM Toronto Lab, where DB2 UDB for distributed platforms (Windows/Unix) is developed. He is a member of the Data Management Partner Enablement organization where he works with IBM Business Partners to migrate applications to DB2 and perform performance tuning. Fraser is a DB2 Certified Solutions Expert in both DB2 Administration and Application Development.
Samuel Poon is a DB2 Consultant from the IBM Toronto Laboratory and works primarily with IBM Business Partners. Sam has worked on several articles published on developerWorks. You can reach him at: firstname.lastname@example.org.