Leverage data partitioning for scalability and high performance on Linux

Configure DB2 UDB ESE V8.2 with the Database Partitioning Feature on SUSE Linux Enterprise Server

Learn the ins and outs and explore the performance and scalability advantages of the IBM DB2® Universal Database™ Data Partitioning Feature (DPF) in DB2 UDB for Linux®, UNIX®, and Windows®. Then, walk through the steps to install and configure DB2 with DPF on SUSE Linux Enterprise Server. You'll also learn important concepts and design considerations that will help you jumpstart your DPF installation in the SUSE Linux Enterprise environment.

Fraser McArthur (fgmcarth@ca.ibm.com), DB2 Consultant, IBM, Software Group

Photo: Fraser McArthurFraser McArthur is a consultant at the IBM Toronto Lab, where DB2 UDB for distributed platforms (Windows/Unix) is developed. He is a member of the Data Management Partner Enablement organization where he works with IBM Business Partners to migrate applications to DB2 and perform performance tuning. Fraser is a DB2 Certified Solutions Expert in both DB2 Administration and Application Development.

Samuel Poon (poons@ca.ibm.com), DB2 Consultant, IBM, Software Group

Sam Poon photoSamuel Poon is a DB2 Consultant from the IBM Toronto Laboratory and works primarily with IBM Business Partners. Sam has worked on several articles published on developerWorks. You can reach him at: poons@ca.ibm.com.

12 January 2006


Database partitioning occurs when a database consists of two or more database partitions, where each database partition stores a subset of table data for each table that resides within it. Partitions can be on separate machines or on the same machine (known as logical partitions). Within this type of database, data is hashed for storage location, to identify which database partition will store the data. A database partition consists of its own data, indexes, configuration files, and transaction logs. In older versions of DB2, a database partition was referred to as a node or a database node; however, in this article we use the term partition. Logical partitions share resources of the same machine where they reside.

Tables within a partitioned database can be located in one or more database partitions. The idea behind partitioning is to spread a table's data across multiple partitions, storing some of its rows in one partition and other rows in other partitions. The database manager (DBM) handles data retrieval and update by decomposing SQL requests automatically into sub-requests. The sub-requests are then executed in parallel among the applicable database partitions. This splitting of tables across database partitions is completely transparent to users and means that data is physically stored across more than one database partition, yet accessed as though it were located in the same place.

Figure 1 shows a single partition running on a single machine, which is analogous to not using DPF:

Figure 1. A single partition on a single machine
A single partition on a single machine

The next two figures show DPF in action. Figure 2 illustrates DPF being used across three different machines, with a single partition on each.

Figure 2. A single partition on each machine using high-speed communications
A single partition on each machine using high-speed communications

Note the use of the high-speed communications interconnect, which could be a TCPIP network that is used between the individual machines. The next figure, Figure 3, shows three partitions running on the same machine; inter-partition here is performed within memory, as opposed to a network (DB2 _FORCE_FCM_BP is enabled by default on all Linux platforms).

Figure 3. Three partitions on a single machine using memory for communication
Three partitions on a single machine using memory for communication

It is important to note that in the above figures, the symmetric multi-processors (SMP) could be replaced by a uni-processor; however, SMP servers are more typical in a DPF environment.

Partitioning keys consist of a column or set of columns from the table and are used to allow users to choose how to partition their data. Using database partition groups and tablespaces, a user can determine across which and how many database partitions their table data is to be spread.

User interaction occurs through one database partition, known as the coordinator partition for that user. The coordinator partition runs on the same database partition as the application, or, in the case of remote applications, the database partition to which that application is connected. Any database partition can become a coordinator partition.

Understanding the DB2 Data Partitioning Feature (DPF)

In order to partition a database, you will require the Database Partitioning Feature (DPF) of DB2. This optional Database Partition Feature license is only available for DB2 Enterprise Server Edition (ESE) and the cost is based on the number of CPUs used in the partitioned database.

When and why to partition

DB2 UDB ESE with DPF is ideal to manage OLAP, data warehousing, and data mining workloads. It can also make a good choice for large-scale OLTP workloads. The most common partitioned environment consists of a single database partition on each physical system that makes up the database partition cluster. The processors on each system are used by the database manager at each database partition to manage its own part of the total data in the database.

Since the data is divided across database partitions, you can use the power of multiple processors on multiple computers to satisfy requests for information. Data retrieval and update requests are decomposed automatically into sub-requests, which are then executed in parallel among the applicable database partitions. Each additional server means more CPUs, more memory, and more disks for your partitioned database.

This performance boost stems from the fact that large amounts of data are broken into smaller, more manageable chunks that each partition/system can work on simultaneously. You may also want to consider using DPF if you are doing extract, transform, and load operations (ETL) and your batch window is small. For example, DB2 LOAD could run simultaneously on each partition, splitting the load operation up in parallel across the servers. Index creations times can also see significant performance boosts. Furthermore, backup and recovery times can be significantly reduced when using DPF, due to the smaller amounts of data that each partitioning machine would be dealing with.

Install DB2 V8.2 on SUSE Linux Enterprise Server V9

Here we cover the necessary steps to get DB2 DPF up and running on your SUSE Server (or multiple server) environment.

Step 1. Verify additional software packages

DB2 requires the following packages to be installed on your Linux system:

  • Public domain korn shell (pdksh)
  • Remote shell (rsh) or open source secure shell (OpenSSH)
  • IBM Java™ 2 Java Run Time
  • Network File Server (nfs)

Use the rpm command to verify packages installation:

  • rpm -qa | grep pdksh
  • rpm -qa | grep rsh or rpm -qa | grep openssh
  • rpm -qa | grep IBMjava
  • rpm -qa | grep nfs

Step 2. Verify kernel parameters

Before installing Linux, please confirm which versions of Linux are currently recommended and validated for DB2. This link "http://www.ibm.com/software/data/db2/linux/validate/" shows the supported distributions of Linux. Ensure a recommended and supported version of Linux is installed.

During the installation of the DB2, you may want to pay attention to some of the Kernel parameters:

Table 1. Kernel parameters that impact DB2
Kernel parameterDescription
kernel.shmmaxDefines the system-wide maximum allowable shared memory segment size in bytes
kernel.msgmniDefines the maximum number of message queue identifiers allowed on the system at any given time
fs.file-maxDefines the maximum number of file-handles that the Linux kernel will allocate
kernel.semDefines the semaphore limits:
  • max semaphores per array (semmsl)
  • max semaphores system wide (semmns)
  • max ops per semop call (semopm)
  • max number of arrays (semmni)
** Semaphores provide low-level synchronization between processes so that only one process has access to the shared segment, thereby ensuring the integrity of that shared resource.

As of Kernel 2.4.18 and later, the file-max parameter's default value is 8192; therefore, you may not need to modify it. You can check the file-max value, using the following command:
cat /proc/sys/fs/file-max

You can display the current Linux kernel setting using the lpcs command:
lpcs -l

As of DB2 UDB Version 8.2, DB2 checks the values of the semmni, msgmni, and shmmax parameters at start-up time, and updates them if the current values are not optimal. DB2 will automatically set up the kernel parameters as follows:

  • semmni is changed to 1024
  • msgmni is changed to 1024
  • shmmax is changed to 268435456 (32-bit) or 1073741824 (64-bit)

It should be noted that the maximum semaphores systemwide (semmns) is equal to:

max number of arrays (semmni) * max semaphores per array (semmsl)

You can also verify the semni, msgmni, shmmax parameters after the DB2 installation by using the db2pd command:
db2pd -osinfo

For performance tuning, you might need to adjust other kernel parameters. The manual procedure is as follows:

  1. Log in as a user with root authority.
  2. Create a /etc/sysctl.conf file, if it does not already exist.
  3. Add the kernel entries in the /etc/sysctl.conf file.
  4. Run sysctl -p to load in sysctl settings from the default file /etc/sysctl.conf.
  5. Add sysctl -p to a system initialization file to set kernel parameters after each reboot.

Step 3. Verify network

The file /etc/hosts contains addresses that are assigned to hostnames (including an IP address, the fully qualified hostname, and the hostname). For optimal performance, you should verify that all the participating servers are defined in the /etc/hosts file.

Step 4. Enable remote shell utility

For data partitioning environments, DB2 requires a remote shell utility to execute commands on a remote partition. Prior to DB2 UDB V8.2.2, the rsh utility was implicitly used as the remote shell mechanism for communicating among the partitions. With the demand of strong authentication and secure network communications, DB2 UDB V8.2.2 allows you to now specify the full path name of the remote shell utility in a registry profile variable called DB2RSHCMD, which adds support for an alternative remote shell utility.

OpenSSH (Open Source Secure Shell) is intended to replace the rsh utility in order to provide secured communication over insecure channels. OpenSSH supports two types of encryption, namely host-based authentication and public-key authentication. For details of using OpenSSH with DB2, refer to the technical article entitled "Configure DB2 Universal Database on Unix to use OpenSSH," which illustrates how to enable OpenSSH with DB2.

For supporting a secured environment, you should use OpenSSH instead of rsh in the database partition environment. If you choose to enable rsh service, you just need to remove the "#" character for in.rshd service in /etc/inetd.conf file. To restart the inetd server, you can enter:
/etc/init.d/inetd restart

Step 5. Set up Network File System (NFS)

For a data partitioning configuration, you must have a file system that is available to all participating machines that participate in your partitioned database system. This file system will be used for the DB2 instance home directory. The machine that owns the DB2 instance home directory is called the instance-owning machine. In order to share the file system, the Network File System must be configured. For Linux, you can verify that the NFS server is running by issuing the following command:
showmount -e hostname

This command shows the list of file systems defined in an NFS server. If NFS is not active, you will receive the following message:
showmount: hostname: RPC: Program not registered

You can configure NFS server using YaST by selecting Misc and Installation Server. Note that you should log on as the root ID.

Figure 4. YaST Installation Server: Selecting the server type
YaST Installation Server: Selecting the server type

Set up NFS server on instance-owning machine

You need to export the instance owner's directory, which will be shared with the other database partitioning server(s).

  1. Create the instance-owner directory, and ensure the directory is mounted. Use commands similar to the following:
    • mkdir /db2home
    • mount /db2home
  2. Modify the file system table (fstab), which is located in /etc/fstab to include the new file system so that the new file system will be mounted at startup time. The entry format is :

    <device> <mountpoint> <filesystemtype> <options> <dump> <fsckorder>

    Your entry may look similar to this:

    Listing 1. Sample instance-owning /etc/fstab listing
     /dev/had7 /db2home ext3 defaults 1 2

    This entry uses an ext3 file system, which is supported in Linux kernel versions 2.4.16 and higher, and the order for file system check (fsck) program is set to 2.

  3. After the instance-owning file system has been set up, you can export this file system through an NFS service at startup time by adding the following entry in the /etc/exports file:

    Listing 2. Sample /etc/exports listing
     db2home InstanceOwnerServer (rw,sync,no_root_squash) db1(rw,sync,no_root_squash)

    where InstanceOwnerServer is the instance owner server name and db1 is the other database server name.

    The export file options are listed in Table 2:

    Table 2. Export file options
    rwFile system is exported with read-write permission.
    syncServer waits for files to be written to the disk before the next read.
    no_root_squashKeeps the root permissions valid.
  4. Execute the exportfs command to make the instance-owner directory available for NFS clients to mount

    /usr/sbin/exportfs -a

    where option a is to export all directories listed in the /etc/exports file.

Setup NFS client on participating machines

Once the instance-owner directory has been made available, you can import the directory to the remaining database partitioning servers.

  1. Create the shared directory in the other servers as follows:

    mkdir /db2home

  2. Add an entry to /etc/fstab in order to NFS mount the file system automatically at boot time:

    Listing 3. Sample participating machines' /etc/fstab
     server1:/db2home /db2home nfs rw,timeo=300,retrans=5,hard,intr,bg,suid

    where server1 is an instance-owner server name, and the other options are listed in the following table:

    Table 3. NFS mounting options
    rwFile system is exported with read-write permission.
    timeoThis value is in tenths of a second before sending the first retransmission after an RPC timeout. The default value is 7-tenths.
    retransThe number of minor timeouts and retransmissions that must occur before a major action is either aborted or a "server not responding" message is printed on the console.
    hardIf an NFS file operation has a major timeout then report "server not responding" on the console and continue retrying indefinitely. This is the default.
    intrIf an NFS file operation has a major timeout and it is hard mounted, then allow signals to interrupt the file operation and operations to be interrupted.
    bgIf the first NFS mount attempt times out, retry the mount in the background.
    suidThis value permits the set-user-identifier bit (SUID) or set group-identifier bit (SGID) to take effect.
  3. Mount the exported file system on the other database partitioning servers with the following command:

    mount server1:/db2home /db2home

Verify the NFS lock daemon

DB2 needs an NFS lock daemon for sharing the instance-owner directory among all participating machines in a cluster. You can verify if the NFS lock daemon is enabled by using the chkconfig command: chkconfig nfslock

You can enable the NFS lock daemon and restart nfslock, with the following commands:

  • chkconfig nfslock on
  • /etc/rc.d/nfslock restart

Verify NFS lock and NSM (Network Status Monitor) services

The NSM service, rpc.statd, implements a reboot notification service, which is used by the NFS file locking service (rpc.lockd). You can verify the status of these services using the following command:

  • ps -ef | grep rpc.statd
  • ps -ef| grep rps.lockd

Step 6. Create required groups and users

In a database partitioning environment, you need to create three groups and user IDs for DB2. They are shown in the table below:

Table 4. Required DB2 groups and users
User IDGroup IDDescription
db2inst1db2iadm1DB2 Instance user ID and group ID
db2fenc1db2fadm1DB2 fenced user ID and group ID
dasusr1dasadm1DB2 Administration Server user ID and group ID

The DB2 instance owner ID and password should be the same on all participating machines, as well, its home directory should be located within the NFS-shared directory. The fenced ID should be configured similarly to the DB2 instance owner ID. The DAS user ID and password can be different on all participating machines; however, you might want to keep the DAS user ID and password the same for simple configuration and maintenance.

For example, you can create DB2 groups, create DB2 user IDs, and set initial passwords in all machines with the following commands:

  • groupadd -g 999 db2iadm1
  • groupadd -g 998 db2fadm1
  • groupadd -g 997 dasadm1
  • useradd -u 1100 -g db2iadm1 -m -d /db2home/db2inst1 db2inst1 -p password1
  • useradd -u 1101 -g db2fadm1 -m -d /db2home/db22fenc1 db2fenc1 -p password2
  • useradd -u 1102 -g dasadm1 -m -d /home/dasadm1 dasusr1 -p password3
  • passwd db2inst1
  • passwd db2fenc1
  • passwd dasusr1

Step 7. Install DB2 UDB on instance-owning machine and all participating machines

The db2_install script can be used to install all DB2 ESE packages on your Linux system using the RPM installation utility. You can use this script to first install DB2 on the instance-owning machine and then on the participating machines.

Since DB2 must be installed on each of the participating machines, this is best accomplished by putting the install code on a shared disk (like /db2home, in our case). You can also optionally record your settings into a response file and then use that file to configure the remaining machines.

First, you must log on as a root user to install DB2:

  • Enter the db2_install command to start the db2_install script, which is stored in the root directory of the CD-ROM.
  • Enter DB2.ESE when db2_install prompts you for the product keyword.

Step 8. Instance-owning machine configuration

This machine will be considered the instance-owning machine, as it stores and shares the instance owner's home directory. It's within this directory that the sqllib directory exists, as well as the Database Manager Configuration file (DBM CFG), Instance Profile Registry, Node directory, and System Database directory. Since these files are shared, updates done to any one of them will be seen by all participating machines. For configuring the instance-owning machine, you need to perform the following steps:

  1. Create a DB2 Instance

    You should log in to the system using the DB2 instance owner ID, and create a DB2 instance shared by all machines by executing db2icrt under /opt/IBM/db2/V8.2/instance:

    • cd /opt/IBM/db2/V8.1/instance
    • ./db2icrt -u db2fenc1 db2inst1

    where -u specifies the fenced ID.

    It is important to note that a DPF Instance will actually exist separately on each participating machine, all using the same instance name, password, and shared home directory. It is the database and not the instance that is partitioned when using DPF.

  2. Enable FCM

    All participating machines must be interconnected by a communication facility, such as a high-speed TCPIP network. The Fast Communications Manager (FCM) is designed to handle communication between database partitioning servers. To enable FCM, a port range must be reserved in the /etc/services file on all participating machines. DB2 provides a default port range that has been reserved in the /etc/services file; by default DB2 modifies the /etc/services file to reserve ports 60000-60003 for inter-partition communication and port 50000 for application communication. DB2 reserves several ports for this during the instance creation. One port for each logical partition is required and each machine will need to reserve the same ports. Each entry contains a port name and port number.

    In addition to the base configuration, the FCM ports should appear similar to the following:

    Listing 4. FCM ports in /etc/services
     DB2c_db2inst1               50000/tcp
     #Add FCM port information 
     DB2_db2inst1           60000/tcp
     DB2_db2inst1_1         60001/tcp
     DB2_db2inst1_2         60002/tcp
     DB2_db2inst1_END       60003/tcp

    As you see in Listing 4, the port name is composed of the instance name prefixed with "DB2_". The last port name is composed of the instance name, prefixed with "DB2_" and suffixed by "_END". The port names between the first one and the last one are suffixed by "_<sequence number>". The port number must be followed by "/tcp".

  3. Update DB2 instance parameters

    Log into the machine using the instance owner ID and specify which protocols will be used by DB2. To initialize the protocol(s), you need to update the DB2COMM profile registry variable using the following command:
    db2set DB2COMM=TCPIP

    Additionally, you need to update the service name in the database manager (DBM) configuration parameter to indicate which port name will be used for communicating with the DB2 server and remote clients. This can be done using the following command:
    db2 update dbm cfg using SVCENAME db2c_db2inst1

  4. Create DB2 Administration Server (DAS)

    You must have a DAS running if you want to use any GUI tools like the Configuration Assistant, Control Center, Replication Center, or the Development Center. You can log on as the DAS user and create a DAS server using the following command:

    • cd /opt/IBM/db2/V8.1/instance
    • ./dascrt -u dasusr1

    where option -u specifies the user ID of the DAS.

    You can start the DAS server by logging on as the DAS administration user ID and executing the command db2admin start. To start the DAS server automatically, you need to use the DAS administration user ID and execute the following:

    • cd /opt/IBM/db2/V8.1/instance
    • ./dasauto -on

    To display autostart status, you can execute:
    ./dasauto ?

  5. Update db2nodes.cfg file

    For the final configuration, you need to update $HOME/sqllib/db2nodes.cfg. The db2nodes.cfg file contains configuration information that tells DB2 which partition on which server participates in the instance. The file can have four columns:

    • database partition number
    • server hostname
    • logical port number
    • network interface for inter-partition communication (optional)

    Each entry represents a database partition on a particular server that participates in the database partitioning of the instance. For example, the following db2nodes.cfg file defines four data partitions. The data partitions 0 and 1 are on server1, and the data partitions 2 and 3 on server2. Each server has two logical ports (one for each logical partition), namely 0 and 1.

    Listing 5. Sample db2nodes.cfg
     0     server1     0
     1     server1     1
     2     server2     0
     3     server2     1
  6. Update the .rhosts file to enable remote commands

    In order to enable the execution of remote commands, each database partition must have the authority to perform remote commands on all other participating server(s). Hence, you must supply an entry for each server in the /db2home/db2inst1/.rhosts file. The .rhosts file should have two columns, which are hostname and instance owner ID. For example, the following entry in .rhosts file represents two servers participating in an instance.

    Listing 6. Sample .rhosts
     server1 db2inst1
     server2 db2inst1

    You should ensure that only root will have read and write access by using:
    chmod 600 /db2home/db2inst1/.rhosts

Step 9. Configure on the participating machines

For configuring the participating machines, you need to do the following steps:

  1. Enable FCM

    You must log on to each participating machine as the root user and add identical port names and port numbers for FCM definitions to the /etc/services file.

  2. Create DB2 Administration Server

    You have to install DAS in all participating machines. For details, refer to Create DB2 Administration Server in Configure on the instance-owning machine.

Step 10. Verification

Once all the updates are done, you might want to reboot the servers, and execute the command db2_all date on all the participating servers to ensure successful completion.

Important concepts

The following are some important concepts that are applicable to DB2 with DPF, and you should be aware of them.

Catalog partition

The DB2 catalog is a set of tables and views that are maintained by the database manager. These tables and views contain metadata; that is, information about the database and its objects, such as descriptions of tables, views, and indexes and security information about the authority that users have on these objects.

The catalog partition is the database partition where the catalog tables for the database are stored. Each database in a partitioned database environment can have its catalog partition on a different database partition server. The catalog partition for a database is automatically created on the database partition server where the CREATE DATABASE command is run.

Coordinator partition

User interaction with a partitioned database occurs through one database partition, known as the coordinator partition for that user. The coordinator partition runs on the same database partition as the application, or in the case of a remote application, the database partition to which that application is connected. Any database partition can be used as or can become a coordinator node.

The coordinator partition is called what it is because of the coordinator agent that runs on it, communicating with the application. You can control which partition you use as the coordinator partition; for example, to connect to partition 3, issue:

  • set client connect_dbpartitionnum 3
  • connect to SAMPLE

Database partition groups

A database partition group is a set of one or more database partitions. Before creating tables for the database, you first create the database partition group where the tablespaces will be stored, and then you create the tablespace where the tables will be stored.

You can define named subsets of one or more database partitions within a database. Each subset is known as a database partition group. Each subset that contains more than one database partition is known as a multi-partition database partition group. Multi-partition database partition groups can only be defined with database partitions that belong to the same instance. A database partition can belong to more than one partition group.

For example, Figure 5 shows a database partition group made of five partitions:

Figure 5. Sample of 3 database partition groups
Sample of 3 database partition groups

In this case, the 'Database partition group 1' spans all but one of the database partitions. It also contains an additional single-partition database partition group called 'Database Partition Group 2.' There is also a single database partition within 'Database Partition Group 3' that is shared with Database Partition Group 1.

A new database partition group can be created with the CREATE DATABASE PARTITION GROUP statement and modified using the ALTER DATABASE PARTITION GROUP statement.

Data is divided across all the partitions in a database partition group and you can add or drop one or more database partitions from a database partition group. Each database partition that is part of the database system configuration must already be defined in a partition configuration file called db2nodes.cfg. A database partition group can contain as little as one database partition, or as much as the entire set of database partitions defined for the database system.

When a database partition group is created or modified, a partitioning map is associated with it. A partitioning map, in conjunction with a partitioning key and a hashing algorithm, is used by the database manager to determine which database partition in the database partition group will store a given row of data. In a non-partitioned database, no partitioning key or partitioning map is required. These concepts are discussed in more detail shortly.

A database partition is a part of the database, complete with user data, indexes, configuration files, and transaction logs. There are several default database partition groups that are created when a database is created:

  • IBMCATGROUP - default database partition group for the tablespace containing the system catalogs.
  • IBMTEMPGROUP is - default database partition group for system temporary tablespaces.
  • IBMDEFAULTGROUP - default database partition group for the tablespaces containing the user tables.

A user temporary tablespace, for declared temporary tables, can be created in IBMDEFAULTGROUP or any user-created database partition group, but not in IBMTEMPGROUP.

To create a new partition group consisting of all partitions in db2nodes.cfg, issue the following command:
create database partition group ALLPART on all dbpartitionnums

If instead you wanted to create a new partition group, only consisting of partitions 1 and 2, you could issue a command similar to the following:
create database partition group PART12 on dbpartitionnums (1,2)

Partition compatibility

The base data types of corresponding columns of partitioning keys are compared and can be declared partition compatible. Partition-compatible data types have the property that two variables, one of each type, with the same value, are mapped to the same partition number by the same partitioning algorithm. Partition compatibility has the following characteristics:

  • A base data type is compatible with another of the same base data type.
  • Internal formats are used for DATE, TIME, and TIMESTAMP data types and are not compatible with each other, and none are compatible with CHAR.
  • Partition compatibility is not affected by columns with NOT NULL or FOR BIT DATA definitions.
  • NULL values of compatible data types are treated identically; those of non-compatible data types may not be.
  • Base data types of a user-defined type are used to analyze partition compatibility.
  • Decimals of the same value in the partitioning key are treated identically, even if their scale and precision differ.
  • Trailing blanks in character strings (CHAR, VARCHAR, GRAPHIC, or VARGRAPHIC) are ignored by the hashing algorithm.
  • BIGINT, SMALLINT, and INTEGER are compatible data types.
  • REAL and FLOAT are compatible data types.
  • CHAR and VARCHAR of different lengths are compatible data types.
  • GRAPHIC and VARGRAPHIC are compatible data types.
  • Partition compatibility does not apply to LONG VARCHAR, LONG VARGRAPHIC, CLOB, DBCLOB, and BLOB data types, because they are not supported as partitioning keys.


Collocation is the placement of rows from different tables that contain related data in the same database partition. Collocated tables allow DB2 to use more efficient join strategies.

You may discover that two or more tables frequently contribute data in response to certain queries. In such cases, you will want related data to be located as close together as possible. In an environment where the database is physically divided among two or more database partitions, there must be a way to keep the related pieces of the divided tables as close together as possible, which is called table collocation.

DB2 recognizes, when accessing more than one table for a join or a subquery, that the data to be joined is located at the same database partition. DB2 can then choose to perform the join or subquery at the database partition where the data is stored, instead of having to move data between database partitions. This localized join/subquery ability has significant performance advantages.

For collocation to occur, tables must:

  • Be in the same database partition group, one that is not being redistributed. (During redistribution, tables in the database partition group may be using different partitioning maps -- they are not collocated.)
  • Have partitioning keys with the same number of columns.
  • Have the corresponding columns of the partitioning key be partition compatible.

Collocation can also occur when a table is located within a single partition database partition group defined on the same partition as another table.


Bufferpools are an area of memory into which data pages are read, modified, and held during processing. By default, when you create a bufferpool it will be created on each partition. For ease of management, it is usually best to keep the bufferpools the same size on each partition; however, it is possible to change the size on specific partitions. For example, to customize the size of partition four's 'BUF8K' bufferpool to 400MB, we could issue:
alter bufferpool BUF8K dbpartitionnum 4 size 51200

You can also selectively choose which partitions to create a bufferpool on by specifying a database partition group, in which case the bufferpool will only be created on partitions that are in that database partition group.


A tablespace is an abstraction of a collection of containers into which database objects are stored. A tablespace provides a level of indirection between a database and the tables stored within the database. A tablespace has space on storage devices assigned to it. The data, index, long field, and LOB portions of a table can be stored in the same tablespace, or can be individually broken out into separate tablespaces for performance reasons.

Database partition expressions

Database Partition Expressions are typically used when defining containers of a tablespace across multiple logical database partitions of the same machine, to avoid having the same path/device name used by more than one logical partition. This ensures that container names are unique across partitions, by using the database partition number as part of the container name. This is an alternative to manually specifying the location for each partition.

You use the argument " $N" ([blank]$N) to indicate a database partition expression and it can be used anywhere in the container name, and multiple database partition expressions can be specified. Terminate the database partition expression with a space character; whatever follows the space is appended to the container name after the database partition expression is evaluated. If there is no space character in the container name after the database partition expression, it is assumed that the rest of the string is part of the expression.

The argument can only be used in one of the following forms (for this example, assume the partition number is 5):

Table 5. Common database partition expressions
[blank]$N" $N"5
[blank]$N+[number]" $N+1011"1016
[blank]$N%[number]" $N%3" (% is modulus)2
[blank]$N+[number]%[number]" $N+12%13"4
[blank]$N%[number]+[number]" $N%3+20"22

So, if strings similar to the above were used and our partition was still 5, we would see:

Table 6. Database partition expressions in use
ExampleContainer Name
'/dbdir/node $N /cont1''/dbdir/node5/cont1'
'/ $N+1000 /file1''/1005/file1'
' $N%10 /container''5/container'
'/dir/ $N%5+2000 /dmscont''/dir/2000/dmscont'

If we issued the following statement on a two-partition database:

We would see the following containers created:

  • /dev/container0 - on DATABASE PARTITION 0
  • /dev/container1 - on DATABASE PARTITION 1

Partitioning keys

A partitioning key is a column (or group of columns) that is used to determine the partition in which a particular row of data is stored.

A partitioning key is defined on a table with the CREATE TABLE statement. If one is not supplied, one is created by default from the first column of the primary key. If there is no primary key, the default partitioning key is the first non-long/non-LOB column defined on that table. If no columns satisfy the requirement for a default partitioning key, the table is created without one, which means the table cannot span more than one partition.

You should bear mind of the following considerations when choosing a partitioning key:

  • How tables are to be accessed
  • The nature of the query workload
  • The join strategies employed by the database system

If collocation is not a major consideration, a good partitioning key for a table is one that spreads the data evenly across all database partitions in the database partition group. The partitioning key for each table in a tablespace that is associated with a database partition group determines if the tables are collocated.

An inappropriate partitioning key can cause uneven data distribution. Columns with unevenly distributed data and columns with a small number of distinct values should not be chosen as a partitioning key. The number of distinct values must be great enough to ensure an even distribution of rows across all database partitions in the database partition group. The cost of applying the partitioning hash algorithm is proportional to the size of the partitioning key. The partitioning key cannot be more than 16 columns, but fewer columns result in better performance. Unnecessary columns should not be included in the partitioning key.

The following points should also be considered when defining partitioning keys:

  • Creation of a multiple partition table that contains only long data types (LONG VARCHAR, LONG VARGRAPHIC, BLOB, CLOB, or DBCLOB) is not supported.
  • The partitioning key definition cannot be altered.
  • The partitioning key should include the most frequently joined columns.
  • The partitioning key should be made up of columns that often participate in a GROUP BY clause.
  • Any unique key or primary key must contain all of the partitioning key columns.
  • In an online transaction processing (OLTP) environment, all columns in the partitioning key should participate in the transaction by using equal (=) predicates with constants or host variables. For example, assume you have an employee number, emp_no, that is often used in transactions such as:

    UPDATE emp_table SET ... WHERE emp_no = host-variable

    In this case, the EMP_NO column would make a good single-column partitioning key for EMP_TABLE.

Hash partitioning is the method by which the placement of each row in the partitioned table is determined. The method works as follows:

  1. The hashing algorithm is applied to the value of the partitioning key, and generates a partition number between zero and 4095.
  2. The partitioning map is created when a database partition group is created. Each of the partition numbers is sequentially repeated in a round-robin fashion to fill the partitioning map.
  3. The partition number is used as an index into the partitioning map. The number at that location in the partitioning map is the number of the database partition where the row is stored.

Partitioning maps

In a partitioned database environment, the database manager must have a way of knowing which table rows are stored on which database partition, so that it can find the data it needs. It uses a map, called a partitioning map, to find the data.

A partitioning map is an internally generated array containing either 4,096 entries for multiple-partition database partition groups, or a single entry for single-partition database partition groups.

For a single-partition database partition group, the partitioning map has only one entry containing the partition number of the database partition where all the rows of a database table are stored. For multiple-partition database partition groups, the partition numbers of the database partition group are specified in a round-robin fashion. Just as a city map is organized into sections using a grid, the database manager uses a partitioning key to determine the location (the database partition) where the data is stored.

For example, assume that you have a database created on five database partitions (numbered 0-4). The partitioning map for the IBMDEFAULTGROUP database partition group of this database would be:
0 1 2 3 4 0 1 2 3 4 0 1 2...

If a database partition group had been created in the database using database partitions 1 and 2, the partitioning map for that database partition group would be:
1 2 1 2 1 2 1 2...

If the partitioning key for a table to be loaded in the database is an integer that has possible values between 1 and 500,000, the partitioning key is hashed to a partition number between 0 and 4 095. That number is used as an index into the partitioning map to select the database partition for that row.

Partitioning maps are a flexible way of controlling where data is stored in a partitioned database. There is also a redistribution utility that allows you to modify (rebalance or skew) the data distribution across the database partitions in your database, but that is beyond the scope of this article.

DB2NODE environment variable

The DB2NODE Environment Variable is used to specify the target logical partition that you want to attach to or connect to. If this variable is not set, it will default to the partition that is defined with port 0 on the machine (in the db2nodes.cfg file).

If you wanted to connect to logical partition 2, you would enter the following commands:

  • DB2NODE=2
  • export DB2NODE
  • db2 terminate

The terminate command is necessary to ensure that the change to takes effect.

To identify the current active logical node, you can issue the following command:
db2 "values (current dbpartitionnum)"

Issuing commands across machines and partitions

These two tools make the task of executing commands across all machines or all partitions, easy. Both tools share the same useful set of options, which can be displayed with either "rah ?" or "db2_all ?" from a DB2 Command Window.


db2_all is used to run a command on all the specified partitions.
db2_all ";db2 update db cfg for SAMPLE using NEWLOGPATH /db2_db/logs"

The semi-colon is used as a prefix to indicate that the command is to run concurrently on partitions.


rah is used to run a command on all machines that make up the partitioned environment.

Database backup

When performing an offline database backup, the catalog partition needs to be backed separately; however, this is not a concern with online backups, as the logs are included in the backup image. For example, we have a database called sample and a directory /dev/backup, which is accessible from all partitions.

First we need to back up the catalog partition, which resides on partition 0 only by specifying "<<+0<"" (in this example):
db2_all '<<+0< db2 BACKUP DATABASE sample TO /dev/backup'

Next we back up the remaining partitions (exclude partition 0) by specifying "<<-0<":
db2_all '|<<-0< db2 BACKUP DATABASE sample TO /dev/backup'

Note that the prefix sequence "|" will run the BACKUP commands in sequence. Now, each partition's backup image can be found in the /dev/backup directory.

Row distribution of a table

The DBPARTITIONNUM function can be used to determine the partition that a row resides on. For example, if used in a SELECT clause, it returns the partition number for each row of the table that was used to form the result of the SELECT statement.

The accepted argument must be the qualified or unqualified name of a column in a table and can contain any data type. The data type of the result is INTEGER and is never null. Since row-level information is returned, the results are the same, regardless of which column is specified for the table. If there is no db2nodes.cfg file, the result is 0.

For example:

select lastname, dbpartitionnum(lastname) as part_num from employee order by 1

Listing 7. Result set using dbpartitionnum function
--------------- -----------
ADAMSON                   0
BROWN                     2
GEYER                     1
GOUNOT                    1
HAAS                      1
HENDERSON                 0
JEFFERSON                 1
JOHNSON                   1
JONES                     3

Node and database directories

A node directory is a directory that contains information that is necessary to establish communications from a client workstation to all applicable database servers.

A database directory is a directory that contains database access information for all databases to which a client can connect. The entries also reference the node directory.

Design considerations

The DB2 Design Advisor is the most straightforward tool for obtaining efficient partitioning suggestions and can be accessed from the Control Center GUI or db2advis command line tool. An updatable partitioning map is used with a hashing algorithm to specify the mapping of partitioning key to database partitions, which is used to determine the placement and retrieval of each row of data.

A workload can then be spread across multiple partitions for large tables, while allowing smaller tables to be stored on one or more database partitions. Since each database partition has local indexes on its data there is increased performance for local data access.

DB2 also supports partial de-clustering, where tables and tablespaces can be spread across a subset of the available partitions. Depending on the number of database partitions, you may have one or more single-partition database partition groups, and one or more multi-partition database partition groups. Each partition must use a unique partition number and the same database partition may be found in one or more database partition groups.

To ensure fast recovery of the partition containing system catalog tables, avoid placing user tables on the same database partition. This is accomplished by placing user tables in database partition groups that do not include the database partition in the IBMCATGROUP database partition group.

Small tables should be placed in single-partition database partition groups, except when you want to take advantage of collocation with a larger table. Avoid extending medium-sized tables across too many database partitions. For example, a 100 MB table may perform better on a 16-partition database partition group than on a 32-partition database partition group.

You can use database partition groups to separate online transaction processing (OLTP) tables from decision support (DSS) tables, to ensure that the performance of OLTP transactions are not adversely affected.

In a multi-partition database partition group, you can only create a unique index if it is a superset of the partitioning key.

When creating a database, make sure to specify a local (not shared) directory in the "ON" clause for the database location. For example,

where /db2_db is a pre-existing local directory.

The default database path (DBTDBPATH) parameter in the database manager configuration, defaults to the location of the home directory of the instance owner (which is shared by NFS). When creating a database without specifying the database location, it will create a database using DBTDBPATH which points to the shared instance-owner directory. This will degrade performance.

Once the database is created, you should ensure that each data partition should also have its own log local directory. You can use the following command:
db2_all ";db2 update db cfg for SAMPLE using NEWLOGPATH /db2_db/logs"

The node number is automatically appended to the path. This is done to maintain the uniqueness of the path in multiple logical node configurations.


This article walked you through the rationale of using the Data Partitioning Feature (DPF) in DB2 UDB, while providing a detailed explanation of the DPF installation process and the important concepts and design considerations for enabling DPF in a SUSE Linux Enterprise environment. With a better understanding of DB2 DPF with SUSE Linux environment, you can jumpstart enabling DPF in your SUSE Linux environment.



Get products and technologies



developerWorks: Sign in

Required fields are indicated with an asterisk (*).

Need an IBM ID?
Forgot your IBM ID?

Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.


All information submitted is secure.

Dig deeper into Information management on developerWorks

Zone=Information Management, Linux
ArticleTitle=Leverage data partitioning for scalability and high performance on Linux