Level: Intermediate Fraser McArthur (fgmcarth@ca.ibm.com), DB2 Consultant, IBM Samuel Poon (poons@ca.ibm.com), DB2 Consultant, IBM
12 Jan 2006 Learn the ins and outs and explore the performance and scalability advantages of the IBM DB2® Universal Database™ Data Partitioning Feature (DPF) in DB2 UDB for Linux®, UNIX®, and Windows®. Then, walk through the steps to install and configure DB2 with DPF on SUSE Linux Enterprise Server. You'll also learn important concepts and design considerations that will help you jumpstart your DPF installation in the SUSE Linux Enterprise environment.
Introduction
Database partitioning occurs when a database consists of two or more database partitions, where each database partition stores a subset of table data for each table that resides within it. Partitions can be on separate machines or on the same machine (known as logical partitions). Within this type of database, data is hashed for storage location, to identify which database partition will store the data. A database partition consists of its own data, indexes,
configuration files, and transaction logs. In older versions of DB2, a database partition was referred to as a node or a database node; however, in this article we use the term partition. Logical partitions share resources of the same machine where they reside.
Tables within a partitioned database can be located in one or more database partitions. The idea behind partitioning is to spread a table's data across multiple partitions, storing some of its rows in one partition and other rows in other partitions. The database manager (DBM) handles data retrieval and update by decomposing SQL requests automatically into sub-requests. The sub-requests are then executed in parallel among the applicable database partitions. This
splitting of tables across database partitions is completely transparent to users and means that data is physically stored across more than one database partition, yet accessed as though it were located in the same place.
Figure 1 shows a single partition running on a single machine, which is analogous
to not using DPF:
Figure 1. A single partition on a single machine
The next two figures show DPF in action. Figure 2 illustrates DPF being used across three
different machines, with a single partition on each.
Figure 2. A single partition on each machine using high-speed communications
Note the use of the high-speed communications interconnect, which could be a TCPIP network that is used between the individual machines. The next figure, Figure 3, shows three partitions running on the same machine; inter-partition here is performed within memory, as opposed to a network (DB2 _FORCE_FCM_BP is enabled by default on all Linux platforms).
Figure 3. Three partitions on a single machine using memory for communication
It is important to note that in the above figures, the symmetric multi-processors (SMP) could be replaced by a uni-processor; however, SMP servers are more typical in a DPF environment.
Partitioning keys consist of a column or set of columns from the table and are used to allow users to choose how to partition their data. Using database partition groups and tablespaces, a user can
determine across which and how many database partitions their table data is to be spread.
User interaction occurs through one database partition, known as the coordinator partition for that
user. The coordinator partition runs on the same database partition as the application, or, in the
case of remote applications, the database partition to which that application is connected. Any
database partition can become a coordinator partition.
Understanding the DB2 Data Partitioning Feature (DPF)
In order to partition a database, you will require the Database Partitioning Feature (DPF) of DB2.
This optional Database Partition Feature license is only available for DB2 Enterprise Server Edition
(ESE) and the cost is based on the number of CPUs used in the partitioned database.
When and why to partition
DB2 UDB ESE with DPF is ideal to manage OLAP, data warehousing, and data mining workloads. It can
also make a good choice for large-scale OLTP workloads. The most common partitioned environment consists
of a single database partition on each physical system that makes up the database partition cluster.
The processors on each system are used by the database manager at each database partition to manage its
own part of the total data in the database.
Since the data is divided across database partitions, you can use the power of multiple processors
on multiple computers to satisfy requests for information. Data retrieval and update requests are
decomposed automatically into sub-requests, which are then executed in parallel among the applicable
database partitions. Each additional server means more CPUs, more memory, and more disks for your
partitioned database.
This performance boost stems from the fact that large amounts of data are broken into smaller, more
manageable chunks that each partition/system can work on simultaneously. You may also want to consider
using DPF if you are doing extract, transform, and load operations (ETL) and your batch window is
small. For example, DB2 LOAD could run simultaneously on each partition, splitting the load operation
up in parallel across the servers. Index creations times can also see significant performance boosts.
Furthermore, backup and recovery times can be significantly reduced when using DPF, due to the
smaller amounts of data that each partitioning machine would be dealing with.
Install DB2 V8.2 on SUSE Linux Enterprise Server V9
Here we cover the necessary steps to get DB2 DPF up and running on your SUSE Server (or multiple
server) environment.
Step 1. Verify additional software packages
DB2 requires the following packages to be installed on your Linux system:
- Public domain korn shell (pdksh)
- Remote shell (rsh) or open source secure shell (OpenSSH)
- IBM Java™ 2 Java Run Time
- Network File Server (nfs)
Use the rpm command to verify packages installation:
-
rpm -qa | grep pdksh
-
rpm -qa | grep rsh
or
rpm -qa | grep openssh
-
rpm -qa | grep IBMjava
-
rpm -qa | grep nfs
Step 2. Verify kernel parameters
Before installing Linux, please confirm which versions of Linux are currently recommended and validated for
DB2. This link "http://www.ibm.com/software/data/db2/linux/validate/" shows the supported distributions
of Linux. Ensure a recommended and supported version of Linux is installed.
During the installation of the DB2, you may want to pay attention to some of the Kernel parameters:
Table 1. Kernel parameters that impact DB2
| Kernel parameter | Description |
|---|
| kernel.shmmax | Defines the system-wide maximum allowable shared memory segment size in bytes | | kernel.msgmni | Defines the maximum number of message queue identifiers allowed on the system at any given time | | fs.file-max | Defines the maximum number of file-handles that the Linux kernel will allocate | | kernel.sem | Defines the semaphore limits:
- max semaphores per array (semmsl)
- max semaphores system wide (semmns)
- max ops per semop call (semopm)
- max number of arrays (semmni)
**
Semaphores provide low-level synchronization between processes so that only one process
has access to the shared segment, thereby ensuring the integrity of that shared resource. |
As of Kernel 2.4.18 and later, the file-max parameter's default value is 8192; therefore, you may
not need to modify it. You can check the file-max value, using the following command:
cat /proc/sys/fs/file-max
You can display the current Linux kernel setting using the lpcs command:
lpcs -l
As of DB2 UDB Version 8.2, DB2 checks the values of the semmni, msgmni, and shmmax parameters at
start-up time, and updates them if the current values are not optimal. DB2 will automatically set up
the kernel parameters as follows:
- semmni is changed to 1024
- msgmni is changed to 1024
- shmmax is changed to 268435456 (32-bit) or 1073741824 (64-bit)
It should be noted that the maximum semaphores systemwide (semmns) is equal to:
max number of arrays (semmni) * max semaphores per array (semmsl)
You can also verify the semni, msgmni, shmmax parameters after the DB2 installation by using
the db2pd command:
db2pd -osinfo
For performance tuning, you might need to adjust other kernel parameters. The manual procedure
is as follows:
- Log in as a user with root authority.
- Create a /etc/sysctl.conf file, if it does not already exist.
- Add the kernel entries in the /etc/sysctl.conf file.
- Run
sysctl -p to load in sysctl settings from the default file /etc/sysctl.conf.
- Add sysctl -p to a system initialization file to set kernel parameters after each reboot.
Step 3. Verify network
The file /etc/hosts contains addresses that are assigned to hostnames (including an IP address, the
fully qualified hostname, and the hostname). For optimal performance, you should verify that all
the participating servers are defined in the /etc/hosts file.
Step 4. Enable remote shell utility
For data partitioning environments, DB2 requires a remote shell utility to execute commands on a
remote partition. Prior to DB2 UDB V8.2.2, the rsh utility was implicitly used as the remote shell
mechanism for communicating among the partitions. With the demand of strong authentication and
secure network communications, DB2 UDB V8.2.2 allows you to now specify the full path name of the
remote shell utility in a registry profile variable called DB2RSHCMD, which adds support for an
alternative remote shell utility.
OpenSSH (Open Source Secure Shell) is intended to replace the rsh utility in order to provide
secured communication over insecure channels. OpenSSH supports two types of encryption, namely
host-based authentication and public-key authentication. For details of using OpenSSH with DB2,
refer to the technical article entitled
"Configure DB2 Universal Database on Unix to use OpenSSH,"
which illustrates how to enable OpenSSH with DB2.
For supporting a secured environment, you should use OpenSSH instead of rsh in the
database partition environment. If you choose to enable rsh service, you just need to remove the
"#" character for in.rshd service in /etc/inetd.conf file. To restart the inetd server, you can
enter:
/etc/init.d/inetd restart
Step 5. Set up Network File System (NFS)
For a data partitioning configuration, you must have a file system that is available to all
participating machines that participate in your partitioned database system. This file system will
be used for the DB2 instance home directory. The machine that owns the DB2 instance home directory
is called the instance-owning machine. In order to share the file system, the Network File System
must be configured. For Linux, you can verify that the NFS server is running by issuing the following
command:
showmount -e hostname
This command shows the list of file systems defined in an NFS server. If NFS is not active, you
will receive the following message:
showmount: hostname: RPC: Program not registered
You can configure NFS server using YaST by selecting Misc and Installation Server. Note that you should log on as the root ID.
Figure 4. YaST Installation Server: Selecting the server type
Set up NFS server on instance-owning machine
You need to export the instance owner's directory, which will be shared with the other database
partitioning server(s).
- Create the instance-owner directory, and ensure the directory is mounted. Use commands
similar to the following:
-
mkdir /db2home
-
mount /db2home
- Modify the file system table (fstab), which is located in /etc/fstab to include the new
file system so that the new file system will be mounted at startup time. The entry format
is :
<device> <mountpoint> <filesystemtype> <options> <dump> <fsckorder>
Your entry may look similar to this:
Listing 1. Sample instance-owning /etc/fstab listing
/dev/had7 /db2home ext3 defaults 1 2 |
This entry uses an ext3 file system, which is supported in Linux kernel versions 2.4.16
and higher, and the order for file system check (fsck) program is set to 2.
-
After the instance-owning file system has been set up, you can export this file system
through an NFS service at startup time by adding the following entry in the /etc/exports
file:
Listing 2. Sample /etc/exports listing
db2home InstanceOwnerServer (rw,sync,no_root_squash) db1(rw,sync,no_root_squash) |
where InstanceOwnerServer is the instance owner server name and db1 is the other
database server name.
The export file options are listed in Table 2:
Table 2. Export file options
| Option | Description |
|---|
| rw | File system is exported with read-write permission. | | sync | Server waits for files to be written to the disk before the next read. | | no_root_squash | Keeps the root permissions valid. |
- Execute the exportfs command to make the instance-owner directory available for NFS clients
to mount
/usr/sbin/exportfs -a
where option a is to export all directories listed in the /etc/exports file.
Setup NFS client on participating machines
Once the instance-owner directory has been made available, you can import the directory to
the remaining database partitioning servers.
- Create the shared directory in the other servers as follows:
mkdir /db2home
-
Add an entry to /etc/fstab in order to NFS mount the file system automatically at boot time:
Listing 3. Sample participating machines' /etc/fstab
server1:/db2home /db2home nfs rw,timeo=300,retrans=5,hard,intr,bg,suid |
where server1 is an instance-owner server name, and the other options are listed in
the following table:
Table 3. NFS mounting options
| Option | Description |
|---|
| rw | File system is exported with read-write permission. | | timeo | This value is in tenths of a second before sending the first retransmission after an RPC timeout. The default value is 7-tenths. | | retrans | The number of minor timeouts and retransmissions that must occur before a major action is either aborted or a "server not responding" message is printed on the console. | | hard | If an NFS file operation has a major timeout then report "server not responding" on the console and continue retrying indefinitely. This is the default. | | intr | If an NFS file operation has a major timeout and it is hard mounted, then allow signals to interrupt the file operation and operations to be interrupted. | | bg | If the first NFS mount attempt times out, retry the mount in the background. | | suid | This value permits the set-user-identifier bit (SUID) or set group-identifier bit (SGID) to take effect. |
- Mount the exported file system on the other database partitioning servers with the following command:
mount server1:/db2home /db2home
Verify the NFS lock daemon
DB2 needs an NFS lock daemon for sharing the instance-owner directory among all participating
machines in a cluster. You can verify if the NFS lock daemon is enabled by using the chkconfig command:
chkconfig nfslock
You can enable the NFS lock daemon and restart nfslock, with the following commands:
-
chkconfig nfslock on
-
/etc/rc.d/nfslock restart
Verify NFS lock and NSM (Network Status Monitor) services
The NSM service, rpc.statd, implements a reboot notification service, which is used by the NFS file
locking service (rpc.lockd). You can verify the status of these services using the following command:
-
ps -ef | grep rpc.statd
-
ps -ef| grep rps.lockd
Step 6. Create required groups and users
In a database partitioning environment, you need to create three groups and user IDs for DB2.
They are shown in the table below:
Table 4. Required DB2 groups and users
| User ID | Group ID | Description |
|---|
| db2inst1 | db2iadm1 | DB2 Instance user ID and group ID | | db2fenc1 | db2fadm1 | DB2 fenced user ID and group ID | | dasusr1 | dasadm1 | DB2 Administration Server user ID and group ID |
The DB2 instance owner ID and password should be the same on all participating machines, as well, its home directory should be located within the NFS-shared directory. The fenced ID should be configured similarly to the DB2 instance owner ID. The DAS user ID and password can be
different on all participating machines; however, you might want to keep the DAS user ID and password
the same for simple configuration and maintenance.
For example, you can create DB2 groups, create DB2 user IDs, and set initial passwords in all
machines with the following commands:
-
groupadd -g 999 db2iadm1
-
groupadd -g 998 db2fadm1
-
groupadd -g 997 dasadm1
-
useradd -u 1100 -g db2iadm1 -m -d /db2home/db2inst1 db2inst1 -p password1
-
useradd -u 1101 -g db2fadm1 -m -d /db2home/db22fenc1 db2fenc1 -p password2
-
useradd -u 1102 -g dasadm1 -m -d /home/dasadm1 dasusr1 -p password3
-
passwd db2inst1
-
passwd db2fenc1
-
passwd dasusr1
Step 7. Install DB2 UDB on instance-owning machine and all participating machines
The db2_install script can be used to install all DB2 ESE packages on your Linux system using
the RPM installation utility. You can use this script to first install DB2 on the instance-owning
machine and then on the participating machines.
Since DB2 must be installed on each of the participating machines, this is best accomplished by
putting the install code on a shared disk (like /db2home, in our case). You can also optionally record
your settings into a response file and then use that file to configure the remaining machines.
First, you must log on as a root user to install DB2:
- Enter the db2_install command to start the db2_install script, which is stored in the
root directory of the CD-ROM.
- Enter DB2.ESE when db2_install prompts you for the product keyword.
Step 8. Instance-owning machine configuration
This machine will be considered the instance-owning machine, as it stores and shares the instance
owner's home directory. It's within this directory that the sqllib directory exists, as well as the
Database Manager Configuration file (DBM CFG), Instance Profile Registry, Node directory, and System
Database directory. Since these files are shared, updates done to any one of them will be seen by all
participating machines. For configuring the instance-owning machine, you need to perform the
following steps:
-
Create a DB2 Instance
You should log in to the system using the DB2 instance owner ID, and create a DB2 instance shared
by all machines by executing db2icrt under /opt/IBM/db2/V8.2/instance:
-
cd /opt/IBM/db2/V8.1/instance
-
./db2icrt -u db2fenc1 db2inst1
where -u specifies the fenced ID.
It is important to note that a DPF Instance will actually exist separately on each participating
machine, all using the same instance name, password, and shared home directory. It is the database
and not the instance that is partitioned when using DPF.
-
Enable FCM
All participating machines must be interconnected by a communication facility, such as a
high-speed TCPIP network. The Fast Communications Manager (FCM) is designed to handle communication
between database partitioning servers. To enable FCM, a port range must be reserved in the
/etc/services file on all participating machines. DB2 provides a default port range that has been
reserved in the /etc/services file; by default DB2 modifies the /etc/services file to reserve
ports 60000-60003 for inter-partition communication and port 50000 for application communication.
DB2 reserves several ports for this during the instance creation. One port for each logical
partition is required and each machine will need to reserve the same ports. Each entry contains a
port name and port number.
In addition to the base configuration, the FCM ports should appear similar to the following:
Listing 4. FCM ports in /etc/services
DB2c_db2inst1 50000/tcp
#Add FCM port information
#
DB2_db2inst1 60000/tcp
DB2_db2inst1_1 60001/tcp
DB2_db2inst1_2 60002/tcp
DB2_db2inst1_END 60003/tcp |
As you see in Listing 4, the port name is composed of the instance name prefixed with "DB2_". The last
port name is composed of the instance name, prefixed with "DB2_" and suffixed by "_END". The
port names between the first one and the last one are suffixed by "_<sequence number>".
The port number must be followed by "/tcp".
-
Update DB2 instance parameters
Log into the machine using the instance owner ID and specify which protocols will be used by DB2. To initialize the protocol(s), you need to update the DB2COMM profile registry variable using the following command:
db2set DB2COMM=TCPIP
Additionally, you need to update the service name in the database manager (DBM) configuration
parameter to indicate which port name will be used for communicating with the DB2 server and remote
clients. This can be done using the following command:
db2 update dbm cfg using SVCENAME db2c_db2inst1
-
Create DB2 Administration Server (DAS)
You must have a DAS running if you want to use any GUI tools like the Configuration Assistant,
Control Center, Replication Center, or the Development Center. You can log on as the DAS user and
create a DAS server using the following command:
-
cd /opt/IBM/db2/V8.1/instance
-
./dascrt -u dasusr1
where option -u specifies the user ID of the DAS.
You can start the DAS server by logging on as the DAS administration user ID and executing the
command db2admin start. To start the DAS server automatically, you need
to use the DAS administration user ID and execute the following:
-
cd /opt/IBM/db2/V8.1/instance
-
./dasauto -on
To display autostart status, you can execute
./dasauto ?
-
Update db2nodes.cfg file
For the final configuration, you need to update $HOME/sqllib/db2nodes.cfg. The db2nodes.cfg
file contains configuration information that tells DB2 which partition on which server participates
in the instance. The file can have four columns:
- database partition number
- server hostname
- logical port number
- network interface for inter-partition communication (optional)
Each entry represents a database partition on a particular server that participates in the
database partitioning of the instance. For example, the following db2nodes.cfg file defines four
data partitions. The data partitions 0 and 1 are on server1, and the data partitions 2 and 3 on
server2. Each server has two logical ports (one for each logical partition), namely 0 and 1.
Listing 5. Sample db2nodes.cfg
0 server1 0
1 server1 1
2 server2 0
3 server2 1 |
-
Update the .rhosts file to enable remote commands
In order to enable the execution of remote commands, each database partition must have the
authority to perform remote commands on all other participating server(s). Hence, you must supply
an entry for each server in the /db2home/db2inst1/.rhosts file. The .rhosts file should have
two columns, which are hostname and instance owner ID. For example, the following entry in
.rhosts file represents two servers participating in an instance.
Listing 6. Sample .rhosts
server1 db2inst1
server2 db2inst1 |
You should ensure that only root will have read and write access by using:
chmod 600 /db2home/db2inst1/.rhosts
Step 9. Configure on the participating machines
For configuring the participating machines, you need to do the following steps:
-
Enable FCM
You must log on to each participating machine as the root user and add identical port names and
port numbers for FCM definitions to the /etc/services file.
-
Create DB2 Administration Server
You have to install DAS in all participating machines. For details, refer to Create DB2
Administration Server in Configure on the instance-owning machine.
Step 10. Verification
Once all the updates are done, you might want to reboot the servers, and execute the command
db2_all date on all the participating servers to ensure successful
completion.
Important concepts
The following are some important concepts that are applicable to DB2 with DPF, and you should be
aware of them.
Catalog partition
The DB2 catalog is a set of tables and views that are maintained by the database manager. These
tables and views contain metadata; that is, information about the database and its objects, such
as descriptions of tables, views, and indexes and security information about the authority that
users have on these objects.
The catalog partition is the database partition where the catalog tables for the database are
stored. Each database in a partitioned database environment can have its catalog partition on a
different database partition server. The catalog partition for a database is automatically created
on the database partition server where the CREATE DATABASE command is
run.
Coordinator partition
User interaction with a partitioned database occurs through one database partition, known as
the coordinator partition for that user. The coordinator partition runs on the same database
partition as the application, or in the case of a remote application, the database partition to
which that application is connected. Any database partition can be used as or can become a coordinator
node.
The coordinator partition is called what it is because of the coordinator agent that runs on it, communicating with the application. You can control which partition you use as the coordinator partition; for example, to connect to partition 3, issue:
-
set client connect_dbpartitionnum 3
-
connect to SAMPLE
Database partition groups
A database partition group is a set of one or more database partitions. Before creating tables
for the database, you first create the database partition group where the tablespaces will be
stored, and then you create the tablespace where the tables will be stored.
You can define named subsets of one or more database partitions within a database. Each subset
is known as a database partition group. Each subset that contains more than one database partition is
known as a multi-partition database partition group. Multi-partition database partition groups can
only be defined with database partitions that belong to the same instance. A database partition can
belong to more than one partition group.
For example, Figure 5 shows a database partition group made of five partitions:
Figure 5. Sample of 3 database partition groups
In this case, the 'Database partition group 1' spans all but one of the database partitions. It
also contains an additional single-partition database partition group called 'Database Partition
Group 2.' There is also a single database partition within 'Database Partition Group 3' that is
shared with Database Partition Group 1.
A new database partition group can be created with the
CREATE DATABASE PARTITION GROUP statement and modified using the
ALTER DATABASE PARTITION GROUP statement.
Data is divided across all the partitions in a database partition group and you can add or drop
one or more database partitions from a database partition group. Each database partition that is part
of the database system configuration must already be defined in a partition configuration file
called db2nodes.cfg. A database partition group can contain as little as one database partition, or
as much as the entire set of database partitions defined for the database system.
When a database partition group is created or modified, a partitioning map is associated with it.
A partitioning map, in conjunction with a partitioning key and a hashing algorithm, is used by the
database manager to determine which database partition in the database partition group will store a
given row of data. In a non-partitioned database, no partitioning key or partitioning map is
required. These concepts are discussed in more detail shortly.
A database partition is a part of the database, complete with user data, indexes, configuration
files, and transaction logs. There are several default database partition groups that are created
when a database is created:
- IBMCATGROUP - default database partition group for the tablespace containing the system catalogs.
- IBMTEMPGROUP is - default database partition group for system temporary tablespaces.
- IBMDEFAULTGROUP - default database partition group for the tablespaces containing the user tables.
A user temporary tablespace, for declared temporary tables, can be created in IBMDEFAULTGROUP or
any user-created database partition group, but not in IBMTEMPGROUP.
To create a new partition group consisting of all partitions in db2nodes.cfg, issue the following
command:
create database partition group ALLPART on all dbpartitionnums
If instead you wanted to create a new partition group, only consisting of partitions 1 and 2, you
could issue a command similar to the following:
create database partition group PART12 on dbpartitionnums (1,2)
Partition compatibility
The base data types of corresponding columns of partitioning keys are compared and can be
declared partition compatible. Partition-compatible data types have the property that two variables,
one of each type, with the same value, are mapped to the same partition number by the same
partitioning algorithm. Partition compatibility has the following characteristics:
- A base data type is compatible with another of the same base data type.
- Internal formats are used for DATE, TIME, and TIMESTAMP data types and are not compatible with
each other, and none are compatible with CHAR.
- Partition compatibility is not affected by columns with NOT NULL or FOR BIT DATA definitions.
- NULL values of compatible data types are treated identically; those of non-compatible data
types may not be.
- Base data types of a user-defined type are used to analyze partition compatibility.
- Decimals of the same value in the partitioning key are treated identically, even if their
scale and precision differ.
- Trailing blanks in character strings (CHAR, VARCHAR, GRAPHIC, or VARGRAPHIC) are ignored by
the hashing algorithm.
- BIGINT, SMALLINT, and INTEGER are compatible data types.
- REAL and FLOAT are compatible data types.
- CHAR and VARCHAR of different lengths are compatible data types.
- GRAPHIC and VARGRAPHIC are compatible data types.
- Partition compatibility does not apply to LONG VARCHAR, LONG VARGRAPHIC, CLOB, DBCLOB, and
BLOB data types, because they are not supported as partitioning keys.
Collocation
Collocation is the placement of rows from different tables that contain related data in the same
database partition. Collocated tables allow DB2 to use more efficient join strategies.
You may discover that two or more tables frequently contribute data in response to certain queries.
In such cases, you will want related data to be located as close together as possible. In an
environment where the database is physically divided among two or more database partitions, there
must be a way to keep the related pieces of the divided tables as close together as possible, which
is called table collocation.
DB2 recognizes, when accessing more than one table for a join or a subquery, that the data to be
joined is located at the same database partition. DB2 can then choose to perform the join or subquery
at the database partition where the data is stored, instead of having to move data between database
partitions. This localized join/subquery ability has significant performance advantages.
For collocation to occur, tables must:
- Be in the same database partition group, one that is not being redistributed. (During
redistribution, tables in the database partition group may be using different partitioning
maps -- they are not collocated.)
- Have partitioning keys with the same number of columns.
- Have the corresponding columns of the partitioning key be partition compatible.
Collocation can also occur when a table is located within a single partition database partition group defined on the same partition as another table.
Bufferpools
Bufferpools are an area of memory into which data pages are read, modified, and held during
processing. By default, when you create a bufferpool it will be created on each partition. For ease
of management, it is usually best to keep the bufferpools the same size on each partition; however,
it is possible to change the size on specific partitions. For example, to customize the size of
partition four's 'BUF8K' bufferpool to 400MB, we could issue:
alter bufferpool BUF8K dbpartitionnum 4 size 51200
You can also selectively choose which partitions to create a bufferpool on by specifying a
database partition group, in which case the bufferpool will only be created on partitions that are
in that database partition group.
Tablespaces
A tablespace is an abstraction of a collection of containers into which database objects are
stored. A tablespace provides a level of indirection between a database and the tables stored within
the database. A tablespace has space on storage devices assigned to it. The data, index, long field,
and LOB portions of a table can be stored in the same tablespace, or can be individually broken out
into separate tablespaces for performance reasons.
Database partition expressions
Database Partition Expressions are typically used when defining containers of a tablespace
across multiple logical database partitions of the same machine, to avoid having the same
path/device name used by more than one logical partition. This ensures that container names
are unique across partitions, by using the database partition number as part of the container
name. This is an alternative to manually specifying the location for each partition.
You use the argument " $N" ([blank]$N) to indicate a database partition expression and it can
be used anywhere in the container name, and multiple database partition expressions can be
specified. Terminate the database partition expression with a space character; whatever follows
the space is appended to the container name after the database partition expression is evaluated.
If there is no space character in the container name after the database partition expression, it
is assumed that the rest of the string is part of the expression.
The argument can only be used in one of the following forms (for this example, assume the
partition number is 5):
Table 5. Common database partition expressions
| Syntax | Example | Value |
|---|
| [blank]$N | " $N" | 5 | | [blank]$N+[number] | " $N+1011" | 1016 | | [blank]$N%[number] | " $N%3" (% is modulus)
| 2 | | [blank]$N+[number]%[number] | " $N+12%13" | 4 | | [blank]$N%[number]+[number] | " $N%3+20" | 22 |
So, if strings similar to the above were used and our partition was still 5, we would see:
Table 6. Database partition expressions in use
| Example | Container Name |
|---|
| '/dbdir/node $N /cont1' | '/dbdir/node5/cont1' | | '/ $N+1000 /file1' | '/1005/file1' | | ' $N%10 /container' | '5/container' | | '/dir/ $N%5+2000 /dmscont' | '/dir/2000/dmscont' |
If we issued the following statement on a two-partition database:
CREATE TABLESPACE TBSP1 MANAGED BY DATABASE USING (device '/dev/container $N' 10000)
We would see the following containers created:
- /dev/container0 - on DATABASE PARTITION 0
- /dev/container1 - on DATABASE PARTITION 1
Partitioning keys
A partitioning key is a column (or group of columns) that is used to determine the partition in
which a particular row of data is stored.
A partitioning key is defined on a table with the CREATE TABLE
statement. If one is not supplied, one is created by default from the first column of the primary
key. If there is no primary key, the default partitioning key is the first non-long/non-LOB
column defined on that table. If no columns satisfy the requirement for a default partitioning
key, the table is created without one, which means the table cannot span more than one partition.
You should bear mind of the following considerations when choosing a partitioning key:
- How tables are to be accessed
- The nature of the query workload
- The join strategies employed by the database system
If collocation is not a major consideration, a good partitioning key for a table is one that
spreads the data evenly across all database partitions in the database partition group. The
partitioning key for each table in a tablespace that is associated with a database partition
group determines if the tables are collocated.
An inappropriate partitioning key can cause uneven data distribution. Columns with unevenly
distributed data and columns with a small number of distinct values should not be chosen as a
partitioning key. The number of distinct values must be great enough to ensure an even distribution
of rows across all database partitions in the database partition group. The cost of applying the
partitioning hash algorithm is proportional to the size of the partitioning key. The partitioning
key cannot be more than 16 columns, but fewer columns result in better performance. Unnecessary
columns should not be included in the partitioning key.
The following points should also be considered when defining partitioning keys:
- Creation of a multiple partition table that contains only long data types (LONG VARCHAR,
LONG VARGRAPHIC, BLOB, CLOB, or DBCLOB) is not supported.
- The partitioning key definition cannot be altered.
- The partitioning key should include the most frequently joined columns.
- The partitioning key should be made up of columns that often participate in a GROUP BY clause.
- Any unique key or primary key must contain all of the partitioning key columns.
- In an online transaction processing (OLTP) environment, all columns in the partitioning key
should participate in the transaction by using equal (=) predicates with constants or
host variables. For example, assume you have an employee number, emp_no, that is often
used in transactions such as:
UPDATE emp_table SET ... WHERE emp_no = host-variable
In this case, the EMP_NO column would make a good single-column partitioning key for
EMP_TABLE.
Hash partitioning is the method by which the placement of each row in the partitioned table
is determined. The method works as follows:
- The hashing algorithm is applied to the value of the partitioning key, and generates a
partition number between zero and 4095.
- The partitioning map is created when a database partition group is created. Each of the
partition numbers is sequentially repeated in a round-robin fashion to fill the
partitioning map.
- The partition number is used as an index into the partitioning map. The number at that
location in the partitioning map is the number of the database partition where the row
is stored.
Partitioning maps
In a partitioned database environment, the database manager must have a way of knowing which
table rows are stored on which database partition, so that it can find the data it needs. It uses
a map, called a partitioning map, to find the data.
A partitioning map is an internally generated array containing either 4,096 entries for
multiple-partition database partition groups, or a single entry for single-partition database
partition groups.
For a single-partition database partition group, the partitioning map has only one entry
containing the partition number of the database partition where all the rows of a database table
are stored. For multiple-partition database partition groups, the partition numbers of the
database partition group are specified in a round-robin fashion. Just as a city map is organized
into sections using a grid, the database manager uses a partitioning key to determine the
location (the database partition) where the data is stored.
For example, assume that you have a database created on five database partitions
(numbered 0-4). The partitioning map for the IBMDEFAULTGROUP database partition group of this
database would be:
0 1 2 3 4 0 1 2 3 4 0 1 2...
If a database partition group had been created in the database using database partitions 1
and 2, the partitioning map for that database partition group would be:
1 2 1 2 1 2 1 2...
If the partitioning key for a table to be loaded in the database is an integer that has
possible values between 1 and 500,000, the partitioning key is hashed to a partition number
between 0 and 4 095. That number is used as an index into the partitioning map to select
the database partition for that row.
Partitioning maps are a flexible way of controlling where data is stored in a partitioned
database. There is also a redistribution utility that allows you to modify (rebalance or skew)
the data distribution across the database partitions in your database, but that is beyond
the scope of this article.
DB2NODE environment variable
The DB2NODE Environment Variable is used to specify the target logical partition that you want
to attach to or connect to. If this variable is not set, it will default to the partition that
is defined with port 0 on the machine (in the db2nodes.cfg file).
If you wanted to connect to logical partition 2, you would enter the following commands:
-
DB2NODE=2
-
export DB2NODE
-
db2 terminate
The terminate command is necessary to ensure that the change to
takes effect.
To identify the current active logical node, you can issue the following command:
db2 "values (current dbpartitionnum)"
Issuing commands across machines and partitions
These two tools make the task of executing commands across all machines or all partitions, easy.
Both tools share the same useful set of options, which can be displayed with either
"rah ?" or "db2_all ?" from a DB2 Command Window.
db2_all
db2_all is used to run a command on all the specified partitions.
db2_all ";db2 update db cfg for SAMPLE using NEWLOGPATH /db2_db/logs"
The semi-colon is used as a prefix to indicate that the command is to run concurrently on partitions.
rah
rah is used to run a command on all machines that make up the partitioned environment.
Database backup
When performing an offline database backup, the catalog partition needs to be backed separately;
however, this is not a concern with online backups, as the logs are included in the backup image.
For example, we have a database called sample and a directory /dev/backup, which is accessible from
all partitions.
First we need to back up the catalog partition, which resides on partition 0 only by specifying
"<<+0<"" (in this example):
db2_all '<<+0< db2 BACKUP DATABASE sample TO /dev/backup'
Next we back up the remaining partitions (exclude partition 0) by specifying
"<<-0<":
db2_all '|<<-0< db2 BACKUP DATABASE sample TO /dev/backup'
Note that the prefix sequence "|" will run the BACKUP commands in sequence. Now, each partition's
backup image can be found in the /dev/backup directory.
Row distribution of a table
The DBPARTITIONNUM function can be used to determine the partition that a row resides on. For
example, if used in a SELECT clause, it returns the partition number for each row of the table
that was used to form the result of the SELECT statement.
The accepted argument must be the qualified or unqualified name of a column in a table and can
contain any data type. The data type of the result is INTEGER and is never null. Since row-level
information is returned, the results are the same, regardless of which column is specified for
the table. If there is no db2nodes.cfg file, the result is 0.
For example:
select lastname, dbpartitionnum(lastname) as part_num from employee order by 1
Listing 7. Result set using dbpartitionnum function
LASTNAME PART_NUM
--------------- -----------
ADAMSON 0
BROWN 2
GEYER 1
GOUNOT 1
HAAS 1
HENDERSON 0
JEFFERSON 1
JOHNSON 1
JONES 3 |
Node and database directories
A node directory is a directory that contains information that is necessary to establish
communications from a client workstation to all applicable database servers.
A database directory is a directory that contains database access information for all databases
to which a client can connect. The entries also reference the node directory.
Design considerations
The DB2 Design Advisor is the most straightforward tool for obtaining efficient partitioning
suggestions and can be accessed from the Control Center GUI or db2advis command line tool. An
updatable partitioning map is used with a hashing algorithm to specify the mapping of partitioning
key to database partitions, which is used to determine the placement and retrieval of each row
of data.
A workload can then be spread across multiple partitions for large tables, while allowing
smaller tables to be stored on one or more database partitions. Since each database partition has
local indexes on its data there is increased performance for local data access.
DB2 also supports partial de-clustering, where tables and tablespaces can be spread across a
subset of the available partitions. Depending on the number of database partitions, you may have
one or more single-partition database partition groups, and one or more multi-partition database
partition groups. Each partition must use a unique partition number and the same database partition
may be found in one or more database partition groups.
To ensure fast recovery of the partition containing system catalog tables, avoid placing user
tables on the same database partition. This is accomplished by placing user tables in database
partition groups that do not include the database partition in the IBMCATGROUP database partition
group.
Small tables should be placed in single-partition database partition groups, except when you
want to take advantage of collocation with a larger table. Avoid extending medium-sized tables
across too many database partitions. For example, a 100 MB table may perform better on a 16-partition
database partition group than on a 32-partition database partition group.
You can use database partition groups to separate online transaction processing (OLTP) tables
from decision support (DSS) tables, to ensure that the performance of OLTP transactions are not
adversely affected.
In a multi-partition database partition group, you can only create a unique index if it is a
superset of the partitioning key.
When creating a database, make sure to specify a local (not shared) directory in the "ON"
clause for the database location. For example,
CREATE DATABASE SAMPLE ON /db2_db
where /db2_db is a pre-existing local directory.
The default database path (DBTDBPATH) parameter in the database manager configuration,
defaults to the location of the home directory of the instance owner (which is shared by NFS).
When creating a database without specifying the database location, it will create a database
using DBTDBPATH which points to the shared instance-owner directory. This will degrade
performance.
Once the database is created, you should ensure that each data partition should also have its
own log local directory. You can use the following command:
db2_all ";db2 update db cfg for SAMPLE using NEWLOGPATH /db2_db/logs"
The node number is automatically appended to the path. This is done to maintain the
uniqueness of the path in multiple logical node configurations.
Conclusion
This article walked you through the rationale of using the Data Partitioning Feature (DPF) in DB2 UDB,
while providing a detailed explanation of the DPF installation process and the important concepts
and design considerations for enabling DPF in a SUSE Linux Enterprise environment. With a better
understanding of DB2 DPF with SUSE Linux environment, you can jumpstart enabling DPF in your SUSE Linux
environment.
Resources Learn
Get products and technologies
Discuss
About the authors  | 
|  | Fraser McArthur is a consultant at the IBM Toronto Lab, where DB2 UDB for distributed platforms (Windows/Unix) is developed. He is a member of the Data Management Partner Enablement organization where he works with IBM Business Partners to migrate applications to DB2 and perform performance tuning. Fraser is a DB2 Certified Solutions Expert in both DB2 Administration and Application Development. |
 | 
|  | Samuel Poon is a DB2 Consultant from the IBM Toronto Laboratory and works primarily with IBM Business Partners. Sam has worked on several articles published on developerWorks. You can reach him at: poons@ca.ibm.com. |
Rate this page
|