A colorful introduction to DB2 UDB, Version 8 for UNIX, Linux, and Windows: Overview of the Database Partitioning Feature

Part 2 of 2

This article provides an introduction to DB2 UDB for Linux, UNIX and Windows Version 8 with the Database Partitioning Feature (DPF). It is the second part of a two part series.

Share:

Raul Chong (rfchong@ca.ibm.com), Database consultant, IBM

Raul F. Chong is a database consultant from the IBM Toronto Laboratory and works primarily with IBM Business Partners. Raul has worked at IBM for six years, three years in DB2 Technical Support, and the other three as a consultant specializing in database application development and migrations from other RDBMS to DB2.


developerWorks Contributing author
        level

18 March 2004

Introduction

In part 1 of this article series, you were introduced to the IBM® DB2® Universal Database™ (UDB) for Linux®, UNIX®, and Windows® environments. Instances, configuration parameters, connectivity information, and different database objects were described. In this second part of the series we introduce you to the Database Partitioning Feature (DPF) available on DB2 UDB Enterprise Server Edition (ESE). With DPF your database is scalable as you can add new machines and spread your database across them. This means more CPUs, more memory and more disks from each of the additional machines for your database!. DB2 UDB ESE with DPF is ideal to manage data warehousing, data mining and online analytical processing (OLAP) workloads. It can also work well with online transaction processing (OLTP) workloads. Note that there is no technical requirement to enable this feature other than purchasing the required license. From a user's perspective, you will connect to the database and issue queries as usual without a need to know the database is spread among several machines.

What you learned in part 1 where we discussed single-partition environments is still applicable to a multi-partition one; all the concepts remain the same, though we need to point out some implementation differences. For example, we explain in which machine your configuration files, connectivity information and so on reside. We also introduce a few new concepts like partitions, partition groups, coordinator node, and so on which were not relevant in a single-partition environment.

As in the first part, figures are used to visually explain concepts. In this article, I use DB2 to refer to DB2 Universal Database Enterprise Server Edition Version 8.1 for Linux, UNIX and Windows. I would suggest you print this article rather than reading it online, as I will reference many of the figures all along. Concepts will be explained briefly at first, and will then be tied in last section in a case study.

Note: Prior to Version 8, DB2 UDB ESE with DPF was known as DB2 UDB Enterprise Extended Edition (EEE).

Database Partitions

When a database is partitioned, you split your database into different independent parts, each consisting of its own data, configuration files, indexes and transaction logs. Each of these parts is a database partition. You can assign multiple partitions to a single physical machine. These are sometimes called 'logical partitions' and they share the resources of the machine.

A single-partition database is a database with only one partition. We talked about this type of environment in part 1 of this article series.

A multi-partition database (also referred to as a partitioned database), is a database with two or more partitions. This type of database is what we discuss in this part of this article series. Depending on your hardware environment, there can be several configurations in which you can partition your database. Figure 1 shows configurations of physical partitions, one partition per machine.

Figure 1 - Database partition configurations with one partition per machine
Database partition configurations with one partition per machine
Note: In Figure 1, the SMP systems can be replaced by uniprocessor systems. Uniprocessor systems are fairly rare now for environments using DB2 with DPF, therefore we have omitted them from the figure.

Figure 2 shows more multi-partition configurations with several partitions per machine.

Figure 2 - Database partition configurations with multiple partitions per machine.
Figure 2 - Database partition configurations with multiple partitions per machine.
Note: Prior to Version 8, the term 'node' was used instead of 'partition'. With version 8 there may still be some concepts, commands and messages using this term. Also note that the node directory concept has no relationship whatsoever to the partition concept.

In order to leverage the knowledge gained in part 1 , and to visualize how a DB2 environment is split in a DPF system, we show in Figure 3 a partial reproduction of the figure in part 1 of the article series, and how it has been split into three physical partitions, one partition per SMP machine. Thus, we show how the DB2 environment changes from a single partition multi-processor configuration (shown in I of Figure 1) to a multi-partition multi-processor configuration (shown in II of Figure 1).

We have changed the machine in the original figure to use the Linux operating system instead of the Windows operating system. DPF is more popular in Linux/UNIX environments than in Windows environments, therefore we will focus our discussion on these platforms, though most concepts are the same in Windows.

Figure 3 - The DB2 Environment in DB2 UDB ESE with DPF
The DB2 Environment in DB2 UDB ESE with DPF

As you can see from Figure 3, the DB2 environment is 'split' so that it now resides in three different multi-processor machines running the same operating system (Linux, for this example), and also running the same DB2 version (version 8.1) at the same fixpak level. Note it is not the purpose of this figure to illustrate how to migrate from an single-partition database system to a multi-partition database system, but to visually show where files would be located on a new installation of a multi-partition system.

It is also important to note, all of the machines participating in a DPF environment have to be interconnected by a communication facility which could be a network using the TCPIP protocol. Several TCPIP ports are reserved on each machine for this 'interpartition' communication. For example, by default after installation, the services file in Linux (/etc/services) is updated as follows (assuming you chose to create the db2inst1 instance):

DB2_db2inst1		60000/tcp
DB2_db2inst1_1		60001/tcp
DB2_db2inst1_2		60002/tcp
DB2_db2inst1_END	60003/tcp
db2c_db2inst1		50000/tcp

The ports 60000 through 60003 are reserved for inter-partition communication by default. You need to update the services file with the correct number of entries depending on the number of partitions you are configuring.

When partitions within the same machine are involved, communication between the partitions still requires this setup. Only in AIX can you force inter-partition communication to be performed in memory by setting the DB2 registry variable DB2_FORCE_FCM_BP to YES.

Port 50000 is used by default to connect to the db2inst1 instance. When a client needs to connect to a DPF system, it needs to issue some catalog commands as described in part 1 of this article series, and also explained in the Connectivity Cheatsheet article. The host name to be used in these commands can be from any of the machines participating in the DPF environment. The machine selected would become the coordinator node, unless the NODE option of the CONNECT statement is used. The concept of coordinator node is described later in this article.

Note: Note: The DB2 entries in the services file for each of the participating machines in a DPF environment should be the same.

The node configuration file (db2nodes.cfg)

The db2nodes.cfg file contains information about the database partitions and the servers they reside on that will participate in an instance. The information of this file is used in some cases to determine the command scope. The section issuing commands and SQL statements in a DPF environment' later in this article discuss in more detail command scope.

Figure 4 shows an example of the db2nodes.cfg file for a cluster of four UNIX servers where two partitions on each server are created (option IV in Figure 2).

Figure 4 - The db2nodes.cfg for a cluster of four UNIX servers with 2 logical partitions per server
The db2nodes.cfg for a cluster of four UNIX servers with 2 logical partitions per server

The columns in a Linux/UNIX environment for the db2nodes.cfg file are:

Partition_number Hostname  Logical_Port netname  resourcename

The partition number, the first column in the db2nodes.cfg file, indicates the number that identifies the partition within DB2. In the figure, we see there will be 8 partitions in total starting from partition 0. The numbering of the partitions must be in ascending order, and can start from any number. Gaps between the numbers are also allowed. Note that the numbering used will be important as it will be taken into consideration in commands or SQL statements as we will see in subsequent sections of this article.

The second column is the TCP/IP hostname of the server where the partition is created.

The third column is optional, but must be used if you create more than one partition on the same server. This column specifies the logical port for the partition within the server and must be unique within a server. For the example highlighted in Fgure 4, we can see the mapping between the db2nodes.cfg entry for server 'myserverb', and how the physical machine looks like.

The fourth column, the netname, is optional and is used to support a host that has more than one active TCP/IP interface, each with its own hostname.

The fifth column, resourcename, is optional an applies only to AIX, HP-UX, and Solaris operating systems. It specifies the operating system resource that the partition should be started in.

On Windows, there is also another column, the computer name, which contains the computer name for the machine on which a partition resides. The order of the columns is:

Partition_number  Hostname	Computer_name    Logical_Port    netname

The db2nodes.cfg file must be located as follows:

  • Under the SQLLIB directory for the instance owner on Linux and UNIX
  • Under the SQLLIB\<Instance name> directory on Windows

In Figure 3, it would be included in the Linux3 machine (We did not show it there because we did not want to introduce new concepts on that figure).

On Linux and UNIX you can edit the db2nodes.cfg file with any ASCII editor or using DB2 commands. On Windows, you can only use the db2ncrt and db2ndrop commands to create and drop database partitions; the db2nodes.cfg file should not be edited directly.

You can also use the db2start command to add and or remove a database partition from the DB2 instance and the db2nodes.cfg file using the add dbpartitionnum and drop dbpartitionnum clauses respectively.


Installing DB2 in a DPF environment

In a DPF environment each of the participating machines needs to have DB2 installed locally. To ease the installation of DB2 in other machines, we recommend you to create a response file when you first install DB2 on the first machine. Once this response file is created, you can transfer the file to the other machines and run an unattended installation using the command db2setup -r <response file> on each machine.

The DB2 install code from the installation CD can reside on a shared disk, from which you proceed to locally install DB2. For example, in Figure 3 let's assume the Linux2 machine has an associated disk where the install code from the installation CD has been copied. This disk is set up so that it can be shared by the other two machines, Linux1 and Linux3. Next, from each of the machines, the db2setup command can be executed to locally install DB2, therefore each machine will have its own local directory /opt/IBM/db2/V8.1 containing the DB2 installed binaries. You may think an alternative would be to install DB2 locally only on the Linux2 machine, and share the installed binaries with Linux1 and Linux3, however this configuration is not supported.

Note: In Linux, the DB2 installed binaries are locally stored in directory /opt/IBM/db2/V8.1. In other platforms DB2 is installed in other directories. Use the db2level command to determine where the installed code has been locally installed in your machine.

An instance in the DPF environment

Partitioning is a concept that applies at the database level, not the instance level. Figure 3 may give you the impression that an instance can be partitioned. What in fact happens in a DPF environment is that an instance is separately created on each machine, however, they all use the same characteristics: The same instance name, same password, and a shared home directory.

In Linux and UNIX, an instance maps to an operating system user, therefore, when an instance is created, it will have its own home directory. For example, in most installations /home/<user name> is used as the home directory. All the instances separately created on each of the participating machines of a DPF environment must use the same name and password. In addition, you must specify the home directory of the corresponding operating system user to be the same directory for all instances, which must be created in a shared disk (NFS share). Figure 5 shows an example.

Figure 5 - An instance in a partitioned environment
An instance in a partitioned environment

In Figure 5, the instance 'myInst' has been created separately in each of the three Linux machines (Linux1, Linux2 and Linux3). myInst maps to an operating system user of the same name which in the figure has a home directory of /db2home/myInst. Note that /db2home/myInst is shared by all the three Linux machines as it resides in a shared disk which is local to Linux3. Since the instance owner directory is locally stored in the Linux3 machine, this machine is considered to be the DB2 Instance owning machine.

Figure 5 also shows that the Database Administration Server (DAS) user, db2as, is created locally on each participating machine of a DPF environment. There can only be one DAS per physical machine regardless of the number of partitions that machine contains. The DAS user's home directory cannot be mounted on a shared disk. Alternatively, different userid and passwords can be used to create the DAS in different machines.

Note: In each of the participating machines of a DPF environment, make sure the passwords for the instances are the same, otherwise the partitioned system will look like it is hanging as the partitions are not able to talk with each other.

The instance owning machine

As indicated earlier, the DB2 Instance owning machine in a DPF environment is the one whose associated disk physically stores the instance home directory. The instance owning machine as shown in Figure 3 has several files under the sqllib directory or its subdirectories in the home directory:

  • Instance level profile registry
  • Database Manager Configuration File (dbm cfg)
  • System db directory
  • Node directory
  • DCS directory

If you change the value of any of the parameters or connectivity information in these files, all participating machines will be able to see the change as only one shared file is being updated. For example, if you update the dbm cfg parameter intra_parallel from NO to YES using a command like this from the Linux1 machine:

db2 update dbm cfg using intra_parallel yes

and then you issue the following command from the Linux2 machine:

db2 get dbm cfg

You are able to see from Linux2 that the change took effect as you are updating one single shared file physically stored on machine Linux3. For parameters that are not configurable online, a db2stop/db2start is required. Issuing these commands from any partition will affect all partitions specified in the db2nodes.cfg file. In a Linux/UNIX environment, you will receive messages from each of the partitions. The case study will demonstrate how this is performed.


Partitioning a database

In order to 'partition' your database, create a directory with the same name locally in each of the different participating machines. For figure 1 for example, let's assume the directory /data has been created locally in each machine. Then, when you execute the command:

create database MYDB2 on /data

the following directory structure is automatically built for you:

/data/<instance name>/NODExxxx/SQLyyyyy

where:

/data: Directory specified in the create database command. This directory must exist before executing the create database command.

<instance name>: Name of the instance. For example 'myInst'

NODExxxx: This is used to distinguish which partition you are working with, where 'xxxx' represents the number of the partition as specified in the db2nodes.cfg file which we will discuss later in this article

SQLyyyyy: This identifies the database, where 'yyyyy' represents a number. If you only have one database created in your system, then yyyyy is equal to 00001, if you have 3 databases in your system, you will have different directories as follows: SQL00001, SQL00002, SQL00003. In order to map the database names to these directories, you can review the local database directory using the command:

db2 list db directory on /data

Inside the SQLyyyyy directories, there will be subdirectories for table spaces, and within them, files containing database data (assuming all table spaces are defined as SMS). Figure 6 shows a visual example of a partitioned database.

Figure 6 - A partitioned database created in the /data directory
A partitioned database created in the /data directory

If you don't specify a path in your create database command, by default the database is created in the directory specfied by the dbm cfg parameter DFTDBPATH, which defaults to the the instance owner home directory. The structure created under this scenario would be:

/db2home
   /myInst
      /NODE0000
          /SQL00001
      /NODE0001
          /SQL00001
      /NODE0002
          /SQL00001

This partitioning is not optimal as all the database data would reside in one single disk which is shared by the other machines. Creating a database like this in a DPF environment defeats the purpose of partitioning.

Note: Before creating a database, change the value of dbm cfg parameter DFTDBPATH or make sure to include a path in your create database command. In either case, the DFTDBPATH or path should point to a directory created locally with the same name in all the participating machines of your DPF system. This directory must be created ahead of time. Similarly, to create the SAMPLE database, specify the path in the command as in: db2sampl <path>

Configuration Files in a DPF environment

In a previous section we indicated that the instance level profile registry, the database manager configuration (dbm cfg) file, the system database directory, node directory and DCS directory are all part of the instance owning machine, and are not partitioned. What about the other configuration files?

Environment variables:
Each participating machine in a partitioned environment can have different environment variables.

Global level profile registry variable:
This is stored in file default.env, in a subdirectory under /var. There is a different local copy of this file for each machine.

Database Configuration File (db cfg):
The db cfg is stored in file SQLDBCON, in the corresponding directory for the database in question. In a partitioned database environment, a separate SQLDBCON file exists for each database partition.

The local database directory:
The local database directory is stored in file SQLDBDIR in the corresponding directory for the database in question. It has the same name as the system db directory, which is located under the instance directory. A separate SQLDBDIR file exists for each database partition.

Note: The values of the Global level profile registry variables, database configuration file parameters and local database directory entries may be different at each database partition, however the recommendation is that each of the values on all partitions be the same.

Logs in a DPF environment

Each database partition should keep its logs separate from each other. Therefore the db cfg parameter 'Path to log files' in each of the partitions should not point to a shared disk directory, but to a local directory. The default log path in each partition will include a NODE000x subdirectory as part of the log path. For example, the value of this parameter in the DPF system shown in Figure 3 could be:

For Partition 0:  /datalogs/db2inst1/NODE0000/SQL00001/SQLOGDIR/
For Partition 1:  /datalogs/db2inst1/NODE0001/SQL00001/SQLOGDIR/
For Partition 2:  /datalogs/db2inst1/NODE0002/SQL00001/SQLOGDIR/

If you were to manually update this path, use the db cfg parameter NEWLOGPATH. As we will see in a later section this command can be executed once and affect all partitions using the db2_all command.


The Catalog Partition

When a database is created, several table spaces are created by default. One of them is SYSCATSPACE which contains the DB2 Catalog. In a DPF environment, SYSCATSPACE cannot be partitioned, but must reside in one partition, known as the Catalog Partition. The partition from which the create database command is issued becomes the Catalog partition for the new database. All access to system tables must go through this database partition.

Figure 3 shows SYSCATSPACE residing on machine Linux1, therefore the create database command was issued from this partition and this becomes the Catalog partition. If you create several databases for a particular instance, make sure to issue the create database command from different partitions so that their corresponding Catalog tables are created on different partitions.

For an existing database, you can determine which partition is the Catalog partition by issuing the command list db directory The output of this command has the field 'Catalog database partition number' for each of the entries indicating the Catalog partition number of the given database. Alternatively, you can connect to the database from each of the different machines and issue the list tablespaces show detail command. Only the Catalog partition will show the SYSCATSPACE table space.


Partition Groups

A partition group is a logical layer that allows the grouping of one or more partitions to perform operations on all the partitions in the group. A database partition can belong to more than one partition group. When a database is created, DB2 will create three default partition groups which cannot be dropped:

  • IBMDEFAULTGROUP: This is the default partition group for any table you create. It consists of all database partitions as defined in the db2nodes.cfg. This partition group cannot be modified. Table space USERSPACE1 is created in this partition group.
  • IBMTEMPGROUP: This partition group is used by all system temporary tables. It also consists of all database partitions as defined in the db2nodes.cfg. Table space TEMPSPACE1 is created in this partition.
  • IBMCATGROUP: This partition group contains the catalog tables (table space SYSCATSPACE), and thus, it only includes the database's catalog partition. This partition group cannot be modified.

To create new database partition groups, you can use the create database partition group statement. This statement will create the database partition group within the database, assign database partitions that you specified to the partition group, and then record the partition group definition in the database system catalog tables.

For example, the following statement creates a partition group 'pgrpall' on all partitions specified in the db2nodes.cfg file:

create database partition group pgrpall on all dbpartitionnums

To create a database partition group 'pg23' consisting of partitions 2 and 3, issue this command:

create database partition group pg23 on dbpartitionnums (2,3)

Other relevant partition group statements/commands are:

  • alter database partition group, statement to add or drop a partition in the group
  • drop database partition group, statement to drop a partition group
  • list database partition group, command to list all your partition groups (IBMTEMPGROUP will not be listed)

Buffer pools in a DPF environment

In Figure 3, we showed bufferpools split across the different partitions. Interpreting this figure for buffer pools, is different than for the other objects, because the data cached in the bufferpools in not partitioned as the figure may imply. What is actually happening is that buffer pools in a DPF environment can be tailored to the different partitions. Using the CREATE BUFFERPOOL statement with the DATABASE PARTITION GROUP clause, you can associate a bufferpool to a given partition group. What this means is that you can have the flexibility to define the buffer pool to the specific partitions defined in the partition group. In addition, the size of the buffer pool on each partition in the partition group can be different if desired. For example, the following statement will create buffer pool 'bpool_1' in partition group 'pg1', which let's assume consists of partitions 2, 3 and 4:

CREATE BUFFERPOOL bpool_1 DATABASE PARTITION GROUP pg1 1 
       SIZE 10000 
      EXCEPT ON DBPARTITIONNUM (3 TO 4) SIZE 5000

Each partition in partition group pg1 will have a bufferpool bpool_1 defined with a size of 10000. Partition 3 and 4 will have a bufferpool of size 5000. Using an analogy, think of it as if you were issuing the CREATE BUFFERPOOL statement on each partition separately where the buffer pool name for each partition was the same, but indicating different sizes, that is:

In partition 2:   CREATE BUFFERPOOL bpool_1 SIZE 10000 
In partition 3:   CREATE BUFFERPOOL bpool_1 SIZE 5000 
In partition 4:   CREATE BUFFERPOOL bpool_1 SIZE 5000

Note the above statements are used to clarify the analogy, executing each of these commands as shown will attempt to create a bufferpool using all partitions. It is not equivalent to using the DATABASE PARTITION GROUP clause of the CREATE BUFFERPOOL statement.

Buffer pools can in addition, be associated to several partition groups. All this means is that the bufferpool definition will be applied to the partitions in the given associated partition groups.


Table spaces in a DPF environment

A table space can be created in specific partitions by associating it to a given partition group. The CREATE TABLESPACE statement with the IN DATABASE PARTITION GROUP clause can be used for this purpose. This allows users to have flexibility as to which partitions will actually be storing their tables. For example, the statement:

CREATE REGULAR TABLESPACE mytbls IN DATABASE PARTITION GROUP pg1 
   MANAGED BY SYSTEM USING ('/data')  BUFFERPOOL bpool_1

creates the table space mytbls which spans partitions 2, 3 and 4 (assuming pg1 is a partition group consisting of these partitions). In addition, the table space is associated with buffer pool bpool_1 defined in an earlier section. Note that the creation of the table space would fail if you provide conflicting partition information between the table space and the associated buffer pool. For example, if bpool_1 was created for partitions 5 and 6, and table space mytbls was created for partitions 2, 3 and 4, then you would get an error creating the table space.


The Coordinator node

In general, for each database connection there will be a corresponding DB2 agent performing the DB2 work. An agent can be thought of as a process (in Linux/UNIX) or thread (in Windows) which performs DB2 work on behalf of the application. There are different types of agents, one of them being the coordinator agent. The coordinator agent is the agent that communicates with the application, receiving requests and sending replies. It can either satisfy the request itself or delegate the work to multiple subagents to work on the request.

The coordinator node of a given application is the partition where the coordinator agent exists. The coordinator node can also be set with the SET CLIENT CONNECT_NODE command. When an application issues a query, parts of these database requests are sent by the coordinator node to subagents at the other partitions; and all results from the other partitions are consolidated at the coordinator node before being sent back to the application.

Any partition can potentially be a coordinator node, therefore in Figure 3, we do not specifically point to any partition as the coordinator node. If you would like to know more about DB2 agents and the DB2 process model, refer to the resources section in this article.


Issuing commands and SQL statements in a DPF environment

Say you have twenty physical machines, each with 2 partitions. Issuing individual commands to each physical machine or partition may be quite a task. Fortunately, DB2 provides two main commands that will execute other commands in all machines or partitions:

rah:
Use this command when you want to execute another command for each physical machine. For example, if you want to create a directory for the twenty physical machines, issue the following:

rah ")mkdir /tmp/$USER"

The ')' character indicates that the user's profile and file named in $RAHENV is suppressed. There are several characters that can be used instead. For an explanation of these characters issue the command: rah ?

db2_all:
Use this command when you want to execute another command for each database partition. For example, if you want to change the db cfg parameter logfilsiz for database SAMPLE in each partition, issue the following:

db2_all ";db2 UPDATE DB CFG FOR sample USING LOGFILSIZ 500"

When the semicolon (;) character is placed inside double quotation marks, the request will run concurrently on all partitions. The characters that can be used in the db2_all command are the same as the ones for the rah command.

For either rah or db2_all, the command executed can be almost anything which you could type at an interactive prompt, including several commands to be run in sequence. On Linux and UNIX platforms, you separate multiple commands using a semicolon (;). On Windows, you separate multiple commands using an ampersand (&). Do not use the separator character following the last command.

Command and SQL Statement scope
The scope of a command (such as db2 get db cfg, db2 list tablespaces, etc. ) is limited to the partition the session is attached to.
The scope of a command preceded by db2_all is the complete list of partitions as obtained from db2nodes.cfg
The scope of an SQL statement is the set of all partitions in db2nodes.cfg (unless WHERE DBPARTITIONNUM= predicate is used)

Using database partition expressions
In a partitioned database, if more than one database partition resides on the same physical machine, the same device or path cannot be specified for such database partitions. You can manually specify a unique container for each database partition, or alternatively you can use database partition expressions in your command or statement to generate values based on the partition number found in the db2nodes.cfg file. The following example should clarify this concept: Let's say you issue the command:

CREATE TABLESPACE TS2 
       MANAGED BY DATABASE USING
       (file '/data/TS2/container $N+100' 5000)


On a four database partition system, the following containers would be created:

/data/TS2/container100 - on DATABASE PARTITION 0
/data/TS2/container101 - on DATABASE PARTITION 1
/data/TS2/container102 - on DATABASE PARTITION 2
/data/TS2/container103 - on DATABASE PARTITION 3

A database partition expression is specified by the argument “ $N” (note there is a blank before $N). Table 1 shows other arguments for creating containers. Operators are evaluated from left to right, and '%' represent the modulus (remainder of a division). Assuming the partition number in question is 3, the 'value' column in table 1 shows the result of resolving the database partition expression.

Table 1. Database Partition Expressions
Database Partition Expressions Example Value

[blank]$N

" $N"

3

[blank]$N+[number]

" $N+500"

503

[blank]$N%[number]

" $N%2"

1

[blank]$N+[number]%[number]

" $N+15%13"

5

[blank]$N%[number]+[number]

" $N%2+20"

21


The DB2NODE environment variable

In in part 1 of this article series we talked about the DB2INSTANCE environment variable used to switch between instances in your database system. The DB2NODE environment variable is used in a similar way, but to switch between partitions in your DPF system. By default, the active partition is the one defined with the logical port of zero (0) in the db2nodes.cfg file for a given server. In order to switch the active partition use the SET command in Windows, or export in Linux/UNIX. Make sure to terminate all connections from any partition to your database after changing this variable, otherwise the change will not take effect.

For example, using the db2nodes.cfg file shown in Figure 4, you have four servers, each with 2 logical partitions. If you log on to server myserverb, any commands you execute will affect partition 2, which is the one with logical port of zero, and thus the default active partition for that server machine. If you would like to switch to partition 0 to be the active partition make this change in a Linux system:

DB2NODE=0
export DB2NODE
db2 terminate

(Having no connections from any partitions is not enough, make sure a 'terminate' is issued).

Note that partition 0 is in server myservera. We can make this partition the active one even from myserverb. To determine which is your active node, you can issue this statement after connecting to a database:

db2 "values (current dbpartitionnum)"

Partitioning Maps and Partitioning keys

By now, you should have a good understanding about how to set up a DPF environment. It is now time to understand how the distribution of the rows of data across the different partitions is performed. Figure 7 shows an example of this distribution.

Figure 7 - Distributing data rows in a DPF environment
Distributing data rows in a DPF environment

A partitioning map is an internally generated array containing either 4,096 entries for multiple-partition database partition groups, or a single entry for single-partition database partition groups. The partition numbers of the database partition group are specified in a round-robin fashion.

A partitioning key is a column (or group of columns) that is used to determine the partition in which a particular row of data is stored. A partitioning key is defined on a table using the CREATE TABLE statement.

When a database partition group is created or modified, a partitioning map is associated with it. A partitioning map, in conjunction with a partitioning key and a hashing algorithm, is used by DB2 to determine which database partition in the database partition group will store a given row of data.

For the example in Figure 7, let's assume partition group 'pg0123' has been defined on partitions 0, 1, 2 and 3. An associated partitioning map is automatically created, that is, an array with 4096 entries containing the values 0,1,2,3,0,1,2,3... (partition numbers are stored in round robin fashion by default, though this can be changed). Let's also assume table 'mytable' has been created with a partitioning key consisting of columns col1, col2 and col3. For each row, the partitioning key column values are passed to the hashing algorithm which will return an output number from 0 to 4095. This number corresponds to one of the entries in the partitioning map array which contains the value of the partition number where the row is to be stored. In Figure 5, if the hashing algorithm had returned an output value of 7, then the row would have been stored in partition p3.


Case Study

Now that you are familiar with DPF, let's review all the concepts in this article with a simple case study.

Your company is expanding, and has recently acquired two other firms. Since the amount of data will be increased twofold approximately, you are wondering if your current single partition DB2 database server will be able to handle the load, or if DB2 with DPF is required. Since you are not too familiar with DB2 with DPF, you decide to play around with it using your test machines: Two SMP machines with 4 processors each running the Linux Operating System. The previous DBA already installed DB2 UDB ESE with DPF on these machines. He has left the company, but fortunately, he left the diagram shown in Figure 8 with his design.

Figure 8 - DB2 UDB ESE with DPF - A case study
DB2 UDB ESE with DPF - A case study

Figure 8 is a combination of physical and logical design. When you verify the correctness of the diagram with what your system has, you note the 'mydb1' database has been dropped, so you decide to rebuild this database as practice. The instance 'db2inst1' is still there as well as other databases. These are the steps you follow:

1. You open two telnet sessions, one for each SMP machine. From one of the sessions, you issue the db2stop followed by the db2start command as shown in Figure 9.

Figure 9 - db2stop and db2start
db2stop and db2start

The first thing you note is that for these two commands, there is no need to issue them from each partition. Issuing them from any partition once will affect all partitions. You also can tell there are four partitions, as you will receive a message from each of them.

2. Next, you would like to review the db2nodes.cfg file to review the configuration of your partitions. Using operating system commands, you determine the home directory for instance db2inst1, which for this example is /db2home/db2inst1. Thus, file db2nodes.cfg would be stored in directory /db2home/db2inst1/sqllib. The contents of the file are shown in Figure 10.

Figure 10 - db2nodes.cfg
db2nodes.cfg

Figure 10 shows there are four partitions, two per server. The server host names are 65658572 and 65658161.

3. Next, you would like to create database 'mydb1', but you would like to issue this command from partition 0 because you want this partition to be the catalog partition. So you issue the statement db2 "values (current dbpartitionnum)" to determine which partition is currently the active one from your session. Next you change the DB2NODE environment variable to zero (0).

Just before creating the database, you also check the value of dbm cfg parameter DFTDBPATH. This is the path where the database will be created by default. You want the path specified (/db2database in this example) to be created locally in all the partitions so that the data is spread across them. Using operating system commands, you confirm this is the case, so you go ahead and create the database. This is shown in Figure 11.

Figure 11 - Determining and switching the active partition. Then creating a database
Determining and switching the active partition. Then creating a database

4. To confirm partition 0 is indeed the catalog partition, simply issue a list db directory command and look for the 'Catalog database partition number' field under the entry for the 'mydb1' database. Alternatively issue a list tablespaces command from each partition. The SYSCATSPACE tablespace should be listed only on the Catalog partition.

5. Next, you need to create partition group pg23 on partitions 2 and 3. Figure 12 shows you how to accomplish this. In addition, the figure shows you how to list your partition groups. The IBMTEMPGROUP is never listed with this command.

Figure 12 - Creating partition group pg23
Creating partition group pg23

6. You now want to create and manage your bufferpools. To create bufferpool BP23 on partition group pg23 issue this command:

db2 “create bufferpool BP23 database partition group pg23 size 500”

Figure 13 shows this statement. It also shows you how associate this bufferpool to another partition group using the ALTER BUFFERPOOL statement.

To list your bufferpools and associated partition groups you can query the syscat.bufferpools view or sysibm.sysbufferpools catalog table.

Figure 13 - Managing bufferpools
Managing bufferpools

Note that a bufferpool can be associated to any partition group. Its definition will be applied to all the partitions in the partition group, and you can specify different sizes on the partitions if required.

7. To create tablespace mytbls1, issue the following:

db2 "create tablespace mytbls1 in database partition group pg23 
         managed by system using ('/data') bufferpool bp23"

8. To create table table1 in tablespace mytbls1 and with a partitioning key of col1 and col2, issue the following:

db2 "create table table1 (col1 int, col2 int, col3 char(10)) 
            in mytbls1  
            partitioning key (col1, col2)"

9. Creating the index index1 has nothing specific to a DPF environment:

db2 "create index index1 on table1 (col1, col2)"

The index will be constructed on each partition for its subset of rows.

10. Next, let's say you would like to update parameter INTRA_PARALLEL from NO to YES using the command update dbm cfg using INTRA_PARALLEL YES as shown in Figure 14. Note as well that the get dbm cfg is also issued, but from the other telnet session. The update version of the value of this parameter is shown from any partition, this shows that you are updating one shared file.

Figure 14 - Update dbm cfg
Update dbm cfg

11. Next, you would like to test the db2_all command to update the db cfg file for all partitions with one command. Figure 15 shows an example of this. The second window in the figure shows the get db cfg before and after the execution of the db2_all command on a partition other than the one where the db2_all command was executed.

Figure 15 - Using db2_all to update the db cfg
Using db2_all to update the db cfg

12. The last thing you would like to test is the rah command. Figure 16 shows an example of using rah to create a subdirectory

Figure 16 - Using rah
Using rah

As you can see, the rah command works per machine, while the db2_all command works per database partition.

And you're done!


Acknowledgements

Special thanks to Bill Wilkins (IBM Content Management Partner Enablement consultant) and Scott Martin (IBM Innovation Center for Business Partners consultant) for their technical review of this article.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=14583
ArticleTitle=A colorful introduction to DB2 UDB, Version 8 for UNIX, Linux, and Windows: Overview of the Database Partitioning Feature
publish-date=03182004