There has never been a better time to consider switching over to DB2. With the recent announcement of DB2 Universal Database™ (UDB) Express-C Edition V8.2, a fully functional, free version of DB2 for Linux®, Unix®, and Windows® platforms (32-bit and 64-bit), more and more developers will be making the switch to a proven data server with state-of-the-art technology and a growing developer community base. DB2 Express-C is based on the same core technology as the DB2 UDB Express Edition V8.2 product. As such, it supports a wide variety of programming languages to build and deploy applications, including C/C++, Java™, .NET, PHP, and more.
Learning a new product or system can sometimes be a complex and time-consuming endeavor. The goal of this article is to eliminate the steep learning curve and get database administrators (DBAs) and developers who are already familiar with PostgreSQL V8.1 up to speed with DB2's architecture and tools in little time. Get a basic understanding of the DB2 architecture and how it differs from PostgreSQL. Learn about the different tools and utilities that are available in DB2 and which ones are analogous to the ones you are already familiar with in PostgreSQL. Then explore all of the additional, easy-to-use features and tools that DB2 has to offer, and see for yourself why it is worth making the switch.
Figure 1. PostgreSQL V8.1 Architecture
Figure 2. DB2 V8.2 Architecture
There are some similarities and differences in architecture. Both PostgreSQL
and DB2 are based on a client-server architecture, where the client tier is
independent of the server. In both cases, a client application wishing to
use a database makes a request over the network and is served by a process
on the server (
postmaster on PostgreSQL,
db2agent on DB2). PostgreSQL uses a simple
process-per-user client/server model in which one client process connects
to exactly one server process. The main process, called
at a specified TCP/IP port for incoming connections and spawns a new server
postgres every time a connection
is requested. The
postgres processes communicate
with each other using semaphores and shared memory to ensure data integrity
during concurrent data access. Database files are accessed through shared
In DB2, coordinator agents (
coordinate the work on behalf of an application and communicate to other
agents, using inter-process communication (IPC) or remote communication
protocols. All connection requests from client applications, whether
they are local or remote, are allocated a corresponding coordinator
agent. When intra-partition parallelism is enabled, extra worker agents,
called subagents (
db2agentp) are spawned by
the coordinator agent to help perform the work in parallel. In DB2, a
database can be partitioned across multiple logical nodes, transparent
to the application. These nodes may be spread over one or more server
machines. DB2 accesses database files through buffer pools. It has
the ability to fetch data into the buffer pools and move data out of the
buffer pool asynchronously using prefetcher
db2pfchr) and page cleaner
db2pclnr) processes. The log writer process
db2loggw) flushes log records from the log
buffer to the log files on disk, while the log reader process
db2loggr) reads the database log files during
transaction processing (in other words, rollback), restart recovery, and roll-forward
operations. For more information about the DB2 process model, refer to
the article "Everything you wanted to know about DB2 Universal Database processes"
Before continuing to compare the fundamental architecture differences, let's quickly look at the main administration tools used in PostgreSQL and DB2 since this article refers to these tools in subsequent sections.
PostgreSQL's command line interface is the
psql is a terminal-based front end to
PostgreSQL that enables you to type in queries interactively, issue
them to PostgreSQL, and see the query results. Alternatively, input can
come from a file. The
psql interface is shown
in Listing 1.
Listing 1. PostgreSQL's psql command line interface
$ psql mydb Welcome to psql 8.1.3, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit mydb=# mydb=# SELECT version(); version ---------------------------------------------------------------- PostgreSQL 8.1.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2) (1 row) mydb=# SELECT current_date; date ------------ 2006-03-09 (1 row) mydb=#
There are no graphical tools that ship with the PostgreSQL distribution, although two popular open source ones, PGAdmin III and PhpPgAdmin, are recommended on the PostgreSQL Web site. These tools can connect to PostgreSQL databases and help perform common tasks.
DB2 ships with several tools you can use to administer a database. The
Command Line Processor (CLP) is the main tool to administer DB2 from the
command line. To launch this tool from Windows, navigate to it through
the Windows start menu (Programs > IBM DB2 >
Command Line Tools > Command Line Processor), or enter
db2 from a DB2 Command Window. On UNIX and Linux
platforms, log in as the instance owner (described shortly) and enter
db2 from a command prompt. The CLP can be run in
interactive mode, as shown in Listing 2. It can also
be run in non-interactive mode, where commands are prefixed with
db2 or come from an input file script. For more
information about working with the CLP, refer to the article "Getting to know the DB2 UDB command line processor" (developerWorks, 2005).
Listing 2. The Command Line Processor in interactive mode
# db2 (c) Copyright IBM Corporation 1993,2002 Command Line Processor for DB2 SDK 8.2.3 You can issue database manager commands and SQL statements from the command prompt. For example: db2 => connect to sample db2 => bind sample.bnd For general help, type: ?. For command help, type: ? command, where command can be the first few keywords of a database manager command. For example: ? CATALOG DATABASE for help on the CATALOG DATABASE command ? CATALOG for help on all of the CATALOG commands. To exit db2 interactive mode, type QUIT at the command prompt. Outside interactive mode, all commands must be prefixed with 'db2'. To list the current command option settings, type LIST COMMAND OPTIONS. For more detailed help, refer to the Online Reference Manual. db2 => connect to sample Database Connection Information Database server = DB2/LINUX 8.2.3 SQL authorization ID = DB2INST1 Local database alias = SAMPLE db2 => values current_date 1 ---------- 03/09/2006 1 record(s) selected. db2 =>
The Control Center, shown in Figure 3, is the
central graphical administration tool in DB2. From the Control Center,
you can perform tasks such as creating a database, backing up a
database, viewing query execution plans, viewing database objects
such as tables, and configuring database settings. To launch the Control
Center on Windows, navigate to it through the Windows start menu
(Programs > IBM DB2 > General Administration Tools
> Control Center), or enter
from a command prompt. On UNIX and Linux, log in as the instance owner user
db2cc from a command prompt.
Figure 3. The Control Center
In PostgreSQL, you must first initialize a database storage area on disk
before you can create any databases. This storage area is known as a
database cluster. A database cluster is a collection of databases
that is managed by a single instance of a running database server (in other words, a
postmaster process). In file system terms, a
database cluster is a single directory under which all data is stored.
Typically, a special user account, called the
Postgres user account, is used to own a server
and its data. Many operations require that you be logged in as this user,
such as when starting the server.
After initialization, a database cluster contains a database called
postgres, which is a default database used
by utilities, users and third party applications. Another database
template1 is created and is
used as a template for creating new databases. Before users can access a
database, the server must be started. The server can be stopped by sending
different signals to the
DB2 has a somewhat similar concept of a database cluster called an
instance. An instance, also known as a database manager (DBM), is a
logical server environment needed to create and work with databases.
It is really just a collection of processes that provide access to
database(s). If you have more than one instance created, each
instance acts as an independent server environment, which can be
stopped and started independently. Like PostgreSQL, you can have
multiple instances active at the same time and have multiple
databases active within each instance. During a DB2 installation
on UNIX and Linux platforms, you have the option to create a
default instance, usually called
these platforms, a special user account is typically created to "own"
the instance and is known as the instance owning user. During installation
on Windows, an instance called
DB2 is always
created. By default, no databases are created in an instance during
installation -- you must explicitly create them. To view a list of
instances defined on your system, you can use the
db2ilist command, which can be found in the
instance subdirectory of the DB2 installation folder.
Listing 3 shows sample output of this command
Listing 3. Sample output of the db2ilist command
$ /opt/ibm/db2/V8.1/instance/db2ilist db2inst1 $
Alternatively, you can view defined instances in the Control Center. Expand the Instances folder in the left pane object tree, as shown in Figure 4.
Figure 4. Viewing defined instances in the Control Center
To start or stop a DB2 instance on UNIX and Linux, log in as or switch to the
instance owning user, and issue the
command. To stop the instance, use the
command. On Windows, you can use the same commands to start and stop the
instance. However, the user account you use to issue these commands must
be a member of the SYSADM group defined for that instance.
You can also start and stop instances from the Control Center. Right-click on the target instance in the left pane object tree and choose the Start or Stop option, respectively. In order to stop an instance, all connections to databases inside the instance must be severed.
New instances can be created with the
db2icrt command and dropped with the
For more information about the user accounts created when DB2 is installed, refer to the article "Understand how user and group accounts interact with DB2 UDB" (developerWorks, 2005).
In PostgreSQL, a database is a named collection of database objects.
Generally, every database object (table, function, and so on) belongs to one and
only one database. However, there are a few system catalogs (for example,
pg_database), that belong to an entire cluster
and are accessible from each database within the cluster. Databases are
physically separated, and access control is managed at the connection level.
Databases are created with the
command and destroyed with the
In DB2, a database is a collection of tables, including the data and related objects, such as indexes, views, triggers, functions, and stored procedures. Multiple databases can exist in the same instance; however, each database is fully independent of each other in the sense that each has its own system catalog tables (containing the database metadata), its own configuration file, and its own recovery log.
Databases are isolated from each other to the extent of objects and data. For example, an application cannot, by default, issue a query that would fetch data from two different databases. A special technology called federation would need to be set up to do that. Databases created inside the same instance do share the same set of instance processes.
There are several ways to create a database, but the most common one is to
CREATE DATABASE command, such as the
CREATE DATABASE MYDB ON /database/mydb
After issuing this command, a database called MYDB is created in the /database/mydb path using the default options. There are many other options you can specify with this command, so be sure to check the command syntax diagram in the DB2 documentation.
Another easy way to create a database is to use the Create Database wizard from the Control Center. This wizard is shown in Figure 5. To launch the wizard, expand the object tree in the left pane until you see the Databases folder. Right-click on the Databases folder and select the Create Database option, then the Standard or With Automatic Maintenance option. Follow the steps in the wizard to complete the database creation process.
Figure 5. The Create Database wizard
PostgreSQL has many configuration parameters that affect the behavior of the database system. These parameters can be set in several areas. The most common area is the postgresql.conf file, which is normally kept in the data directory. Listing 4 shows sample contents of this file:
Listing 4. Sample contents of the postgresql.conf file
# This is a comment log_connections = yes log_destination = 'syslog' search_path = '$user, public'
A second way to set these parameters is to give them as a command line
option to the
postmaster server, such as:
postmaster -c log_connections=yes -c log_destination='syslog'
Finally, a command line option can be applied to one particular session only.
The environment variable
PGOPTIONS can be used for this purpose on the
env PGOPTIONS='-c geqo=off' psql
DB2 must also be configured at various levels, as shown in Figure 6.
Figure 6. The different levels where DB2 can be configured
First, certain settings are configurable at the instance level, such as
communications and security. To view a list of all the instance level
configuration parameters, you can issue the
GET DB CFG command.
Listing 5 shows a snippet of the output from
Listing 5. Sample output from the GET DBM CFG command
db2 => get dbm cfg Database Manager Configuration Node type = Enterprise Server Edition with local and remote clients Database manager configuration release level = 0x0a00 CPU speed (millisec/instruction) (CPUSPEED) = 4.251098e-07 Communications bandwidth (MB/sec) (COMM_BANDWIDTH) = 1.000000e+02 Max number of concurrently active databases (NUMDB) = 8 Data Links support (DATALINKS) = NO Federated Database System Support (FEDERATED) = NO Transaction processor monitor name (TP_MON_NAME) = Default charge-back account (DFT_ACCOUNT_STR) = Java Development Kit installation path (JDK_PATH) = /opt/IBMJava2-142 Diagnostic error capture level (DIAGLEVEL) = 3 Notify Level (NOTIFYLEVEL) = 3 Diagnostic data directory path (DIAGPATH) = /home/db2inst1/sqllib/db2dump ..... .....
To update an instance parameter, use the
UPDATE DBM CFG command:
UPDATE DBM CFG USING <parameter_name> <value>
parameter_name> with the
parameter name and <
with the parameter value. Some parameters are dynamic, meaning that any
changes take effect immediately. Others require you to stop and restart the
instance in order for them to take effect.
Other settings are configurable at the database level, such as logging,
sort memory, and the SQL statement cache. To view a list of all the database
configuration parameters, issue the
GET DB CFG FOR
<db_name> command where
db_name> is the name of your database.
Listing 6 shows a snippet of the output from this
Listing 6. Sample output from the GET DB CFG command
db2 => get db cfg for sample Database Configuration for Database sample Database configuration release level = 0x0a00 Database release level = 0x0a00 Database territory = US Database code page = 1208 Database code set = UTF-8 Database country/region code = 1 Database collating sequence = BINARY Alternate collating sequence (ALT_COLLATE) = Database page size = 4096 Dynamic SQL Query management (DYN_QUERY_MGMT) = DISABLE Discovery support for this database (DISCOVER_DB) = ENABLE Default query optimization class (DFT_QUERYOPT) = 5 Degree of parallelism (DFT_DEGREE) = 1 Continue upon arithmetic exceptions (DFT_SQLMATHWARN) = NO Default refresh age (DFT_REFRESH_AGE) = 0 Default maintained table types for opt (DFT_MTTB_TYPES) = SYSTEM Number of frequent values retained (NUM_FREQVALUES) = 10 Number of quantiles retained (NUM_QUANTILES) = 20 Backup pending = NO Database is consistent = YES ..... .....
To update a parameter, use the
UPDATE DB CFG
UPDATE DB CFG FOR <db_name> USING <parameter_name> <value>
Substitute <db_name> with your database name, <parameter_name> with the parameter name, and <value> with the parameter value. Some parameters are dynamic. Others require that you deactivate the database (if it was explicitly activated), or wait for all the connections to the database to be disconnected before letting them reconnect.
Instance and database parameters can also be updated using the Control Center. Right-click on the target instance or database, depending on the parameter you want to update, and select the Configure Parameters option. This launches the Instance/Database Configuration dialog window, as shown in Figure 7.
Figure 7. Database Configuration dialog window
Finally, certain parameters are configurable at the DB2 registry level. Every DB2 server has a registry where you can set flags that activate new features as well as platform-specific and special optimizations. This registry area should not to be confused with the operating system's registry. The DB2 registry only exists in and applies to DB2.
To view the current registry settings, use the
db2set command. There are additional flags you
can add to this command to provide additional output. For example, the
-all flag shows all the registry variables that
are currently set. The
-lr flag lists every
available registry variable that can be set for the current platform.
Listing 7 shows the sample output of the
Listing 7. Sample output from the db2set command
# db2set -all [i] DB2COMM=TCPIP [g] DB2SYSTEM=localhost.localdomain [g] DB2ADMINSERVER=dasusr1
To set or change one of the registry parameters, use the
Ensure that you do not leave a space on either side of the equals sign (=) or
else your change will not take effect. Registry variables can be set and
applied at various levels. You can set them globally so that they apply
to all instances. They can also be set for a specific instance.
Alternatively, they can be set at the environment level, which means that
they are set outside of DB2 using the
command in Windows or the
EXPORT command in
UNIX and Linux. With very few exceptions, the instance must be restarted
before any changes you make take effect.
A utility called the Configuration Advisor can also be used to recommend
an initial baseline configuration of your environment. This utility can
be run from the command line using the
AUTOCONFIGURE command or can be run as a wizard
from the Control Center. To launch it from the Control Center, right-click
on the target database and select the Configuration Advisor item.
PostgreSQL uses system shared memory as a buffer. The
shared_buffers configuration parameter in the
postgresql.conf file sets the number of shared memory buffers used by the
database server. The default is typically 1000, but may be less if the
kernel settings do not support it. Each buffer is 8192 bytes, unless a
different value of
BLCKSZ was chosen when
building the server.
DB2 uses database objects called buffer pools as a cache for table and
index data. Buffer pools are created at a database level and are not shared
by multiple databases. When you create a buffer pool, you must specify the
page size you want to use and the number of pages to allocate.
DB2 supports page sizes of 4K, 8K, 16K, and 32K. Depending on your workload
complexity, you may decide to only use one large buffer pool. Or you may
want to use multiple buffer pools, each of different page sizes. To create
a buffer pool, use the
CREATE BUFFERPOOL statement. For example:
CREATE BUFFERPOOL bp16k SIZE 1000 PAGESIZE 16K
A buffer pool can also be created from the Control Center. Expand the target database, then right-click the Buffer Pools folder, and select the Create option. This launches the Create Buffer Pool dialog window, shown in Figure 8.
Figure 8. The Create Buffer Pool dialog window
By default, when a new database is created, a buffer pool with a 4K page size called IBMDEFAULTBP is created. This buffer pool is usually insufficient in size, so you will want to increase its size or create other larger ones.
Notice the Show Command button at the bottom of the window in Figure 8. Most Control Center dialog windows let you view the SQL statement or command that is actually executed in the background. These commands can be saved in scripts for execution at a later time. This is also a great way to learn DB2 syntax.
Table spaces in PostgreSQL allow database administrators to define locations in the file system where the files representing database objects can be stored. Once created, a table space can be referred to by name when creating database objects. By using table spaces, an administrator can control the disk layout of a PostgreSQL installation. Tables, indexes, and entire databases can be assigned to particular table spaces. To do so, a user with the CREATE privilege on a given table space must pass the table space name as a parameter to the relevant command.
In DB2, a table space is a database object that is used as an abstraction layer between tables (and indexes) and the physical disks where their files are stored. When you create a table space, you can associate it with a specific buffer pool as well as to specific containers. A container is a directory, file, or raw device. This architecture gives you a lot of flexibility in managing performance. For example, if you have a "hot" table, you can place it in its own table space and give it its own buffer pool. This helps ensure the data of this table is always cached in memory.
Table spaces can be defined as system-managed spaces (SMS) or database-managed spaces (DMS). SMS table spaces are managed by the operating system and grow as required. They provide decent performance with minimum administration. DMS table spaces are pre-allocated and managed by DB2. This type of table space typically outperforms SMS, but more requires administration overhead. DMS table spaces are also more flexible in the sense that they allow you to split up a table's data, indexes, and large objects into separate table spaces.
To create a table space, use the
CREATE TABLESPACE command, such as:
CREATE TABLESPACE USER_16K
MANAGED BY SYSTEM
USING ('d:\usp3_cont1', 'e:\usp3_cont2', 'f:\usp3_cont3')
This command creates a table space using a page size of 16K and associates it with the bp16k buffer pool. The table space is spread across three containers that each reside on their own drive.
Table spaces can also be created using the Control Center. Expand the target database, right-click the Table Spaces folder, and select the Create option. This launches the Create Table Space wizard, shown in Figure 9.
Figure 9. The Create Table Space wizard
When a new database is created with the default options, three table spaces are created, as described in Table 1.
Table 1. Table spaces created by default when a database is created
|Table space name||Description|
|SYSCATSPACE||Catalog table space containing the system catalog tables.|
|TEMPSPACE1||System temporary table space used as a work area by DB2 to perform operations such as joins and sorts.|
|USERSPACE1||This table space stores user tables when a table space is not explicitly indicated at table creation time.|
You can use these default table spaces or create your own. Additionally, if you want to use declared global temporary tables, you need to create a user temporary table space before creating the temporary table. This type of table space is not created by default when a new database is created.
Schemas are similar in PostgreSQL and DB2. Every database contains one
or more named schemas, which in turn contain tables and other database
objects. Schemas can be created using the
CREATE SCHEMA command. For example:
CREATE SCHEMA myschema
In addition, DB2 allows schemas to be created "on the fly" if the one specified does not exist. Only users who have been granted the IMPLICIT_SCHEMA database authority can do this.
To drop a schema, the
DROP SCHEMA command is
used. For example:
DROP SCHEMA myschema
The major difference between PostgreSQL and DB2 is what happens when a schema is not specified. In PostgreSQL, when you create an object without specifying a schema name, it is automatically put into a schema named public. Every new database contains this schema. In other words, the following statements are equivalent in PostgreSQL:
CREATE TABLE products ( ... )
CREATE TABLE public.products ( ... )
When a schema is not specified, the system determines which object is
intended by following a search path, which is a list of schemas
to look in. The first matching object residing in a schema listed in the
search path is taken to be the one wanted. If there is no match in the
search path, an error is reported, even if matching object names exist
in other schemas in the database. The first schema named in the search
path is called the current schema. By default, it is the
public schema. Aside from being the first schema searched, it is
also the schema in which new objects will be created if the
CREATE <object> command does not specify
a schema name. This is why new objects are created in the public
schema, since it is listed first in the search path by default. It is
possible to change this path using the
statement, such as:
SET search_path TO myschema,public;
Once the target schema is in the path, you can access database objects in the schema without qualifying them. The search path works in the same way for other objects, such as data type names, function names, and operator names.
In DB2, the current schema, by default, is the same as the user ID that is used to connect to the database. Thus, if a user called ACCTREP connects to the database, the default schema that is used for unqualified database objects is ACCTREP. All qualified database object references use the schema that was explicitly specified.
To change the default schema, use the
statement. For example, the statement:
SET SCHEMA JIM
changes the default schema to JIM for the duration of connection, unless it is issued again with a different value. In DB2, there is no concept of schema path, as in PostgreSQL. There can only be one value specified for the current schema. Stored procedures and user-defined functions, on the other hand, do have a function path that works similar to the schema search path in PostgreSQL.
Tables, views, and indexes are similar in PostgreSQL and DB2. In DB2, before creating any type of database object, you must first connect to the database you want to create it in. You can create these objects using command syntax or using wizards in the Control Center. Figure 10 shows the Create Table wizard. To launch the Create Table wizard, right-click the Tables folder under the target database and select the Create Table option.
Figure 10. The Create Table wizard
Although the syntax for creating objects is very similar, each database implements them differently. For example, in PostgreSQL, you can specify what type of index to create, such as a B-Tree, R-Tree, or Hash index. In DB2, you do not have this choice. However, DB2 automatically creates special types of indexes when needed, such as bitmap indices when executing certain types of queries, or block indexes, when working with multi-dimensional clustering (MDC) tables. DB2 also has special implementation of objects that PostgreSQL does not have, such as MDC tables and Materialized Query Tables.
The main purpose of this article is to describe the DB2 environment in comparison with the PostgreSQL environment. Database server logic, including stored procedures, triggers, and user-defined functions is a vast topic, and thus is only covered briefly.
PostgreSQL is extensible because its catalog tables can be modified by the user. Extension includes components such as types, user-defined operators, and user-defined functions.
PostgreSQL provides four kinds of functions:
- Query language functions (functions written in pure SQL)
- Procedural language functions (functions written in, for example, PL/pgSQL or PL/Tcl)
- Internal functions (functions written in C that have been statically linked into the PostgreSQL server)
- C-language functions (written in C, or a language that can be made compatible with C, and are compiled into dynamically loadable objects and loaded by the server on demand)
PostgreSQL also supports triggers. The trigger code is actually comprised of a trigger function, which is a special function containing logic to execute in the trigger. The trigger function must be created before the trigger itself. Trigger functions can be written in most of the available procedural languages, including PL/pgSQL, PL/Tcl, PL/Perl, and PL/Python.
DB2 supports stored procedures, user-defined functions, and stored procedures. Stored procedures can be written in C/C++, Java, OLE, COBOL, a Microsoft CLR-supported language, or DB2's SQL Procedural Language (SQL PL). When you create an SQL stored procedure, its procedural statements are converted into a native representation that is stored in the database catalogs, as is done with other SQL statements. When the procedure is called, the native representation is loaded from the catalogs and executed in the DB2 engine.
DB2 offers the ability to create customized user-defined functions (UDFs) using DB2's SQL PL language. They can also be written in an external language such as C or Java. UDFs can encapsulate frequently used logic to reduce the complexity of application development. DB2's procedural language for UDFs is actually a subset of DB2's PSM procedural language. As such, certain SQL PL elements that are supported in SQL stored procedures are not supported in UDFs.
Finally, DB2 supports triggers for INSERT, UPDATE, and DELETE operations. DB2 also supports triggers defined on views, which are known as INSTEAD OF triggers. It is important to note that DB2's procedural language for triggers is actually a subset of the DB2 PSM procedural language. As such, certain SQL PL elements that are supported in SQL stored procedures are not supported in triggers.
Both PostgreSQL and DB2 have authentication and authorization mechanisms. Authentication refers to the process of validating a user's credentials. Authorization refers to the process of ensuring a user has the appropriate database privileges to perform a given action.
PostgreSQL offers a number of different client authentication methods. The method used to authenticate a particular client connection can be selected on the basis of the client's host address, database, and user. Client authentication is controlled by a configuration file, which traditionally is named pg_hba.conf and is stored in the database cluster's data directory. The general format of the pg_hba.conf file is a set of records, one per line. Each record specifies a connection type, a client IP address range (if relevant for the connection type), a database name, a user name, and the authentication method to be used for connections matching these parameters. The first record with a matching connection type, client address, requested database, and user name is used to perform authentication. If no record matches, access is denied. Listing 8 shows a sample snippet of the pg_hba.conf file.
Listing 8. Sample contents of the pg_hba.conf file
# Allow any user on the local system to connect to any database under # any database user name using Unix-domain sockets (the default for # connections). # # TYPE DATABASE USER CIDR-ADDRESS METHOD local all all trust # The same using local loopback TCP/IP connections. # # TYPE DATABASE USER CIDR-ADDRESS METHOD host all all 127.0.0.1/32 trust # Allow a user from host 192.168.12.10 to connect to database # "postgres" if the user's password is correctly supplied. # # TYPE DATABASE USER CIDR-ADDRESS METHOD host postgres all 192.168.12.10/32 md5
Other authentication and connection settings are maintained in the postgresql.conf file.
In DB2, application or database user accounts do not exist within the database, but are rather managed by an external security facility, such as the operating system. Within the database, privileges on specific database objects can be assigned to user or group accounts defined in the external facility.
By default, authentication takes place on the DB2 server. There are a number of other authentication methods supported by DB2, including authentication performed on the client (allowing "single sign-on" access), a Kerberos security facility, or a user-defined Generic Security Service (GSS) plug-in. Additional authentication options include the ability to encrypt user names and passwords as well as data, as they travel across the network between client and server. You can update the value of the DB2 instance parameter called AUTHENTICATION to change the authentication settings for the entire instance. All databases inside the instance use that value.
PostgreSQL manages database access permissions using the concept of roles. A role can be thought of as either a database user or a group of database users, depending on how the role is set up. Roles can own database objects (for example, tables) and can assign privileges on those objects to other roles. Roles are conceptually separate from operating system users and are global across a database cluster installation and not per individual database.
In DB2, since there is no concept of a database or application user, it is
necessary to create any operating system users that your application uses
to connect to the database and then grant those users specific database
object privileges. To facilitate administration and management, you can
also create groups of users and grant specific database privileges to
those groups. Users belonging to those groups inherit the group's assigned
privileges. Privileges can be granted and revoked using the standard
statements. They can also be managed from the Control Center. A list of
the different privileges that can be granted and revoked can be found in the
DB2 documentation. Privileges are granted at the database level and only
apply to a specific database, not all of the databases inside the instance.
If you want the user to have the same privileges in multiple databases, you
must explicitly grant those privileges in each of the databases.
DB2 UDB also provides a hierarchy of system-level administrative authorities (SYSADM, SYSCTRL, SYSMAINT, SYSMON). These authorities allow users to perform a subset of administrative tasks such as creating/dropping databases and forcing users off the system. Authorities are assigned at the instance level by specifying the name of a group in the associated instance level parameter (SYSADM_GROUP, SYSCTRL_GROUP, SYSMAINT_GROUP, SYSMON_GROUP). Users belonging to that group inherit that authority for all databases in the instance.
For additional information about user accounts and privileges/authorities, refer to the articles " Understand how user and group accounts interact with DB2 UDB" (developerWorks, 2005) and " Understand how authorities and privileges are implemented in DB2 UDB" (developerWorks, 2006).
Unlike traditional database systems that use locks for concurrency control, PostgreSQL maintains data consistency by using a multiversion model (Multiversion Concurrency Control, MVCC). This means that while querying a database, each transaction sees a snapshot of data as it was some time ago, regardless of the current state of the underlying data. This protects the transaction from viewing inconsistent data that could be caused by other concurrent transaction updates on the same data rows.
The SQL standard describes four transaction isolation levels and the corresponding concurrency problems that may occur in each one. In PostgreSQL, you can request any of the four standard isolation levels, but internally, there are only two distinct isolation levels, which correspond to the Read Committed and Serializable level. When you select the level Read Uncommitted you really get Read Committed, and when you select Repeatable Read you really get Serializable, so the actual isolation level may be stricter than what you select.
Read Committed is the default isolation level in PostgreSQL. When a transaction runs on this isolation level, a SELECT query sees only data committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions. However, the SELECT statement does see the effects of previous updates executed within its own transaction, even though they are not yet committed. In effect, a SELECT query sees a snapshot of the database as of the instant that that query begins to run. Two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit changes during execution of the first SELECT.
DB2's concurrency model is radically different than PostgreSQL's. At any one time, there is only one copy or instance of a row. DB2 uses row and table locking to ensure data integrity and consistency. Whenever a row is requested for read or update, DB2 obtains a lock, except if the application requesting the row is using the Uncommitted Read isolation level, in which case no row locks are obtained for read operations. Update operations always require an exclusive lock. These locks are released when a transaction is committed or rolled back. The lock settings are configurable, including how much memory to devote to maintaining lock information (LOCKLIST database configuration parameter), how long an application should wait for a lock before timing out (LOCKTIMEOUT database configuration parameter), and the maximum percentage of lock memory one application can consume (MAXLOCKS database configuration parameter).
DB2 supports different levels of concurrency control to suit the needs of most applications. An application should use the level of isolation that provides the appropriate concurrency. The level of isolation is specified by each database connection, meaning different connections to the database can use different levels of isolation. In many cases, DB2's default setting of Cursor Stability is sufficient. DB2 also allows the isolation level to be specified at the SQL statement level, thereby overriding the database default isolation level. For example, suppose that an application needs to get a "rough" count of the number of rows in a table. Performance is of the utmost importance, and the Cursor Stability isolation level is required with the exception of this one SQL statement. You could use the following SQL statement to get this information:
SELECT COUNT(*) FROM tab1 WITH UR
Since it is acceptable to read dirty data in this case, choosing the Uncommitted Read isolation level for this statement will help improve performance because read locks are not obtained.
To increase the concurrency of the system, commit your transactions often, including read-only transactions. If possible, reschedule the applications that compete for access to the same table. Also, use the Uncommitted Read (UR) isolation level for transactions where read consistency is not an issue.
You may also want to enable three relatively new registry variables (DB2_EVALUNCOMMITTED, DB2_SKIPINSERTED, and DB2_SKIPDELETED) to help improve the concurrency of this system. These might especially be beneficial when converting from a MVCC system such as PostgreSQL. For more information about these registry variables, refer to the article "Lock avoidance in DB2 UDB V8" (developerWorks, 2005).
There are three approaches typically used to back up PostgreSQL databases:
SQL dump, file system backup, and on-line backup. The SQL-dump method
generates a text file with SQL commands that, when fed back to the server,
recreate the database in the same state as it was at the time of the
dump. The utility program called
used for this purpose. To backup an entire database cluster, the
pg_dumpall program is provided.
An alternative backup strategy is to directly copy the files that
PostgreSQL uses to store the data in the database using whatever method
you prefer for doing file system backups. Finally, you can also combine
a file-system-level backup with a copy of the write-ahead log files to
restore a database to a particular point-in-time.
In DB2, each database must be backed up separately using the
BACKUP command. This takes a binary copy of
the database state at the time of the backup that can later be used to
restore the database to the same state. DB2 supports three kinds of
backups: full, incremental, and delta. Full backups are a complete
snapshot of the current state of the database. Incremental backup
allows you to back up only index and data pages that have changed since
a previous backup. Delta backups are based on the data that has changed
since the last incremental or delta backup. In addition, DB2 supports
online and offline backups, giving you the flexibility to keep your
system running 24x7.
To take a backup, use the
BACKUP command. For example:
backup database sample to /dev1/backup
This command takes a full offline backup of the SAMPLE database and places the backup files in the /dev1/backup path.
Backups can also be taken using the Control Center. Right-click on the target database in the left pane object tree and select the Backup option. The Backup Wizard, shown in Figure 11, is displayed where you can customize the backup settings.
Figure 11. The Backup Wizard in Control Center
DB2 can also automate all of your backup needs if you choose. DB2 can keep track of when a backup should be taken and will then kick one off in a user-defined maintenance window if required.
For more information about the
refer to the article "The IBM DB2 Universal Database for Linux, UNIX, and Windows Backup
Utility" (developerWorks, 2005).
Backup images can be restored using the
command. For example:
RESTORE DATABASE SAMPLE
Alternatively, you can restore a database from the Control Center using the Restore Database wizard. To launch this wizard, right-click on the All Databases folder, then select the Create Database option, and then the From Backup option.
A database restore operation requires an exclusive connection, and the
RESTORE utility prevents other applications
from accessing the database until the restore operation completes
successfully. A table space restore operation, however, can be
done online. Refer to the
DB2 documentation for more information about the
In addition to the
BACKUP utility, DB2 has
another utility called
db2look that can be
used to regenerate the database object definitions. They can
can then be used to recreate the same objects in a different
database. Note that this utility only generates object definitions
(DDL), not the data. You can also access this utility from the Control
Center. Right-click on an object (in other words, a table, database, and so on)
and select the Generate DDL option. For more information
db2look utility, refer to the
PostgreSQL has a couple of utilities to help maintain the health and
performance of the database. The
command must be run on a regular basis to recover disk space occupied
by updated or deleted rows and to update data statistics used by the
PostgreSQL query planner. Additionally, it is sometimes useful to
rebuild indexes periodically with the
In DB2, there are several utilities that can be used to maintain the health
and performance of the database. The first, and most important, is the
RUNSTATS utility. The
RUNSTATS utility collects statistical information
for tables and indexes, which is then used during query optimization to come
up with an optimal query access plan. It is critical to update statistics on
a regular basis as well as after loading or deleting a lot of rows in a
table or adding a new index. The
runstats command is used to collect statistics
for a table. For example, the following runstats command collects detailed
statistics for the EMPLOYEE table and all of its indexes:
runstats on table tedwas.employee with distribution indexes all
This utility is also available from the Control Center. Right-click on a table in the Tables folder under the database of interest and select the Run Statistics option. The Run Statistics dialog window appears, as shown in Figure 12, where you can define the options to use during statistics collection.
Figure 12. The runstats dialog window
While you can collect statistics anytime you like, DB2 can also automate
this process for you. DB2 can keep track of when statistics should be
updated and will update them in a user-defined maintenance window if
required. To find out more about this capability and the
runstats command in general, refer to the
RUNSTATS in DB2 UDB, Version 8.2" (developerWorks, 2005).
Another important maintenance utility in DB2 is the
REORG utility. The
utility helps arrange the data in tables and indexes more efficiently. For
tables, it reconstructs the rows to eliminate fragmented data and compacts
information. For indexes, it reorganizes all indexes defined on a table by
rebuilding the index data into un-fragmented, physically contiguous pages.
For example, the following command reorganizes the EMPLOYEE table:
REORG TABLE tedwas.employee
REORG utility can also be accessed from the
Control Center. Right-click on a table in the Tables folder under the
target database and select the Reorganize option (or the Reorganize
Indexes option to reorganize indexes). The Reorganize Table dialog window
appears, as shown in Figure 13, where you can define
the options to use for the reorganization.
Figure 13. The Reorganize Table dialog window
A special utility called
REORGCHK can be used to
determine if a table or index needs to be re-organized. For example, the
output (not shown) from the following command can help determine if the
EMPLOYEE table needs to be reorganized:
REORGCHK CURRENT STATISTICS ON TABLE tedwas.employee
PostgreSQL's statistics collector is a subsystem that supports collection
and reporting of information about server activity. Several predefined
pg_stat_database, etc.) are available to
query and display the results of statistics collection. Alternatively,
you can build custom views using the underlying statistics functions.
In DB2, monitoring can be performed at various levels. You can
use the snapshot monitor to capture information about the database and
any connected applications at a specific time. Snapshots are useful for
determining the status of a database system at a point-in-time. Taken
at regular intervals, they are also useful for observing trends and
foreseeing potential problems. Before taking snapshots, you might
want to set a few switches to capture more detailed information if
the basic information that is captured is not enough. Refer to the
DB2 documentation for setting these switches. When you are ready to
take the snapshot, you can issue the
command. Listing 9 shows partial output of the
GET SNAPSHOT FOR ALL DATABASES command.
Listing 9. Output from the GET SNAPSHOT FOR ALL DATABASES command
Database Snapshot Database name = SAMPLE Database path = /home/db2inst1/db2inst1/NODE0000/SQL00001/ Input database alias = Database status = Active Catalog database partition number = 0 Catalog network node name = Operating system running at database server= LINUX Location of the database = Local First database connect timestamp = 03/09/2006 11:03:58.417363 Last reset timestamp = Last backup timestamp = Snapshot timestamp = 03/09/2006 11:41:55.957502 Number of automatic storage paths = 0 High water mark for connections = 5 Application connects = 47 Secondary connects total = 1 Applications connected currently = 2 Appls. executing in db manager currently = 0 Agents associated with applications = 2 Maximum agents associated with applications= 5 Maximum coordinating agents = 5 Locks held currently = 0 Lock waits = 0 Time database waited on locks (ms) = Not Collected Lock list memory in use (Bytes) = 1920 Deadlocks detected = 0 Lock escalations = 0 Exclusive lock escalations = 0 Agents currently waiting on locks = 0 Lock Timeouts = 0 Number of indoubt transactions = 0 Total Private Sort heap allocated = 0 Total Shared Sort heap allocated = 0 Shared Sort heap high water mark = 0 Total sorts = 20 Total sort time (ms) = Not Collected Sort overflows = 1 Active sorts = 0 ... ...
You can also retrieve snapshot information using SQL, using the corresponding
SNAPSHOT_xxxx() table functions. A list of these
functions can be found in the
Event monitors are used to collect information about the database and any connected applications when specific events occur. These events represent transitions in database activity such as a deadlock situation or when an application connects. This type of information would be hard to capture using the snapshot monitor as you would have to take a snapshot during the exact window the event occurred. You can define an event monitor by the type of event or events you want it to monitor. For example, a deadlock event monitor waits for a deadlock to occur; when one does, it collects information about the applications involved and the locks in contention.
You can create an event monitor using the command line syntax or the Control Center. For example, the following statements create and activate an event monitor for application connections and writes its output to a file in a path called /tmp/conevents:
CREATE EVENT MONITOR conmon FOR CONNECTIONS WRITE TO FILE '/tmp/conevents'
SET EVENT MONITOR conmon STATE 1
When the monitoring period has finished, the event monitor buffer can be flushed and the monitor can be stopped using the following statements:
FLUSH EVENT MONITOR conmon
SET EVENT MONITOR conmon STATE 0
The raw event monitor data can then be formatted using the
db2evmon -db sample -evm conmon
Listing 10 shows a snippet of the formatted event monitor output.
Listing 10. A snippet of formatted event monitor output for connection events
-------------------------------------------------------------------------- EVENT LOG HEADER Event Monitor name: CONMON Server Product ID: SQL08024 Version of event monitor data: 7 Byte order: LITTLE ENDIAN Number of nodes in db2 instance: 1 Codepage of database: 1208 Territory code of database: 1 Server instance name: db2inst1 -------------------------------------------------------------------------- -------------------------------------------------------------------------- Database Name: SAMPLE Database Path: /home/db2inst1/db2inst1/NODE0000/SQL00001/ First connection timestamp: 03/09/2006 12:00:22.473810 Event Monitor Start time: 03/09/2006 12:02:39.125790 -------------------------------------------------------------------------- 3) Connection Header Event ... Appl Handle: 9 Appl Id: *LOCAL.db2inst1.060309170022 Appl Seq number: 0007 DRDA AS Correlation Token: *LOCAL.db2inst1.060309170022 Program Name : db2bp Authorization Id: DB2INST1 Execution Id : db2inst1 Codepage Id: 1208 Territory code: 1 Client Process Id: 16627 Client Database Alias: SAMPLE Client Product Id: SQL08024 Client Platform: Unknown Client Communication Protocol: Local Client Network Name: localhost.localdomai Connect timestamp: 03/09/2006 12:00:22.473810 4) Connection Header Event ... Appl Handle: 11 Appl Id: *LOCAL.db2inst1.060309170318 Appl Seq number: 0001 DRDA AS Correlation Token: *LOCAL.db2inst1.060309170318 Program Name : DB2HMON Authorization Id: DB2INST1 Execution Id : db2inst1 Codepage Id: 1208 Territory code: 0 Client Process Id: 16323 Client Database Alias: SAMPLE Client Product Id: SQL08024 Client Platform: Unknown Client Communication Protocol: Local Client Network Name: localhost.localdomain Connect timestamp: 03/09/2006 12:03:17.225823 ... ...
To create an event monitor from Control Center, right-click the Event Monitors folder under the database of interest and select the Create option. The Create Event Monitor dialog window, shown in Figure 14, is displayed where you can configure the event monitor's properties.
Figure 14. Creating an event monitor in Control Center
To start or stop the monitor, right-click on the new monitor created in the Event Monitors folder, and select the Start Event Monitoring or Stop Event Monitoring options, respectively. To view the monitor output, right-click again on the new monitor you created and choose the Analyze Event Monitor Records option. Figure 15 shows the Event Analyzer dialog window.
Figure 15. Viewing event monitor analysis using the Event Analyzer
Event monitors can also be configured to log their output into a pipe or tables.
PostgreSQL contains several utilities and commands that are used to perform maintenance on databases. Table 2 summarizes them and provides the corresponding utilities or commands in DB2, if they exist.
Table 2. A mapping of PostgreSQL's commands and utilities to DB2
|PostgreSQL COMMAND/UTILITY||DESCRIPTION||DB2 EQUIVALENT|
||A utility for re-clustering tables in a database. Re-clusters them again on the same index that was last used.||
||A wrapper around the ||
||Prints configuration parameters of the currently installed version of PostgreSQL||
||A utility for backing up a PostgreSQL database. Dumps can be output in script or archive file formats.||Either the |
||A utility for writing out ("dumping") all PostgreSQL databases of a cluster into one script file. The script file contains SQL commands that can be used as input to psql to restore the databases.||No equivalent in DB2. Each database in an instance must be backed up
separately using the |
||A utility for restoring a PostgreSQL database from an archive created
||A terminal-based front-end to PostgreSQL. Enables queries to be issued interactively and displays the query results. Alternatively, input can be from a file.||
||A wrapper around the ||
||A wrapper around the ||
||Creates a new PostgreSQL database cluster. A database cluster is a collection of databases that are managed by a single server instance.||
||Prints information initialized during
||A utility for starting, stopping, or restarting the PostgreSQL
backend server (||
||Clears the write-ahead log (WAL) and optionally resets some other control information stored in the pg_control file. This function is sometimes needed if these files have become corrupted.||
||The actual PostgreSQL server process that processes queries. It is
normally not called directly; instead a
||The PostgreSQL multi-user database server. In order for a client
application to access a database it connects (over a network or
locally) to a running postmaster. The postmaster then starts a
separate server process (||
DB2 also comes with many other tools, which were not discussed in this article. The Command Editor, shown in Figure 16, is a graphical version of the Command Line Processor for those who like to work in graphical environments.
Figure 16. The Command Editor
The Configuration Assistant, shown in Figure 17, can help you configure client connections to your remote databases by automatically detecting them over the network and configuring them for you. You can also manually configure connections using this tool.
Figure 17. The Configuration Assistant
The Task Center, shown in Figure 18, is used to schedule tasks, such as taking backups or running maintenance scripts.
Figure 18. The Task Center
Finally, the Development Center, shown in Figure 19 is a complete integrated development environment for building SQL and Java stored procedures and user-defined functions.
Figure 19. The Development Center
On Windows, all of these tools can be launched by navigating to them through the Windows Start menu. They can also be launched by clicking the corresponding toolbar button in Control Center, or by selecting them from the Control Center's Tools menu. We highly recommend exploring these tools to help you increase your productivity and reduce your learning curve.
In this article, we reviewed the main architectural and environmental differences between PostgreSQL and DB2. This included a review of important DB2 concepts such as instances, databases, buffer pools, and table spaces, as well as important tools including the Control Center, Command Line Processor, and various wizards.
In addition to this article, there are a variety of other resources available to help you. A new interactive forum has been introduced to help answer your questions about DB2 Express-C. It is staffed by DB2 experts from IBM and is becoming a very strong community for developers and administrators. The Porting Zone Web site offers extensive information about porting your existing database to DB2 and describes a practical approach along with best practices. Finally, the DB2 Migration Station also contains a variety of resources to help you make the switch to DB2.
DB2 Online Information Center:
The most recent DB2 online (and searchable) documentation.
"Everything you wanted to know about DB2 Universal Database processes" (developerWorks, 2003)
"Getting to know the DB2 UDB command line processor" (developerWorks, 2005)
"Understand how user and group accounts interact with DB2 UDB" (developerWorks, 2005)
"Understand how authorities and privileges are implemented in DB2 UDB" (developerWorks, 2006)
"Lock avoidance in DB2 UDB V8" (developerWorks, 2005)
"The IBM DB2 Universal Database for Linux, UNIX, and Windows Backup Utility" (developerWorks, 2004)
"RUNSTATS in DB2 UDB, Version 8.2" (developerWorks, 2004)
developerWorks Information Management zone: Find more resources
for DB2 UDB developers and administrators.
developerWorks DB2 Porting zone: Find
DB2 porting resources here.
developerWorks DB2 Migration Station zone:
Migrate to DB2 technology.
Get products and technologies
Download a free version of IBM DB2
Build your next development project with
trial software, available for download directly from developerWorks.
- Participate in the discussion forum.
Participate in The DB2 Express-C Forum
and get involved in the developerWorks community.
developerWorks blogs: Get involved in
the developerWorks community.
Ted J. Wasserman is a database consultant at the IBM Silicon Valley Laboratory in San Jose, California. Ted works on the DB2 Business Partner Technical Enablement team, where he specializes in helping IBM Business Partners migrate their applications and databases to DB2. Ted has a master's degree in computer science, as well as a bachelor's degree in computer science from Queen's University in Kingston, Ontario, Canada.