developerworks > My developerWorks >  Dashboard > DB2 wiki > ... > Migration > PostgreSQL migration

View Info

PostgreSQL migration

PostgreSQL migration

This section provides an overview of the migration considerations involved in moving from PostgreSQL to DB2 Express-C. There a few important areas you need to be aware of when migrating to DB2, including architectural differences, DDL and DML mapping, and database administration differences.
Generally, it is quite straightforward to migrate from PostgreSQL to DB2. Most of the PostgreSQL DDL elements can easily be ported to DB2 using a simple "search-and-replace" approach. One area that needs closer attention is data concurrency, since PostgreSQL is based on a multi-version concurrency control (MVCC) model. When porting to DB2, you should take extra care to set the DB2 locking parameters appropriately and choose the best isolation level.

Architecture Comparison

These are some of the differences in architecture between PostgreSQL and DB2:
Process handling
Both PostgreSQL and DB2 are based on a client-server architecture. A client application wishing to use a database makes a request over the network and is served by a process on the server, called "postmaster" on PostgreSQL and "db2agent" on DB2.
PostgreSQL uses a simple process-per-user client/server model in which one client process connects to exactly one server process. The main process, called "postmaster", listens at a specified TCP/IP port for incoming connections and spawns a new server process called "postgres" every time a connection is requested.
In DB2, coordinator agents (called "db2agent") coordinate the work on behalf of an application, and communicate to other agents using inter-process communication (IPC) or remote communication protocols. All connection requests from client applications, whether they are local or remote, are allocated a corresponding coordinator agent. When intra-partition parallelism is enabled, extra worker agents, called subagents are spawned by the coordinator agent to help perform the work in parallel.
Instances
In PostgreSQL, you must first initialize a database storage area on disk before you can create any databases. This storage area is known as a "database cluster". A database cluster is a collection of databases managed by a single instance of a running database server (i.e. a "postmaster" process). In filesystem terms, a database cluster is a single directory under which all data is stored.
DB2 has a somewhat similar concept called an "instance". An instance, also known as a database manager (DBM), is a logical server environment needed to create and work with databases. It is really just a collection of processes which provide access to database(s). If you have more than one instance created, each instance acts as an independent server environment which can be stopped and started independently. You can have multiple instances active at the same time, and have multiple databases active within each instance.
Databases
In PostgreSQL, a database is a named collection of database objects. Generally, every database object (table, function, etc.) belongs to one and only one database, but there are a few system catalogs (e.g. "pg_database"), that belong to an entire cluster and are accessible from each database within the cluster. Databases are physically separated and access control is managed at the connection level.
In DB2, a database is a collection of tables, including the data and related objects, such as indexes, views, triggers, functions, and stored procedures. Multiple databases can exist in the same instance; however, each database is fully independent of each other in the sense that each has its own system catalog tables (containing the database metadata), its own configuration file, and its own recovery log. Databases created inside the same instance share the same set of instance processes.
Schemas
Schemas are similar in PostgreSQL and DB2. Every database contains one or more named schemas, which in turn contain tables and other database objects. The major difference between PostgreSQL and DB2 is what happens when a schema is not specified. In PostgreSQL, when you create an object without specifying a schema name, it is automatically put into a schema named "public".
In DB2, the current schema, by default, is the same as the user ID that is used to connect to the database. In DB2, there is no concept of schema path, as in PostgreSQL. There can only be one value specified for the current schema.
Database objects
Tables, views, and indexes are similar in PostgreSQL and DB2. In DB2, before creating any type of database object, you must first connect to the database you want to create it in. You can create these objects using command syntax or using wizards in the Control Center.
Although the syntax for creating objects is very similar, each database implements them differently. For example, in PostgreSQL, you can specify what type of index to create, such as a B-Tree, R-Tree, or Hash index. In DB2, you do not have this choice. However, DB2 automatically creates special types of indexes when needed, such as bitmap indices when executing certain types of queries, or block indexes, when working with multi-dimensional clustering (MDC) tables. DB2 also has special implementation of objects that PostgreSQL does not have, such as MDC tables and Materialized Query Tables.
DML / DDL mapping
Since both PostgreSQL and DB2 follow the SQL standard, migrating DDL and DML statements should be fairly straightforward. Any proprietary PostgreSQL extensions or syntax will need to be removed. In addition, other syntax changes will be required for things such as data types and type casts.
Security
Both PostgreSQL and DB2 have authentication and authorization mechanisms. PostgreSQL offers a number of different client authentication methods. The method used to authenticate a particular client connection can be selected on the basis of the client's host address, database, and user. Client authentication is controlled by a configuration file, which traditionally is named pg_hba.conf and is stored in the database cluster's data directory.
In DB2, application or database user accounts do not exist within the database, but are rather managed by an external security facility such as the operating system. Within the database, privileges on specific database objects can be assigned to user or group accounts defined in the external facility. By default, authentication takes place on the DB2 server, although there are a number of other authentication methods supported by DB2.
PostgreSQL manages database access permissions using the concept of roles. A role can be thought of as either a database user, or a group of database users, depending on how the role is set up. Roles can own database objects (for example, tables) and can assign privileges on those objects to other roles. Roles are conceptually completely separate from operating system users and are global across a database cluster installation and not per individual database.
In DB2, since there is no concept of a database or application user, it is necessary to create any operating system users which your application uses to connect to the database and then grant those users specific database object privileges. To facilitate administration and management, you can also create groups of users and grant specific database privileges to those groups.
Concurrency
Unlike traditional database systems which use locks for concurrency control, PostgreSQL maintains data consistency through a multiversion model (Multiversion Concurrency Control, MVCC). This means that while querying a database, each transaction sees a snapshot of data as it was some time ago, regardless of the current state of the underlying data. This protects the transaction from viewing inconsistent data that could be caused by other concurrent transaction updates on the same data rows.
DB2's concurrency model is radically different than PostgreSQL's. At any one time, there is only one copy or instance of a row. DB2 uses row and table locking to ensure data integrity and consistency. Whenever a row is requested for read or update, DB2 obtains a lock, except if the application requesting the row is using the Uncommitted Read isolation level, in which case no row locks are obtained for read operations. Update operations always require an exclusive lock, which is released when the transaction completes.

Migration considerations

Here are several things to consider with respect to SQL differences between PostgreSQL and DB2:
Proprietary SQL (syntax, functions, implementation)
One particular PostgreSQL function that is a bit cumbersome to port to DB2 is the LIMIT/OFFSET clause in the SELECT statement, because there is no direct equivalent in DB2. The standard SQL 2003 approach to port this functionality is to use a sub-select with ROW_NUMBER OVER(ORDER BY column) AS number, column.
For example:SELECT * FROM (
SELECT ROW_NUMBER OVER (ORDER BY column) AS rownum, id, cost
FROM items
WHERE cost < 595
) AS myresults
WHERE rownum >= 10 AND rownum <= 20

Concurrency
DB2 supports different levels of concurrency control to suit the needs of most applications. An application should use the level of isolation that provides the appropriate concurrency. The level of isolation is specified by each database connection, meaning different connections to the database can use different levels of isolation. In many cases, DB2's default setting of Cursor Stability is sufficient. DB2 also allows the isolation level to be specified at the SQL statement level, thereby overriding the database default isolation level. For example, suppose that an application needs to get a "rough" count of the number of rows in table. Performance is of the utmost importance, and the Cursor Stability isolation level is required with the exception of this one SQL statement. You could use the following SQL statement to get this information:
SELECT COUNT FROM tab1 WITH UR
Since it is acceptable to read dirty data in this case, choosing the Uncommitted Read isolation level for this statement will help improve performance because read locks are not obtained.
To increase the concurrency of the system, commit your transactions often, including read-only transactions. If possible, reschedule the applications that compete for access to the same table. Also, use the Uncommitted Read (UR) isolation level for transactions where read consistency is not an issue.
You may also want to enable three relatively new registry variables (DB2_EVALUNCOMMITTED, DB2_SKIPINSERTED, and DB2_SKIPDELETED) to help improve the concurrency of this system. These might especially be beneficial when converting from a MVCC system such as PostgreSQL.

Security
In DB2, since there is no concept of a database or application user, it is necessary to create any operating system users required by your application to connect to the database and then grant those users specific database object privileges.
DB2 UDB also provides a hierarchy of system-level administrative authorities (SYSADM, SYSCTRL, SYSMAINT, SYSMON). These authorities allow users to perform a subset of administrative tasks such as creating/dropping databases and forcing users off the system. Authorities are assigned at the instance level by specifying the name of a group in the associated instance level parameter (SYSADM_GROUP, SYSCTRL_GROUP, SYSMAINT_GROUP, SYSMON_GROUP). Users belonging to that group inherit that authority for all databases in the instance.

Tools for migration

We are not aware of any prominent commercial migration tools that are available to help migrate a PostgreSQL database and data to DB2. However, since PostgreSQL complies with the same SQL standards as DB2, much of the object creation code (DDL) and SQL statements will work in DB2 with minor or possibly no changes. Any proprietary PostgreSQL extensions or syntax will need to be removed. Much of the conversion effort can be accomplished using a simple "search-and-replace" type of approach in a text editor.

Resources

This page provided a brief comparison between PostgreSQL and DB2, and described some migration considerations. For more information, consult these references:



Browse Space
- Pages
- Labels
- Attachments
- Mail
- News
- Advanced

Explore Confluence
- Popular Labels
- Notation Guide

Your Account
Log In

 

Other Features

View a printable version of the current page.

Add Content


Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.3.3 Build:#645 Feb 13, 2007)
Bug/feature request - Contact Administrators