Migrate from MySQL or PostgreSQL to DB2 Express-C

Move to DB2 in three easy steps

Easy to use, powerful and free! That's how MySQL and PostgreSQL are often described. But did you know that DB2® has editions with the same qualities and more? They are DB2 Express and Express-C - members of the DB2 product family that are specifically tailored to meet the needs of small to medium sized businesses. Express and Express-C are professional-grade yet easy-to-use databases that run on the Windows® and Linux® platforms. They are known for their simplified installation, graphical user interfaces, self-managing capabilities, and tools that make DB2 easy to use. And Express-C is free! So let's get started and find out how easy it is to migrate from MySQL/PostgreSQL to DB2.

Share:

Vikram S. Khatri, Certified Consulting I/T Specialist, EMC

Vikram Khatri works for IBM in the Sales and Distribution Division and is a member of DB2 Migration team. Vikram has 21 years of IT experience and specializes in migrating non-DB2 databases to DB2. Vikram supports the DB2 technical sales organization by assisting with complex database migration projects as well as with database performance benchmark testing.


developerWorks Contributing author
        level

Nora Sokolof (nsokolof@us.ibm.com), Certified Consulting IT Specialist, EMC

Nora Sokolof works for IBM in the Sales and Distribution Division and is a member of the DB2 Migration team. Nora has been with IBM for almost 20 years. Before joining the migration team, Nora has held roles as a database designer, database administrator and application developer . Nora has assisted hundreds of customers with their migrations to DB2 from Oracle, PeopleSoft, and Informix.



Manas Dadarkar (manas@us.ibm.com), Advisory Software Engineer, EMC

Manas Dadarkar is the technical leader of the Migration Toolkit (MTK) team. Manas' team develops tools that migrate non-DB2 databases to DB2.



15 June 2006

Also available in Chinese

IBM Data Movement Tool

The tool described in this article has been replaced with a new IBM Data Movement Tool. Please click this link to get the new tool .


Architectural overview and comparison

You'll find many differences between DB2, MySQL, and PostgreSQL. We'll start by looking at some of the fundamental architectural ways that these 3 database servers are different or similar.

  • Architectural model
    • MySQL database server uses a dedicated server thread-based architecture.
    • PostgreSQL and DB2 use a dedicated server process model architecture.
  • Storage engine
    • MySQL database uses pluggable storage engine architecture.
    • PostgreSQL and DB2 use a dedicated storage engine.
  • Table space model
    • MySQL uses table space for InnoDB and NDB storage engines.
    • PostgreSQL table spaces can span multiple databases.
    • DB2 table spaces are specific to a database.
  • Schema support
    • PostgreSQL and DB2 have true schema support.
    • MySQL does not have true schema support. A schema in MySQL can be thought as a MySQL database.
  • Case sensitivity for database object names
    • MySQL uses case-sensitive names for database, tables, and columns on Linux. PostgreSQL table names are case-insensitive on both UNIX and Windows and are stored in lower case for versions 8.1.3 and 8.1.0.
    • DB2 stores object names in upper case by default on all platforms. However, DB2 object names can be stored in mixed or lower case if placed within double quotes in DDL CREATE statements.
  • Array columns
    • MySQL and DB2 do not support array columns.
    • PostgreSQL does support array columns.
  • Authentication
    • DB2 performs authentication using various external security schemes such as the operating system, PAM, Kerberos, Active Directory, LDAP, and so on. It also allows third party security modules to be plugged-in.
    • PostgreSQL performs authentication in one of many different ways. It can use database users/roles, the operating system, PAM, Kerberos, and so on, based upon settings in its host based configuration (pg_hba.conf) file.
    • MySQL implements authentication at the database level and passwords are encrypted.
  • Instance architecture
    • DB2 instance manages different databases and there could be many DB2 instances on the same machine.
    • PostgreSQL concept of instance is similar to database clustering.
    • MySQL instance is similar to a DB2 database.

It is true that MySQL and PostgreSQL databases can post feature changes almost weekly, whereas feature implementation and changes in DB2 occur after a very careful planning as there are innumerable dependencies using DB2 products. Please note that we used MySQL 5.1, PostgreSQL 8.0.3, and DB2 V8.2 for comparison purposes at the time of writing, so take that into consideration when you read this article.

Figure 1, Figure 2, and Figure 3 are architectural diagrams of MySQL, PostgreSQL and DB2. We drew the architectural diagrams for MySQL and PostgreSQL to the best of our ability after reading through several documents. If you find a discrepancy, please let us know and we will correct it.

MySQL

MySQL uses a thread-based architecture, whereas PostgreSQL and DB2 are both based on a process-based architecture. A MySQL instance, as you see in Figure 1, can manage many databases. There is a common system catalog named INFORMATION_SCHEMA shared by all MySQL databases in an instance.

A database in DB2 is a separate entity by itself having its own system catalog, table spaces, buffer pools, and so on. The DB2 instance manages different databases, and they do not share table spaces, logging, system catalogs, or temporary table spaces.

A single instance of MySQL with many databases can be visualized as a single database in DB2 with each MySQL database represented as a schema in DB2. If you have several MySQL instances running on your server and each instance managing several databases, you can take one of the following migration paths:

  • Migrate each MySQL instance as a DB2 database under a single DB2 instance.
  • Migrate each MySQL instance as a DB2 instance with one DB2 database having a different schema for each MySQL database.

Note: When we refer to process-based architecture, we are referring to UNIX® platforms, as the same model on Windows will be a thread-based architecture. This holds true for both DB2 and PostgreSQL.

You can only access a single database resource within the scope of a DB2 connection whereas MySQL allows you to access multiple database resources within the scope of the same connection.

One of the interesting feature of MySQL is its pluggable storage engines. You can choose either MyISAM, InnoDB, Archive, Federated, Memory, Merge, Cluster, NDB or Custom storage engine. Each storage engine has different characteristics, and one can choose a particular storage engine based upon the specific requirements. For comparison purposes, we find InnoDB to be closest to a relational database.

Figure 1. MySQL architecture and processes overview
MySQL architecture and processes overview

A MySQL server process (mysqld) can create a number of threads:

  • A global thread (per server process) is responsible for creating and managing each user connection thread.
  • A thread is created to handle each new user connection.
  • Each connection thread also performs authentication and query execution.
  • On Windows, there is a named pipe handler thread that does the same work as the connection thread for named pipe connection requests.
  • A signal thread handles alarms and forces timeouts on connections that have been idle too long.
  • A thread is allocated to handle shutdown events.
  • There are threads for handling synchronization of master and slave servers for replication.
  • Threads are used for table flushing, maintenance tasks, and so on.

MySQL uses a data cache, record cache, key cache, table cache, hostname cache, and privilege cache for caching and retrieval of different types of data used by all the threads executing within the server process.

In addition, the MySQL main process (mysqld) has threads to handle database management activities such as backup, restore, concurrency control, and so on.

PostgreSQL

A PostgreSQL instance, shown in Figure 2, can manage a cluster of databases. Each database has its own set of system catalogs known as INFORMATION_SCHEMA and pg_catalog. All databases share pg_databases as a common system table. Each database is a separate entity and the collection of databases is called a cluster. A PostgreSQL instance can be thought as managing a cluster of databases. A server can run more than one such instance of database clusters.

A PostgreSQL database can be migrated logically to a DB2 database. Both the databases support the schema object type. You can not access other databases from a named connection.

The most significant difference between PostgreSQL and DB2 is related to table spaces. A PostgreSQL table space can span one or more database whereas DB2 table spaces are specific to a database.

Figure 2. PostgreSQL architecture and processes overview
PostgreSQL architecture and processes overview

A PostgreSQL session consists of several main processes:

  • A postmaster process serves as a supervisory process that spawns other processes and listens for user connections.
  • A user process such as psql is used for interactive SQL queries.
  • One or more server processes named postgres are spawned by postmaster to handle users' requests for data.
  • The server processes communicate with each other through semaphores and shared memory.

DB2

Figure 3 shows DB2's architecture. This figure explains how DB2 handles data between disks (files, raw devices, directories, and so on) using buffer pools. DB2 uses a connection concentrator to handle large number of connections. DB2 page cleaners and pre-fetchers do the work asynchronously and separate processes handle redo log activity. Please refer to the Resources for a detailed description of how locking and processes work in DB2.

Figure 3. DB2 architecture and processes overview
DB2 architecture and processes overview

A DB2 session consists of several processes:

What is different in MySQL, PostgreSQL and DB2?

  • MySQL uses a thread-based architecture model whereas PostgreSQL and DB2 use process-based architecture models.
  • DB2 can have multiple instances on a server. Each instance can have many databases in it. Each database is physically and logically separate from each other.
  • MySQL can have multiple instances of mysqld running on a server. Each instance can manage one or more MySQL databases. Each database in an instance is not physically or logically separate. Each database in MySQL can be thought as a SCHEMA in DB2.
  • A PostgreSQL cluster of databases can be thought of as equivalent to a DB2 instance.
  • db2sysc, the main DB2 system controller or engine process
  • Listener processes such as db2tcpcm and db2ipccm that listen for a user's request for connection
  • One or more agents that work on behalf of an application. Agents can be either of two types:
    • db2agent works on behalf of an application and communicates to other agents using inter-process communication or remote communication protocol.
    • db2agntp works to fulfill client requests on the database when intra-parallel is turned on.
  • A user's process such as db2 which is used for interactive querying from the command line
  • db2bp - a persistent background process for DB2 Command Line Processor (CLP)
  • db2disp - an agent dispatcher process to assign connections to available coordinating agents when connection concentrator is enabled
  • db2fmcd - a fault monitor coordinating daemon per server
  • db2fmd - a fault monitor daemon for each instance
  • db2resyn - a resync manager process to handle two-phase commit
  • db2dlock - a db2 dead lock detector
  • db2loggr - the database log reader
  • db2loggw - the database log writer
  • db2pclnr - the buffer pool page cleaner
  • db2pfchr - the buffer pool pre-fetcher
  • db2fmp - to run user code on the server outside db2 address space
  • ... and many more

DB2 server processes communicate with each other through memory areas known as Database Manager Memory and Database Shared Memory as shown in Figure 4.

Figure 4. DB2 database manager (instance) and database shared memory architecture
DB2 database manager (instance) and database shared memory architecture

Feature comparison

Table 1 compares MySQL, PostgreSQL and DB2 features. This is not an all inclusive list, but a comparison of the most commonly used features.

Table 1. MySQL and DB2 features comparison
FeatureMySQLPostgreSQLDB2
Instance An instance is started by executing the MySQL command (mysqld). An instance may manage one or more databases. A server can run multiple instances of mysqld. An instance manager can monitor various instances of mysqld. An instance is started by executing the Postmaster process (pg_ctl). An instance may manage one or more databases known as a cluster. A cluster is an area on the disk that is initialized during installation and consists of a single directory where all the data is stored. The installation creates the first database using initdb. An instance is a single installation of DB2 that manages one or more databases. A default instance is created during installation. An instance is started using the db2start command. Multiple instances on the same machine may also be created using the db2icrt command. Data storage is not allocated until the database itself is created. The database may manage storage by itself using raw devices or use the operating system file system. The environment variable DB2INSTANCE determines which instance you are connected to.
Database A named collection of objects and is a separate entity from other databases in the instance. All databases in MySQL in an instance share same system catalog. A named collection of objects and each database is a separate entity from other databases. Each database has its own system catalog but pg_databases is shared across all databases. A named collection of objects and is a separate entity from other databases. A database is a separate entity physically and logically with no sharing amongst other databases. A single DB2 instance can manage one or more databases.
Data Buffers This is set by the innodb_buffer_pool_size configuration parameter. This is the size in bytes of the memory buffer InnoDB uses to cache data and indexes of its tables. On a dedicated database server, this may set this to up to 80% of the machine physical memory size. Shared_buffers cache. There are 64 buffers allocated by default. The default block size is 8K. Buffer cache can be updated by setting the shared_buffers parameter in the postgresql.conf file. By default, one buffer pool is allocated and additional buffer pools may be added using the CREATE BUFFERPOOL command. The default page size is determined at database creation time and can be 4, 8, 16 or 32K.
Database Connection A client connects to the database using the CONNECT or the USE statement specifying the database name and optionally the userid and password. Users and groups of users are managed within the database using Roles. A client connects to the database using the connect statement specifying the database name and optionally the userid and password. Users and groups of users are managed within the database using roles. A client connects to the database using the connect statement specifying the database name and optionally the userid and password. Users and Groups are created outside of the database using OS commands.
Authentication MySQL manages authentication at the database level. PostgreSQL's authentication depends upon host based configuration. DB2 uses APIs for authenticating users through various implementations such as Kerberos, LDAP, Active Directory, and PAM at the OS level and has pluggable authentication architecture for third party modules.
Encryption You can specify password at table level to encrypt the data. You can also use AES_ENCRYPT and AES_DECRYPT functions for encryption and decryption of column data. Network encryption can be achieved through SSL connections. You can use functions from pgcrypto library to encrypt / decrypt columns. Network encryption can be achieved through SSL connections. You can use DB2 provided encryption and decryption methods to encrypt / decrypt the column data. The network communication between client and server can be encrypted if you choose DATA_ENCRYPT method of authentication at the instance level.
Auditing You can do grep on querylog. You can use PL/pgSQL triggers on tables to do auditing. DB2 provides db2audit utility to provide detailed auditing without having a need to implement trigger or log based auditing.
Query Explain Use EXPLAIN command to see the explain plan of a query. Use EXPLAIN command to see the explain plan of a query. DB2 provides GUI and command line tools to see explain plan of a query. It can also capture queries from SQL cache and produce the explain plan. You can use tools to see the explain plan of SQL's in all stored procedures.
Backup, Recovery and Logging InnoDB uses write-ahead logging. Supports full online and offline backups and crash and transaction recovery. Maintains a write-ahead log in a subdirectory of the data directory. Supports full online and offline backups and crash, point-in-time and transaction recovery. Uses write-ahead logging. Supports full, incremental, delta and tablespace level online / offline backup and recovery. Supports crash, point-in-time, and transaction recovery.
JDBC Driver JDBC drivers can be downloaded from ResourcesJDBC drivers can be downloaded from ResourcesSupports Type-2 and Type-4 (Universal) driver. JDBC drivers are part of db2 product.
Table Types Depends on the storage engine. For example, NDB storage engine supports partitioned tables and memory engine supports in-memory tables. Supports temporary, regular and partitioned tables of different types of range and list. Hash partitioning is not supported. Supports user, temporary, regular and partitioned tables of different types range, hash and multidimensional clusters.
Index Types Depends on the storage engine. MyISAM: BTREE, InnoDB: BTREE Supports B-tree, Hash, R-tree and Gist indexes. Supports B-tree and bitmap indexes.
Constraints Supports, primary, foreign, unique, not null constraint. Check constraints are parsed but not enforced. Supports, primary, foreign, unique, not null and check constraints. Supports primary, foreign, unique, not null and check constraints.
Stored Procedures and User-Defined Functions Supports both the CREATE PROCEDURE and CREATE FUNCTION statements. Stored procedures can be written in SQL, C++. User-Defined Functions can be written in SQL, C and C++. At times the term stored procedure is used, however, only a CREATE FUNCTION statement is supported. User-Defined Functions can be written in PL/pgSQL (proprietary procedural language) SQL, and C. Supports both the CREATE PROCEDURE and CREATE FUNCTION statements. Stored procedures can be written in SQL (SQL PL), C, Java, COBOL, and REXX. User-Defined Functions can be written in SQL (SQL PL), C, Java.
Triggers Supports before row triggers, after row and statement triggers with trigger statements written in procedure language compound statements. Supports before and after per row and per statement triggers and trigger procedures written in C and all other supported procedural languages.. Supports before row triggers, after row and statement triggers, instead of triggers and triggers with SQL PL compound statements. Stored procedures can be called from triggers.
System Configuration File my.conf Postgresql.conf Database Manager Configuration
Database Configuration my.conf Postgresql.conf Database Configuration
Client Connectivity File my.conf pg_hba.conf System Database Directory
Node Directory
XML support Limited XML support Limited XML support Extensive support for accessing XML data. DB2 Viper (V9) is the first hybrid database to store / retrieve XML in its native form.
Data Access and Administration Tools OPTIMIZE TABLE - reclaims unused space and defragment data file
myisamchk -analyze - updates statistics for use by query optimizer for the MyISAM storage engine
mysql - command line tool
MySQL Administrator - client GUI tool
Vacuum - reclaims unused space
Analyze - updates statistics for use by query optimizer
psql - command line tool
pgAdmin - client GUI tool
Reorg - used to recluster and defragment data
Runstat - collects statistics used by optimizer
CLP - command line processor
Control Center - client GUI tool
Concurrency control Supports table and row level locking. InnoDB Storage Engine supports READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, and SERIALIZABLE. The isolation level is set at the transaction level using the SET TRANSACTION ISOLATION LEVEL statement. Supports table and row level locking. The ANSI isolation levels supported are Read Committed (default - sees a snapshot of the database at the time the query started) and Serialization (like Repeatable Read - only sees committed results before the transaction started). The isolation level is set at the transaction level using the SET TRANSACTION statement. It is set at the session level using SET SESSION. Supports table and row level locking and 4 isolation levels, RR (repeatable read), RS (read stability), CS (default - cursor stability) and UR (uncommitted read). Isolation level is set at the session level using SET ISOLATION, set at the SQL statement level using the WITH clause or set at the database level using a Database Configuration parameter.

So far, we have seen some architectural and feature differences between MySQL, PostgreSQL and DB2 in this section. Let us now explore the data type differences in each of the database servers.


Data type comparisons between MySQL, PostgreSQL and DB2

The SQL ANSI standard lays out rules for data types used in relational database systems. However, not every database platform supports each and every data type defined by the standards committees. Furthermore, vendor data types of a given name may be implemented differently from what the standard may specify and even differently from all other database vendors. As a result, although many MySQL, PostgreSQL and DB2 data types are similar in either their name or meaning or both, many differences can be noted as well.

Table 2 lists the most commonly used DB2 data types. We provide the closest match to DB2 for MySQL and PostgreSQL data types in subsequent sections.

As far as DB2 SQL limits (such as length limits on constraint names, data types limits, and so on) are concerned, these limits are being taken away systematically with each new release.

Table 2. DB2 data types
Data type Remark
BIGINTStores signed or unsigned integers uses 8 bytes of storage
BLOB
BLOB(n)
Holds variable-length binary data up to 2 GB in length. Lengths over 1 GB are not logged
CHAR(n)
CHARACTER(n)
Holds fixed-length character data up to 254 bytes in length. Uses 'n' bytes of storage
CHAR(n) FOR BIT DATAHolds fixed-length binary values
CLOB
CLOB(n)
Holds variable-length character data up to 2 GB in length. Lengths over 1 GB are not logged
DATEHolds a calendar date without time of day. Uses 4 bytes of storage
DEC(p,s)
DECIMAL(p,s)
NUM(p,s)
NUMERIC(p,s)
Holds a precision (p) of 1 to 31 and a scale (s) of 0 to 31. Uses (p/2) +1 bytes of storage.
DOUBLE
DOUBLE PRECISION
FLOAT
Holds floating point numbers and uses 8 bytes of storage
FLOAT(p) Holds a precision (p) of 1 to 53. If p <= 24 then a synonym for REAL. If p >= 25 then is a synonym for DOUBLE PRECISION
GRAPHIC(n) Used for National Language Support (NLS) and fixed-length character strings (usually DBCS) up to 127 bytes in length. Uses n*2 bytes of storage for double-byte character sets or n bytes of storage for single-byte character sets.
INT
INTEGER
Stores signed or unsigned integers and uses 4 bytes of storage
REAL Holds floating point numbers and uses 4 bytes of storage
SMALLINT Holds signed and unsigned integers and uses 2 bytes of storage
TIME Holds the time of day and uses 3 bytes of storage
TIMESTAMP Stores the date (year, month, day) and time (hour, minute, second) with up to a precision of 6 for microseconds. Uses 10 bytes of storage
VARCHAR(n)
CHAR VARYING(n)
CHARACTER VARYING(n)
Holds variable length character data up to 32,672 bytes in length. Uses n+2 bytes of storage
VARCHAR(n) FOR BIT DATA Stores binary data in variable-length. Uses n bytes of storage
VARGRAPHIC(n) Holds variable-length double-byte character data up to 16,336 characters in length. Uses (n*2)+2 bytes of storage

MySQL and DB2

Understand data type caveats between MySQL and DB2

  • Migrate BLOB and CLOB < 32K to VARCHAR(n) WITH BIT DATA or VARCHAR(n) from performance standpoint. The tool handles this conversion by examining the actual data in the tables.
  • Tool handles UNSIGNED data types.
  • Migrate the Boolean data type to either SMALLINT or CHAR(1).
  • If decimal precision is greater than 31, tool converts column to the double data type.
  • Automatically incremented columns are migrated to a numeric data type and an IDENTITY clause.

The definitions and differences between the MySQL and DB2 data types are described in the following tables. Table 3 describes the most commonly used MySQL data types. Table 4 maps the MySQL data type to the closest DB2 equivalent.

MySQL might use a SERIAL alias for as a data type which translates to BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE.

BOOL or BOOLEAN is a synonym for TINYINT(1). The maximum digit for DECIMAL in MySQL is 65 and maximum number of supported decimal is 30. If UNSIGNED is specified for DECIMAL, negative values are not allowed.

The timestamp column does not support microseconds.

Table 3. MySQL data types
Data type Remark
BIT A fixed-length bit string
BOOLEAN Stores a logical Boolean (true/false/unknown) value of either TRUE, true and 1; FALSE, false and 0.
TINYBLOB Raw binary data used to store binary objects such as graphics up to 255 bytes
BLOB Raw binary data used to store binary objects such as graphics up to 65,535 bytes
MEDIUMBLOB Raw binary data used to store binary objects such as graphics up to 16,777,215 bytes
LONGBLOB Raw binary data used to store binary objects such as graphics up to 4GB
CHAR(n)
CHARACTER(n)
Contains a fixed-length character string padded with spaces up to a length of n.
DATE Holds a calendar date (year, month, day) in a 3 byte storage
DATETIME Holds a calendar date and time of day in 8 bytes of storage
YEAR A year in two-digit or four-digit format in 1 byte of storage
DECIMAL(p,s)
NUMERIC(p,s)
Stores exact numeric values with a precision (p) as high as 65 and a scale (s) of 30 or higher.
FLOAT Stores floating-point numbers to the limits allowed by the hardware. A single-precision floating-point number is accurate to approximately 7 decimal places. UNSIGNED attribute disallows negative values
DOUBLE
REAL
Stores double precision floating-point numbers to the limits allowed by the hardware. A double-precision floating-point number is accurate to approximately 15 decimal places. UNSIGNED attribute disallows negative values
TINYINT Stores signed or unsigned 1-byte integers
SMALLINT Stores signed or unsigned 2-byte integers
MEDIUMINT Stores signed or unsigned 3-byte integers
INTEGER Stores signed or unsigned 4-byte integers
BIGINT Stores signed or unsigned 8-byte integers
TINYTEXT Used to store character string data up to 255 bytes
TEXT Used to store character string data up to 65,535 bytes
MEDIUMTEXT Used to store character string data up to 16,777,215 bytes
LONGTEXT Used to store character string data up to 4GB
TIME Holds the time of day in 3 bytes of storage
TIMESTAMP Stores the date and time in 4 bytes of storage. A TIMESTAMP column is automatically set to the date and time of the most recent operation if you do not give it a valid value.
VARCHAR(n)
CHARACTER VARYING(n)
CHARACTER VARYING
Stores variable-length character strings up to a length of n. Trailing spaces are not stored.
ENUM A string object that can have only one value, chosen from the list of values 'value1', 'value2', ..., NULL
SET A string object that can have zero or more values, each of which must be chosen from the list of values 'value1', 'value2', ...
BINARY Similar to the CHAR type, but stores binary byte strings rather than non-binary character strings.
VARBINARY Similar to the VARCHAR type, but stores binary byte strings rather than non-binary character strings.
Table 4. Mapping MySQL data types to DB2
MYSQL DB2 Remark
BIT CHAR(n) FOR BIT DATA See Resources for UDFs to ease migration for more details
BOOLEAN SMALLINT or CHAR(1) Use a check constraint to enforce the rule
TINYBLOB VARCHAR(255) FOR BIT DATA You can either use BLOB(255) or VARCHAR(255) FOR BIT DATA. Using VARCHAR is more efficient in this case.
BLOB BLOB(64K) Consider using VARCHAR(n) FOR BIT DATA if length is < 32K
MEDIUMBLOB BLOB(16M) You can use NOT LOGGED to improve the performance.
LONGBLOB BLOB(2G) The maximum length of BLOB supported is 2GB.
CHAR(n)
CHARACTER(n)
CHAR(n)
CHARACTER(n)
The maximum size of 'n' in DB2 is 254.
DATE DATE -
DATETIME TIMESTAMP You can use CURRENT TIMEZONE special register to transform the date
YEAR SMALLINT You can use a CHECK constraint to enforce YEAR rule
DECIMAL(p,s)
NUMERIC(p,s)
DECIMAL(p,s)
NUMERIC(p,s)
If p > 31,use DOUBLE instead
FLOAT REAL _
DOUBLE
REAL
DOUBLE _
SMALLINT SMALLINT Use check constraint to enforce value < 256
SMALLINT SMALLINT _
MEDIUMINT INTEGER Use check constraint on max size, if required
INTEGER INTEGER
INT
_
BIGINT BIGINT _
TINYTEXT VARCHAR(255) Using VARCHAR for size < 32K is more efficient
TEXT CLOB(64K) DB2 allows you to specify the size parameter for CLOB or BLOB. Use the size that is required for you instead of using TINY, MEDIUM or LONG CLOB
MEDIUMTEXT CLOB(16M) _
LONGTEXT CLOB(2G) The max size is 2GB. You can use LOGGED up to 1GB size of BLOB or CLOB. Use NOT LOGGED option for better performance.
TIME TIME _
TIMESTAMP TIMESTAMP _
VARCHAR(n)
CHARACTER VARYING(n)
VARCHAR(n)
CHARACTER VARYING(n)
Use VARCHAR if size is < 32K
ENUM VARCHAR(n) Use check constraint to enforce the rules.
SET VARCHAR(n) Use check constraint to enforce the rules.
BINARY CHAR(n) FOR BIT DATA Use if n < 254 otherwise use VARCHAR(n) FOR BIT DATA
VARBINARY VARCHAR(n) FOR BIT DATA Use VARCHAR if 'n' < 32K otherwise use BLOB

PostgreSQL and DB2

Data type Caveats for PostgreSQL and DB2

  • One might use the TEXT data type in PostgreSQL in-lieu of CHAR(n) or VARCHAR(n). Migrating a TEXT column to a CLOB data type might not be the best choice if the maximum length of the value to be stored is less than 32K. The migration tool handles the correct conversion of the TEXT column by examining the data.
  • PostgreSQL might use NUMERIC with a precision > than 31. Migrate NUMERICs with a precision > 31 to a DOUBLE data type for DB2
  • Migrate Boolean data type to either SMALLINT or CHAR(1). This might require some changes in the application.
  • BIGSERIAL data type is mapped to DB2's BIGINT and an IDENTITY attribute.
  • PostgreSQL supports implicit casting of data types in mixed expressions. DB2 supports explicit casting of data types in mixed expressions.

The definitions and differences between the DB2 and PostgreSQL data types are described in the following two tables. Table-4 describes the most commonly used PostgreSQL data types. Table-5 maps the PostgreSQL data type to the closest DB2 equivalent.

PostgreSQL uses special network address types such as inet, cidr, macaddr. These data types are migrated to the VARCHAR data type in DB2.

PostgreSQL supports geometric data types as well. The tool does not handle geometric data types. Currently, we are assuming that there is not a great need to support the conversion of such data types. If you do use geometric data type, please send us a note and we will provide a fix in the tool.

Bit string data types in PostgreSQL can be handled with some modifications in your application. At this time, this is not supported in the tool. If you have a need for this, please let us know about it.

PostgreSQL also supports multi-dimensional arrays and they are best migrated in DB2 as child tables. However, the tool does not support multi-dimensional arrays at this time.

Table 5. PostgreSQL data types
Data type Remark
BIGSERIAL
SERIAL8
Stores an auto-incrementing unique integer value up to 8 bytes of data
BIT A fixed-length bit string
BIT VARYING(n)
VARBIT(n)
A variable-length bit string n bits in length
BOOLEAN Stores a logical Boolean (true/false/unknown) value of either TRUE, t, true, y, yes and 1; FALSE, f, false, n, no and 0.
BYTEA Raw binary data used to store large binary objects such as graphics. Uses 4 bytes plus the length of the binary string for storage
CHAR(n)
CHARACTER(n)
Contains a fixed-length character string padded with spaces up to a length of n.
DATE Holds a calendar date (year, month, day) in a 4-byte storage space
DATETIME Holds a calendar date and time of day
DECIMAL(p,s)
NUMERIC(p,s)
Stores exact numeric values with a precision (p) and a scale (s) of 0 or higher.
FLOAT4
REAL
Stores floating-point numbers with a precision of 8 or less and 6 decimal places
FLOAT8
DOUBLE PRECISION
Stores floating-point numbers with a precision of 16 or less and 15 decimal places
SMALLINT Stores signed or unsigned 2-byte integers
INTEGER Stores signed or unsigned 4-byte integers
INT8
BIGINT
Stores signed or unsigned 8-byte integers
SERIAL
SERIAL4
Stores an auto-incrementing unique integer value using up to 4 bytes of its storage.
TEXT Stores large, variable-length, character-string data up to 1 GB. PostgreSQL automatically compresses TEXT strings.
TIME (WITHOUT TIME ZONE |
WITH TIME ZONE)
Holds the time of day and stores either no time zone in 8 bytes of storage or the time zone of the database server using 12 bytes of storage.
TIMESTAMP (WITHOUT TIME ZONE |
WITH TIME ZONE)
Stores the date and time and stores either no time zone or the time zone of the database server and uses 8 bytes of storage.
VARCHAR(n)
CHARACTER VARYING(n)
CHARACTER VARYING
Stores variable-length character strings up to a length of n. Trailing spaces are not stored.
Table 6. Mapping PostgreSQL data types to DB2
POSTGRESQLDB2Remark
BIGSERIAL
SERIAL8
BIGINT Use IDENTITY Attribute to simulate auto-increment feature
BIT CHAR(n) FOR BIT DATA For length up to 254 bytes
BIT VARYING(n)
VARBIT(n)
VARCHAR(n) FOR BIT DATA Use if up to 32,672 bytes
BYTEA BLOB Can be used if between 32K and 2GB bytes
BOOLEAN No Boolean type Use CHAR(1) or SMALLINT
CHAR(n)
CHARACTER (n)
CHAR(n) Up to 254 bytes
DATE DATE Use the CURRENT TIMEZONE special register to transform the date
DATETIME TIMESTAMP Use the CURRENT TIMEZONE special register to transform the date
DECIMAL(p,s)
NUMERIC(p,s)
DECIMAL(p,s) For precision > 31, Use DOUBLE
FLOAT4
REAL
REAL Can use either NUMERIC or FLOAT
FLOAT8
DOUBLE PRECISION
DOUBLE PRECISION Use DOUBLE PRECISION for large number or use NUMERIC if precision < 31
SMALLINT SMALLINT _
INTEGER INTEGER _
INT8
BIGINT
BIGINT _
VARCHAR(n)
CHARACTER VARYING(n)
CHARACTER VARYING
VARCHAR(n) If 'n' is 32K or less. DB2 requires you to specify 'n' while postgres does not enforce a value for 'n'
SERIAL
SERIAL4
INTEGER Use with IDENTITY attribute
TEXT VARCHAR(n)
CLOB
Use VARCHAR if length is less than 32K bytes otherwise use BLOB if size is more than 32K bytes
TIME (WITHOUT TIME ZONE | WITH TIME ZONE) TIME Without a time zone
TIMESTAMP (WITHOUT TIME ZONE | WITH TIME ZONE) TIMESTAMP Without a time zone

You can create a foreign key constraint in PostgreSQL even if the data type is different in the referenced table. For example, if parent table's unique key has a data type as integer, you can create a foreign key on child table for a column using a data type as char(10). The tool will convert the constraint but it will fail as DB2 does not allow data type to be different.

Now, we have seen the data type differences between MySQL, PostgreSQL and DB2 along with some high level feature differences. Let us now discuss migration approach to DB2 in three easy steps.


Step 1: Install DB2 Express/Express-C

Difference between Express and Express-C

  • DB2 Express-C, a version of DB2 Express for the community is a no charge data server for use in development and deployment.
  • DB2 Express-C is completely "free" to download, develop, deploy, test, run, embed, and redistribute.
  • DB2 Express-C can also be installed on Windows XP-Home in addition to Windows and Linux on 32 and 64 bit platform.
  • DB2 Express is a priced option including
    • Warehouse Manager tools & servers,
    • Extender and DB2 Connect support,
    • Informix Data Source Replication,
    • Replication Data Capture,
    • APPC and NetBios,
    • HADR is an additional priced option

Both DB2 Express and Express-C can be installed on Linux or Windows systems running with 32 or 64 bit hardware with up to 2 processors and 4GB of addressable memory. DB2 Express-C can easily be upgraded to DB2 Express, Workgroup and Enterprise Server Editions without modifying the database or applications such as C/C++, Java, .NET, and PHP.

The installation process is largely the same for Linux or Windows. Installation of DB2 Express involves performing the following easy steps:

  1. Log in to your system using a local administrator account (on Windows) or as the root user (on Linux)
  2. Execute the setup.exe on Windows and the Setup launch pad appears
  3. Execute setup on Linux to do GUI installation or run db2install for command line install.
    • If you use db2install, you will need to create DB2 instances and so forth manually.
  4. In GUI install, select install product to launch the Setup wizard
  5. Follow the installation wizard and supply input when prompted
  6. For products embedded with DB2 Express-C, you can choose to perform a silent install using a response file.

A few additional notes:

  • Setting up the DB2 instance owner
    • The default user id on Windows is db2admin and db2inst1 on Linux.
    • On Linux, one difference is that besides the instance owner user id you will also be prompted to provide an additional user id for the fenced user. The fenced user id will be used to run external C or Java stored procedures and user-defined functions.
    • If the specified user does not exist, it will be created and granted the necessary privileges
    • If an existing user id is used, it must already have administrator privileges (Windows).
  • During installation a default instance is created
    • On Windows it is called DB2.
    • On Linux, it is called db2inst1.

By default, the DB2 server is configured to use TCPIP on port 50000. This setting can be optionally changed using the protocols button.

After the installation First Steps will be launched and can be used to help you create your first database called SAMPLE

You can download DB2 Express / Express-C from Resources.


Step 2: Migrate DDL and data using the tool

Modifications to PostgreSQL setup

If you will be connecting to a PostgreSQL database from a remote machine, enable the remote client connections to the PostgreSQL server as follows:

  1. Modify the pg_hba.conf to allow remote connections.

    Find the pg_hba.conf file in the postgres database directory. Add a line in the pg_hba.conf file to allow remote TCPIP connections to PostgreSQL database as follows:

    host    all         all         9.0.0.0 255.0.0.0     trust

    The 4th parameter specifies the IP address range and the 5th parameter specifies the subnet mask. In above example, we exposed all ip’s starting with 9 to be able to connect to PostgreSQL database.

  2. Start the Postgres database server from the command line.
    $ pg_ctl -D /home/postgres/testdb -o -i -l logfile start

Required software for migration

  • JDBC driver PostgreSQL

    To connect to PostgreSQL, use the JDBC driver to connect to the database. At the time of this writing, we used PostgreSQL 8.0.3 with JDBC driver 8.0.315. You can download the JDBC driver for PostgreSQL from resources. Please note that the tool provided in this paper does not include the PostgreSQL JDBC driver.

  • JDBC Driver MySQL

    We used MySQL Connector/J 3.1 V 3.1.12 to connect to MySQL database. You can download the JDBC driver for the MySQL database from resources. The tool provided does not include the JDBC driver.

    After installing the JDBC driver, modify the CLASSPATH parameter to include the JDBC driver.

  • IBM JDK 5.0

    The tool has been tested only with Java JDK 5.0. You can use either a Sun or IBM Java JDK 5.0 to run the tool. Download the IBM 5.0 JDK from resources

Setup migration tool

Things to remember

  • Install IBM JDK 5.0
  • Update JAVA_HOME and include JAVA_HOME/bin directory in your PATH
  • Copy IBMExtract.jar and required JDBC drivers in a directory and include these files in your CLASSPATH variable.

On UNIX systems, you can either install the tool under the DB2 instance user home directory or some other user who has the necessary permissions to run the DB2 LOAD utility.

After you download the tool from download section, unzip the file in a directory. The tool is provided in IBMExtract.jar file. Update the CLASSPATH variable in your profile to include the tool and other necessary JDBC drivers. For example, the following example shows how to include IBMExtract.jar, PostgreSQL and MySQL JDBC drivers in your CLASSPATH.

export JAVA_HOME=/opt/ibm/java2-i386-50
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=$HOME/java/lib/IBMExtract.jar:$CLASSPATH
export CLASSPATH=$HOME/java/lib/postgresql-8.0-315.jdbc3:$CLASSPATH
export CLASSPATH=$HOME/java/lib/mysql-connector-java-3.1.12-bin.jar:$CLASSPATH

On Windows systems, update the CLASSPATH environment variable through Control Panel -> System -> Advanced -> Environment Variables.

Tool structure

There are two components of the tool. The first component (ibm.GenInput) generates the input file for use by the second component (ibm.GenerateExtract). The input file generated by the first component can be modified by the user to remove the tables that are not needed for the migration. Optionally, you can also edit the input file to specify the tables that might be created in DB2 as a result of custom query.

ibm.GenInput

The script to run the first component is a geninput shell script on Linux and geninput.cmd on Windows. You will need to specify correct parameters for your environment to connect to MySQL/PostgreSQL database. You will need to change DBVENDOR, SERVER, DATABASE, PORT, DBUID and DBPWD parameters in the following script. The script is shown below:

Listing 1. Script to run on Windows (geninput.cmd)
@echo off
cls

ECHO Executed by: %USERNAME% Machine: %COMPUTERNAME% On %OS% %DATE% %TIME%
ECHO.
ECHO -------------------------------------------------------------------
ECHO Program to perform MySQL/PostgreSQL to DB2 Migration
ECHO -------------------------------------------------------------------
ECHO.


if "%1" == ""  (
  echo Usage : geninput.cmd dbname
  goto end
)

SET DBVENDOR=postgres

SET DB2SCHEMA=%1
SET SERVER=server.ibm.com
SET DATABASE=%1
SET PORT=5432
SET DBUID=postgres
SET DBPWD=pwd
%JAVA_HOME%\bin\java -DINPUT_DIR=.\migr -cp %CLASSPATH% ibm.GenInput 
  %DBVENDOR% %DB2SCHEMA% %SERVER% %DATABASE% %PORT% %DBUID% %DBPWD%
:end
Listing 2. Script to run on Linux (geninput)
#!/bin/bash
if [ "$1" = ""  ] ; then
  echo Usage : geninput dbname
  exit 1
fi

DBVENDOR=postgres
DB2SCHEMA=$1
SERVER=server.ibm.com
DATABASE=$1
PORT=5432
DBUID=postgres
DBPWD=pwd

java -DINPUT_DIR=$PWD -cp $CLASSPATH ibm.GenInput $DBVENDOR $DB2SCHEMA 
     $SERVER $DATABASE $PORT $DBUID $DBPWD

The INPUT directory to store the input files for the tool is specified through VM parameter as -DINPUT_DIR. In above script, it is specified as current directory. The program will create an input directory in current working directory.

Input file creation parameters

Table 7. GenInput parameters
Argument name Value Description
Java program ibm.GenInput This is the main java program
DBVENDOR postgres or mysql Specify either postgres or mysql
DB2SCHEMA schema_name Specify DB2 schema in which you can import the tables from source database
SERVER Hostname Hostname or IP address of the server hosting PostgreSQL or MySQL database
DATABASE dbname Name of the PostgreSQL or MySQL database
PORT nnn Port number to connect to. The default port to connect to MySQL/Postgresql is 3306/5432.
DBUID uid MySQL or PostgreSQL database user id
DBPWD pwd MySQL or PostgreSQL database password

Structure of the input file

The first component of the tool ibm.GenInput generates an input file to be used by second component ibm.GenerateExtract. The structure of the input file is provided below in case you need to modify the query part of the file to create a table based upon a query on source database.

DB2_Schema_Name.Actual_Table_Name:Table Query

Sample content of the file is below:

ama.ama_msa:SELECT * FROM public.ama_msa
ama.ama_mti:SELECT * FROM public.ama_mti
ama.ama_pe:SELECT * FROM public.ama_pe
ama.ama_pmsa:SELECT * FROM public.ama_pmsa
ama.ama_schools:SELECT * FROM public.ama_schools
ama.ama_specialties_group:SELECT * FROM public.ama_specialties_group

ibm.GenerateExtract

The script to unload the data and to generate DDL for DB2 is an unload shell script on Linux and an unload.cmd script on Windows. You will need to modify the following connection parameters for the MySQL/PostgreSQL databases; DBVENDOR, SERVER, PORT, DBUID and DBPWD. The unload script is shown in the following tables for Windows and Linux environment.

Listing 3. Script to run on Windows (unload.cmd)
@echo off
cls

ECHO Executed by: %USERNAME% Machine: %COMPUTERNAME% On %OS% %DATE% %TIME% 
ECHO.
ECHO -------------------------------------------------------------------
ECHO Program to perform MySQL/PostgreSQL to DB2 Migration
ECHO -------------------------------------------------------------------
ECHO.


if "%1" == ""  (
  echo Usage : unload.cmd dbname
  goto end
)

SET TABLES=input\%1.tables
SET COLSEP=~
SET DBVENDOR=postgres
SET NUM_THREADS=5
SET SERVER=server.ibm.com
SET DATABASE=%1
SET PORT=5432
SET DBUID=postgres
SET DBPWD=pwd
SET GENDDL=true
SET UNLOAD=true
SET FETCHSIZE=100
%JAVA_HOME%\bin\java -DOUTPUT_DIR=output\%1 -cp %CLASSPATH% 
  ibm.GenerateExtract %TABLES% %COLSEP% %DBVENDOR% %NUM_THREADS% %SERVER% 
  %DATABASE% %PORT% %DBUID% %DBPWD% %GENDDL% %UNLOAD% %FETCHSIZE%
:end
Listing 4. Script to run on Linux (unload)
#!/bin/bash
if [ "$1" = ""  ] ; then
  echo Usage : unload dbname
  exit 1
fi

TABLES=$PWD/input/$1.tables
COLSEP=\~
DBVENDOR=postgres
NUM_THREADS=5
SERVER=db2lab9.dfw.ibm.com
DATABASE=$1
PORT=5432
DBUID=postgres
DBPWD=db2mig
GENDDL=true
UNLOAD=true
FETCHSIZE=100

java -DOUTPUT_DIR=$PWD/output/$1 -cp $CLASSPATH ibm.GenerateExtract 
   $TABLES $COLSEP $DBVENDOR $NUM_THREADS $SERVER $DATABASE 
   $PORT $DBUID $DBPWD $GENDDL $UNLOAD $FETCHSIZE

We specify OUTPUT_DIR to the java program through a -D switch to the JVM. In this case, it is defined as output/$1, which will be the output/database name that you specify through the unload command.

Unload parameters

Table 8. GenerateExtract parameters
Argument name Value Description
Java program Ibm.GenerateExtract This is the main Java program
TABLES FileName This is the file that contains the name of the tables and SQL query that will be used to generate DDL for DB2 and to unload the data. This file is generated in the first step.
COLSEP \~ The column separator. In this example, it has been chosen as the tilde character (~). If the tilde character has a special meaning in your Unix environment, you can prefix it with a back slash. On Linux platforms, tilde is used to expand to the home directory name. On Windows platform, you can specify it without preceding it with a back slash.
DBVENDOR postgres The name of the database vendor. If you are running this tool against the MySQL database, specify mysql; for PostgreSQL, use the postgres value.
NUM_THREADS nn The number of threads that the java program will run.
SERVER Hostname The hostname or IP address of the MySQL/PostgreSQL database server. If you are running on a local host, you can specify localhost.
DATABASE dbname The name of the MySQL/PostgreSQL database that you will migrate to db2.
PORT nnn The port number used to connect to the MySQL / PostgreSQL database. The default port on MySQL/PostgreSQL is 3306/5432 respectively.
DBUID uid The user id of the MySQL / PostgreSQL database server.
DBPWD pwd The password of the MySQL/PostgreSQL user id.
GENDDL true This value can be either true/false. This instructs the tool to generate DDL for the tables to be unloaded.
UNLOAD true This value can be either true/false. This instructs the tool to unload the data into the OUTPUT_DIR directory. The OUTPUT_DIR is specified by using the the -D switch for the JVM.
FETCHSIZE 1000 This is an important parameter and may be specified by a value of 100 or greater. If you specify this value to be very large, you may run out of memory as the MySQL/PostgreSQL JDBC driver will try to hold large amounts of data in memory. If you have a large amount of memory, you could improve performance by increasing this parameter. If you run into an "out of memory" issue, then decrease this parameter.

Driver properties

The tool uses a driver properties file to read JDBC driver information for MySQL / PostgreSQL database. The properties file is included in the JAR file.

postgres=org.postgresql.Driver
mysql=com.mysql.jdbc.Driver

URL properties

The tool uses the URL properties file to read JDBC driver URL information for MySQL / PostgreSQL database. This poperties file is included in the JAR file.

postgres=jdbc:postgresql://
mysql=jdbc:mysql://

Mapping properties

Data type mapping between MySQL / PostgreSQL and DB2 is controlled by using a data type mapping properties file. This file is included in the JAR file. If there is a need to change the data type mapping properties between MySQL / PostgreSQL and DB2, you can just modify this file and do not have to modify the program.

Listing 5. Data type mapping properties file
POSTGRES.INT=INTEGER
POSTGRES.INT2=SMALLINT
POSTGRES.INT4=INTEGER
POSTGRES.INT8=INTEGER
POSTGRES.SERIAL4=INTEGER
POSTGRES.SERIAL8=INTEGER
POSTGRES.BOOLEAN=SMALLINT
POSTGRES.BYTEA=BLOB
POSTGRES.VARCHAR=VARCHAR;VARLENGTH=TRUE;DEFAULT=255
POSTGRES.CHARACTER=CHAR;VARLENGTH=TRUE
POSTGRES.BPCHAR=CHAR;VARLENGTH=TRUE
POSTGRES.DATE=DATE
POSTGRES.FLOAT4=REAL
POSTGRES.FLOAT8=DOUBLE PRECISION
POSTGRES.INTEGER=INTEGER
POSTGRES.NUMERIC=NUMERIC;VARLENGTH=TRUE
POSTGRES.TEXT=VARCHAR;VARLENGTH=TRUE;DEFAULT=255;USEACTUALDATA=TRUE
POSTGRES.TIME=TIME
POSTGRES.TIMESTAMP=TIMESTAMP
POSTGRES.OID=INTEGER
MYSQL.BOOLEAN=SMALLINT
MYSQL.BIT=SMALLINT
MYSQL.TINYBLOB=VARCHAR(255) FOR BIT DATA
MYSQL.BLOB=BLOB;VARLENGTH=TRUE
MYSQL.MEDIUMBLOB=BLOB;VARLENGTH=TRUE
MYSQL.LONGBLOB=BLOB;VARLENGTH=TRUE
MYSQL.CHAR=CHAR;VARLENGTH=TRUE
MYSQL.CHARACTER=CHAR;VARLENGTH=TRUE
MYSQL.DATE=DATE
MYSQL.DATETIME=TIMESTAMP
MYSQL.YEAR=SMALLINT
MYSQL.NUMERIC=NUMERIC;VARLENGTH=TRUE
MYSQL.DECIMAL=NUMERIC;VARLENGTH=TRUE
MYSQL.FLOAT=REAL
MYSQL.DOUBLE=DOUBLE
MYSQL.REAL=DOUBLE
MYSQL.TINYINT=SMALLINT
MYSQL.SMALLINT=SMALLINT
MYSQL.MEDIUMINT=INT
MYSQL.INTEGER=INT
MYSQL.BIGINT=BIGINT
MYSQL.BIT_UNSIGNED=SMALLINT
MYSQL.TINYINT_UNSIGNED=SMALLINT
MYSQL.SMALLINT_UNSIGNED=SMALLINT
MYSQL.MEDIUMINT_UNSIGNED=INT
MYSQL.INTEGER_UNSIGNED=INT
MYSQL.BIGINT_UNSIGNED=BIGINT
MYSQL.DECIMAL_UNSIGNED=NUMERIC;VARLENGTH=TRUE
MYSQL.NUMERIC_UNSIGNED=NUMERIC;VARLENGTH=TRUE
MYSQL.TINYTEXT=VARCHAR;VARLENGTH=TRUE;DEFAULT=255
MYSQL.TEXT=VARCHAR;VARLENGTH=TRUE;DEFAULT=65535
MYSQL.MEDIUMTEXT=CLOB(16M)
MYSQL.LONGTEXT=CLOB(2G)
MYSQL.TIME=TIME
MYSQL.TIMESTAMP=TIMESTAMP
MYSQL.VARCHAR=VARCHAR;VARLENGTH=TRUE;DEFAULT=255
MYSQL.BINARY=CHAR FOR BIT DATA;VARLENGTH=TRUE
MYSQL.VARBINARY=VARCHAR FOR BIT DATA;VARLENGTH=TRUE

How to run the tool on Windows or Linux

To use this tool, there are only two commands that you need to run. The first command is named geninput.cmd on Windows and geninput on Linux. The second command is named unload.cmd on Windows and unload on Linux. The example shown here is for Linux but it is the same for both platforms.

First - Generate input file

If you do not specify an argument for the geninput script, a message will be shown stating that a MySQL/PostgreSQL database name is required (shown below). Re-run the script and specify the name of the MySQL/PostgreSQL database; the input file for the given database will be generated.

Listing 6. Running geninput
db2@db2lab9:~/migr> ./geninput
Usage : geninput dbname

db2@db2lab9:~/migr> ./geninput ama
[2006-05-23 09.35.54.563] dbSourceName:postgres
[2006-05-23 09.35.54.564] db2SchemaName:ama
[2006-05-23 09.35.54.565] server:server.ibm.com
[2006-05-23 09.35.54.565] dbName:ama
[2006-05-23 09.35.54.565] port:5432
[2006-05-23 09.35.54.565] uid:postgres
[2006-05-23 09.35.54.566] INPUT Directory = /home/db2/migr/input
[2006-05-23 09.35.54.575] Configuration file loaded: 'driver.properties'
[2006-05-23 09.35.54.576] Configuration file loaded: 'url.properties'
[2006-05-23 09.35.54.599] Driver org.postgresql.Driver loaded
[2006-05-23 09.35.54.960] ama.ama_addresstype:SELECT * FROM public.ama_addresstype
ama.ama_country_codes:SELECT * FROM public.ama_country_codes
ama.ama_hosp_affil:SELECT * FROM public.ama_hosp_affil
ama.ama_msa:SELECT * FROM public.ama_msa
ama.ama_mti:SELECT * FROM public.ama_mti
ama.ama_pe:SELECT * FROM public.ama_pe
ama.ama_physicians:SELECT * FROM public.ama_physicians
ama.ama_pmsa:SELECT * FROM public.ama_pmsa
ama.ama_schools:SELECT * FROM public.ama_schools
ama.ama_specialties_group:SELECT * FROM public.ama_specialties_group
ama.ama_top:SELECT * FROM public.ama_top
ama.ama_type_of_practice:SELECT * FROM public.ama_type_of_practice
ama.calculation:SELECT * FROM public.calculation
ama.calculation_group:SELECT * FROM public.calculation_group
ama.category:SELECT * FROM public.category
ama.code_lookup:SELECT * FROM public.code_lookup
ama.physician_calculation:SELECT * FROM public.physician_calculation
ama.physician_calculation_group:SELECT * FROM public.physician_calculation_group
ama.physician_category:SELECT * FROM public.physician_category
ama.possible_answer:SELECT * FROM public.possible_answer
ama.question:SELECT * FROM public.question
ama.topic:SELECT * FROM public.topic

The input file will be created in the input directory of your current working directory. You can modify this file to remove tables that you do not want to migrate by removing the SQL query which would unload the data.

Second - Generate DDL and unload the data

You are now ready to run the extract program that generates the DDL for DB2 and unloads the data from the MySQL / PostgreSQL database. You will need to specify the name of the database as an argument for this program.

Listing 7. Running unload
db2@db2lab9:~/migr> ./unload ama
[2006-05-23 09.40.43.157] TABLES_PROP_FILE:/home/db2/migr/input/ama.tables
[2006-05-23 09.40.43.159] DRIVER_PROP_FILE:driver.properties
[2006-05-23 09.40.43.160] URL_PROP_FILE:url.properties
[2006-05-23 09.40.43.161] DATAMAP_PROP_FILE:datamap.properties
[2006-05-23 09.40.43.162] colsep:~
[2006-05-23 09.40.43.162] dbSourceName:postgres
[2006-05-23 09.40.43.163] threads:5
[2006-05-23 09.40.43.164] server:server.ibm.com
[2006-05-23 09.40.43.165] dbName:ama
[2006-05-23 09.40.43.166] port:5432
[2006-05-23 09.40.43.167] uid:postgres
[2006-05-23 09.40.43.168] fetchSize:100
[2006-05-23 09.40.43.186] Configuration file loaded: '/home/db2/migr/input/ama.tables'
[2006-05-23 09.40.43.188] query size 22 schemaName size = 22
[2006-05-23 09.40.43.387] Configuration file loaded: 'driver.properties'
[2006-05-23 09.40.43.389] Configuration file loaded: 'url.properties'
[2006-05-23 09.40.43.398] Configuration file loaded: 'datamap.properties'
[2006-05-23 09.40.43.414] Driver org.postgresql.Driver loaded
[2006-05-23 09.40.43.606] Starting Blades
[2006-05-23 09.40.43.607] Starting Blade_1
[2006-05-23 09.40.43.613] Starting Blade_0
[2006-05-23 09.40.43.613] Starting Blade_3
[2006-05-23 09.40.43.615] Starting Blade_2
[2006-05-23 09.40.43.615] Starting Blade_4
[2006-05-23 09.40.43.883] Blade_3 unloaded 21 rows in 269 ms for ama.ama_pe
[2006-05-23 09.40.44.218] Blade_4 unloaded 5 rows in 603 ms for ama.ama_addresstype
[2006-05-23 09.40.44.273] Blade_3 unloaded 0 rows in 390 ms for ama.possible_answer
[2006-05-23 09.40.44.560] Blade_1 unloaded 10 rows in 952 ms for ama.ama_top
[2006-05-23 09.40.44.569] Blade_3 unloaded 0 rows in 296 ms for ama.category
[2006-05-23 09.40.44.687] Blade_2 unloaded 0 rows in 1072 ms for ama.physician_calculation
[2006-05-23 09.40.44.718] Blade_4 unloaded 0 rows in 500 ms for ama.question
[2006-05-23 09.40.44.881] Blade_3 unloaded 0 rows in 312 ms for ama.calculation_group
[2006-05-23 09.40.44.914] Blade_2 unloaded 384 rows in 227 ms for ama.ama_pmsa
[2006-05-23 09.40.44.984] Blade_4 unloaded 493 rows in 266 ms for ama.ama_country_codes
[2006-05-23 09.40.45.076] Blade_2 unloaded 13 rows in 162 ms for ama.ama_type_of_practice
[2006-05-23 09.40.45.343] Blade_4 unloaded 201 rows in 359 ms for 
ama.ama_specialties_group
[2006-05-23 09.40.45.451] Blade_1 unloaded 7141 rows in 891 ms for ama.ama_hosp_affil
[2006-05-23 09.40.45.691] Blade_0 unloaded 6102 rows in 2078 ms for ama.ama_mti
[2006-05-23 09.40.45.869] Blade_1 unloaded 0 rows in 418 ms for ama.code_lookup
[2006-05-23 09.40.46.024] Blade_0 unloaded 0 rows in 333 ms for ama.calculation
[2006-05-23 09.40.46.236] Blade_0 unloaded 0 rows in 212 ms for 
ama.physician_calculation_group
[2006-05-23 09.40.46.380] Blade_0 unloaded 0 rows in 144 ms for ama.physician_category
[2006-05-23 09.40.46.405] Blade_1 unloaded 1863 rows in 536 ms for ama.ama_schools
[2006-05-23 09.40.46.539] Blade_1 unloaded 4 rows in 134 ms for ama.ama_msa
[2006-05-23 09.40.46.917] Blade_0 unloaded 0 rows in 537 ms for ama.topic
[2006-05-23 09.40.48.931] ama_physicians 10000 rows unloaded in 3.835 sec
[2006-05-23 09.40.52.048] ama_physicians 10000 rows unloaded in 3.117 sec
........
[2006-05-23 09.44.21.891] ama_physicians 10000 rows unloaded in 2.152 sec
[2006-05-23 09.44.24.200] ama_physicians 10000 rows unloaded in 2.309 sec
[2006-05-23 09.44.26.670] Blade_2 unloaded 969995 rows in 221594 ms for ama.ama_physicians
[2006-05-23 09.44.26.671] ==== Total time: 223.0 sec 
[2006-05-23 09.44.26.923] done Blade_0
[2006-05-23 09.44.27.175] done Blade_1
[2006-05-23 09.44.27.427] done Blade_2
[2006-05-23 09.44.27.679] done Blade_3
[2006-05-23 09.44.27.931] done Blade_4

Tool output

After the tool is successfully run against your database, navigate to the output directory (in our example, output/ama) to view the output which will appear as shown below:

Listing 8. Tool output
db2@db2lab9:~/migr> ls -l output/ama
total 76
drwxr-xr-x  2 db2 db2  4096 2006-05-23 06:16 ama_data
-rw-r--r--  1 db2 db2  1212 2006-05-23 09:44 ama_db2checkpending.sql
-rw-r--r--  1 db2 db2  2687 2006-05-23 09:44 ama_db2cons.sql
-rw-r--r--  1 db2 db2   662 2006-05-23 09:44 ama_db2drop.sql
-rw-r--r--  1 db2 db2  1378 2006-05-23 09:44 ama_db2fkdrop.sql
-rw-r--r--  1 db2 db2  3523 2006-05-23 09:44 ama_db2fkeys.sql
-rw-r--r--  1 db2 db2 13190 2006-05-23 09:44 ama_db2load.sql
-rw-r--r--  1 db2 db2  3148 2006-05-23 09:44 ama_db2runstats.sql
-rw-r--r--  1 db2 db2  1143 2006-05-23 09:44 ama_db2.sh
-rw-r--r--  1 db2 db2  1192 2006-05-23 09:44 ama_db2tabcount.sql
-rw-r--r--  1 db2 db2  7099 2006-05-23 09:44 ama_db2tables.sql
-rw-r--r--  1 db2 db2  4336 2006-05-23 09:44 ama_db2tabstatus.sql
drwxr-xr-x  2 db2 db2  4096 2006-05-23 06:06 ama_dump
drwxr-xr-x  2 db2 db2  4096 2006-05-23 06:06 ama_msg

The explanation of each file generated is given in following table.

Table 9. Output files
File/Directory Name Description
ama_data Contains all the data files that were unloaded from the source database.
ama_dump Contains the data that was not loaded into DB2. The DB2 LOAD utility will dump the data that was not loaded in DB2.
ama_msg Contains all the messages that were genearated by the DB2 LOAD utility.
ama_db2tables.sql Contains table creation scripts for DB2.
ama_db2cons.sql Contains all constraints and indexes. Please be advised that the check constraints are not reverse engineered by this tool. This file will hold DDL for primary keys, unique constraints and all other indexes.
ama_db2fkeys.sql Contains all foreign keys constraints.
ama_db2load.sql Contains DB2 LOAD utility scripts that will be used to load the data.
ama_db2tabcount.sql
ama_db2tabstatus.sql
ama_db2fkdrop.sql
ama_db2drop.sql
ama_db2checkpending.sql
Contains table row counts used for verifying data movement.
Check the status of the table after the LOAD utility has completed.
Used to drop all foreign keys constraints.
Used to drop all tables in DB2
Used to remove a table from check pending state after loading data.
ama_db2.sh
ama_db2.cmd
The shell script to create all the DB2 objects on the Linux platform.
The shell script to create all the DB2 objects on the Windows platform.

Due to how different databases implement constraints, there a few points to note:

  • In MySQL/PostgreSQL, you can create a unique constraint (or index) index on a nullable column however DB2 requires that a column with a unique constraint is defined as NOT NULL.
  • In MySQL/PostgreSQL, a foreign key can be created for a primary key even if the primary key is null. DB2 does not allow primary keys with nulls.
  • The tool handles these differences by automatically generating the required DDL for DB2 and will define unique or primary key columns as NOT NULL. As a result, some data may require modifications in order to be migrated.

Step 3: Migrate to DB2

Use DB2 autonomics for simplified management

The focus of this paper is for small and medium businesses. We want to make your DB2 experience as smooth as possible without having to worry about day-to-day tuning or maintenance tasks. When you migrate MySQL / PostgreSQL databases to DB2, you can use some or all of these best practices shown below to reduce your cost of maintenance of databases:

  • Your best friend is the DB2 Control Center. This is one of the best tools to ease your DB2 administration tasks. It also teaches you the DB2 principles through GUI wizards and how DB2 is going to construct a command when you click on the Show Command button.
  • After you create the database, run the Configuration Advisor utility from the Control Center. Answer the ten questions that best explain your processing requirements to automate the tuning of your database.
  • Use the DB2 Control Center to ease the creation of table spaces and buffer pools with automatic storage and let DB2 calculate its storage needs. After you create buffer pools etc, let DB2 decide how much memory it should use for its use with the help of the Autoconfigure command or the Configuration Advisor GUI.
  • Do not forget to run Configuration Advisor whenever you make hardware or database changes. If you add server memory or add table spaces, run Configuration Advisor to adjust your DB2 parameters. DB2 Configuration Advisor uses smart algorithms derived from the experience of highly skilled DB2 DBAs and many years of real world tuning exercises based on a wide variety of databases.
  • Configure Alert Notification in DB2 so that you know what is happening inside DB2 or if you need to take some corrective actions before hand. Setting up Alert Notification is easy through Health Center. You need to specify a SMTP server and create user groups and contacts.
  • Use autonomics in DB2 to let DB2 manage its own administration. After you create the database, go to Control Center and right click on Configure Automatic Maintenance to configure automatic maintenance for backups, runstats (update statistics) and table reorganizations. Remember that Automatic Maintenance is not job scheduling tool but is a very smart utility that DB2 uses to determine if it is necessary to perform some basic maintenance tasks.
  • If you start using the autonomics features of DB2, you will soon realize that you are not paying as much attention to DB2 as you thought you would.

Now, you are ready to run the migration. You have generated all the scripts that you need to create DB2 objects. Let us walk you through some of the best practices that can make your DB2 experience as simple as possible.

Create database in DB2

You can create DB2 database with the command CREATE DB dbname. When you create a DB2 database, it creates three table spaces for you - System, Temporary and User. If you use DB2 Viper, it will create the USER table space as a database managed (DMS) using automatic storage. The DMS table spaces give you the best performance along with a good sized buffer pool. Your temporary table space should be created as system managed (SMS).


















































Run Autoconfigure command for tuning

Table 10. Auto-configure DB2 database
Parameter name Description
mem_percent Percentage of the server memory that you want to dedicate to DB2
workload_type Is the database used for OLTP, Data warehouse, or for mixed purposes? Use Mixed if you are not sure.
num_stmts Number of average SQL statements in a unit of work from your application.
tpm What is the transaction rate per minute for your application?
admin_priority What is the administration priority? Is it the performance or recovery of the database?
is_populated Is there enough data populated in your database? Run this tool again when number of rows in your tables changed significantly since you last ran the tool.
num_local_apps How many batch programs accessing database you run on your database server?
num_remote_apps How many remote applications will be connecting to your database? If you are using an application server of your choice like Tomcat or WebSphere, use total number of connections in your pool.
isolation What is the isolation of your application? Use RR as it will calculate lock memory requirements conservatively.
bp_resizable Have you tuned your buffer pools? If not, let DB2 tune your buffer pools.

After you created the database, run DB2 AUTOCONFIGURE command as shown in Listing 9 to tune your database upfront. Answer 10 questions and run the command against your database. Your best bet is to run this command from Control Center as it is very intuitive to run it through GUI.

Run autoconfigure command against your database or use DB2 Control Center to run it interactively through the GUI.

Listing 9. Autoconfigure
$ db2 connect to yourdbname
$ db2 autoconfigure using 
   mem_percent      85 
   workload_type    simple 
   num_stmts        20 
   tpm              3000 
   admin_priority   performance 
   is_populated     yes 
   num_local_apps   0 
   num_remote_apps  500 
   isolation        cs 
   bp_resizeable    yes 
apply db and dbm;
$ db2 connect reset

Create table spaces and buffer pools before running the migration scripts

One of the outputs from the tool is the approximate maximum size of the row in each table. The maximum row size will determine if you need to use a 4K, 8K, 16K or 32K page size for your table space. After going through the output of this file, decide how many table spaces you need for each size. Use the DB2 Control Center to create these table spaces (with automatic storage) before you create the objects. If you have at least one table space for each page size, all of your table creation scripts will run as they should without the need to specify which table space they should be created in. If you want to make life even easier for yourself, create one 32K page size DMS table space and one 4K size DMS table space before running the table creation script.

If you have been using LOBS in MySQL/PostgreSQL database, you will need to also create LARGE table space. Again do it through the Control Center as it is easily done that way.

Run migration scripts in your DB2 database

After performing the above steps, run script <dbname>_db2.sh on Linux or <dbname>_db2.cmd on Windows to create all the objects and to load data into DB2. A sample script is shown below;

Listing 10. Creating the objects and loading data
#!/bin/bash
if [ "$1" = "" ] ; then
   echo To run this script, specify name of the db2 database
   echo For example, \"./ama_db2 sample\"
   echo where sample is the name of the db2 database
   echo 
   exit 1
fi
OUTPUT=ama_db2.log
echo Executing Script ama_db2.sh > $OUTPUT
echo Connecting to $1
db2 connect to $1 >> $OUTPUT
echo Running ama_db2tables.sql script to create all tables
db2 -tvf ama_db2tables.sql >> $OUTPUT
echo Running ama_db2cons.sql script to create primary keys and indexes
db2 -tvf ama_db2cons.sql >> $OUTPUT
echo Running ama_db2load.sql script to create to load the data
db2 -tvf ama_db2load.sql >> $OUTPUT
echo Running ama_db2fkeys.sql script to create all foreign keys
db2 -tvf ama_db2fkeys.sql >> $OUTPUT
echo Running ama_db2tabcount.sql script to count rows from all tables
db2 -tvf ama_db2tabcount.sql >> $OUTPUT
echo Running ama_db2tabstatus.sql script to show status of tables after load
db2 -tvf ama_db2tabstatus.sql >> $OUTPUT
db2 connect reset >> $OUTPUT
echo 
echo Check the log file $OUTFILE for any errors or issues
echo

The migration script performs following tasks.

  • Creates all tables in DB2 database
  • Creates all primary keys and indexes
  • Loads data using the DB2 LOAD utility. This utility will load the data and also generate statistics for the data in your tables.
  • After loading the data, creates the foreign key constraints.
  • Performs the row counts for all DB2 tables so that you can verify the completeness of data movement
  • Reports the availability status of the tables after the LOAD

If you have clean data, the migration should go smoothly. When you create foreign key constraints, check the log file for the errors and fix it in the source database before attempting a new migration

You have additional scripts that you might need for following possible purposes.

  • To drop all tables in DB2
  • To drop all foreign key constraints. It's a good idea to drop all foreign key constraints before you drop the tables.
  • If your migration process went smoothly, you might not need to run the runstats command again (it was specified to run with the load) but it is provided in case you need to run it as a separate step.

Enable automatic maintenance on DB2 database

After your successful migration to DB2, it is highly recommended that you enable automatic maintenance for your database. Through the Control Center, you can configure DB2 to automate the management of backups, runstats and reorgs. You will need to specify a maintenance window so that DB2 knows when it can perform these jobs automatically. Again, this is not a job scheduler but you can also schedule maintenance jobs through DB2 Task Center.

Run autoconfigure again

After you have migrated the data, run the configuration advisor so that DB2 is optimally tuned for your work load.


Frequently asked questions

Question: How do I get the source code of this tool?

Answer: First of all, we would like to know about the bugs / issues. We do not mind sharing the code with you if you promise to share with us the enhancements that you make. Please send a note to one of the authors if you would like to have the source.

Question: I am getting the errors when I create the tables.

Answer: You might be missing a table space of required page size. Create a table space of required page size like 8K, 16K or 32K.

Question: I get an error Missing data map for ... and application quits.

Answer: We must have missed adding the data type in the mapping.properties file. Extract mapping.properties file from the JAR file and add the missing data type and rebuild the JAR file. If it is too much of a problem, send a note to one of the authors.

Question: What are the limitations of the tool?

Answer: The tool does not handle column default values, check constraints, stored procedures or functions from MySQL / PostgreSQL.

Question: Does this tool handle sequence object of PostgreSQL?

Answer: The tool converts sequences to identity attribute. However, DB2 does allow use of sequence objects.

Question: Can I run this tool to do the migration from other databases?

Answer: Yes, you can. The tool has been developed as a generic tool of migration from any database that has a universal JDBC driver. We have tested this tool with Oracle database and it works well. But we have not done enough testing. This tool will also work with Microsoft SQL Server and Sybase. For migration from other well known databases to DB2, it is however recommended to use more versatile and free IBM tool Migration Toolkit also known as MTK. Use resources to download this tool.


Conclusion

While we were developing this tool, our focus was to develop a very small footprint program for best performance and by using a multi-threaded approach we hoped to speed up the migration. This is not a fool-proof program and if you get errors or issues, please let us know. While we may not consider enhancement requests, we will be very receptive to fix the bugs.

There is no guarantee that this program will meet all your requirements, so please read the official disclaimer carefully before you begin.

It is our goal for you to start considering the use of this powerful database which until recently was largely found within the domain of large corporations. Start using DB2 Express-C and leverage the advantages of a world-class database to meet your own database needs.


Disclaimer

This article contains sample code. IBM grants you ("Licensee") a non-exclusive, royalty free, license to use this sample code. However, the sample code is provided as-is and without any warranties, whether EXPRESS OR IMPLIED, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT. IBM AND ITS LICENSORS SHALL NOT BE LIABLE FOR ANY DAMAGES SUFFERED BY LICENSEE THAT RESULT FROM YOUR USE OF THE SOFTWARE. IN NO EVENT WILL IBM OR ITS LICENSORS BE LIABLE FOR ANY LOST REVENUE, PROFIT OR DATA, OR FOR DIRECT, INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL OR PUNITIVE DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF THE USE OF OR INABILITY TO USE SOFTWARE, EVEN IF IBM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.


Download

DescriptionNameSize
Migration tool - MySQL/PostgreSQL databases to DB2GenerateExtract.zip27KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=128869
ArticleTitle=Migrate from MySQL or PostgreSQL to DB2 Express-C
publish-date=06152006