The tool described in this article has been replaced with a new IBM Data Movement Tool. Please click this link to get the new tool .
You'll find many differences between DB2, MySQL, and PostgreSQL. We'll start by looking at some of the fundamental architectural ways that these 3 database servers are different or similar.
- Architectural model
- MySQL database server uses a dedicated server thread-based architecture.
- PostgreSQL and DB2 use a dedicated server process model architecture.
- Storage engine
- MySQL database uses pluggable storage engine architecture.
- PostgreSQL and DB2 use a dedicated storage engine.
- Table space model
- MySQL uses table space for InnoDB and NDB storage engines.
- PostgreSQL table spaces can span multiple databases.
- DB2 table spaces are specific to a database.
- Schema support
- PostgreSQL and DB2 have true schema support.
- MySQL does not have true schema support. A schema in MySQL can be thought as a MySQL database.
- Case sensitivity for database object names
- MySQL uses case-sensitive names for database, tables, and columns on Linux. PostgreSQL table names are case-insensitive on both UNIX and Windows and are stored in lower case for versions 8.1.3 and 8.1.0.
- DB2 stores object names in upper case by default on all platforms. However, DB2 object names can be stored in mixed or lower case if placed within double quotes in DDL CREATE statements.
- Array columns
- MySQL and DB2 do not support array columns.
- PostgreSQL does support array columns.
- DB2 performs authentication using various external security schemes such as the operating system, PAM, Kerberos, Active Directory, LDAP, and so on. It also allows third party security modules to be plugged-in.
- PostgreSQL performs authentication in one of many different ways. It can use database users/roles, the operating system, PAM, Kerberos, and so on, based upon settings in its host based configuration (pg_hba.conf) file.
- MySQL implements authentication at the database level and passwords are encrypted.
- Instance architecture
- DB2 instance manages different databases and there could be many DB2 instances on the same machine.
- PostgreSQL concept of instance is similar to database clustering.
- MySQL instance is similar to a DB2 database.
It is true that MySQL and PostgreSQL databases can post feature changes almost weekly, whereas feature implementation and changes in DB2 occur after a very careful planning as there are innumerable dependencies using DB2 products. Please note that we used MySQL 5.1, PostgreSQL 8.0.3, and DB2 V8.2 for comparison purposes at the time of writing, so take that into consideration when you read this article.
Figure 1, Figure 2, and Figure 3 are architectural diagrams of MySQL, PostgreSQL and DB2. We drew the architectural diagrams for MySQL and PostgreSQL to the best of our ability after reading through several documents. If you find a discrepancy, please let us know and we will correct it.
MySQL uses a thread-based architecture, whereas PostgreSQL and DB2 are both based on a process-based architecture. A MySQL instance, as you see in Figure 1, can manage many databases. There is a common system catalog named INFORMATION_SCHEMA shared by all MySQL databases in an instance.
A database in DB2 is a separate entity by itself having its own system catalog, table spaces, buffer pools, and so on. The DB2 instance manages different databases, and they do not share table spaces, logging, system catalogs, or temporary table spaces.
A single instance of MySQL with many databases can be visualized as a single database in DB2 with each MySQL database represented as a schema in DB2. If you have several MySQL instances running on your server and each instance managing several databases, you can take one of the following migration paths:
- Migrate each MySQL instance as a DB2 database under a single DB2 instance.
- Migrate each MySQL instance as a DB2 instance with one DB2 database having a different schema for each MySQL database.
Note: When we refer to process-based architecture, we are referring to UNIX® platforms, as the same model on Windows will be a thread-based architecture. This holds true for both DB2 and PostgreSQL.
You can only access a single database resource within the scope of a DB2 connection whereas MySQL allows you to access multiple database resources within the scope of the same connection.
One of the interesting feature of MySQL is its pluggable storage engines. You can choose either MyISAM, InnoDB, Archive, Federated, Memory, Merge, Cluster, NDB or Custom storage engine. Each storage engine has different characteristics, and one can choose a particular storage engine based upon the specific requirements. For comparison purposes, we find InnoDB to be closest to a relational database.
Figure 1. MySQL architecture and processes overview
A MySQL server process (mysqld) can create a number of threads:
- A global thread (per server process) is responsible for creating and managing each user connection thread.
- A thread is created to handle each new user connection.
- Each connection thread also performs authentication and query execution.
- On Windows, there is a named pipe handler thread that does the same work as the connection thread for named pipe connection requests.
- A signal thread handles alarms and forces timeouts on connections that have been idle too long.
- A thread is allocated to handle shutdown events.
- There are threads for handling synchronization of master and slave servers for replication.
- Threads are used for table flushing, maintenance tasks, and so on.
MySQL uses a data cache, record cache, key cache, table cache, hostname cache, and privilege cache for caching and retrieval of different types of data used by all the threads executing within the server process.
In addition, the MySQL main process (mysqld) has threads to handle database management activities such as backup, restore, concurrency control, and so on.
A PostgreSQL instance, shown in Figure 2, can manage a cluster of databases. Each database
has its own set of system catalogs known as INFORMATION_SCHEMA and pg_catalog.
All databases share
pg_databases as a common system table. Each database is a
separate entity and the collection of databases is called a cluster. A PostgreSQL instance
can be thought as managing a cluster of databases. A server can run more than one such instance
of database clusters.
A PostgreSQL database can be migrated logically to a DB2 database. Both the databases support the schema object type. You can not access other databases from a named connection.
The most significant difference between PostgreSQL and DB2 is related to table spaces. A PostgreSQL table space can span one or more database whereas DB2 table spaces are specific to a database.
Figure 2. PostgreSQL architecture and processes overview
A PostgreSQL session consists of several main processes:
- A postmaster process serves as a supervisory process that spawns other processes and listens for user connections.
- A user process such as psql is used for interactive SQL queries.
- One or more server processes named postgres are spawned by postmaster to handle users' requests for data.
- The server processes communicate with each other through semaphores and shared memory.
Figure 3 shows DB2's architecture. This figure explains how DB2 handles data between disks (files, raw devices, directories, and so on) using buffer pools. DB2 uses a connection concentrator to handle large number of connections. DB2 page cleaners and pre-fetchers do the work asynchronously and separate processes handle redo log activity. Please refer to the Resources for a detailed description of how locking and processes work in DB2.
Figure 3. DB2 architecture and processes overview
A DB2 session consists of several processes:
- db2sysc, the main DB2 system controller or engine process
- Listener processes such as db2tcpcm and db2ipccm that listen for a user's request for connection
- One or more agents that work on behalf of an application. Agents can be either of two types:
- db2agent works on behalf of an application and communicates to other agents using inter-process communication or remote communication protocol.
- db2agntp works to fulfill client requests on the database when intra-parallel is turned on.
- A user's process such as db2 which is used for interactive querying from the command line
- db2bp - a persistent background process for DB2 Command Line Processor (CLP)
- db2disp - an agent dispatcher process to assign connections to available coordinating agents when connection concentrator is enabled
- db2fmcd - a fault monitor coordinating daemon per server
- db2fmd - a fault monitor daemon for each instance
- db2resyn - a resync manager process to handle two-phase commit
- db2dlock - a db2 dead lock detector
- db2loggr - the database log reader
- db2loggw - the database log writer
- db2pclnr - the buffer pool page cleaner
- db2pfchr - the buffer pool pre-fetcher
- db2fmp - to run user code on the server outside db2 address space
- ... and many more
DB2 server processes communicate with each other through memory areas known as Database Manager Memory and Database Shared Memory as shown in Figure 4.
Figure 4. DB2 database manager (instance) and database shared memory architecture
Table 1 compares MySQL, PostgreSQL and DB2 features. This is not an all inclusive list, but a comparison of the most commonly used features.
Table 1. MySQL and DB2 features comparison
|Instance||An instance is started by executing the MySQL command (mysqld). An instance may manage one or more databases. A server can run multiple instances of mysqld. An instance manager can monitor various instances of mysqld.||An instance is started by executing the Postmaster process (pg_ctl). An instance may manage one or more databases known as a cluster. A cluster is an area on the disk that is initialized during installation and consists of a single directory where all the data is stored. The installation creates the first database using initdb.||An instance is a single installation of DB2 that manages one or more databases.
A default instance is created during installation. An instance is started using the
|Database||A named collection of objects and is a separate entity from other databases in the instance. All databases in MySQL in an instance share same system catalog.||A named collection of objects and each database is a separate entity from other databases. Each database has its own system catalog but pg_databases is shared across all databases.||A named collection of objects and is a separate entity from other databases. A database is a separate entity physically and logically with no sharing amongst other databases. A single DB2 instance can manage one or more databases.|
|Data Buffers||This is set by the innodb_buffer_pool_size configuration parameter. This is the size in bytes of the memory buffer InnoDB uses to cache data and indexes of its tables. On a dedicated database server, this may set this to up to 80% of the machine physical memory size.||Shared_buffers cache. There are 64 buffers allocated by default. The default block size is 8K. Buffer cache can be updated by setting the shared_buffers parameter in the postgresql.conf file.||By default, one buffer pool is allocated and additional buffer pools may be added using the CREATE BUFFERPOOL command. The default page size is determined at database creation time and can be 4, 8, 16 or 32K.|
|Database Connection||A client connects to the database using the CONNECT or the USE statement specifying the database name and optionally the userid and password. Users and groups of users are managed within the database using Roles.||A client connects to the database using the connect statement specifying the database name and optionally the userid and password. Users and groups of users are managed within the database using roles.||A client connects to the database using the connect statement specifying the database name and optionally the userid and password. Users and Groups are created outside of the database using OS commands.|
|Authentication||MySQL manages authentication at the database level.||PostgreSQL's authentication depends upon host based configuration.||DB2 uses APIs for authenticating users through various implementations such as Kerberos, LDAP, Active Directory, and PAM at the OS level and has pluggable authentication architecture for third party modules.|
|Encryption||You can specify password at table level to encrypt the data. You can also use AES_ENCRYPT and AES_DECRYPT functions for encryption and decryption of column data. Network encryption can be achieved through SSL connections.||You can use functions from pgcrypto library to encrypt / decrypt columns. Network encryption can be achieved through SSL connections.||You can use DB2 provided encryption and decryption methods to encrypt / decrypt the column data. The network communication between client and server can be encrypted if you choose DATA_ENCRYPT method of authentication at the instance level.|
|Auditing||You can do grep on querylog.||You can use PL/pgSQL triggers on tables to do auditing.||DB2 provides db2audit utility to provide detailed auditing without having a need to implement trigger or log based auditing.|
|Query Explain||Use EXPLAIN command to see the explain plan of a query.||Use EXPLAIN command to see the explain plan of a query.||DB2 provides GUI and command line tools to see explain plan of a query. It can also capture queries from SQL cache and produce the explain plan. You can use tools to see the explain plan of SQL's in all stored procedures.|
|Backup, Recovery and Logging||InnoDB uses write-ahead logging. Supports full online and offline backups and crash and transaction recovery.||Maintains a write-ahead log in a subdirectory of the data directory. Supports full online and offline backups and crash, point-in-time and transaction recovery.||Uses write-ahead logging. Supports full, incremental, delta and tablespace level online / offline backup and recovery. Supports crash, point-in-time, and transaction recovery.|
|JDBC Driver||JDBC drivers can be downloaded from Resources||JDBC drivers can be downloaded from Resources||Supports Type-2 and Type-4 (Universal) driver. JDBC drivers are part of db2 product.|
|Table Types||Depends on the storage engine. For example, NDB storage engine supports partitioned tables and memory engine supports in-memory tables.||Supports temporary, regular and partitioned tables of different types of range and list. Hash partitioning is not supported.||Supports user, temporary, regular and partitioned tables of different types range, hash and multidimensional clusters.|
|Index Types||Depends on the storage engine. MyISAM: BTREE, InnoDB: BTREE||Supports B-tree, Hash, R-tree and Gist indexes.||Supports B-tree and bitmap indexes.|
|Constraints||Supports, primary, foreign, unique, not null constraint. Check constraints are parsed but not enforced.||Supports, primary, foreign, unique, not null and check constraints.||Supports primary, foreign, unique, not null and check constraints.|
|Stored Procedures and User-Defined Functions||Supports both the CREATE PROCEDURE and CREATE FUNCTION statements. Stored procedures can be written in SQL, C++. User-Defined Functions can be written in SQL, C and C++.||At times the term stored procedure is used, however, only a CREATE FUNCTION statement is supported. User-Defined Functions can be written in PL/pgSQL (proprietary procedural language) SQL, and C.||Supports both the CREATE PROCEDURE and CREATE FUNCTION statements. Stored procedures can be written in SQL (SQL PL), C, Java, COBOL, and REXX. User-Defined Functions can be written in SQL (SQL PL), C, Java.|
|Triggers||Supports before row triggers, after row and statement triggers with trigger statements written in procedure language compound statements.||Supports before and after per row and per statement triggers and trigger procedures written in C and all other supported procedural languages..||Supports before row triggers, after row and statement triggers, instead of triggers and triggers with SQL PL compound statements. Stored procedures can be called from triggers.|
|System Configuration File||my.conf||Postgresql.conf||Database Manager Configuration|
|Database Configuration||my.conf||Postgresql.conf||Database Configuration|
|Client Connectivity File||my.conf||pg_hba.conf||System Database Directory|
|XML support||Limited XML support||Limited XML support||Extensive support for accessing XML data. DB2 Viper (V9) is the first hybrid database to store / retrieve XML in its native form.|
|Data Access and Administration Tools||OPTIMIZE TABLE - reclaims unused space and defragment data file|
myisamchk -analyze - updates statistics for use by query optimizer for the MyISAM storage engine
mysql - command line tool
MySQL Administrator - client GUI tool
|Vacuum - reclaims unused space|
Analyze - updates statistics for use by query optimizer
psql - command line tool
pgAdmin - client GUI tool
|Reorg - used to recluster and defragment data|
Runstat - collects statistics used by optimizer
CLP - command line processor
Control Center - client GUI tool
|Concurrency control||Supports table and row level locking. InnoDB Storage Engine supports READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, and SERIALIZABLE. The isolation level is set at the transaction level using the SET TRANSACTION ISOLATION LEVEL statement.||Supports table and row level locking. The ANSI isolation levels supported are Read Committed (default - sees a snapshot of the database at the time the query started) and Serialization (like Repeatable Read - only sees committed results before the transaction started). The isolation level is set at the transaction level using the SET TRANSACTION statement. It is set at the session level using SET SESSION.||Supports table and row level locking and 4 isolation levels, RR (repeatable read), RS (read stability), CS (default - cursor stability) and UR (uncommitted read). Isolation level is set at the session level using SET ISOLATION, set at the SQL statement level using the WITH clause or set at the database level using a Database Configuration parameter.|
So far, we have seen some architectural and feature differences between MySQL, PostgreSQL and DB2 in this section. Let us now explore the data type differences in each of the database servers.
The SQL ANSI standard lays out rules for data types used in relational database systems. However, not every database platform supports each and every data type defined by the standards committees. Furthermore, vendor data types of a given name may be implemented differently from what the standard may specify and even differently from all other database vendors. As a result, although many MySQL, PostgreSQL and DB2 data types are similar in either their name or meaning or both, many differences can be noted as well.
Table 2 lists the most commonly used DB2 data types. We provide the closest match to DB2 for MySQL and PostgreSQL data types in subsequent sections.
As far as DB2 SQL limits (such as length limits on constraint names, data types limits, and so on) are concerned, these limits are being taken away systematically with each new release.
Table 2. DB2 data types
|BIGINT||Stores signed or unsigned integers uses 8 bytes of storage|
|Holds variable-length binary data up to 2 GB in length. Lengths over 1 GB are not logged|
|Holds fixed-length character data up to 254 bytes in length. Uses 'n' bytes of storage|
|CHAR(n) FOR BIT DATA||Holds fixed-length binary values|
|Holds variable-length character data up to 2 GB in length. Lengths over 1 GB are not logged|
|DATE||Holds a calendar date without time of day. Uses 4 bytes of storage|
|Holds a precision (p) of 1 to 31 and a scale (s) of 0 to 31. Uses (p/2) +1 bytes of storage.|
|Holds floating point numbers and uses 8 bytes of storage|
|FLOAT(p)||Holds a precision (p) of 1 to 53. If p <= 24 then a synonym for REAL. If p >= 25 then is a synonym for DOUBLE PRECISION|
|GRAPHIC(n)||Used for National Language Support (NLS) and fixed-length character strings (usually DBCS) up to 127 bytes in length. Uses n*2 bytes of storage for double-byte character sets or n bytes of storage for single-byte character sets.|
|Stores signed or unsigned integers and uses 4 bytes of storage|
|REAL||Holds floating point numbers and uses 4 bytes of storage|
|SMALLINT||Holds signed and unsigned integers and uses 2 bytes of storage|
|TIME||Holds the time of day and uses 3 bytes of storage|
|TIMESTAMP||Stores the date (year, month, day) and time (hour, minute, second) with up to a precision of 6 for microseconds. Uses 10 bytes of storage|
|Holds variable length character data up to 32,672 bytes in length. Uses n+2 bytes of storage|
|VARCHAR(n) FOR BIT DATA||Stores binary data in variable-length. Uses n bytes of storage|
|VARGRAPHIC(n)||Holds variable-length double-byte character data up to 16,336 characters in length. Uses (n*2)+2 bytes of storage|
The definitions and differences between the MySQL and DB2 data types are described in the following tables. Table 3 describes the most commonly used MySQL data types. Table 4 maps the MySQL data type to the closest DB2 equivalent.
MySQL might use a SERIAL alias for as a data type which translates to
BIGINT UNSIGNED NOT NULL
BOOL or BOOLEAN is a synonym for TINYINT(1). The maximum digit for DECIMAL in MySQL is 65 and maximum number of supported decimal is 30. If UNSIGNED is specified for DECIMAL, negative values are not allowed.
The timestamp column does not support microseconds.
Table 3. MySQL data types
|BIT||A fixed-length bit string|
|BOOLEAN||Stores a logical Boolean (true/false/unknown) value of either TRUE, true and 1; FALSE, false and 0.|
|TINYBLOB||Raw binary data used to store binary objects such as graphics up to 255 bytes|
|BLOB||Raw binary data used to store binary objects such as graphics up to 65,535 bytes|
|MEDIUMBLOB||Raw binary data used to store binary objects such as graphics up to 16,777,215 bytes|
|LONGBLOB||Raw binary data used to store binary objects such as graphics up to 4GB|
|Contains a fixed-length character string padded with spaces up to a length of n.|
|DATE||Holds a calendar date (year, month, day) in a 3 byte storage|
|DATETIME||Holds a calendar date and time of day in 8 bytes of storage|
|YEAR||A year in two-digit or four-digit format in 1 byte of storage|
|Stores exact numeric values with a precision (p) as high as 65 and a scale (s) of 30 or higher.|
|FLOAT||Stores floating-point numbers to the limits allowed by the hardware. A single-precision floating-point number is accurate to approximately 7 decimal places. UNSIGNED attribute disallows negative values|
|Stores double precision floating-point numbers to the limits allowed by the hardware. A double-precision floating-point number is accurate to approximately 15 decimal places. UNSIGNED attribute disallows negative values|
|TINYINT||Stores signed or unsigned 1-byte integers|
|SMALLINT||Stores signed or unsigned 2-byte integers|
|MEDIUMINT||Stores signed or unsigned 3-byte integers|
|INTEGER||Stores signed or unsigned 4-byte integers|
|BIGINT||Stores signed or unsigned 8-byte integers|
|TINYTEXT||Used to store character string data up to 255 bytes|
|TEXT||Used to store character string data up to 65,535 bytes|
|MEDIUMTEXT||Used to store character string data up to 16,777,215 bytes|
|LONGTEXT||Used to store character string data up to 4GB|
|TIME||Holds the time of day in 3 bytes of storage|
|TIMESTAMP||Stores the date and time in 4 bytes of storage. A TIMESTAMP column is automatically set to the date and time of the most recent operation if you do not give it a valid value.|
|Stores variable-length character strings up to a length of n. Trailing spaces are not stored.|
|ENUM||A string object that can have only one value, chosen from the list of values 'value1', 'value2', ..., NULL|
|SET||A string object that can have zero or more values, each of which must be chosen from the list of values 'value1', 'value2', ...|
|BINARY||Similar to the CHAR type, but stores binary byte strings rather than non-binary character strings.|
|VARBINARY||Similar to the VARCHAR type, but stores binary byte strings rather than non-binary character strings.|
Table 4. Mapping MySQL data types to DB2
|BIT||CHAR(n) FOR BIT DATA||See Resources for UDFs to ease migration for more details|
|BOOLEAN||SMALLINT or CHAR(1)||Use a check constraint to enforce the rule|
|TINYBLOB||VARCHAR(255) FOR BIT DATA||You can either use BLOB(255) or VARCHAR(255) FOR BIT DATA. Using VARCHAR is more efficient in this case.|
|BLOB||BLOB(64K)||Consider using VARCHAR(n) FOR BIT DATA if length is < 32K|
|MEDIUMBLOB||BLOB(16M)||You can use NOT LOGGED to improve the performance.|
|LONGBLOB||BLOB(2G)||The maximum length of BLOB supported is 2GB.|
|The maximum size of 'n' in DB2 is 254.|
|DATETIME||TIMESTAMP||You can use CURRENT TIMEZONE special register to transform the date|
|YEAR||SMALLINT||You can use a CHECK constraint to enforce YEAR rule|
|If p > 31,use DOUBLE instead|
|SMALLINT||SMALLINT||Use check constraint to enforce value < 256|
|MEDIUMINT||INTEGER||Use check constraint on max size, if required|
|TINYTEXT||VARCHAR(255)||Using VARCHAR for size < 32K is more efficient|
|TEXT||CLOB(64K)||DB2 allows you to specify the size parameter for CLOB or BLOB. Use the size that is required for you instead of using TINY, MEDIUM or LONG CLOB|
|LONGTEXT||CLOB(2G)||The max size is 2GB. You can use LOGGED up to 1GB size of BLOB or CLOB. Use NOT LOGGED option for better performance.|
|Use VARCHAR if size is < 32K|
|ENUM||VARCHAR(n)||Use check constraint to enforce the rules.|
|SET||VARCHAR(n)||Use check constraint to enforce the rules.|
|BINARY||CHAR(n) FOR BIT DATA||Use if n < 254 otherwise use VARCHAR(n) FOR BIT DATA|
|VARBINARY||VARCHAR(n) FOR BIT DATA||Use VARCHAR if 'n' < 32K otherwise use BLOB|
The definitions and differences between the DB2 and PostgreSQL data types are described in the following two tables. Table-4 describes the most commonly used PostgreSQL data types. Table-5 maps the PostgreSQL data type to the closest DB2 equivalent.
PostgreSQL uses special network address types such as
inet, cidr, macaddr. These
data types are migrated to the VARCHAR data type in DB2.
PostgreSQL supports geometric data types as well. The tool does not handle geometric data types. Currently, we are assuming that there is not a great need to support the conversion of such data types. If you do use geometric data type, please send us a note and we will provide a fix in the tool.
Bit string data types in PostgreSQL can be handled with some modifications in your application. At this time, this is not supported in the tool. If you have a need for this, please let us know about it.
PostgreSQL also supports multi-dimensional arrays and they are best migrated in DB2 as child tables. However, the tool does not support multi-dimensional arrays at this time.
Table 5. PostgreSQL data types
|Stores an auto-incrementing unique integer value up to 8 bytes of data|
|BIT||A fixed-length bit string|
|A variable-length bit string n bits in length|
|BOOLEAN||Stores a logical Boolean (true/false/unknown) value of either TRUE, t, true, y, yes and 1; FALSE, f, false, n, no and 0.|
|BYTEA||Raw binary data used to store large binary objects such as graphics. Uses 4 bytes plus the length of the binary string for storage|
|Contains a fixed-length character string padded with spaces up to a length of n.|
|DATE||Holds a calendar date (year, month, day) in a 4-byte storage space|
|DATETIME||Holds a calendar date and time of day|
|Stores exact numeric values with a precision (p) and a scale (s) of 0 or higher.|
|Stores floating-point numbers with a precision of 8 or less and 6 decimal places|
|Stores floating-point numbers with a precision of 16 or less and 15 decimal places|
|SMALLINT||Stores signed or unsigned 2-byte integers|
|INTEGER||Stores signed or unsigned 4-byte integers|
|Stores signed or unsigned 8-byte integers|
|Stores an auto-incrementing unique integer value using up to 4 bytes of its storage.|
|TEXT||Stores large, variable-length, character-string data up to 1 GB. PostgreSQL automatically compresses TEXT strings.|
|TIME (WITHOUT TIME ZONE ||
WITH TIME ZONE)
|Holds the time of day and stores either no time zone in 8 bytes of storage or the time zone of the database server using 12 bytes of storage.|
|TIMESTAMP (WITHOUT TIME ZONE ||
WITH TIME ZONE)
|Stores the date and time and stores either no time zone or the time zone of the database server and uses 8 bytes of storage.|
|Stores variable-length character strings up to a length of n. Trailing spaces are not stored.|
Table 6. Mapping PostgreSQL data types to DB2
|BIGINT||Use IDENTITY Attribute to simulate auto-increment feature|
|BIT||CHAR(n) FOR BIT DATA||For length up to 254 bytes|
|VARCHAR(n) FOR BIT DATA||Use if up to 32,672 bytes|
|BYTEA||BLOB||Can be used if between 32K and 2GB bytes|
|BOOLEAN||No Boolean type||Use CHAR(1) or SMALLINT|
|CHAR(n)||Up to 254 bytes|
|DATE||DATE||Use the CURRENT TIMEZONE special register to transform the date|
|DATETIME||TIMESTAMP||Use the CURRENT TIMEZONE special register to transform the date|
|DECIMAL(p,s)||For precision > 31, Use DOUBLE|
|REAL||Can use either NUMERIC or FLOAT|
|DOUBLE PRECISION||Use DOUBLE PRECISION for large number or use NUMERIC if precision < 31|
|VARCHAR(n)||If 'n' is 32K or less. DB2 requires you to specify 'n' while postgres does not enforce a value for 'n'|
|INTEGER||Use with IDENTITY attribute|
|Use VARCHAR if length is less than 32K bytes otherwise use BLOB if size is more than 32K bytes|
|TIME (WITHOUT TIME ZONE | WITH TIME ZONE)||TIME||Without a time zone|
|TIMESTAMP (WITHOUT TIME ZONE | WITH TIME ZONE)||TIMESTAMP||Without a time zone|
You can create a foreign key constraint in PostgreSQL even if the data type is different in the referenced table. For example, if parent table's unique key has a data type as integer, you can create a foreign key on child table for a column using a data type as char(10). The tool will convert the constraint but it will fail as DB2 does not allow data type to be different.
Now, we have seen the data type differences between MySQL, PostgreSQL and DB2 along with some high level feature differences. Let us now discuss migration approach to DB2 in three easy steps.
Both DB2 Express and Express-C can be installed on Linux or Windows systems running with 32 or 64 bit hardware with up to 2 processors and 4GB of addressable memory. DB2 Express-C can easily be upgraded to DB2 Express, Workgroup and Enterprise Server Editions without modifying the database or applications such as C/C++, Java, .NET, and PHP.
The installation process is largely the same for Linux or Windows. Installation of DB2 Express involves performing the following easy steps:
- Log in to your system using a local administrator account (on Windows) or as the root user (on Linux)
- Execute the
setup.exeon Windows and the Setup launch pad appears
setupon Linux to do GUI installation or run
db2installfor command line install.
- If you use
db2install, you will need to create DB2 instances and so forth manually.
A few additional notes:
- Setting up the DB2 instance owner
- The default user id on Windows is
- On Linux, one difference is that besides the instance owner user id you will also be prompted to provide an additional user id for the fenced user. The fenced user id will be used to run external C or Java stored procedures and user-defined functions.
- If the specified user does not exist, it will be created and granted the necessary privileges
- If an existing user id is used, it must already have administrator privileges (Windows).
- During installation a default instance is created
- On Windows it is called DB2.
- On Linux, it is called db2inst1.
By default, the DB2 server is configured to use TCPIP on port 50000. This setting can be optionally changed using the protocols button.
After the installation First Steps will be launched and can be used to help you create your first database called SAMPLE
You can download DB2 Express / Express-C from Resources.
If you will be connecting to a PostgreSQL database from a remote machine, enable the remote client connections to the PostgreSQL server as follows:
Modify the pg_hba.conf to allow remote connections.
Find the pg_hba.conf file in the postgres database directory. Add a line in the pg_hba.conf file to allow remote TCPIP connections to PostgreSQL database as follows:
host all all 22.214.171.124 255.0.0.0 trust
The 4th parameter specifies the IP address range and the 5th parameter specifies the subnet mask. In above example, we exposed all ip’s starting with 9 to be able to connect to PostgreSQL database.
Start the Postgres database server from the command line.
$ pg_ctl -D /home/postgres/testdb -o -i -l logfile start
- JDBC driver PostgreSQL
To connect to PostgreSQL, use the JDBC driver to connect to the database. At the time of this writing, we used PostgreSQL 8.0.3 with JDBC driver 8.0.315. You can download the JDBC driver for PostgreSQL from resources. Please note that the tool provided in this paper does not include the PostgreSQL JDBC driver.
- JDBC Driver MySQL
We used MySQL Connector/J 3.1 V 3.1.12 to connect to MySQL database. You can download the JDBC driver for the MySQL database from resources. The tool provided does not include the JDBC driver.
After installing the JDBC driver, modify the CLASSPATH parameter to include the JDBC driver.
- IBM JDK 5.0
The tool has been tested only with Java JDK 5.0. You can use either a Sun or IBM Java JDK 5.0 to run the tool. Download the IBM 5.0 JDK from resources
On UNIX systems, you can either install the tool under the DB2 instance user home directory or some other user who has the necessary permissions to run the DB2 LOAD utility.
After you download the tool from download section, unzip the file in a directory. The tool is provided in IBMExtract.jar file. Update the CLASSPATH variable in your profile to include the tool and other necessary JDBC drivers. For example, the following example shows how to include IBMExtract.jar, PostgreSQL and MySQL JDBC drivers in your CLASSPATH.
export JAVA_HOME=/opt/ibm/java2-i386-50 export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH=$HOME/java/lib/IBMExtract.jar:$CLASSPATH export CLASSPATH=$HOME/java/lib/postgresql-8.0-315.jdbc3:$CLASSPATH export CLASSPATH=$HOME/java/lib/mysql-connector-java-3.1.12-bin.jar:$CLASSPATH
On Windows systems, update the CLASSPATH environment variable through Control Panel -> System -> Advanced -> Environment Variables.
There are two components of the tool. The first component (ibm.GenInput) generates the input file for use by the second component (ibm.GenerateExtract). The input file generated by the first component can be modified by the user to remove the tables that are not needed for the migration. Optionally, you can also edit the input file to specify the tables that might be created in DB2 as a result of custom query.
The script to run the first component is a
geninput shell script on Linux
geninput.cmd on Windows. You will need to specify
correct parameters for your environment to connect to MySQL/PostgreSQL database. You will need
to change DBVENDOR, SERVER, DATABASE, PORT, DBUID and DBPWD parameters in the following script.
The script is shown below:
Listing 1. Script to run on Windows (geninput.cmd)
@echo off cls ECHO Executed by: %USERNAME% Machine: %COMPUTERNAME% On %OS% %DATE% %TIME% ECHO. ECHO ------------------------------------------------------------------- ECHO Program to perform MySQL/PostgreSQL to DB2 Migration ECHO ------------------------------------------------------------------- ECHO. if "%1" == "" ( echo Usage : geninput.cmd dbname goto end ) SET DBVENDOR=postgres SET DB2SCHEMA=%1 SET SERVER=server.ibm.com SET DATABASE=%1 SET PORT=5432 SET DBUID=postgres SET DBPWD=pwd %JAVA_HOME%\bin\java -DINPUT_DIR=.\migr -cp %CLASSPATH% ibm.GenInput %DBVENDOR% %DB2SCHEMA% %SERVER% %DATABASE% %PORT% %DBUID% %DBPWD% :end
Listing 2. Script to run on Linux (geninput)
#!/bin/bash if [ "$1" = "" ] ; then echo Usage : geninput dbname exit 1 fi DBVENDOR=postgres DB2SCHEMA=$1 SERVER=server.ibm.com DATABASE=$1 PORT=5432 DBUID=postgres DBPWD=pwd java -DINPUT_DIR=$PWD -cp $CLASSPATH ibm.GenInput $DBVENDOR $DB2SCHEMA $SERVER $DATABASE $PORT $DBUID $DBPWD
The INPUT directory to store the input files for the tool is specified through VM parameter
as -DINPUT_DIR. In above script, it is specified as current directory.
The program will create an
input directory in current working directory.
Table 7. GenInput parameters
|Java program||ibm.GenInput||This is the main java program|
|DBVENDOR||postgres or mysql||Specify either postgres or mysql|
|DB2SCHEMA||schema_name||Specify DB2 schema in which you can import the tables from source database|
|SERVER||Hostname||Hostname or IP address of the server hosting PostgreSQL or MySQL database|
|DATABASE||dbname||Name of the PostgreSQL or MySQL database|
|PORT||nnn||Port number to connect to. The default port to connect to MySQL/Postgresql is 3306/5432.|
|DBUID||uid||MySQL or PostgreSQL database user id|
|DBPWD||pwd||MySQL or PostgreSQL database password|
The first component of the tool ibm.GenInput generates an input file to be used by second component ibm.GenerateExtract. The structure of the input file is provided below in case you need to modify the query part of the file to create a table based upon a query on source database.
Sample content of the file is below:
ama.ama_msa:SELECT * FROM public.ama_msa ama.ama_mti:SELECT * FROM public.ama_mti ama.ama_pe:SELECT * FROM public.ama_pe ama.ama_pmsa:SELECT * FROM public.ama_pmsa ama.ama_schools:SELECT * FROM public.ama_schools ama.ama_specialties_group:SELECT * FROM public.ama_specialties_group
The script to unload the data and to generate DDL for DB2 is an
shell script on Linux and an
unload.cmd script on Windows. You will need to modify
the following connection parameters for the MySQL/PostgreSQL databases;
DBVENDOR, SERVER, PORT, DBUID and DBPWD. The unload script is shown in the following tables
for Windows and Linux environment.
Listing 3. Script to run on Windows (unload.cmd)
@echo off cls ECHO Executed by: %USERNAME% Machine: %COMPUTERNAME% On %OS% %DATE% %TIME% ECHO. ECHO ------------------------------------------------------------------- ECHO Program to perform MySQL/PostgreSQL to DB2 Migration ECHO ------------------------------------------------------------------- ECHO. if "%1" == "" ( echo Usage : unload.cmd dbname goto end ) SET TABLES=input\%1.tables SET COLSEP=~ SET DBVENDOR=postgres SET NUM_THREADS=5 SET SERVER=server.ibm.com SET DATABASE=%1 SET PORT=5432 SET DBUID=postgres SET DBPWD=pwd SET GENDDL=true SET UNLOAD=true SET FETCHSIZE=100 %JAVA_HOME%\bin\java -DOUTPUT_DIR=output\%1 -cp %CLASSPATH% ibm.GenerateExtract %TABLES% %COLSEP% %DBVENDOR% %NUM_THREADS% %SERVER% %DATABASE% %PORT% %DBUID% %DBPWD% %GENDDL% %UNLOAD% %FETCHSIZE% :end
Listing 4. Script to run on Linux (unload)
#!/bin/bash if [ "$1" = "" ] ; then echo Usage : unload dbname exit 1 fi TABLES=$PWD/input/$1.tables COLSEP=\~ DBVENDOR=postgres NUM_THREADS=5 SERVER=db2lab9.dfw.ibm.com DATABASE=$1 PORT=5432 DBUID=postgres DBPWD=db2mig GENDDL=true UNLOAD=true FETCHSIZE=100 java -DOUTPUT_DIR=$PWD/output/$1 -cp $CLASSPATH ibm.GenerateExtract $TABLES $COLSEP $DBVENDOR $NUM_THREADS $SERVER $DATABASE $PORT $DBUID $DBPWD $GENDDL $UNLOAD $FETCHSIZE
We specify OUTPUT_DIR to the java program through a -D switch to the JVM. In this case,
it is defined as output/$1, which will be the output/database name that you specify through the
Table 8. GenerateExtract parameters
|Java program||Ibm.GenerateExtract||This is the main Java program|
|TABLES||FileName||This is the file that contains the name of the tables and SQL query that will be used to generate DDL for DB2 and to unload the data. This file is generated in the first step.|
|COLSEP||\~||The column separator. In this example, it has been chosen as the tilde character (~). If the tilde character has a special meaning in your Unix environment, you can prefix it with a back slash. On Linux platforms, tilde is used to expand to the home directory name. On Windows platform, you can specify it without preceding it with a back slash.|
|DBVENDOR||postgres||The name of the database vendor. If you are running this tool against the MySQL database, specify mysql; for PostgreSQL, use the postgres value.|
|NUM_THREADS||nn||The number of threads that the java program will run.|
|SERVER||Hostname||The hostname or IP address of the MySQL/PostgreSQL database server. If you are running on a local host, you can specify localhost.|
|DATABASE||dbname||The name of the MySQL/PostgreSQL database that you will migrate to db2.|
|PORT||nnn||The port number used to connect to the MySQL / PostgreSQL database. The default port on MySQL/PostgreSQL is 3306/5432 respectively.|
|DBUID||uid||The user id of the MySQL / PostgreSQL database server.|
|DBPWD||pwd||The password of the MySQL/PostgreSQL user id.|
|GENDDL||true||This value can be either true/false. This instructs the tool to generate DDL for the tables to be unloaded.|
|UNLOAD||true||This value can be either true/false. This instructs the tool to unload the data into the OUTPUT_DIR directory. The OUTPUT_DIR is specified by using the the -D switch for the JVM.|
|FETCHSIZE||1000||This is an important parameter and may be specified by a value of 100 or greater. If you specify this value to be very large, you may run out of memory as the MySQL/PostgreSQL JDBC driver will try to hold large amounts of data in memory. If you have a large amount of memory, you could improve performance by increasing this parameter. If you run into an "out of memory" issue, then decrease this parameter.|
The tool uses a driver properties file to read JDBC driver information for MySQL / PostgreSQL database. The properties file is included in the JAR file.
The tool uses the URL properties file to read JDBC driver URL information for MySQL / PostgreSQL database. This poperties file is included in the JAR file.
Data type mapping between MySQL / PostgreSQL and DB2 is controlled by using a data type mapping properties file. This file is included in the JAR file. If there is a need to change the data type mapping properties between MySQL / PostgreSQL and DB2, you can just modify this file and do not have to modify the program.
Listing 5. Data type mapping properties file
POSTGRES.INT=INTEGER POSTGRES.INT2=SMALLINT POSTGRES.INT4=INTEGER POSTGRES.INT8=INTEGER POSTGRES.SERIAL4=INTEGER POSTGRES.SERIAL8=INTEGER POSTGRES.BOOLEAN=SMALLINT POSTGRES.BYTEA=BLOB POSTGRES.VARCHAR=VARCHAR;VARLENGTH=TRUE;DEFAULT=255 POSTGRES.CHARACTER=CHAR;VARLENGTH=TRUE POSTGRES.BPCHAR=CHAR;VARLENGTH=TRUE POSTGRES.DATE=DATE POSTGRES.FLOAT4=REAL POSTGRES.FLOAT8=DOUBLE PRECISION POSTGRES.INTEGER=INTEGER POSTGRES.NUMERIC=NUMERIC;VARLENGTH=TRUE POSTGRES.TEXT=VARCHAR;VARLENGTH=TRUE;DEFAULT=255;USEACTUALDATA=TRUE POSTGRES.TIME=TIME POSTGRES.TIMESTAMP=TIMESTAMP POSTGRES.OID=INTEGER MYSQL.BOOLEAN=SMALLINT MYSQL.BIT=SMALLINT MYSQL.TINYBLOB=VARCHAR(255) FOR BIT DATA MYSQL.BLOB=BLOB;VARLENGTH=TRUE MYSQL.MEDIUMBLOB=BLOB;VARLENGTH=TRUE MYSQL.LONGBLOB=BLOB;VARLENGTH=TRUE MYSQL.CHAR=CHAR;VARLENGTH=TRUE MYSQL.CHARACTER=CHAR;VARLENGTH=TRUE MYSQL.DATE=DATE MYSQL.DATETIME=TIMESTAMP MYSQL.YEAR=SMALLINT MYSQL.NUMERIC=NUMERIC;VARLENGTH=TRUE MYSQL.DECIMAL=NUMERIC;VARLENGTH=TRUE MYSQL.FLOAT=REAL MYSQL.DOUBLE=DOUBLE MYSQL.REAL=DOUBLE MYSQL.TINYINT=SMALLINT MYSQL.SMALLINT=SMALLINT MYSQL.MEDIUMINT=INT MYSQL.INTEGER=INT MYSQL.BIGINT=BIGINT MYSQL.BIT_UNSIGNED=SMALLINT MYSQL.TINYINT_UNSIGNED=SMALLINT MYSQL.SMALLINT_UNSIGNED=SMALLINT MYSQL.MEDIUMINT_UNSIGNED=INT MYSQL.INTEGER_UNSIGNED=INT MYSQL.BIGINT_UNSIGNED=BIGINT MYSQL.DECIMAL_UNSIGNED=NUMERIC;VARLENGTH=TRUE MYSQL.NUMERIC_UNSIGNED=NUMERIC;VARLENGTH=TRUE MYSQL.TINYTEXT=VARCHAR;VARLENGTH=TRUE;DEFAULT=255 MYSQL.TEXT=VARCHAR;VARLENGTH=TRUE;DEFAULT=65535 MYSQL.MEDIUMTEXT=CLOB(16M) MYSQL.LONGTEXT=CLOB(2G) MYSQL.TIME=TIME MYSQL.TIMESTAMP=TIMESTAMP MYSQL.VARCHAR=VARCHAR;VARLENGTH=TRUE;DEFAULT=255 MYSQL.BINARY=CHAR FOR BIT DATA;VARLENGTH=TRUE MYSQL.VARBINARY=VARCHAR FOR BIT DATA;VARLENGTH=TRUE
To use this tool, there are only two commands that you need to run. The first command is
geninput.cmd on Windows and
geninput on Linux. The second
command is named
unload.cmd on Windows and
unload on Linux. The
example shown here is for Linux but it is the same for both platforms.
First - Generate input file
If you do not specify an argument for the geninput script, a message will be shown stating that a MySQL/PostgreSQL database name is required (shown below). Re-run the script and specify the name of the MySQL/PostgreSQL database; the input file for the given database will be generated.
Listing 6. Running geninput
db2@db2lab9:~/migr> ./geninput Usage : geninput dbname db2@db2lab9:~/migr> ./geninput ama [2006-05-23 09.35.54.563] dbSourceName:postgres [2006-05-23 09.35.54.564] db2SchemaName:ama [2006-05-23 09.35.54.565] server:server.ibm.com [2006-05-23 09.35.54.565] dbName:ama [2006-05-23 09.35.54.565] port:5432 [2006-05-23 09.35.54.565] uid:postgres [2006-05-23 09.35.54.566] INPUT Directory = /home/db2/migr/input [2006-05-23 09.35.54.575] Configuration file loaded: 'driver.properties' [2006-05-23 09.35.54.576] Configuration file loaded: 'url.properties' [2006-05-23 09.35.54.599] Driver org.postgresql.Driver loaded [2006-05-23 09.35.54.960] ama.ama_addresstype:SELECT * FROM public.ama_addresstype ama.ama_country_codes:SELECT * FROM public.ama_country_codes ama.ama_hosp_affil:SELECT * FROM public.ama_hosp_affil ama.ama_msa:SELECT * FROM public.ama_msa ama.ama_mti:SELECT * FROM public.ama_mti ama.ama_pe:SELECT * FROM public.ama_pe ama.ama_physicians:SELECT * FROM public.ama_physicians ama.ama_pmsa:SELECT * FROM public.ama_pmsa ama.ama_schools:SELECT * FROM public.ama_schools ama.ama_specialties_group:SELECT * FROM public.ama_specialties_group ama.ama_top:SELECT * FROM public.ama_top ama.ama_type_of_practice:SELECT * FROM public.ama_type_of_practice ama.calculation:SELECT * FROM public.calculation ama.calculation_group:SELECT * FROM public.calculation_group ama.category:SELECT * FROM public.category ama.code_lookup:SELECT * FROM public.code_lookup ama.physician_calculation:SELECT * FROM public.physician_calculation ama.physician_calculation_group:SELECT * FROM public.physician_calculation_group ama.physician_category:SELECT * FROM public.physician_category ama.possible_answer:SELECT * FROM public.possible_answer ama.question:SELECT * FROM public.question ama.topic:SELECT * FROM public.topic
The input file will be created in the
input directory of your current
working directory. You can modify this file to remove tables that you do not
want to migrate by removing the SQL query which would unload the data.
Second - Generate DDL and unload the data
You are now ready to run the extract program that generates the DDL for DB2 and unloads the data from the MySQL / PostgreSQL database. You will need to specify the name of the database as an argument for this program.
Listing 7. Running unload
db2@db2lab9:~/migr> ./unload ama [2006-05-23 09.40.43.157] TABLES_PROP_FILE:/home/db2/migr/input/ama.tables [2006-05-23 09.40.43.159] DRIVER_PROP_FILE:driver.properties [2006-05-23 09.40.43.160] URL_PROP_FILE:url.properties [2006-05-23 09.40.43.161] DATAMAP_PROP_FILE:datamap.properties [2006-05-23 09.40.43.162] colsep:~ [2006-05-23 09.40.43.162] dbSourceName:postgres [2006-05-23 09.40.43.163] threads:5 [2006-05-23 09.40.43.164] server:server.ibm.com [2006-05-23 09.40.43.165] dbName:ama [2006-05-23 09.40.43.166] port:5432 [2006-05-23 09.40.43.167] uid:postgres [2006-05-23 09.40.43.168] fetchSize:100 [2006-05-23 09.40.43.186] Configuration file loaded: '/home/db2/migr/input/ama.tables' [2006-05-23 09.40.43.188] query size 22 schemaName size = 22 [2006-05-23 09.40.43.387] Configuration file loaded: 'driver.properties' [2006-05-23 09.40.43.389] Configuration file loaded: 'url.properties' [2006-05-23 09.40.43.398] Configuration file loaded: 'datamap.properties' [2006-05-23 09.40.43.414] Driver org.postgresql.Driver loaded [2006-05-23 09.40.43.606] Starting Blades [2006-05-23 09.40.43.607] Starting Blade_1 [2006-05-23 09.40.43.613] Starting Blade_0 [2006-05-23 09.40.43.613] Starting Blade_3 [2006-05-23 09.40.43.615] Starting Blade_2 [2006-05-23 09.40.43.615] Starting Blade_4 [2006-05-23 09.40.43.883] Blade_3 unloaded 21 rows in 269 ms for ama.ama_pe [2006-05-23 09.40.44.218] Blade_4 unloaded 5 rows in 603 ms for ama.ama_addresstype [2006-05-23 09.40.44.273] Blade_3 unloaded 0 rows in 390 ms for ama.possible_answer [2006-05-23 09.40.44.560] Blade_1 unloaded 10 rows in 952 ms for ama.ama_top [2006-05-23 09.40.44.569] Blade_3 unloaded 0 rows in 296 ms for ama.category [2006-05-23 09.40.44.687] Blade_2 unloaded 0 rows in 1072 ms for ama.physician_calculation [2006-05-23 09.40.44.718] Blade_4 unloaded 0 rows in 500 ms for ama.question [2006-05-23 09.40.44.881] Blade_3 unloaded 0 rows in 312 ms for ama.calculation_group [2006-05-23 09.40.44.914] Blade_2 unloaded 384 rows in 227 ms for ama.ama_pmsa [2006-05-23 09.40.44.984] Blade_4 unloaded 493 rows in 266 ms for ama.ama_country_codes [2006-05-23 09.40.45.076] Blade_2 unloaded 13 rows in 162 ms for ama.ama_type_of_practice [2006-05-23 09.40.45.343] Blade_4 unloaded 201 rows in 359 ms for ama.ama_specialties_group [2006-05-23 09.40.45.451] Blade_1 unloaded 7141 rows in 891 ms for ama.ama_hosp_affil [2006-05-23 09.40.45.691] Blade_0 unloaded 6102 rows in 2078 ms for ama.ama_mti [2006-05-23 09.40.45.869] Blade_1 unloaded 0 rows in 418 ms for ama.code_lookup [2006-05-23 09.40.46.024] Blade_0 unloaded 0 rows in 333 ms for ama.calculation [2006-05-23 09.40.46.236] Blade_0 unloaded 0 rows in 212 ms for ama.physician_calculation_group [2006-05-23 09.40.46.380] Blade_0 unloaded 0 rows in 144 ms for ama.physician_category [2006-05-23 09.40.46.405] Blade_1 unloaded 1863 rows in 536 ms for ama.ama_schools [2006-05-23 09.40.46.539] Blade_1 unloaded 4 rows in 134 ms for ama.ama_msa [2006-05-23 09.40.46.917] Blade_0 unloaded 0 rows in 537 ms for ama.topic [2006-05-23 09.40.48.931] ama_physicians 10000 rows unloaded in 3.835 sec [2006-05-23 09.40.52.048] ama_physicians 10000 rows unloaded in 3.117 sec ........ [2006-05-23 09.44.21.891] ama_physicians 10000 rows unloaded in 2.152 sec [2006-05-23 09.44.24.200] ama_physicians 10000 rows unloaded in 2.309 sec [2006-05-23 09.44.26.670] Blade_2 unloaded 969995 rows in 221594 ms for ama.ama_physicians [2006-05-23 09.44.26.671] ==== Total time: 223.0 sec [2006-05-23 09.44.26.923] done Blade_0 [2006-05-23 09.44.27.175] done Blade_1 [2006-05-23 09.44.27.427] done Blade_2 [2006-05-23 09.44.27.679] done Blade_3 [2006-05-23 09.44.27.931] done Blade_4
After the tool is successfully run against your database, navigate to the output directory (in our example, output/ama) to view the output which will appear as shown below:
Listing 8. Tool output
db2@db2lab9:~/migr> ls -l output/ama total 76 drwxr-xr-x 2 db2 db2 4096 2006-05-23 06:16 ama_data -rw-r--r-- 1 db2 db2 1212 2006-05-23 09:44 ama_db2checkpending.sql -rw-r--r-- 1 db2 db2 2687 2006-05-23 09:44 ama_db2cons.sql -rw-r--r-- 1 db2 db2 662 2006-05-23 09:44 ama_db2drop.sql -rw-r--r-- 1 db2 db2 1378 2006-05-23 09:44 ama_db2fkdrop.sql -rw-r--r-- 1 db2 db2 3523 2006-05-23 09:44 ama_db2fkeys.sql -rw-r--r-- 1 db2 db2 13190 2006-05-23 09:44 ama_db2load.sql -rw-r--r-- 1 db2 db2 3148 2006-05-23 09:44 ama_db2runstats.sql -rw-r--r-- 1 db2 db2 1143 2006-05-23 09:44 ama_db2.sh -rw-r--r-- 1 db2 db2 1192 2006-05-23 09:44 ama_db2tabcount.sql -rw-r--r-- 1 db2 db2 7099 2006-05-23 09:44 ama_db2tables.sql -rw-r--r-- 1 db2 db2 4336 2006-05-23 09:44 ama_db2tabstatus.sql drwxr-xr-x 2 db2 db2 4096 2006-05-23 06:06 ama_dump drwxr-xr-x 2 db2 db2 4096 2006-05-23 06:06 ama_msg
The explanation of each file generated is given in following table.
Table 9. Output files
|ama_data||Contains all the data files that were unloaded from the source database.|
|ama_dump||Contains the data that was not loaded into DB2. The DB2 LOAD utility will dump the data that was not loaded in DB2.|
|ama_msg||Contains all the messages that were genearated by the DB2 LOAD utility.|
|ama_db2tables.sql||Contains table creation scripts for DB2.|
|ama_db2cons.sql||Contains all constraints and indexes. Please be advised that the check constraints are not reverse engineered by this tool. This file will hold DDL for primary keys, unique constraints and all other indexes.|
|ama_db2fkeys.sql||Contains all foreign keys constraints.|
|ama_db2load.sql||Contains DB2 LOAD utility scripts that will be used to load the data.|
Contains table row counts used for verifying data movement.|
Check the status of the table after the LOAD utility has completed.
Used to drop all foreign keys constraints.
Used to drop all tables in DB2
Used to remove a table from check pending state after loading data.
The shell script to create all the DB2 objects on the Linux platform.|
The shell script to create all the DB2 objects on the Windows platform.
Due to how different databases implement constraints, there a few points to note:
- In MySQL/PostgreSQL, you can create a unique constraint (or index) index on a nullable column however DB2 requires that a column with a unique constraint is defined as NOT NULL.
- In MySQL/PostgreSQL, a foreign key can be created for a primary key even if the primary key is null. DB2 does not allow primary keys with nulls.
- The tool handles these differences by automatically generating the required DDL for DB2 and will define unique or primary key columns as NOT NULL. As a result, some data may require modifications in order to be migrated.
Now, you are ready to run the migration. You have generated all the scripts that you need to create DB2 objects. Let us walk you through some of the best practices that can make your DB2 experience as simple as possible.
Create database in DB2
You can create DB2 database with the command
CREATE DB dbname. When you create
a DB2 database, it creates three table spaces for you - System, Temporary and User. If you use
DB2 Viper, it will create the USER table space as a database managed (DMS) using automatic storage.
The DMS table spaces give you the best performance along with a good sized buffer pool. Your
temporary table space should be created as system managed (SMS).
Run Autoconfigure command for tuning
Table 10. Auto-configure DB2 database
|mem_percent||Percentage of the server memory that you want to dedicate to DB2|
|workload_type||Is the database used for OLTP, Data warehouse, or for mixed purposes? Use Mixed if you are not sure.|
|num_stmts||Number of average SQL statements in a unit of work from your application.|
|tpm||What is the transaction rate per minute for your application?|
|admin_priority||What is the administration priority? Is it the performance or recovery of the database?|
|is_populated||Is there enough data populated in your database? Run this tool again when number of rows in your tables changed significantly since you last ran the tool.|
|num_local_apps||How many batch programs accessing database you run on your database server?|
|num_remote_apps||How many remote applications will be connecting to your database? If you are using an application server of your choice like Tomcat or WebSphere, use total number of connections in your pool.|
|isolation||What is the isolation of your application? Use RR as it will calculate lock memory requirements conservatively.|
|bp_resizable||Have you tuned your buffer pools? If not, let DB2 tune your buffer pools.|
After you created the database, run DB2 AUTOCONFIGURE command as shown in Listing 9 to tune your database upfront. Answer 10 questions and run the command against your database. Your best bet is to run this command from Control Center as it is very intuitive to run it through GUI.
Run autoconfigure command against your database or use DB2 Control Center to run it interactively through the GUI.
Listing 9. Autoconfigure
$ db2 connect to yourdbname $ db2 autoconfigure using mem_percent 85 workload_type simple num_stmts 20 tpm 3000 admin_priority performance is_populated yes num_local_apps 0 num_remote_apps 500 isolation cs bp_resizeable yes apply db and dbm; $ db2 connect reset
Create table spaces and buffer pools before running the migration scripts
One of the outputs from the tool is the approximate maximum size of the row in each table. The maximum row size will determine if you need to use a 4K, 8K, 16K or 32K page size for your table space. After going through the output of this file, decide how many table spaces you need for each size. Use the DB2 Control Center to create these table spaces (with automatic storage) before you create the objects. If you have at least one table space for each page size, all of your table creation scripts will run as they should without the need to specify which table space they should be created in. If you want to make life even easier for yourself, create one 32K page size DMS table space and one 4K size DMS table space before running the table creation script.
If you have been using LOBS in MySQL/PostgreSQL database, you will need to also create LARGE table space. Again do it through the Control Center as it is easily done that way.
Run migration scripts in your DB2 database
After performing the above steps, run script <dbname>_db2.sh on Linux or <dbname>_db2.cmd on Windows to create all the objects and to load data into DB2. A sample script is shown below;
Listing 10. Creating the objects and loading data
#!/bin/bash if [ "$1" = "" ] ; then echo To run this script, specify name of the db2 database echo For example, \"./ama_db2 sample\" echo where sample is the name of the db2 database echo exit 1 fi OUTPUT=ama_db2.log echo Executing Script ama_db2.sh > $OUTPUT echo Connecting to $1 db2 connect to $1 >> $OUTPUT echo Running ama_db2tables.sql script to create all tables db2 -tvf ama_db2tables.sql >> $OUTPUT echo Running ama_db2cons.sql script to create primary keys and indexes db2 -tvf ama_db2cons.sql >> $OUTPUT echo Running ama_db2load.sql script to create to load the data db2 -tvf ama_db2load.sql >> $OUTPUT echo Running ama_db2fkeys.sql script to create all foreign keys db2 -tvf ama_db2fkeys.sql >> $OUTPUT echo Running ama_db2tabcount.sql script to count rows from all tables db2 -tvf ama_db2tabcount.sql >> $OUTPUT echo Running ama_db2tabstatus.sql script to show status of tables after load db2 -tvf ama_db2tabstatus.sql >> $OUTPUT db2 connect reset >> $OUTPUT echo echo Check the log file $OUTFILE for any errors or issues echo
The migration script performs following tasks.
- Creates all tables in DB2 database
- Creates all primary keys and indexes
- Loads data using the DB2 LOAD utility. This utility will load the data and also generate statistics for the data in your tables.
- After loading the data, creates the foreign key constraints.
- Performs the row counts for all DB2 tables so that you can verify the completeness of data movement
- Reports the availability status of the tables after the LOAD
If you have clean data, the migration should go smoothly. When you create foreign key constraints, check the log file for the errors and fix it in the source database before attempting a new migration
You have additional scripts that you might need for following possible purposes.
- To drop all tables in DB2
- To drop all foreign key constraints. It's a good idea to drop all foreign key constraints before you drop the tables.
- If your migration process went smoothly, you might not need to run the runstats command again (it was specified to run with the load) but it is provided in case you need to run it as a separate step.
Enable automatic maintenance on DB2 database
After your successful migration to DB2, it is highly recommended that you enable automatic maintenance for your database. Through the Control Center, you can configure DB2 to automate the management of backups, runstats and reorgs. You will need to specify a maintenance window so that DB2 knows when it can perform these jobs automatically. Again, this is not a job scheduler but you can also schedule maintenance jobs through DB2 Task Center.
Run autoconfigure again
After you have migrated the data, run the configuration advisor so that DB2 is optimally tuned for your work load.
Question: How do I get the source code of this tool?
Answer: First of all, we would like to know about the bugs / issues. We do not mind sharing the code with you if you promise to share with us the enhancements that you make. Please send a note to one of the authors if you would like to have the source.
Question: I am getting the errors when I create the tables.
Answer: You might be missing a table space of required page size. Create a table space of required page size like 8K, 16K or 32K.
Question: I get an error
Missing data map for ... and application quits.
Answer: We must have missed adding the data type in the mapping.properties file. Extract mapping.properties file from the JAR file and add the missing data type and rebuild the JAR file. If it is too much of a problem, send a note to one of the authors.
Question: What are the limitations of the tool?
Answer: The tool does not handle column default values, check constraints, stored procedures or functions from MySQL / PostgreSQL.
Question: Does this tool handle sequence object of PostgreSQL?
Answer: The tool converts sequences to identity attribute. However, DB2 does allow use of sequence objects.
Question: Can I run this tool to do the migration from other databases?
Answer: Yes, you can. The tool has been developed as a generic tool of migration from any database that has a universal JDBC driver. We have tested this tool with Oracle database and it works well. But we have not done enough testing. This tool will also work with Microsoft SQL Server and Sybase. For migration from other well known databases to DB2, it is however recommended to use more versatile and free IBM tool Migration Toolkit also known as MTK. Use resources to download this tool.
While we were developing this tool, our focus was to develop a very small footprint program for best performance and by using a multi-threaded approach we hoped to speed up the migration. This is not a fool-proof program and if you get errors or issues, please let us know. While we may not consider enhancement requests, we will be very receptive to fix the bugs.
There is no guarantee that this program will meet all your requirements, so please read the official disclaimer carefully before you begin.
It is our goal for you to start considering the use of this powerful database which until recently was largely found within the domain of large corporations. Start using DB2 Express-C and leverage the advantages of a world-class database to meet your own database needs.
This article contains sample code. IBM grants you ("Licensee") a non-exclusive, royalty free, license to use this sample code. However, the sample code is provided as-is and without any warranties, whether EXPRESS OR IMPLIED, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT. IBM AND ITS LICENSORS SHALL NOT BE LIABLE FOR ANY DAMAGES SUFFERED BY LICENSEE THAT RESULT FROM YOUR USE OF THE SOFTWARE. IN NO EVENT WILL IBM OR ITS LICENSORS BE LIABLE FOR ANY LOST REVENUE, PROFIT OR DATA, OR FOR DIRECT, INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL OR PUNITIVE DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF THE USE OF OR INABILITY TO USE SOFTWARE, EVEN IF IBM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
|Migration tool - MySQL/PostgreSQL databases to DB2||GenerateExtract.zip||27KB||HTTP|
- Porting to
DB2 Universal Database
The developerWorks DB2 Universal Database porting site gives you the
information you need to port an application and its data from other database
management systems to DB2 UDB.
- Visit the
developerWorks DB2 zone
to expand your DB2 skills.
- Stay current with
DeveloperWorks technical events and Webcasts
IBM Data Movement Tool
UDFs to ease migration
Developerworks Apr 2005
Leverage PostgreSQL skills to learn DB2 Express
Developerworks Mar 2006
Everything You Wanted to Know About DB2 Universal Database Processes
Developerworks Apr 2003
Leverage MySQL skills to learn DB2 Express
Developerworks Feb 2006
DB2 Express-C, the developer-friendly alternative
Developerworks Feb 2006
Log DB2 UDB stored procedure messages
Developerworks Jan 2006
Getting to know the DB2 UDB command line processor
Developerworks March 2005
The IBM DB2 Universal Database for Linux, UNIX, and Windows Backup Utility
Developerworks Jan 2005
Lock avoidance in DB2 UDB V8
Developerworks Sep 2005
MySQL to DB2 UDB Conversion Guide
Redbook May 2004
Migrating to IBM database servers gets easier with the latest MTK release
Developerworks Mar 2006
Get products and technologies
DB2 Express-C Absolutely Free
Download IBM MTK
Software Evaluation Kits on DVDs
Featured product trials
PostgreSQL JDBC Driver
MySQL JDBC Driver
IBM JDK 5.0
- Participate in
DB2 Express-C Forum
- Participate in
and get involved in the developerWorks community.
Vikram Khatri works for IBM in the Sales and Distribution Division and is a member of DB2 Migration team. Vikram has 21 years of IT experience and specializes in migrating non-DB2 databases to DB2. Vikram supports the DB2 technical sales organization by assisting with complex database migration projects as well as with database performance benchmark testing.
Nora Sokolof works for IBM in the Sales and Distribution Division and is a member of the DB2 Migration team. Nora has been with IBM for almost 20 years. Before joining the migration team, Nora has held roles as a database designer, database administrator and application developer . Nora has assisted hundreds of customers with their migrations to DB2 from Oracle, PeopleSoft, and Informix.