© 2002 International Business Machines Corporation. All rights reserved.
Dbexport and dbimport are IBM Informix® data movement utilities that move databases from one machine to another. The whole process is pretty simple and straightforward. Dbexport creates a special directory into which it creates a schema file that records the current database structure, and then it unloads data into ASCII files for every database object. You can keep those schema and data files on disk or on tape and then easily transfer them to another machine using UNIX® tar or other utilities. At the same time, dbexport creates a message file in the user's current directory to record any errors and warnings that occurred during the dbexport operation. The name of the message file is
The dbimport utility uses the schema and data files created by dbexport and creates exactly the same database on other machines. Like dbexport, dbimport also creates a message file,
dbimport.out, in the user's current directory to log errors and warnings during the dbimport operation.
You use these utilities for logical backups. By logical backup, I mean the backup of the database structure or schema, and logical database objects such as tables, indexes, views, triggers and stored procedures. A logical backup is quite different from physical backups produced by such utilities as ontape, on-archive, or on-bar, which back up the entire Informix instance and raw disk space.
One big advantage of logical backup is its flexibility. The schema file created by dbexport is an ASCII file and can be edited at anytime before importing the database. There are quite a few areas you can modify:
- Extent size for tables. You can enlarge your first and second table extent size such that the table will have fewer but larger extents. This can improve your overall performance.
- Physical location for a table or an index. Based on your studies of your operating system's disk I/O, you may want to move some tables and indexes to another location so as to reduce I/O load.
- Names for indexes and constraints. You may want to replace automatically generated indexes and constraints names with more meaningful names.
- Structures of database objects. You may apply a different fragmentation strategy for tables and indexes by editing the schema file, so that when the database gets imported, tables and indexes will be fragmented across different disks to achieve better performance.
In addition, you can change the name of the exported database so that you can keep more than one copy of the same database on the same Informix instance. This is really good for developers and testers who would always like to keep an original database for comparison.
Another advantage of logical backup is its finer granularity. When you restore, you don't need to shut down whole Informix instance; instead, you can restore it on the database level. In this way, you can avoid interfering with other people who are working with other databases.
Because of those advantages, we use dbexport and dbimport utilities frequently when developing, unit testing, box testing, and integration testing.
Most of our databases are pretty small, only few hundred megabytes in size. But a few of them are quite large - not only large because of the size of the data, but also large in the number of triggers and stored procedures. One of the databases exceeds two gigabytes and contains more than four hundred stored procedures and almost a thousand triggers.
Because IBM Informix stored procedure language (SPL) is a handy mid-tier API, a lot of developers and programmers like to use it to implement business rules and as a link between front-end applications and the database server. We are expecting that the number of stored procedures and triggers will grow from release to release. We've had no problem in exporting and importing small databases - databases with a few hundreds of MBs, but when we import large databases - databases with large amount of data and a large number of triggers and stored procedures - we had problems with long transactions and long compilation times.
IBM Informix considers the whole process of importing a database as a single transaction. When data is large and triggers and store procedures are many, it is easy to get into a "long transaction" state. A long transaction is defined by Informix as a transaction that starts in one logical log file and is not committed when the database server needs to reuse the same logical log file. When a long transaction hits the long trasaction high water mark, it starts to roll back; when the long transaction exclusive access high water mark, the Informix server is blocked while the long transaction is rolled back. In other words, other database operations on the server such as SELECT, INSERT, DELETE or UPDATE are held until the long transaction is rolled back. The common way to avoid this is either to turn off the database logging or increase logical logs to allow larger transactions to go through.
When importing large databases, most of the time is spent on compiling stored procedures, especially when the store procedure is long and complicated. The time is extremely long and during the compiling, memory core dumps can occur. This often makes us to drop the half-imported database and restart the whole process of importing from the beginning. This is quite a time-consuming process.
To resolve these problems, we wrote a UNIX shell script to facilitate the database import process. This script makes the process of importing large databases much easier and smoother, and, most importantly, it improves the speed significantly; it makes whole process of importing at least four to five times faster. Before we had this script, we needed four to five hours to import large databases; with the script, we cut the time down to one hour and avoid the errors I described above.
To understand the logic of the script, you need to understand a bit about our system configuration. We have two types of systems, the primary and the secondary. On the primary system we have four databases: big_db, small_db1, small_db2 and small_db3. Of all databases, big_db is the largest; it is about 2 GB of data and has more than 500 stored procedures and over one thousand triggers. This is the database that caused us the most problem when importing and exporting. The other databases are very small, only a few megabytes in size, and have almost no stored procedures or triggers.
On the secondary system, there are only three small databases: small_db1, small_db2 and small_db3.
The script consists of many functions that each contributes to smooth database export and import operations. Figure 1 shows the relationships among those functions:
Figure 1. Export and import operations
Note: P stands for Primary and S stands for Secondary.
As you can see from the flowchart, the heart of this script is composed of two functions: export_db and import_db. The basic logic here is to divide and conquer. In export_db, we take the advantage of dbexport/dbimport utilities, which allow us to edit the exported schema. After we export big_db, we split its exported schema into three parts as follows:
- Tables - only includes all tables structures and data
- Procedures - includes all stored procedures
- Rest - includes all views, indexes, constraints and triggers
The import_db function uses different methods to process each of them:
- For tables, we use the dbimport utility to create the database and then import tables with data only. This part goes very fast, because there are no indexes and constraints for the tables.
- The second part, stored procedures, used to be the bottleneck, because the dbimport utility compiles stored procedures line by line. Because the database has already created by dbimport, we can now use the UNIX pipe utility for this part. This improves the speed significantly because the UNIX pipe utility can do the batch processing for compiling stored procedures.
- We use the same method for the third part to create all views, indexes, constraints and triggers.
I'll explore this in more detail later.
Let's have a close look at the script in Listing 1 and see how this divide-and-conquer logic gets implemented. Remember, this code is presented to you "as is" without warranty of any kind.
The purpose of the two functions in Listing 1 is to ensure that all database connections are killed before exporting the database. When you export databases, you need exclusive access to the database; otherwise you will get an error from Informix and won't be able to export. The logic here is to bounce Informix to get rid of all database connections and then when the Informix engine comes back, immediately rename the database so that other application programs cannot access the database we are going to export. Then we can export the database without any interference.
Listing 2 shows the most important function of the script, the split_schema function.
As we mentioned before, export_db is one of the major functions of the script, so let's explore it a little more.
First, the function calls the rename_db function. The purpose is to obtain exclusive access to the database to be exported, so that we can export it without any interference from other programs. Then the function exports the database. Note that it uses quiet mode when exporting the database. Because quiet mode does not show the errors and warnings during export on the screen, it is very important to check dbexport.out after the database gets exported; otherwise, you may miss errors and warnings. Right after export, the functions calls rename_db function again to rename the database we just exported so that other programs can access it. The function then checks the database name parameter or argument to see if it is big_db. If it is big_db, it splits the exported schema into three parts. For the small databases, we don't split their exported schemas.
The purpose of the function in Listing 4 is to turn off logging while importing the large database, so that the dbimport utility will use a minimum number of logical logs, thus preventing importing operation getting into "long transaction" state and rolling back. We will then turn logging on after the import.
Listing 5 shows how this function is getting called.
Import_db (Listing 5) is another major function of the script. It first checks the database name parameter or argument passed in. If it is big_db, it uses the dbimport utility to process the tables part of exported schema and then uses the UNIX pipe utility for compiling stored procedures, indexes, and triggers. The purpose is to speed up the whole process and, as I mentioned before, by doing it this way, we reduced the time used for import big_db by at least 50-60%. For big_db, we also turn off logging before import and turn on logging after import to avoid "long transaction." We only do this for big_db; for other databases, we believe we have plenty of logical logs and will never run into "long transaction."
The rest of the script is to create a user-friendly and easy-to-use menu, which I don't describe in this article, but you can see it for yourself in the downloadable script.
As you have seen, the heart of the script is the logic to split an exported schema into three parts and to process each of them with different Informix dbimport and UNIX cat utilities. For us, this method is more efficient than just using the Informix dbimport utility to import large databases; it greatly speeds up the whole database import process and is better at using Informix and system resources such as logical logs and memory. Since the script is composed of many functions, it is very flexible and could easily be modified to meet your specific requirements. This script should give you some practical and useful ideas and suggestions on how to import large databases more efficiently.
|new.sh||14KB||FTP | HTTP|
Jianing Fan is a software engineer at Motorola specializing in relational database management systems. He is an Informix Certified Professional, Oracle Certified Professional, and has over 10 years of database and system experience as a developer, system administrator, and DBA. Jianing can be reached at email@example.com.