IBM Informix data movement utilities provide an easy way to move databases from one machine to another. Large databases, especially those with a large number of associated stored procedures and triggers, can pose special challenges. This article provides a sample script that you can modify to make the job easier and faster.

Jianing Fan, Software Engineer, Motorola

Jianing Fan is a software engineer at Motorola specializing in relational database management systems. He is an Informix Certified Professional, Oracle Certified Professional, and has over 10 years of database and system experience as a developer, system administrator, and DBA. Jianing can be reached at fan@eis.comm.mot.com.



08 August 2002

© 2002 International Business Machines Corporation. All rights reserved.

Introduction

Dbexport and dbimport are IBM Informix® data movement utilities that move databases from one machine to another. The whole process is pretty simple and straightforward. Dbexport creates a special directory into which it creates a schema file that records the current database structure, and then it unloads data into ASCII files for every database object. You can keep those schema and data files on disk or on tape and then easily transfer them to another machine using UNIX® tar or other utilities. At the same time, dbexport creates a message file in the user's current directory to record any errors and warnings that occurred during the dbexport operation. The name of the message file is dbexport.out.

The dbimport utility uses the schema and data files created by dbexport and creates exactly the same database on other machines. Like dbexport, dbimport also creates a message file, dbimport.out, in the user's current directory to log errors and warnings during the dbimport operation.

You use these utilities for logical backups. By logical backup, I mean the backup of the database structure or schema, and logical database objects such as tables, indexes, views, triggers and stored procedures. A logical backup is quite different from physical backups produced by such utilities as ontape, on-archive, or on-bar, which back up the entire Informix instance and raw disk space.

One big advantage of logical backup is its flexibility. The schema file created by dbexport is an ASCII file and can be edited at anytime before importing the database. There are quite a few areas you can modify:

  • Extent size for tables. You can enlarge your first and second table extent size such that the table will have fewer but larger extents. This can improve your overall performance.
  • Physical location for a table or an index. Based on your studies of your operating system's disk I/O, you may want to move some tables and indexes to another location so as to reduce I/O load.
  • Names for indexes and constraints. You may want to replace automatically generated indexes and constraints names with more meaningful names.
  • Structures of database objects. You may apply a different fragmentation strategy for tables and indexes by editing the schema file, so that when the database gets imported, tables and indexes will be fragmented across different disks to achieve better performance.

In addition, you can change the name of the exported database so that you can keep more than one copy of the same database on the same Informix instance. This is really good for developers and testers who would always like to keep an original database for comparison.

Another advantage of logical backup is its finer granularity. When you restore, you don't need to shut down whole Informix instance; instead, you can restore it on the database level. In this way, you can avoid interfering with other people who are working with other databases.

Because of those advantages, we use dbexport and dbimport utilities frequently when developing, unit testing, box testing, and integration testing.


Problems we ran into

Most of our databases are pretty small, only few hundred megabytes in size. But a few of them are quite large - not only large because of the size of the data, but also large in the number of triggers and stored procedures. One of the databases exceeds two gigabytes and contains more than four hundred stored procedures and almost a thousand triggers.

Because IBM Informix stored procedure language (SPL) is a handy mid-tier API, a lot of developers and programmers like to use it to implement business rules and as a link between front-end applications and the database server. We are expecting that the number of stored procedures and triggers will grow from release to release. We've had no problem in exporting and importing small databases - databases with a few hundreds of MBs, but when we import large databases - databases with large amount of data and a large number of triggers and stored procedures - we had problems with long transactions and long compilation times.

Long transactions

IBM Informix considers the whole process of importing a database as a single transaction. When data is large and triggers and store procedures are many, it is easy to get into a "long transaction" state. A long transaction is defined by Informix as a transaction that starts in one logical log file and is not committed when the database server needs to reuse the same logical log file. When a long transaction hits the long trasaction high water mark, it starts to roll back; when the long transaction exclusive access high water mark, the Informix server is blocked while the long transaction is rolled back. In other words, other database operations on the server such as SELECT, INSERT, DELETE or UPDATE are held until the long transaction is rolled back. The common way to avoid this is either to turn off the database logging or increase logical logs to allow larger transactions to go through.

Long compilation time

When importing large databases, most of the time is spent on compiling stored procedures, especially when the store procedure is long and complicated. The time is extremely long and during the compiling, memory core dumps can occur. This often makes us to drop the half-imported database and restart the whole process of importing from the beginning. This is quite a time-consuming process.


Resolving the problems

To resolve these problems, we wrote a UNIX shell script to facilitate the database import process. This script makes the process of importing large databases much easier and smoother, and, most importantly, it improves the speed significantly; it makes whole process of importing at least four to five times faster. Before we had this script, we needed four to five hours to import large databases; with the script, we cut the time down to one hour and avoid the errors I described above.

Description of our system

To understand the logic of the script, you need to understand a bit about our system configuration. We have two types of systems, the primary and the secondary. On the primary system we have four databases: big_db, small_db1, small_db2 and small_db3. Of all databases, big_db is the largest; it is about 2 GB of data and has more than 500 stored procedures and over one thousand triggers. This is the database that caused us the most problem when importing and exporting. The other databases are very small, only a few megabytes in size, and have almost no stored procedures or triggers.

On the secondary system, there are only three small databases: small_db1, small_db2 and small_db3.

The script's logic

The script consists of many functions that each contributes to smooth database export and import operations. Figure 1 shows the relationships among those functions:

Figure 1. Export and import operations
Export and import operations

Note: P stands for Primary and S stands for Secondary.

As you can see from the flowchart, the heart of this script is composed of two functions: export_db and import_db. The basic logic here is to divide and conquer. In export_db, we take the advantage of dbexport/dbimport utilities, which allow us to edit the exported schema. After we export big_db, we split its exported schema into three parts as follows:

  • Tables - only includes all tables structures and data
  • Procedures - includes all stored procedures
  • Rest - includes all views, indexes, constraints and triggers

The import_db function uses different methods to process each of them:

  • For tables, we use the dbimport utility to create the database and then import tables with data only. This part goes very fast, because there are no indexes and constraints for the tables.
  • The second part, stored procedures, used to be the bottleneck, because the dbimport utility compiles stored procedures line by line. Because the database has already created by dbimport, we can now use the UNIX pipe utility for this part. This improves the speed significantly because the UNIX pipe utility can do the batch processing for compiling stored procedures.
  • We use the same method for the third part to create all views, indexes, constraints and triggers.

I'll explore this in more detail later.

Understanding the script in detail

Let's have a close look at the script in Listing 1 and see how this divide-and-conquer logic gets implemented. Remember, this code is presented to you "as is" without warranty of any kind.

Listing 1. stop_start and rename_db functions
#!/bin/ksh 
 
. /usr/informix/.profile 
 
DBACCESS=${INFORMIXDIR}/bin/dbaccess 
DBEXPORT=${INFORMIXDIR}/bin/dbexport 
DBIMPORT=${INFORMIXDIR}/bin/dbimport 
EXPORT_PATH="/usr/gsm/db" 
STAT_DB=stat_db 
WEN_DB=wen_db 
CONFIG_BIN=/SYS_SETUP/SYSCONFIG 
LOGFILE="/usr/gsm/logs/cutover/db_exim.$$"  
 
####################### 
# stop_start_informix # 
####################### 
stop_start_informix() { 
 
   echo "" 
   echo "Stopping/Starting Informix." 
   echo "This may take up to 200 seconds. Please Wait..." 
   /usr/bin/su - informix -c "onmode -ky"     >/dev/null 2>&1 
   #Wait for 50 seconds then start the informix engine backup 
   sleep 50 
   /usr/bin/su - informix -c "oninit; exit 0" >/dev/null 2>&1 
   rc=1 
   while [ $rc -ne 5 ] 
   do 
      echo "Informix oninit return code is $rc"  
      echo "Waiting for 50 seconds..." 
      sleep 50  
      echo "Checking if the Informix Engine is Online..." 
      /usr/bin/su - informix -c "onstat -" >/dev/null 2>&1 
      rc=$? 
      # 
      # onstat return codes: 
      #    0 = Initialization 
      #    1 = Quiescent 
      #    2 = Fast Recovery 
      #    3 = Archive Backup 
      #    4 = Shutting Down 
      #    5 = Online 
      #    6 = System Aborting 
      # 
   done 
   echo "The Informix Engine is Online." 
   echo "" 
 
 
} 
 
 
############# 
# rename_db # 
############# 
rename_db() { 
 
   rc=1 
   while [ $rc -ne 0 ] 
   do 
      echo "Trying to rename the database $1 to $2" 
      echo "rename database $1 to $2" | $DBACCESS  
      rc=$? 
      if [ $rc -ne 0 ] 
      then 
         echo "WARNING: Failed to rename the database $1 to $2." 
         echo "Retrying. Please wait..." 
         # 
         # bounce the Informix engine 
         # 
         stop_start_informix 
 
      fi 
   done 
   # 
   # bounce the Informix engine 
   # 
   stop_start_informix 
 
}

The purpose of the two functions in Listing 1 is to ensure that all database connections are killed before exporting the database. When you export databases, you need exclusive access to the database; otherwise you will get an error from Informix and won't be able to export. The logic here is to bounce Informix to get rid of all database connections and then when the Informix engine comes back, immediately rename the database so that other application programs cannot access the database we are going to export. Then we can export the database without any interference.

Listing 2 shows the most important function of the script, the split_schema function.

Listing 2. split_schema function
########################### 
# split schema            # 
########################### 
split_schema() { 
 
#define file seperator 
file_seperator0="^CREATE PROCEDURE" 
file_seperator1="^{ TABLE" 
file_seperator2="^create view" 
 
TABLE_SQL=table.sql 
PROC_SQL=proc.sql 
REST_SQL=rest.sql 
 
#copy old schema file 
 
echo "Copying $EXPORTED_SQL ..." 
 
cp ${EXPORTED_SQL} ${EXPORTRED_SQL}.orig 
 
if [ "$?" -ne 0 ] 
   then 
     echo "error with creating files.please check directory permission" 
     exit 1 
fi 
 
echo "Parsing ${EXPORTED_SQL}..." 
 
#parse into three parts 
 
position0=$(( $(cat ${EXPORTED_SQL} | grep -n -i "$file_seperator0"  | head -1 |cut -f1 -d:) - 1 )) 
 
position1=$(( $(cat ${EXPORTED_SQL} | grep -n -i "$file_seperator1"  | head -1 |cut -f1 -d:) - 1 )) 
 
position2=$(( $(cat ${EXPORTED_SQL} | grep -n -i "$file_seperator2" | head -1 |cut -f1 -d:) - 1 )) 
 
if [ $position0 -lt 0 ] 
   then  
     echo "First seperator $file_seperator0 is not found in file. " 
     exit 1 
elif [ $position1 -lt 0 ] 
   then  
     echo "Second seperator $file_seperator1 is not found in file." 
     exit 1 
elif [ $position2 -lt 0 ] 
   then 
     echo "Second seperator $file_seperator2 is not found in file." 
     exit 1 
fi 
 
echo "$position0 $position1 $position2" 
 
diff_12=`expr $position2 - $position1` 
 
 
head -${position0} ${EXPORTED_SQL}> ${EXPORTED_DIR}/${TABLE_SQL} 
 
head -${position1} ${EXPORTED_SQL} | tail +${position0}  > ${EXPORTED_DIR}/${PROC_SQL} 
 
tail +${position1} ${EXPORTED_SQL} | head -${diff_12}  >>${EXPORTED_DIR}/${TABLE_SQL} 
 
tail +${position2} ${EXPORTED_SQL} > ${EXPORTED_DIR}/${REST_SQL} 
 
mv ${EXPORTED_DIR}/${TABLE_SQL} ${EXPORTED_SQL} 
 
}

This is the heart of the script, and the logic is explained in detail in The script's logic. This function is called right after big_db gets exported in export_db function, as shown in Listing 3.

Listing 3. export_db function
############# 
# export_db # 
############# 
export_db() { 
 
   date 
 
   DB=$1 
   lockedDB=$2 
   EXPORTED_SQL="$EXPORT_PATH/$lockedDB.exp/$lockedDB.sql" 
   EXPORTED_DIR="$EXPORT_PATH/$lockedDB.exp" 
 
   if [ -d "$EXPORT_PATH/$lockedDB.exp" ]; then 
      echo "Moving $EXPORT_PATH/$lockedDB.exp to another directory." 
      mv $EXPORT_PATH/$lockedDB.exp $EXPORT_PATH/$lockedDB.exp.old 
   fi 
 
   if [ "$DB" = "small_db2" ] 
   then 
      su - informix -c "echo 'grant dba to root' | $DBACCESS $DB; exit 0" 
   fi 
 
   rename_db $DB $lockedDB 
 
   echo "Exporting the $lockedDB database. Please Wait..." 
    
   $DBEXPORT -q $lockedDB -o $EXPORT_PATH -ss 
    
   ok=`grep "dbexport comleted" dbexport.out` 
 
   if [ "$ok" = "dbexport completed" ] 
   then 
      echo "dbexport completed with no error." 
   else  
      echo "Check dbexport.out file; your export has problems!" 
      tail dbexport.out  
      exit 1; 
   fi  
 
   rename_db $lockedDB $DB 
 
   if [ "$DB" = "big_db" ] 
   then 
      split_schema 
   fi  
 
}

As we mentioned before, export_db is one of the major functions of the script, so let's explore it a little more.

First, the function calls the rename_db function. The purpose is to obtain exclusive access to the database to be exported, so that we can export it without any interference from other programs. Then the function exports the database. Note that it uses quiet mode when exporting the database. Because quiet mode does not show the errors and warnings during export on the screen, it is very important to check dbexport.out after the database gets exported; otherwise, you may miss errors and warnings. Right after export, the functions calls rename_db function again to rename the database we just exported so that other programs can access it. The function then checks the database name parameter or argument to see if it is big_db. If it is big_db, it splits the exported schema into three parts. For the small databases, we don't split their exported schemas.

Listing 4. Function to logging on or off
################### 
# turn on logging # 
################### 
turnonlogging() { 
 
infxcon="/usr/informix/etc/onconfig" 
 
cp ${infxcon} ${infxcon}.ori 
 
cat $infxcon.ori |sed 's/^TAPEDEV.*$/TAPEDEV  \/dev\/null/' > $infxcon 
 
/usr/bin/su informix -c "ontape -s -U $1;exit 0" 
 
cp ${infxcon}.ori ${infxcon} 
 
 
}

The purpose of the function in Listing 4 is to turn off logging while importing the large database, so that the dbimport utility will use a minimum number of logical logs, thus preventing importing operation getting into "long transaction" state and rolling back. We will then turn logging on after the import.

Listing 5 shows how this function is getting called.

Listing 5. import_db
############# 
# import_db # 
############# 
import_db() { 
 
   date 
 
   DB=$1 
   lockedDB=$2 
   DBSPACE=$3 
   EXPORTED_SQL="$EXPORT_PATH/$lockedDB.exp/$lockedDB.sql" 
   EXPORTED_DIR="$EXPORT_PATH/$lockedDB.exp" 
 
   echo "Importing the $lockedDB database. Please Wait..." 
 
   if [ "$DB" = "big_db" ] 
   then 
       $DBIMPORT -q $lockedDB -d $DBSPACE -i $EXPORT_PATH  >$TEMP3 2>&1 
       /usr/bin/su - informix -c "cat ${EXPORTED_DIR}/proc.sql | $DBACCESS $DB; exit 0" 
       /usr/bin/su - informix -c "cat ${EXPORTED_DIR}/rest.sql | $DBACCESS $DB; exit 0"  
       turnonlogging $DB 
   else 
            $DBIMPORT -q $lockedDB -d $DBSPACE -l -i $EXPORT_PATH  >$TEMP3 2>&1 
   fi 
    
   ok=`grep "dbimport completed" dbimport.out` 
 
   if [ "$ok" = "dbimport completed" ] 
   then 
      echo "dbimport completed with no errors." 
   else 
      echo "Check dbimport.out; problems in dbimport." 
      tail dbimport.out 
      exit 1; 
   fi 
 
   rename_db $lockedDB $DB 
    
}

Import_db (Listing 5) is another major function of the script. It first checks the database name parameter or argument passed in. If it is big_db, it uses the dbimport utility to process the tables part of exported schema and then uses the UNIX pipe utility for compiling stored procedures, indexes, and triggers. The purpose is to speed up the whole process and, as I mentioned before, by doing it this way, we reduced the time used for import big_db by at least 50-60%. For big_db, we also turn off logging before import and turn on logging after import to avoid "long transaction." We only do this for big_db; for other databases, we believe we have plenty of logical logs and will never run into "long transaction."

The rest of the script is to create a user-friendly and easy-to-use menu, which I don't describe in this article, but you can see it for yourself in the downloadable script.


Conclusion

As you have seen, the heart of the script is the logic to split an exported schema into three parts and to process each of them with different Informix dbimport and UNIX cat utilities. For us, this method is more efficient than just using the Informix dbimport utility to import large databases; it greatly speeds up the whole database import process and is better at using Informix and system resources such as logical logs and memory. Since the script is composed of many functions, it is very flexible and could easily be modified to meet your specific requirements. This script should give you some practical and useful ideas and suggestions on how to import large databases more efficiently.


Download

DescriptionNameSize
Code samplenew.sh  ( HTTP | FTP )14KB

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=13917
ArticleTitle=Optimizing Large Database Imports
publish-date=08082002