Data archiving with Informix Dynamic Server table-level restore

A practical guide for independent data archiving

If you need to archive your data for a long period of time, you must consider all the possible archiving techniques, and weigh their pros and cons. This article discusses the most common options for archiving your IBM® Informix® Dynamic Server (IDS) data, and illustrates techniques to retrieve your data from a previously taken backup in order to store it in an independent format, preserved against future changes in your environment.

Volker Fraenkle (volker.fraenkle@de.ibm.com), IT Specialist for Informix Dynamic Server and DB2 for Linux, UNIX, and Windows, IBM

Photo: Volker FraenkleVolker Fraenkle started working with Informix in 1990. Currently, he is responsible for supporting Informix customers within the Software Group Information Management Service of IBM Germany. Volker's areas of special expertise are replication, high availability, backup and restore, and security.



26 April 2007

Database administrator responsibilities

Nowadays, many governments have laws that regulate the period of retention of business data. Common snapshots of business data need to be preserved for several years to ensure access of the data to a specified point in time. In Germany, for example, the time period for archiving business data is 10 years.

To meet these legal requirements, companies usually delegate the responsibility of data archiving to their database administrators (DBAs). Here are some of the most common strategies of IBM IDS DBAs for data archiving:

  • Using the standard backup features:

    The easiest way of archiving data is to use the regularly scheduled backups. DBAs keep their backups on tapes or within a storage manager for the requested period.

    Disadvantages:

    • Keeping regular backups does not guarantee that you can restore the data if necessary. Are you sure that the next generation database servers will still support the format of your current version? For example, you would have a difficult IDS migration path from IDS, Version 7.21 to IDS, Version 10.

      So you need a full copy of the installation base of IDS, too.

    • Are you sure, that the current database server will run on future platforms (hardware, OS)? As an example, look at the supported hardware platforms for IDS, Version 9.21.

      So you need to stock the hardware for that.

  • Using dbexport:

    Basically, using the dbexport utility is the most secure way to keep your data readable over years.

    Disadvantages:

    • dbexport locks the database in exclusive mode. To execute dbexport, no users are allowed to be connected to the database. During the export, users are not able to work with this database.
    • Speed. dbexport is not the fastest utility to unload data. Exporting the data may take minutes to several hours. But even 10 minutes is too long in a 24x7 environment, because of the exclusive lock at the database level. Most installations are 24x7 environments today, which means normal user work during the day, batch processing during the night.

  • Using High Performance Unload (HPL):

    This has the same advantage as the dbexport utility. Additionally HPL allows you to compress your data during the export, which saves disk space. Users are not locked out from the database, they are able to access the tables.

    Disadvantage:

    • In contrast to dbexport, the HPL does not force consistency of the exported data because no exclusive lock is set. So you have to manually prevent the data from being updated while the unload is running.

  • Making your own archiving tool:

    May have the same advantages and disadvantages as dbexport and HPL.

  • Using third party archiving software:

    Third party archiving software uses the same techniques as described above. It may have the same advantages and disadvantages as already described.

    Disadvantage:

    • Third party archiving software is not free (in most cases).

Data archiving through table-level restore

With IDS, Version 10, IBM introduced a new feature called table-level restore. Table-level restore allows restoring one or more tables, in whole or in part, from any available backup archive (ontape as well as onbar).

External tables and data consistency

Because of technical reasons, the data that is extracted to external tables is not consistent over the whole database, if there was any open transaction during the backup. This circumstance and how to avoid it, is discussed later in this article.

An often unknown feature is the option to restore the data into external tables. The only restriction is that a restore to external tables is only possible from a Level-0-Archive. All you need to archive your data in an independent format is the dbschema utility and the archecker utility.

The following chapter describes the steps to extract your data into an independent format using the table-level restore feature of IDS.

In the Download section, you will find the arexport.pl tool, which demonstrates a technique to produce a dbexport compatible format for use with dbimport.

Five steps to archive your data

Figure 1. Process schema of table-level restore
Process schema of table-level restore

Perform a Level-0-Archive

You can use either use ontape or onbar to create your backup archive. With ontape you can also use the STDIO option.

Listing 1. Examples of backup commands
ontape -s -L 0
ontape -s -L 0 -t STDIO | gzip >/tmp/backup.gz
onbar -b -w -L 0
onbar -b -L 0

Create a schema file for each table

You have to create a schema file for each table that you want to extract. Use dbschema with the -ss option for a valid schema file.

Listing 2. Example dbschema-command
dbschema -d foo_db -t foo foo.schema -ss

Remove all definitions except the table definition itself. However, do not remove any fragmentation definition.

Listing 3. Create-table-schema example
create table foo (
  f1 integer,
  f2 char(10),
  primary key (f1)
) in dbspace1 extent size 16 next size 16 lock mode row;

Prepare the external table definition

Edit the previously created schema file and add the external table definition.

Listing 4. External-table-schema example
create external table foo_unl (
  f1 integer,
  f2 char(10),
  primary key (f1)
) using ("./foo.unl", delimited);

Prepare the archecker schema file

  • Put all previously created schema files into a single file.
  • Insert "database foo_db;" before the first line.
  • Add a data manipulation language (DML) command, like "insert into foo_unl select * from foo;", for each table at the end of the file.
  • Finally, add "restore to current with no log;" at the end of the file.
Listing 5. Example for a complete archecker schema file
database foo_db;
   
create table foo1 (
  f1 integer,
  f2 char(10),
  primary key (f1)
) in dbspace1 extent size 16 next size 16 lock mode row;

create external table foo1_unl (
  f1 integer,
  f2 char(10),
  primary key (f1)
) using ("./foo1.unl", delimited);

create table foo2 (
  f1 integer,
  f2 integer,
  primary key (f1)
) in dbspace2 extent size 16 next size 16 lock mode row;

create external table foo2_unl (
  f1 integer,
  f2 integer,
  primary key (f1)
) using ("./foo2.unl", delimited);

insert into foo1_unl select * from foo1;
insert into foo2_unl select * from foo2;

restore to current with no log;

Execute the archecker utility

Execute the archecker utility, depending on your backup command.

Listing 6. Examples of archecker commands
# archive done with ontape to file
archecker -tvsdX -s <archecker.schema.file>       

# archive done with ontape to STDIO
gzip -cd /tmp/backup.gz | archecker -tvsdX -s <archecker.schema.file>

# archive done with onbar
archecker -bvsdX -s <archecker.schema.file>

If your archecker schema file is well formed, archecker extracts all requested tables into the defined files. If you have used ontape to file for the backup, archecker prompts you to put in the first physical tape. Press Return to continue.

Listing 7. Archecker sample output
IBM Informix Dynamic Server Version 10.00.UC5
Program Name:   archecker
Version:        8.0
Released:       2006-05-18 03:24:37
CSDK:           IBM Informix CSDK Version 2.90
ESQL:           IBM Informix-ESQL Version 2.90.UC1
Compiled:       05/18/06 03:25  on Linux 2.4.21.ELsmp #1 SMP Wed Jan 12 23:35:44 EST 2005

AC_STORAGE               /tmp
AC_MSGPATH               /tmp/ac_msg.log
AC_VERBOSE               on
AC_TAPEDEV               /tmp/backup.ids10base
AC_TAPEBLOCK             32 KB
AC_LTAPEDEV              /dev/null
AC_LTAPEBLOCK            32 KB
Dropping old log control tables
Restore only physical image of archive
Extracting table foo_db:foo1 into foo_db:foo1_unl

Please put in Phys Tape 1.
Type <return> or 0 to end:

Tape type:      Archive Backup Tape
OnLine version: IBM Informix Dynamic Server Version 10.00.UC5
Archive date:   Sat Feb 17 02:59:52 2007
Archive level:  0
Tape blocksize:  32768
Tape size:  10240
Tape number in series:  1

Scan PASSED
Control page checks PASSED
Table checks PASSED
Table extraction commands 2
Tables found on archive 2
LOADED: foo_db:foo1 produced 13 rows.
LOADED: foo_db:foo2 produced 99 rows.

External tables and data consistency

As already mentioned, archecker is unable to extract the data in a database in a consistent manner, if any writing transactions happened during the backup. This is a restriction if you are using external tables. The reason behind this restriction is the way Informix processes the physical and logical restore of the data:

Dirty read backup

You can think of the restriction of archecker to achieve logical consistency as some kind of dirty read isolation level for the backup.

  • Physical restore:

    During the physical restore, archecker performs the following tasks:

    • Reads the schema command file

      Retrieves the information about source tables, destination tables, schema, dbspace names, and determines the required archive

    • Scans the archive pages
    • Processes each row and determines if the row is complete or partial

      If the row is a partial row, archecker stages the row for later processing

    • Inserts the row into the destination table

      If a logical restore is requested, two additional columns are added to the table. These columns (rowid and part number of the original table) provide a primary key to identify the physical record that needs to be updated during logical restore.

  • Logical restore:

    If the physical restore is followed by the logical restore, archecker performs the following additional tasks:

    • Retrieves the necessary logical log records

      All necessary logical log information is stored into a temporary table

    • Applies the logical logs

      archecker converts the logical log records from the temporary table into SQL statements and applies it to the table

To ensure logical consistency for external tables, archecker would have to add additional tasks for the handling of external tables. Such additional tasks might result in a lot of overhead, without any advantage in comparison to using internal tables. So archecker simply does not support logical consistency for external tables.

It is up to the database administrator to ensure data consistency. There is a need to prevent the application from modifying the data during the backup. The simplest solution is to take the Informix instance into quiescent mode.

Listing 8. Quiesce instance for backup
$ onmode -ys       # take instance into quiescent mode
$ ontape -s -L 0   # run backup
$ onmode -ym       # take instance into online mode

Taking the instance into quiescent mode is a disadvantage that brings the table-level restore feature close to the dbexport utility. But you should also consider the advantages:

  • Speed

    Taking a backup is much faster than exporting the data to flat files because the data is copied in binary mode as opposed to the dbexport utility, which reads the data on a row-by-row basis. Therefore, the period of time where the database is unavailable is minimized, if you are using the backup utility.

  • Available file system space on the production system

    There might not be enough space on the production system to export the data locally. Therefore, you need to export the data over the network. This consumes additional time.

Table 1. Comparison of using external tables versus dbexport
Internal tablesdbexport
Archiving can be done at any timeYesNo
Data consistency with offline backupYesNo
Data consistency with online backupNoNo
Restore data to a point in timeNoNo
Need temporary space in dbspacesNoNo
Generating external files floods the bufferpoolNoYes
Generating external files is fastYesNo
Extracting needs to be done on production systemNoYes
Extracting can be done on the production systemYesYes

Use internal tables

The use of internal tables has the advantage that there is no need to prevent the application from updating the data during the backup. But there are also disadvantages (such as it needs enough space in dbspaces).

Table 2. Comparison of using internal versus external tables
Internal tablesExternal tables
Archiving can be done at any timeYesYes
Data consistency with offline backupYesYes
Data consistency with online backupYesNo
Restore data to a point in timeYesNo
Need temporary space in dbspacesYesNo
Generating external files is fastNoYes
Extracting needs to be done on the production systemNoNo
Extracting can be done on the production systemYes, but not recommendedYes

Archive with internal tables

  • Perform a Level-0-Archive

    There is no need to take the server into quiescent mode.

  • Create a schema file for each table
  • Prepare the instance

    You need at least one dbspace that meets the size of the largest table in the backup. If you have fragmented tables, you need a dbspace that meets the size of all fragments of the largest fragmented table. There is no need to have the same dbspaces as on the production system.

  • Create a new database

    You need a database to hold the internal table, temporary.

  • Prepare the archecker schema file
  • Execute the archecker utility
  • Unload the internal table to file
  • Drop the internal table
Listing 9. Example for instance preparations
# create a dbspace with 50 GB
onspaces -c -d archive_dbs -p /tmp/archive.000 -o 0 -s 5000000

# create a database
dbaccess - -
create database archive in archive_dbs;
Listing 10. Example of an archecker schema file
database foo_db;
create table foo (
  f1 integer,
  f2 char(10),
  primary key (f1)
) in dbspace1 extent size 16 next size 16 lock mode row;

database archive;
create table foo (
  f1 integer,
  f2 char(10),
  primary key (f1)
) in dbspace1;

insert into archive:foo select * from foo_db:foo;

restore to current;
Listing 11. Example for extracting the data
# run archecker
archecker -tvsdX -s <archecker.schema.file>       

# unload and drop table
dbaccess archive -
unload to foo.unl select * from foo;
drop table foo;

What are the differences in comparison to archiving into external tables?

  • The archive database

    To store the data internally, you need to specify a target database. The target database must exist before retrieving the data.

  • Definition of the target table

    The definition of the target table is likely the same as for a standard table. You can specify a dbspace if necessary. If the dbspace is omitted, the dbspace of the archive database is used.

  • The retrieve statement

    You have to specify the source and the target database for the retrieve statement.

  • The restore command

    Omit the with no log option of the restore to command, or specify an additional timestamp, like restore to '2006-12-31 23:59:59'.

  • The unload command

    You have to manually unload the table.

  • Speed versus space

    It is recommended to execute the archive on a per-table basis, versus retrieving all tables in one cycle with the external table mode. The advantage is that you only need to reserve space for the size of the largest table from the backup. The disadvantage is that archiving of the whole database takes more time. But because you do not need the production system for archiving, time is not a factor that really matters.

Archive to XML

Sometimes, the unload format together with a database schema file is not independent enough, because the data is only usable if you have knowledge about the Informix database. If you have to transfer the data to someone who needs to work with the data, but does not have knowledge about the Informix unload format, you need a more independent format.

Nowadays the most independent data format is XML/XSD. If you prepare your data in XML/XSD, everyone should be able to extract the data, without any knowledge of the format.

Listing 12. Example of an XML/XSD file
<?xml version="1.0" encoding="utf-8"?>
<informix xmlns="http://www.w3.org/2001/XMLSchema" 
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
          xsi:noNamespaceSchemaLocation="test.xsd">
  <table owner="informix" name="foo">
    <row>
      <f1>1</f1>
      <f2>ONE</f2>
    </row>
    <row>
      <f1>2</f1>
      <f2>TWO</f2>
    </row>
  </table>
</informix>

With Informix, Version 10, you can use user-defined routines (UDRs) to unload the tables into XML/XSD. These UDRs are not shipped with the server. You have to manually compile and install them. For details, see "Generating XML from IDS 9.x." (developerWorks, Feb 2003), found in the Resources section.

Another possibility is to use the dbexp2xml tool, which is included in the zip file found in the Download section. You can use the dbexp2xml to generate XML/XSD files from a given dbexport directory.

IDS, Version 11 and XML

Starting with Informix, Version 11, a new set of XML SQL functions is included. You can use these functions to publish SQL results sets in XML. Use the genxmlschema function to publish the table's schema together with its data.

Consider that IDS, Version 11 is still in beta, and therefore the functionality of the genxmlschema command might still change before IDS, Version 11 will be released.

Listing 13. genxmlschema syntax
>>-- genxmlschema(row, element) -->>
Listing 14. SQL command with genxmlschema
select genxmlschema (foo, "foo") from foo;
Listing 15. Result set of genxmlschema
genxmlschema  <?xml version="1.0" encoding="en_US.819" ?>
              xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
              targetNamespace="http://schemas.ibm.com/informix/2006/sqltypes"
              xmlns="http://schemas.ibm.com/informix/2006/sqltypes"
              ElementFormDefault="qualified">
              <xs:element name="foo">
                <xs:complexType>
                  <xs:sequence>
                    <xs:element name="f1"  type="xs:integer"/>
                    <xs:element name="f2"  type="xs:char(10)"/>
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
              </xs:schema>
<foo>
  <row>
     <f1>1</f1>
     <f2>ONE      </f2>
  </row>
  <row>
     <f1>2</f1>
     <f2>TWO      </f2>
  </row>
</foo>

Requirements

Database

The archecker utility tries to connect to the requested database. If the database does not exist, the extraction fails. See the Resources section for details.

STDIO

If you want to restore from a backup taken to STDIO, you have to modify the AC_TAPEDEV setting in the archecker configuration file. See the Resources section for details.

Configuration settings

The archecker utility should work without any changes in the configuration. However, special environments need special settings.

Environment variable $AC_CONFIG

For more information, see the IDS Information Center.

The default configuration file for the archecker utility is $INFORMIXDIR/etc/ac_config.std. If you need different settings to meet your special environment, it is recommended to create your own configuration file and set the $AC_CONFIG environment variable (%AC_CONFIG% for Microsoft Windows®).

Listing 16. AC_CONFIG UNIX example
AC_CONFIG=/home/informix/my_special_ac.cfg; export AC_CONFIG
Listing 17. AC_CONFIG Microsoft Windows example
set AC_CONFIG="C:\Program Files\Informix\etc\my_special_ac.cfg"

AC_STORAGE

For more information, see the IDS Information Center.

Normally, archecker uses about 15 megabytes of temporary space. For very large databases, this may not be enough. If this is the case, set AC_STORAGE to a directory on a larger file system.

AC_TAPEDEV

For more information, see the IDS Information Center.

If you want to extract tables from a backup taken to STDIO, you have to set AC_TAPEDEV to STDIO because archecker does not have an override option as ontape has.

Disclaimer

This article illustrates programming techniques on various operating platforms. The examples within this document have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these examples.


Download

DescriptionNameSize
Sample Perl scripts for this articlesamples.zip25KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=212152
ArticleTitle=Data archiving with Informix Dynamic Server table-level restore
publish-date=04262007