Informix Dynamic Server data compression and storage optimization

Save storage resources, reduce I/O, and optimize performance with new IDS features

Starting with IBM® Informix® Dynamic Server (IDS) Version 11.50.xC4, you can compress data and optimize storage in IDS databases. The advantages of data compression and storage optimization include significant storage savings, reduced I/O activity, and faster backup and restore. IDS provides full online support for enabling storage optimization and compressing existing table data, while applications continue to use the table. This article provides an overview of IDS data compression and storage optimization functionality and shows you how to perform both tasks.

Share:

Nilesh Ozarkar (nilesho@us.ibm.com), Advisory Software Engineer, IBM

Photo of Nilesh OzarkarNilesh Ozarkar is an Advisory Software Engineer at the IBM Informix Lab in Lenexa, Kansas. Nilesh has been with IBM Informix for approximately nine years. He has worked on several IDS components and co-authored an IBM Redbook publication on data replication.



26 September 2013 (First published 27 April 2009)

Also available in Chinese

How IDS storage optimization works

IDS storage optimization works by considering the entire row and all its columns (except column data that is stored outside the row as a single string of bytes, such as BLOB data). IDS then looks for repeating patterns and stores those patterns as symbols in a compression dictionary, as shown in Figure 1:

Figure 1. Patterns that can be stored as symbols in a dictionary
Representation of an employee database table, showing what data can be compressed and how it is represented in the compression dictionary

After creating the dictionary, IDS stores it in a dictionary repository.

The process of optimizing storage for a table involves four steps:

  1. Create a compression dictionary.
  2. Compress the data in rows in a table or table fragment.
  3. Repack the table or fragment rows.
  4. Reclaim free space.

The following sections provide more detail on each of these steps

Creating a compression dictionary

To create a dictionary, IDS samples a set of rows from an existing table and creates a dictionary of symbols, which represent byte patterns. IDS stores the dictionary in the dictionary repository. Each dbspace that contains a compressed partition has a dictionary repository. The dictionary also has an in-memory representation, so that active queries and updates can quickly compress and uncompress data.

The size of each dictionary is approximately 75KB. However, each dictionary can grow to be as large as 150KB. IDS stores one compression dictionary in memory for each compressed partition. Therefore, it is important for DBAs to take into account these resource requirements.

Compressing the data in table or fragment rows

After creating the dictionary, IDS starts a background process that goes through the table or fragment and compresses the table or fragment rows. The process compresses each row and leaves it in the page where it was compressed. Any new rows that are inserted or updated are also compressed.

This compress operation runs while other transactions and queries are occurring on the table. Therefore, IDS performs the operation in small transactions and holds locks on the rows being actively compressed for only a short duration.

Repacking the rows

Once data has been compressed, a significant amount of unused space or holes may exist between the rows. The repacking operation uses an algorithm similar to the compress operation to move all the rows to the front of the partition. This operation also utilizes small transactions and locks the rows that are actively being moved.

Reclaiming free space

After all the rows have been repacked, the shrink operation removes the unused table or fragment space and returns free space to the dbspace that contains the table or fragment.


Prerequisite for compression and storage optimization tasks

To perform storage optimization tasks, you must be able to connect to the sysadmin database, and you must be a Database Server Administrator (DBSA). By default, only the user named Informix may perform these tasks.


Identify compression candidates

Issuing a command that estimates compression ratios

You can perform storage optimization at the table level or the fragment level for a fragmented table. You can identify compression candidates (in Version 11.50.xC4 and later) by issuing the table or fragment command with the estimate_compression parameter.

The ratios this command displays are data compression estimates based on a random sampling of rows from the table. The actual ratio of space saved can vary slightly.

Listing 1 shows a sample employee table with three fragments. The first fragment is not compressed and has a value of 0.0% in the curr (current compression ratio) column. The curr column for the other two fragments represents the currently achieved compression ratio. The change column for each fragment represents the possible change in the compression ratio if you were to uncompress and compress the table.

Listing 1: Compression estimates
$ dbaccess hrdb –

Database selected.

> select sysadmin:task("table estimate_compression", tabname) 
 from systables where tabid > 100;

(expression)  est   curr  change partnum    table
              ----- ----- ------ ---------- -----------------------------------
              85.9%  0.0%  +85.9 0x00200002 hrdb:informix.employee
              85.9% 85.5%   +0.4 0x00300002 hrdb:informix.employee
              86.1% 86.1%   +0.0 0x00400002 hrdb:informix.employee

              Succeeded: table estimate_compression  hrdb:informix.employee

(expression)  est   curr  change partnum    table
              ----- ----- ------ ---------- -----------------------------------
              22.5%  0.0%  +22.5 0x001001f6 hrdb:informix.audit_log

              Succeeded: table estimate_compression  hrdb:informix.audit_log

2 row(s) retrieved.

Using the stand-alone IDS Compression Estimator on Windows

To identify compression candidates in a prior (pre-11.50.xC4) version of IDS, you can use the stand-alone Windows® utility called IDS Compression Estimator. The Resources section of this article contains a link for downloading the utility. Install the utility on a Windows system along with Client-SDK or I-Connect (required for connecting to an IDS instance). After installing the estimator, you can connect to any IDS instance remotely by providing the necessary connection information.

Figure 2 shows an example of how the utility displays potential storage savings if data is compressed:

Figure 2. IDS Compression Estimator
Screen shot of the IDS Compression Estimator. The results tab is shown. Each database table has columns for the size, the compressed size and the percent saved by compression.

Optimize your storage

Enable compression first

You must enable disk compression before you can compress data. You need to enable compression only once in the lifetime of the database server. However, you do not need to enable compression to estimate how much space compressing your data would save. You also do not need to enable compression if you want to use the repack or shrink operations to free up space from tables without compressing any data.

To enable compression, execute the following SQL administration API statement:

> execute function sysadmin:task("enable compression");

Note that once compression is enabled, you must follow IDS reversion procedures to go back to a prior version of IDS that does not support compression. You would also need to uncompress or drop any compressed tables or fragments before reverting.

Perform compression tasks

The modular design of IDS storage optimization allows you to perform compression and storage optimization tasks both individually and collectively. This means you can perform all operations (such as compressing data, moving rows to the front of the partition, and truncating free space from the end of the partition) with a single SQL statement or with multiple separate statements. Additionally, you can repack or shrink tables (both compressed or uncompressed) separately or at any time (and any number of times) without preventing other DML activity.

For example, Listing 2 shows a command you can issue to reduce the size of older, more static fragments of a time-fragmented table, while leaving more recent and frequently accessed data in uncompressed form:

Listing 2: Command that reduces the size of a fragment
> select sysadmin:task('fragment compress repack shrink', partn)
>     from sysfragments F, systables T
>         where F.tabid = T.tabid
>         and F.partition in LIST{"dbsp3", "dbsp4"}
>         and T.tabname = "trans_detail";

Figure 3 shows a representation of data and free space within a partition during the storage optimization process:

Figure 3. Data in a partition during the compression and storage optimization process
1) uncompressed data uses most of partition 2) compress operation creates free space between rows 3) repack operation moves free space to front of partition 4) shrink operation truncates the free space from the end of the partition

If you want to compress a single fragment, use the fragment command with the partition number as an argument. For example:

> execute function sysadmin:task("fragment compress", “0x500002”);

If you want to compress and repack all the fragments in a table, use the table command with the table name as an argument. For example:

> execute function sysadmin:task("table compress repack", "trans_detail",
  "hrdb", "informix");

Note: In the function sysadmin:task, the third and fourth arguments (database name and owner name) are optional. The default for these arguments is the current database and owner name, respectively.

Create the dictionary only

If you do not want to compress existing row data but you want to compress newly inserted or updated rows, then just create a compression dictionary by executing the table or fragment command with the create_dictionary parameter. To build a comprehensive compression dictionary, you must ensure that each fragment of the table has at least 2000 rows of data to sample.

For example, the following command creates a compression dictionary for a table named employee, but it does not compress existing rows:

> execute function sysadmin:task("table create_dictionary", "employee");

Uncompress data

You can uncompress a previously compressed table or fragment. Uncompressing a table or fragment deactivates compression for new inserts and updates, uncompresses all compressed rows, and marks the compression dictionary as inactive. The inactive dictionary will be available for use by Enterprise Replication and the Change Data Capture API, which might be processing logs from an earlier point in time. However, once a dictionary is marked as inactive, IDS will no longer use it to compress new or changed rows.

The uncompress operation normally needs to allocate new pages for rows that no longer fit on their original pages.

Following are examples of commands to uncompress data. The first example is for a fragment and uses the uncompress command. The second example is for a table and uses the uncompress_offline command. With the uncompress_offline command, the uncompress operation is performed while holding an exclusive lock on the fragment being processed, preventing all other access to the row data until the operation is complete.

> execute function sysadmin:task(“fragment uncompress”, “4194306”);

> execute function sysadmin:task(“table uncompress_offline”, “employee”);

Monitoring Interface

System-Monitoring Interface (SMI)

IDS stores compression dictionaries, both active and inactive, in a dictionary repository called rsccompdict. Each dbspace that contains a compressed table or fragment has a dictionary repository.

The syscompdicts view in the sysmaster database brings together the information from all dictionary repositories. Each row represents a compression dictionary associated with a table or fragment by its part number (partnum).

Listing 3 shows an active dictionary that has only creation information (such as a creation timestamp, the log unique id, and the log). An inactive dictionary would have additional drop information.

Listing 3. syscompdicts compression Dictionary Information
> select * from sysmaster:syscompdicts;

dict_partnum        2097154
dict_code_version   1
dict_dbsnum         2
dict_create_times+  1239141487
dict_create_logun+  24
dict_create_logpos  4997200
dict_drop_timesta+  1239235592
dict_drop_loguniq+  26
dict_drop_logpos    5375332

dict_partnum        5242882
dict_code_version   1
dict_dbsnum         5
dict_create_times+  1239293762
dict_create_logun+  33
dict_create_logpos  10055760
dict_drop_timesta+  0
dict_drop_loguniq+  0
dict_drop_logpos    0

2 row(s) retrieved.

onstat utility

To view currently opened active compression dictionaries, use the onstat -g ppd command, as shown in Listing 4. (This onstat command does not list inactive dictionaries.)

Listing 4. Information displayed by -g ppd
$ onstat -g ppd

IBM Informix Dynamic Server Version 11.50.FC4      \\
            -- On-Line -- Up 1 days 19:42:23 -- 157696 Kbytes

Partition Compression Dictionary Info
partnum    Version  DbsNum   CrTS     CrLogID  CrLogPos DrTS     DrLogID  DrLogPos
0x300002   1        3        1239139862 24       3559504  0        0        0 
0x400002   1        4        1239139959 24       3682384  0        0        0 
0x500002   1        5        1239293762 33       10055760 0        0        0

You can also use the onstat -g dsk command to track the progress of currently running compression operations, as shown in Listing 5:

Listing 5. Information displayed by onstat -g dsk
$ onstat –g dsk

IBM Informix Dynamic Server Version 11.50.FC4     \\
            -- On-Line -- Up 3 days 01:18:15 -- 174080 Kbytes

Partnum      OP    Processed     Cur Page  Duration  Table
0x00500002    2        2000           112        0s  trans_detail 
0x00400002    4        1300           334        0s  employee

oncheck utility

To get the TBLspace disk utilization report for a partition issue, use the oncheck -pT command, as shown in Listing 6. For a compressed partition, the last part of the report includes a compressed data summary.

Listing 6. Information displayed by oncheck -pT
                  Table fragment partition dbsp4 in DBspace dbsp4

    Physical Address               5:5
    Creation date                  04/09/2009 10:56:44
    TBLspace Flags                 8000801    Page Locking
                                              TBLspace use 4 bit bit-maps
                                              TBLspace is compressed
    Maximum row size               107
    Number of special columns      0
    Number of keys                 0
    Number of extents              1
    Current serial value           1
    Current SERIAL8 value          1
    Current BIGSERIAL value        1
    Current REFID value            1
    Pagesize (k)                   2
    First extent size              8
    Next extent size               8
    Number of pages allocated      337
    Number of pages used           337
    Number of data pages           336
    Number of rows                 6242
    Partition partnum              5242882
    Partition lockid               2097156

    Extents
         Logical Page     Physical Page        Size Physical Pages
                    0              5:53         337        337

TBLspace Usage Report for hrdb:informix.trans_detail

    Type                  Pages      Empty  Semi-Full       Full  Very-Full
    ---------------- ---------- ---------- ---------- ---------- ----------
    Free                      0
    Bit-Map                   1
    Index                     0
    Data (Home)             336
                     ----------
    Total Pages             337

    Unused Space Summary

        Unused data slots                                 0
        Unused bytes per data page                       22
        Total unused bytes in data pages               7392

    Home Data Page Version Summary

                 Version                                 Count

                       0 (current)                        336

    Compressed Data Summary

        Number of  rows                        6242
        Number of compressed rows              5005
        Percentage of compressed rows           80.18

To see the actual storage savings, compare the disk utilization report before and after you perform the storage optimization.


Open Admin Tool (OAT)

OAT provides an easy-to-use web interface for IDS administration. The Resources section of this article contains a link you can use to get more information and downloading instructions for OAT.

Using OAT, you can easily identify compression candidates and perform storage optimization tasks. Figure 4 shows a view of a database and its tables along with their current compression status. You get the compression estimates when the cursor hovers over the usage column. Alternately, you can look at the DBspaces view to identify compression candidates from a dbspace.

Figure 4. OAT view of the compression status of tables in a database
OAT screen shot showing current compression status of a database. Cursor is over one table causing compression estimates to be displayed

After a compression candidate is identified, you can select the storage optimization operations that you want to perform. These tasks are shown in Figure 5:

Figure 5. OAT view of storage optimization tasks
OAT screen shot showing user has selected the compress, repack, and shrink operations for a database table

To view the status of any long-running compression tasks, click the Compression Task Status tab and view status information.


Conclusion

There are several advantages you can gain by using the data compression and optimization features of IDS. These include significant storage savings, reduced I/O activity, and faster backup and restore. Download the latest version of IDS (Version 11.50.xC4) and try it out yourself.

Acknowledgements

Thanks to Patricia Smith, John Lengyel, Kevin Cherkauer, and Scott Lashley for providing assistance in writing this document. For additional information, see the IBM Informix Dynamic Server Data Compression and Storage Optimization White Paper.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=385628
ArticleTitle=Informix Dynamic Server data compression and storage optimization
publish-date=09262013