How IDS storage optimization works
IDS storage optimization works by considering the entire row and all its columns (except column data that is stored outside the row as a single string of bytes, such as BLOB data). IDS then looks for repeating patterns and stores those patterns as symbols in a compression dictionary, as shown in Figure 1.
Figure 1. Patterns that can be stored as symbols in a dictionary
After creating the dictionary, IDS stores it in a dictionary repository.
The process of optimizing storage for a table involves four steps:
- Create a compression dictionary.
- Compress the data in rows in a table or table fragment.
- Repack the table or fragment rows.
- Reclaim free space.
The following sections provide more detail on each of these steps
Creating a compression dictionary
To create a dictionary, IDS samples a set of rows from an existing table and creates a dictionary of symbols, which represent byte patterns. IDS stores the dictionary in the dictionary repository. Each dbspace that contains a compressed partition has a dictionary repository. The dictionary also has an in-memory representation, so that active queries and updates can quickly compress and uncompress data.
The size of each dictionary is approximately 75KB. However, each dictionary can grow to be as large as 150KB. IDS stores one compression dictionary in memory for each compressed partition. Therefore, it is important for DBAs to take into account these resource requirements.
Compressing the data in table or fragment rows
After creating the dictionary, IDS starts a background process that goes through the table or fragment and compresses the table or fragment rows. The process compresses each row and leaves it in the page where it was compressed. Any new rows that are inserted or updated are also compressed.
This compress operation runs while other transactions and queries are occurring on the table. Therefore, IDS performs the operation in small transactions and holds locks on the rows being actively compressed for only a short duration.
Once data has been compressed, a significant amount of unused space or holes may exist between the rows. The repacking operation uses an algorithm similar to the compress operation to move all the rows to the front of the partition. This operation also utilizes small transactions and locks the rows that are actively being moved.
After all the rows have been repacked, the shrink operation removes the unused table or fragment space and returns free space to the dbspace that contains the table or fragment.
Prerequisite for compression and storage optimization tasks
To perform storage optimization tasks, you must be able to connect to the sysadmin database, and you must be a Database Server Administrator (DBSA). By default, only the user named Informix may perform these tasks.
Identify compression candidates
Issuing a command that estimates compression ratios
Storage optimization can be done at the table level or the fragment level for a fragmented table.
You can identify compression candidates (in Version 11.50.xC4 and later) by issuing the
table or fragment command with
the estimate_compression parameter.
The ratios this command displays are data compression estimates based on a random sampling of rows from the table. The actual ratio of space saved can vary slightly.
Listing 1 shows a sample employee table with three fragments. The first fragment is not compressed and has a value of 0.0% in the curr (current compression ratio) column. The curr column for the other two fragments represents the currently achieved compression ratio. The change column for each fragment represents the possible change in the compression ratio if you were to uncompress and compress the table.
Listing 1: Compression estimates
$ dbaccess hrdb –
Database selected.
> select sysadmin:task("table estimate_compression", tabname)
from systables where tabid > 100;
(expression) est curr change partnum table
----- ----- ------ ---------- -----------------------------------
85.9% 0.0% +85.9 0x00200002 hrdb:informix.employee
85.9% 85.5% +0.4 0x00300002 hrdb:informix.employee
86.1% 86.1% +0.0 0x00400002 hrdb:informix.employee
Succeeded: table estimate_compression hrdb:informix.employee
(expression) est curr change partnum table
----- ----- ------ ---------- -----------------------------------
22.5% 0.0% +22.5 0x001001f6 hrdb:informix.audit_log
Succeeded: table estimate_compression hrdb:informix.audit_log
2 row(s) retrieved.
|
Using the standalone IDS Compression Estimator on Windows
To identify compression candidates in a prior (pre-11.50.xC4) version of IDS, you can use the standalone Windows® utility called IDS Compression Estimator. The Resources section of this article contains a link for downloading the utility. Install the utility on a Windows system along with Client-SDK or I-Connect (required for connecting to an IDS instance). After installing the estimator, you can connect to any IDS instance remotely by providing the necessary connection information.
The screen shot in Figure 2 shows an example of how the utility displays potential storage savings if data is compressed.
Figure 2. IDS Compression Estimator
You must enable disk compression before you can compress data. You need to enable compression only once in the lifetime of the database server. However, you do not need to enable compression to estimate how much space compressing your data would save. You also do not need to enable compression if you want to use the repack or shrink operations to free up space from tables without compressing any data.
To enable compression, execute the following SQL administration API statement:
> execute function sysadmin:task("enable compression");
|
Note that once compression is enabled, you must follow IDS reversion procedures to go back to a prior version of IDS that does not support compression. You would also need to uncompress or drop any compressed tables or fragments before reverting.
The modular design of IDS storage optimization allows you to perform compression and storage optimization tasks both individually and collectively. This means you can perform all operations (such as compressing data, moving rows to the front of the partition, and truncating free space from the end of the partition) with a single SQL statement or with multiple separate statements. Additionally, you can repack or shrink tables (both compressed or uncompressed) separately or at any time (and any number of times) without preventing other DML activity.
For example, Listing 2 shows a command you can issue to reduce the size of older, more static fragments of a time-fragmented table, while leaving more recent and frequently accessed data in uncompressed form.
Listing 2: Command that reduces the size of a fragment
> select sysadmin:task('fragment compress repack shrink', partn)
> from sysfragments F, systables T
> where F.tabid = T.tabid
> and F.partition in LIST{"dbsp3", "dbsp4"}
> and T.tabname = "trans_detail";
|
Figure 3 shows a representation of data and free space within a partition during the storage optimization process.
Figure 3. Data in a partition during the compression and storage optimization process
If you want to compress a single fragment, use the fragment
command with the partition number as an argument. For
example:
> execute function sysadmin:task("fragment compress", “0x500002”);
|
If you want to compress and repack all the fragments in a table, use the table command with the table name as an argument. For example:
> execute function sysadmin:task("table compress repack", "trans_detail",
"hrdb", "informix");
|
Note: In the function sysadmin:task, the third and
fourth arguments (database name and owner name) are optional. The default for these arguments is the current database and owner name, respectively.
If you do not want to compress
existing row data but you want to compress newly inserted or updated rows,
then just create a compression dictionary by executing the table or fragment command with the create_dictionary parameter.
To build a comprehensive compression dictionary, you must ensure that each fragment of
the table has at least 2000 rows of data to sample.
For example, the following command creates a compression dictionary for a table named employee, but it does not compress existing rows:
> execute function sysadmin:task("table create_dictionary", "employee");
|
You can uncompress a previously compressed table or fragment. Uncompressing a table or fragment deactivates compression for new inserts and updates, uncompresses all compressed rows, and marks the compression dictionary as inactive. The inactive dictionary will be available for use by Enterprise Replication and the Change Data Capture API, which might be processing logs from an earlier point in time. However, once a dictionary is marked as inactive, IDS will no longer use it to compress new or changed rows.
The uncompress operation normally needs to allocate new pages for rows that no longer fit on their original pages.
Following are examples of commands to uncompress data.
The first example is for a fragment and uses the uncompress command.
The second example is for a table and uses the uncompress_offline command.
With the uncompress_offline command, the uncompress
operation is performed while holding an exclusive lock on the fragment being
processed, preventing all other access to the row data until the operation is complete.
> execute function sysadmin:task(“fragment uncompress”, “4194306”); > execute function sysadmin:task(“table uncompress_offline”, “employee”); |
System-Monitoring Interface (SMI)
IDS stores compression dictionaries, both active and inactive, in a dictionary repository called rsccompdict. Each dbspace that contains a compressed table or fragment has a dictionary repository.
The syscompdicts view in the sysmaster database brings together the information from all dictionary repositories. Each row represents a compression dictionary associated with a table or fragment by its part number (partnum).
The following example shows an active dictionary that has only creation information (such as a creation timestamp, the log unique id, and the log). An inactive dictionary would have additional drop information.
Listing 3. syscompdicts compression Dictionary Information
> select * from sysmaster:syscompdicts; dict_partnum 2097154 dict_code_version 1 dict_dbsnum 2 dict_create_times+ 1239141487 dict_create_logun+ 24 dict_create_logpos 4997200 dict_drop_timesta+ 1239235592 dict_drop_loguniq+ 26 dict_drop_logpos 5375332 dict_partnum 5242882 dict_code_version 1 dict_dbsnum 5 dict_create_times+ 1239293762 dict_create_logun+ 33 dict_create_logpos 10055760 dict_drop_timesta+ 0 dict_drop_loguniq+ 0 dict_drop_logpos 0 2 row(s) retrieved. |
To view currently opened active compression dictionaries, use the onstat -g ppd command, as shown in Listing 4.
(This onstat command does not list inactive dictionaries.)
Listing 4. Information displayed by
-g ppd
$ onstat -g ppd
IBM Informix Dynamic Server Version 11.50.FC4 \\
-- On-Line -- Up 1 days 19:42:23 -- 157696 Kbytes
Partition Compression Dictionary Info
partnum Version DbsNum CrTS CrLogID CrLogPos DrTS DrLogID DrLogPos
0x300002 1 3 1239139862 24 3559504 0 0 0
0x400002 1 4 1239139959 24 3682384 0 0 0
0x500002 1 5 1239293762 33 10055760 0 0 0
|
You can also use the onstat -g dsk command to track the progress of currently
running compression operations, as shown in Listing 5.
Listing 5. Information displayed by
onstat -g dsk
$ onstat –g dsk
IBM Informix Dynamic Server Version 11.50.FC4 \\
-- On-Line -- Up 3 days 01:18:15 -- 174080 Kbytes
Partnum OP Processed Cur Page Duration Table
0x00500002 2 2000 112 0s trans_detail
0x00400002 4 1300 334 0s employee
|
To get the TBLspace disk utilization report for a partition issue, use the oncheck -pT command, as shown in Listing 6.
For a compressed partition, the last part of the report includes a compressed data summary.
Listing 6. Information displayed by
oncheck -pT
Table fragment partition dbsp4 in DBspace dbsp4
Physical Address 5:5
Creation date 04/09/2009 10:56:44
TBLspace Flags 8000801 Page Locking
TBLspace use 4 bit bit-maps
TBLspace is compressed
Maximum row size 107
Number of special columns 0
Number of keys 0
Number of extents 1
Current serial value 1
Current SERIAL8 value 1
Current BIGSERIAL value 1
Current REFID value 1
Pagesize (k) 2
First extent size 8
Next extent size 8
Number of pages allocated 337
Number of pages used 337
Number of data pages 336
Number of rows 6242
Partition partnum 5242882
Partition lockid 2097156
Extents
Logical Page Physical Page Size Physical Pages
0 5:53 337 337
TBLspace Usage Report for hrdb:informix.trans_detail
Type Pages Empty Semi-Full Full Very-Full
---------------- ---------- ---------- ---------- ---------- ----------
Free 0
Bit-Map 1
Index 0
Data (Home) 336
----------
Total Pages 337
Unused Space Summary
Unused data slots 0
Unused bytes per data page 22
Total unused bytes in data pages 7392
Home Data Page Version Summary
Version Count
0 (current) 336
Compressed Data Summary
Number of rows 6242
Number of compressed rows 5005
Percentage of compressed rows 80.18
|
To see the actual storage savings, compare the disk utilization report before and after you perform the storage optimization.
OAT provides an easy-to-use web interface for IDS administration. The Resources section of this article contains a link you can use to get more information and downloading instructions for OAT.
Using OAT, you can easily identify compression candidates and perform storage optimization tasks. Figure 4 shows a view of a database and its tables along with their current compression status. You get the compression estimates when the cursor hovers over the usage column. Alternately, you can look at the DBspaces view to identify compression candidates from a dbspace.
Figure 4. OAT view of the compression status of tables in a database
After a compression candidate is identified, you can select the storage optimization operations that you want to perform. These tasks are shown in Figure 5.
Figure 5. OAT view of storage optimization tasks
To view the status of any long-running compression tasks, click the “Compression Task Status” tab and view status information.
There are several advantages you can gain by using the data compression and optimization features of IDS. These include significant storage savings, reduced I/O activity, and faster backup and restore. Download the latest version of IDS (Version 11.50.xC4) and try it out yourself.
Thanks to Patricia Smith, John Lengyel, Kevin Cherkauer, and Scott Lashley for providing assistance in writing this document. For additional information, see the IBM Informix Dynamic Server Data Compression and Storage Optimization White Paper.
Learn
-
Check out the GReen
IT Report for links to some of the latest GRIT about the environmentally friendly side of information technology (and of life).
-
Learn more about Informix at the Informix page on
developerWorks.
-
Learn more about Informix at the IBM Informix Dynamic Server 11.5 Information Center.
-
Stay current with developerWorks
technical events and webcasts.
- Learn about and download the
Open Admin Tool (OAT).
Get products and technologies
-
Download the Informix Compression Estimator Program.
-
Download a free trial version of Informix Dynamic Server.
Discuss





