DB2 9: Row compression and large RIDs

Consider large RIDs on row compression in DB2 9

This article describes the impact of large row IDs (RIDs) on the row compression feature in DB2 for Linux, UNIX, and Windows. Both large RIDs and row compression features were introduced in DB2 9. Row compression can significantly reduce the average row size, but when not using large RIDs, you are limited by the regular tablespace limits of 255 rows per page. By examining a simple test case of the table ORDERS from the TPCH database, you'll see how using large RIDs can circumvent the limits on the number of rows per page, a relevant factor for optimizing the benefits of row compression.

Peter Schurr (peter.schurr@de.ibm.com), IT Specialist DB2, IBM

Author Photo: Peter SchurrPeter Schurr works as an IT services specialist for IBM Software Group in IBM Deutschland. His area of expertise is administration and application development with DB2 for Linux, UNIX, and Windows. He has eight years of experience with DB2 and is an IBM Certified Advanced DBA and IBM Certified Application Developer. Peter's areas of special expertise are performance tuning, replication, federated databases, and data modeling.



05 July 2007

Also available in Chinese

Large RIDs

In DB2 Version 8, table and tablespace sizes are limited, as shown in Table 1. The table and tablespace size limits depend on the page size. The number of bytes used as a pointer is 3 bytes. Therefore, only 2 to the power of 24 units are available. This results in 16,777,216 pages. Because 1 byte is being used for the slot number in a single page, 255 multiplied by 16,777,216 rows can be addressed. Depending on the page size, the following limits exist:

Table 1. Tablespace limits on page size in DB2 V8
# of pagesPage sizeLimit of table / tablespace
16,777,2164 K64 GB
16,777,2168 K128 GB
16,777,21616 K256 GB
16,777,21632 K512 GB

In DB2 9, the limits have been expanded. The number of bytes for the page address has been increased to 4 bytes, and the slot number now uses 2 bytes. Table 2 shows the table and tablespace limits in DB2 9.

Table 2. Tablespace limits on page size in DB2 9
# of pagesPage sizeLimit of table / tablespace
536,870,9124 K2 TB
536,870,9128 K4 TB
536,870,91216 K8 TB
536,870,91232 K16 TB

Large RIDs are only supported in large tablespaces in DB2 9. This differs from DB2 Version 8, where a large tablespace was designed for LOBs and LONG data types only. Large tablespaces are the default mode in DB2 9. But when you migrate from DB2 8 to DB2 9, keep in mind that the regular tablespaces are not converted to large tablespaces. Consider in your migration plan that you may want to change the regular tablespaces into large tablespaces.

Row compression

The row compression feature in DB2 can be used to save storage space at the table level. The benefits are saving container space, smaller backup image size (and therefore reduced backup duration), and less page activity in the bufferpools. You can activate row compression for a single table. A dictionary is created that contains reusable patterns. For the patterns, a pointer is stored. It is possible to estimate the ratio of compression for each table by using the DB2 INSPECT command. You must reorganize a table to enforce the compression.

Compression and large RIDs

Row compression reduces the average size of a row. When using a regular tablespace (suppose you are migrating from DB2 Version 8 to DB2 9, where the regular tablespace will be kept), the limit is still fixed at 255 rows per page. Although you can reduce the average row length by using row compression, you are limited by the number of rows per page and you will waste storage space on the tablespace container.

Example on table ORDERS in database TPCH

The following example uses the table ORDER of a TPCH database (of 1 GB total size) to show the impact of the number of RIDs after using the row compression. A table is used with a small average row size. Therefore, the table has been modified by changing the initial maximum row length. The column O_COMMENT has been cut to a length of 20 characters (instead of 79). A page size of 16 K is used for the tablespace to store the table ORDERS. Using this page size, the effect of large RIDs can be explained very well.

The main steps in the following example are:

  1. Create a table ORDERS in a regular tablespace in DB2 Version 8, check the number of pages, average row size, and number of rows per page.
  2. Migrate the instance and the database to DB2 9.
  3. Create a new large tablespace in DB2 9.
  4. Create the table ORDERS2 like ORDER in the new large tablespace.
  5. Compress both tables, ORDERS and ORDERS2, and run reorg on them.
  6. Now you can compare the number of pages, average row size, and the number of rows per page of both tables. You can see the difference between a regular and a large tablespace.

Modify table design

Example 1 shows the structure of table ORDERS. The column O_COMMENT has been modified.

Example 1. DDL of table ORDERS
db2inst1@mstar:~/test/compr_lrid> db2 describe table orders

Column                         Type      Type
name                           schema    name               Length   Scale Nulls
------------------------------ --------- ------------------ -------- ----- ------
O_ORDERKEY                     SYSIBM    INTEGER                   4     0 No
O_CUSTKEY                      SYSIBM    INTEGER                   4     0 No
O_ORDERSTATUS                  SYSIBM    CHARACTER                 1     0 No
O_TOTALPRICE                   SYSIBM    DECIMAL                  15     2 No
O_ORDERDATE                    SYSIBM    DATE                      4     0 No
O_ORDERPRIORITY                SYSIBM    CHARACTER                15     0 No
O_CLERK                        SYSIBM    CHARACTER                15     0 No
O_SHIPPRIORITY                 SYSIBM    INTEGER                   4     0 No
O_COMMENT                      SYSIBM    VARCHAR                  20     0 No

  9 record(s) selected.

db2inst1@mstar:~/test/compr_lrid>

The average row size (that is column AVG_ROW_SIZE in the query) decreases from 107 bytes of the original table to 79 bytes, see Example 2. The column ROWS_PER_PAGE has been calculated by dividing the number of rows (CARD) by the number of used pages (NPAGES).

Example 2. Numbers of table ORDERS in regular tablespace
db2inst1@mstar:~/test/compr_lrid> db2 -tvf ars.sql
SELECT SUBSTR(a.tabname,1,10) AS table, b.npages , 
CASE WHEN (b.NPAGES  > 0) THEN (b.CARD / b.NPAGES) ELSE -1 END AS ROWS_PER_PAGE, 
SUM(AVGCOLLEN) AVG_ROW_SIZE 
FROM SYSCAT.COLUMNS a, SYSCAT.TABLES b, SYSCAT.TABLESPACES c 
WHERE a.tabschema = b.tabschema AND a.tabname = b.tabname AND b.tbspaceid = c.tbspaceid 
AND a.tabname = 'ORDERS' 
GROUP BY a.tabschema, a.tabname, pagesize, card, npages
TABLE      NPAGES               ROWS_PER_PAGE        AVG_ROW_SIZE
---------- -------------------- -------------------- ------------
ORDERS     8198                  182                           79

  1 record(s) selected.


db2inst1@mstar:~/test/compr_lrid>

In the regular tablespace, in DB2 Version 8, in non-compressed mode, 182 rows per-page are stored. In total these are 8198 pages. This is near the limit of 255 pages per row, so the page size of 16 K was a good choice. After migrating the instance and the database to DB2 9, the statistics must be refreshed. A new large tablespace ltb16K with a page size of 16 K will be created and a new table ORDERS2 will be created like the table ORDER in the new large tablespace. No indexes or constraints are used in the TPCH database. Therefore, the creation of the table is very simple. The data will be loaded by cursor, see Example 3.

Create the large tablespace

Example 3. Creation of table ORDERS2
db2inst1@mstar:~/test/compr_lrid> ./cr_tbspace_ltb16k.sh
   Database Connection Information

 Database server        = DB2/LINUX 9.1.0
 SQL authorization ID   = DB2INST1
 Local database alias   = TPCH

DROP TABLESPACE ltb16K
DB21034E  The command was processed as an SQL statement because it was not a
valid Command Line Processor command.  During SQL processing it returned:
SQL0204N  "LTB16K" is an undefined name.  SQLSTATE=42704

CREATE LARGE TABLESPACE ltb16K PAGESIZE 16 K MANAGED BY DATABASE 
  USING ( FILE '/db2/db2inst1/TPCH/ltb16K.001' 20000 ) BUFFERPOOL bp16k
DB20000I  The SQL command completed successfully.

CREATE TABLE orders2 LIKE orders IN ltb16K
DB20000I  The SQL command completed successfully.

DB20000I  The SQL command completed successfully.
DECLARE c1 CURSOR FOR SELECT * FROM orders
DB20000I  The SQL command completed successfully.

LOAD FROM c1 OF CURSOR INSERT INTO orders2
SQL3501W  The table space(s) in which the table resides will not be placed in
backup pending state since forward recovery is disabled for the database.
SQL1193I  The utility is beginning to load data from the SQL statement "
SELECT * FROM orders".
SQL3500W  The utility is beginning the "LOAD" phase at time "2007-02-09
23.03.50.673543".
SQL3519W  Begin Load Consistency Point. Input record count = "0".
SQL3520W  Load Consistency Point was successful.
SQL3110N  The utility has completed processing.  "1500000" rows were read from
the input file.
SQL3519W  Begin Load Consistency Point. Input record count = "1500000".
SQL3520W  Load Consistency Point was successful.
SQL3515W  The utility has finished the "LOAD" phase at time "2007-02-09
23.04.17.263410".
Number of rows read         = 1500000
Number of rows skipped      = 0
Number of rows loaded       = 1500000
Number of rows rejected     = 0
Number of rows deleted      = 0
Number of rows committed    = 1500000
db2inst1@mstar:~/test/compr_lrid>

In the large tablespace, in DB2 9, in non-compressed mode, also 182 rows per-page are stored. The average row size (79) and the number of pages (8198) are the same as in the regular tablespace, see Example 4.

Example 4. Numbers of ORDERS2 in large tablespace
db2inst1@mstar:~/test/compr_lrid> db2 -tvf ars.orders2.sql
SELECT SUBSTR(a.tabname,1,10) AS table, PAGESIZE, b.CARD, b.npages , 
CASE WHEN (b.NPAGES  > 0) THEN (b.CARD / b.NPAGES) ELSE -1 END AS ROWS_PER_PAGE, 
SUM(AVGCOLLEN) AVG_ROW_SIZE FROM SYSCAT.COLUMNS a, SYSCAT.TABLES b, SYSCAT.TABLESPACES c 
WHERE a.tabschema = b.tabschema AND a.tabname = b.tabname AND b.tbspaceid = c.tbspaceid 
AND a.tabname = 'ORDERS2' 
GROUP BY a.tabschema, a.tabname, pagesize, card, npages
TABLE      NPAGES               ROWS_PER_PAGE        AVG_ROW_SIZE
---------- -------------------- -------------------- ------------
ORDERS2                    8198                  182           79

  1 record(s) selected.
db2inst1@mstar:~/test/compr_lrid>

The catalog in DB2 9 contains a new column called AVGROWSIZE, which shows the average length (in bytes) of both compressed and uncompressed rows in a table. It is being retrieved for the tables ORDERS and ORDERS2 in Example 5. Both tables have 89 bytes as the average row size. This value differs a little from the calculated value of 79 bytes - calculated by the AVGCOLLEN of all columns.

Example 5. Retrieving the AVGROWSIZE in uncompressed mode
db2inst1@mstar:~/test/compr_lrid> db2 "select AVGROWSIZE 
from syscat.tables where tabname = 'ORDERS2' "

AVGROWSIZE
----------
        89

  1 record(s) selected.

db2inst1@mstar:~/test/compr_lrid>
db2inst1@mstar:~/test/compr_lrid> db2 "select AVGROWSIZE 
from syscat.tables where tabname = 'ORDERS' "

AVGROWSIZE
----------
        89

  1 record(s) selected.

db2inst1@mstar:~/test/compr_lrid>

Compress the tables

Now both tables are being compressed, see Example 6.

Note: A reorganization of the tables is necessary, the ALTER TABLE statement just updates the catalog content.

Example 6. Compressing the tables
db2inst1@mstar:~/test/compr_lrid> db2 "alter table db2inst1.orders compress yes"
DB20000I  The SQL command completed successfully.
db2inst1@mstar:~/test/compr_lrid>
db2inst1@mstar:~/test/compr_lrid> db2 "alter table db2inst1.orders2 compress yes"
DB20000I  The SQL command completed successfully.
db2inst1@mstar:~/test/compr_lrid>
db2inst1@mstar:~/test/compr_lrid> time db2 -v "reorg table db2inst1.ORDERS 
 resetdictionary"
reorg table  db2inst1.ORDERS resetdictionary
DB20000I  The REORG command completed successfully.

real  0m58.335s
user  0m0.018s
sys 0m0.029s
db2inst1@mstar:~/test/compr_lrid>

db2inst1@mstar:~/test/compr_lrid> time db2 -v "reorg table  db2inst1.ORDERS2 
 resetdictionary"
reorg table  db2inst1.ORDERS2 resetdictionary
DB20000I  The REORG command completed successfully.

real  0m59.505s
user  0m0.020s
sys 0m0.028s
db2inst1@mstar:~/test/compr_lrid> db2 "reorgchk update statistics 
on table db2inst1.orders " > reorgchk.orders.compr.out

db2inst1@mstar:~/test/compr_lrid> db2 "reorgchk update statistics 
on table db2inst1.orders2 " > reorgchk.orders2.compr.out
db2inst1@mstar:~/test/compr_lrid>

In Example 7, you can see that the number of pages in the regular tablespace is now 253, it is yet limited by 255 rows per page. However, the large tablespace allows 427 rows per-page for the table ORDERS2. Therefore, the number of pages used for table ORDERS2 is smaller than in table ORDERS. This shows the effect when compressing a table in a regular tablespace. You will still hit the limit of 255 pages. To avoid this, you have to use a large tablespace, then you are able to store more rows per page after the compression.

Example 7. Numbers after compressing
db2inst1@mstar:~/test/compr_lrid> db2 -tvf ars.sql
SELECT SUBSTR(a.tabname,1,10) AS table, PAGESIZE, b.CARD, b.npages , 
CASE WHEN (b.NPAGES  > 0) THEN (b.CARD / b.NPAGES) ELSE -1 END AS ROWS_PER_PAGE, 
SUM(AVGCOLLEN) AVG_ROW_SIZE FROM SYSCAT.COLUMNS a, SYSCAT.TABLES b, SYSCAT.TABLESPACES c 
WHERE a.tabschema = b.tabschema AND a.tabname = b.tabname AND b.tbspaceid = c.tbspaceid 
AND a.tabname = 'ORDERS' GROUP BY a.tabschema, a.tabname, pagesize, card, npages
TABLE      NPAGES               ROWS_PER_PAGE        AVG_ROW_SIZE
---------- -------------------- -------------------- ------------
ORDERS                     5907                  253           79

  1 record(s) selected.

db2inst1@mstar:~/test/compr_lrid> db2 -tvf ars.orders2.sql
SELECT SUBSTR(a.tabname,1,10) AS table, PAGESIZE, b.CARD, b.npages , 
CASE WHEN (b.NPAGES  > 0) THEN (b.CARD / b.NPAGES) ELSE -1 END AS ROWS_PER_PAGE, 
SUM(AVGCOLLEN) AVG_ROW_SIZE FROM SYSCAT.COLUMNS a, SYSCAT.TABLES b, SYSCAT.TABLESPACES c 
WHERE a.tabschema = b.tabschema AND a.tabname = b.tabname AND b.tbspaceid = c.tbspaceid 
AND a.tabname = 'ORDERS2' GROUP BY a.tabschema, a.tabname, pagesize, card, npages
TABLE      NPAGES               ROWS_PER_PAGE        AVG_ROW_SIZE
---------- -------------------- -------------------- ------------
ORDERS2                    3512                  427           79

  1 record(s) selected.

db2inst1@mstar:~/test/compr_lrid>

The average row size (AVGROWSIZE) is the same, as shown in Example 8. It is now 38 bytes. Without compression it was 89 bytes (see Example 5).

Example 8. Retrieving the AVGROWSIZE
db2inst1@mstar:~/test/compr_lrid>

db2inst1@mstar:~/test/compr_lrid> db2 "select AVGROWSIZE 
 from syscat.tables where tabname = 'ORDERS' "

AVGROWSIZE
----------
        38

  1 record(s) selected.
db2inst1@mstar:~/test/compr_lrid> db2 "select AVGROWSIZE 
 from syscat.tables where tabname = 'ORDERS2' "

AVGROWSIZE
----------
        38

  1 record(s) selected.
db2inst1@mstar:~/test/compr_lrid>

Introduced with the compression feature are some new columns in the catalog. Specifically, the column PCTPAGESSAVED shows the ratio of savings of pages after the compression. In this example, 57 percent of the pages were saved in the large tablespace, because the number was reduced from 8198 to 3512. The regular tablespace needs some more pages. The number of pages could be reduced from 8198 to 5907 pages, see Example 9. Using the large tablespace saves 40.5 percent of the page space (from 5.907 down to 3.512 pages).

Example 9. Saving in the tables
db2inst1@mstar:~/test/compr_lrid> db2 "select SUBSTR(tabname,1,20) AS table, npages, 
from syscat.tables where tabschema = 'DB2INST1' and tabname LIKE  'ORDERS%' "
TABLE                NPAGES               PCTPAGESSAVED
-------------------- -------------------- ------------------------ -------------
ORDERS                               5907                   27
ORDERS2                              3512                   57

  2 record(s) selected.
db2inst1@mstar:~/test/compr_lrid>

Summary

In Table 3, the results are summarized. The best savings are reached by using the large tablespace. Only 3.512 pages are necessary after the compression, the regular tablespace still needs 5.907 pages. The difference between the regular and the large tablespace is 5.907 pages - 3.512 pages, which results in 2.395 pages of wasted space.

Table 3. Comparing the average rows per page page size 16 K
TableTablespaceModeCARDNPAGESAVGROWSIZEAVG ROWS PER PAGE
ORDERSREGULARNo compression1.500.0008.19889182
ORDERSREGULARCompressed1.500.0005.90738253
ORDERS2LARGENo compression1.500.0008.19889182
ORDERS2LARGECompressed1.500.0003.51228427

Table 4 compares the savings. In the table ORDERS, 28 percent of pages have been saved. In the table ORDERS2, 57 percent of the pages have been saved by the compression.

Table 4. Comparing the savings
TableTablespacePCTPAGESSAVEDNPAGES before compressionNPAGES after compressionSAVINGS
ORDERSREGULAR278.1985.90728
ORDERS2LARGE5708.1983.51257

When using a page size of 32 K, the compression in the regular tablespace fails with the warning SQL2220W. There is a minimum record length of 127 bytes for a 32 K regular tablespace (see Table 6). The average row size of 89 bytes is too low.

Table 6. Min and max record length
Page sizeRegular tablespace min record lengthRegular tablespace max record lengthLarge tablespace min record lengthLarge tablespace max record length
4 K1425112287
8 K3025312580
16 K62254121165
32 K127253122335

Conclusion: Best practices

To fully take advantage of the benefits of row compression in DB2 9, follow these best practices:

  • Refresh the statistics after migrating to DB2 9 (by running RUNSTATS or REORGCHK UPDATE STATISTICS).
  • Use the DB2 INSPECT command to check the savings on the tables to decide which tables are worth compressing.
  • A reorganization of tables with row compression is necessary for the compression to be effective.
  • When migrating from DB2 Version 8 to DB2 9, plan to migrate the regular tablespaces to large tablespaces.
  • When migrating to large tablespaces, consider scheduling both the data movement and the reorganization of the data.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=237759
ArticleTitle=DB2 9: Row compression and large RIDs
publish-date=07052007