Topic
  • 3 replies
  • Latest Post - ‏2014-09-12T18:40:22Z by tmhajas
tmhajas
tmhajas
15 Posts

Pinned topic database compression applied by default ?

‏2014-08-25T23:11:22Z |

hi everyone,

I wonder if the DB2 applies any compression by default, without using 

http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.dbobj.doc/doc/c0007306.html?cp=SSEPGG_9.7.0%2F3-0-2-1-1-2

either row or value compression ? 

 

Here is one thing which puzzle me, I'm trying to do an estimation (database sizing) based on some of our tables which are expected to grow the most (these are constantly inserted - every 5 min or more often with rows, but data is also deleted periodically eg. every 2 week, 1 month, 3 months, these are all factored into my formula).

Using either the :

  • NEW -  IBM data studio client  (it shows the average row size, eg. 200 bytes)
  • OLD - DB2 Control Center (it shows both average row size, but it can also show table size on disk)

I could determine roughly the size of my entire DB, but based on a simple calculus of "average row size" X "number of rows" to calculate the size of a table (one of the largest one), this does not match up with the size db2 control center is returning.

On the other hand, using get_dbsize_info() 

eg.  db2 => call get_dbsize_info(?,?,?,0)

  Value of output parameters
  --------------------------
  Parameter Name  : SNAPSHOTTIMESTAMP
  Parameter Value : 2014-08-25-10.50.34.703000
 
  Parameter Name  : DATABASESIZE
  Parameter Value : 2395136000
 
  Parameter Name  : DATABASECAPACITY
  Parameter Value : 10164322304
 
  Return Status = 0

the size comes out to be much much smaller, about 20% of my calculus. Based on my calculation (worse case scenario) the DB size should be around 500GB, while using get_dbsize_info() it comes out only 120GB.

So what can be wrong ?  My first thought is that there might be some compression (especially for many zero value inserts - value compression?) going on by default, but i checked with our appplication DB admin, the tables are not created with any compression.

Is the data studio client unreliable ?

Is the db2 control center unreliable ? 

Am i miss calculating something in my formula so badly ? 

 

NOTE : We are currently using DB2 9.7 fp4 (+ newer FP levels) and DB2 10.1.fp2 (+ newer FP levels).

Any feedback is greatly appreciated ! Thanks

 

Updated on 2014-09-10T17:45:23Z at 2014-09-10T17:45:23Z by tmhajas
  • mor
    mor
    520 Posts

    Re: database compression ?

    ‏2014-08-26T09:00:37Z  

    db2 v9.7 / v10.1 does not automatically use row-compression or value-compression.


    If licensed, row-compression applies only to those tables/indexes that are compressed (either by their 'create table' 'create index' DDL or by Alter table.
    The same is true for value-compression - by explicit request.

    If your runstats are up to date, you don't need any GUI to determine table/index sizes, but instead can SUM + Group-By on the various columns in sysibmadm.admintabinfo.

    Control-Centre is deprecated in current db2 versions.
    Data Studio: a sledgehammer to crack a nut.
     

  • tmhajas
    tmhajas
    15 Posts

    Re: database compression ?

    ‏2014-09-10T23:08:03Z  
    • mor
    • ‏2014-08-26T09:00:37Z

    db2 v9.7 / v10.1 does not automatically use row-compression or value-compression.


    If licensed, row-compression applies only to those tables/indexes that are compressed (either by their 'create table' 'create index' DDL or by Alter table.
    The same is true for value-compression - by explicit request.

    If your runstats are up to date, you don't need any GUI to determine table/index sizes, but instead can SUM + Group-By on the various columns in sysibmadm.admintabinfo.

    Control-Centre is deprecated in current db2 versions.
    Data Studio: a sledgehammer to crack a nut.
     

    hi Mike,

    Thanks for the reply, I think I've read that for compression some licensing is required, which is not available in our DB2 ESE license.

    The sysibmadm.admintabinfo is certainly useful, but it's an overkill to calculate what i need (too many tables to take into account), I just need to validated my db sizing roughly, I don't look for an exact match, but when in reality it's 120GB and theory 500GB, that it's a problem hard to overlook.

     

    I've done some more testing with this admintabinfo, for a smaller scale environment we have and something is not consistent:

    db2 => select tabschema, SUM(data_object_p_size),SUM(INDEX_OBJECT_P_SIZE), SUM(LONG_OBJECT_P_SIZE),SUM(LOB_OBJECT_P_SIZE), SUM(XML_OBJECT_P_SIZE)  fro
    m sysibmadm.admintabinfo group by tabschema

    TABSCHEMA                                                                                                                        2
    3                    4                    5                    6
    -------------------------------------------------------------------------------------------------------------------------------- --------------------
    -------------------- -------------------- -------------------- --------------------

    TPC                                                                                                                                           2124608
                  239744                  832                  704                    0

      1 record(s) selected.

    where total is : 2365880 which according to http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.1.0/com.ibm.db2.udb.admin.doc/doc/r0022024.htm?cp=SSEPGG_9.1.0%2F14-0-2-11-3 is in kilobytes, which is ~2.4 GB

    Then if i check this number against get_dbsize_info :

    db2 => call get_dbsize_info(?,?,?,0)

      Value of output parameters
      --------------------------
      Parameter Name  : SNAPSHOTTIMESTAMP
      Parameter Value : 2014-09-10-05.08.06.857000

      Parameter Name  : DATABASESIZE
      Parameter Value : 3173998592

      Parameter Name  : DATABASECAPACITY
      Parameter Value : 10166206464

      Return Status = 0

    which is approx 3GB, that matches with the size on the disk: c:\DB2, but it's again off with ~30% from the admintabinfo.

     

    Finally here is my formula that I use to estimate the DB size (based on our application):

    (# bytes in DB) = R x (S x V x ((24 x CR x R1) + (24 x R2) + R3))

    Where:
     R = performance record size; for a subsystem approximately 200 bytes
     S = number of subsystems
     V = number of volumes
     CR = Collection Rate; number of performance samples in a hour, for example, 5 min = 12 samples
     R1 = retention period for sample data, in days
     R2 = retention period for hourly data, in days
     R3= retention period for daily data, in days

    now this 200 bytes is from the db2cc or data studio client on average for the tables I expect the most growth (performance data tab), i just checked briefly throughout the tables, it's 200bytes on average.

    Again, if it's only a 10-20% difference, that would be fine. Let me know if you have any thoughts. Thanks

  • tmhajas
    tmhajas
    15 Posts

    Re: database compression ?

    ‏2014-09-12T18:40:22Z  
    • tmhajas
    • ‏2014-09-10T23:08:03Z

    hi Mike,

    Thanks for the reply, I think I've read that for compression some licensing is required, which is not available in our DB2 ESE license.

    The sysibmadm.admintabinfo is certainly useful, but it's an overkill to calculate what i need (too many tables to take into account), I just need to validated my db sizing roughly, I don't look for an exact match, but when in reality it's 120GB and theory 500GB, that it's a problem hard to overlook.

     

    I've done some more testing with this admintabinfo, for a smaller scale environment we have and something is not consistent:

    db2 => select tabschema, SUM(data_object_p_size),SUM(INDEX_OBJECT_P_SIZE), SUM(LONG_OBJECT_P_SIZE),SUM(LOB_OBJECT_P_SIZE), SUM(XML_OBJECT_P_SIZE)  fro
    m sysibmadm.admintabinfo group by tabschema

    TABSCHEMA                                                                                                                        2
    3                    4                    5                    6
    -------------------------------------------------------------------------------------------------------------------------------- --------------------
    -------------------- -------------------- -------------------- --------------------

    TPC                                                                                                                                           2124608
                  239744                  832                  704                    0

      1 record(s) selected.

    where total is : 2365880 which according to http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.1.0/com.ibm.db2.udb.admin.doc/doc/r0022024.htm?cp=SSEPGG_9.1.0%2F14-0-2-11-3 is in kilobytes, which is ~2.4 GB

    Then if i check this number against get_dbsize_info :

    db2 => call get_dbsize_info(?,?,?,0)

      Value of output parameters
      --------------------------
      Parameter Name  : SNAPSHOTTIMESTAMP
      Parameter Value : 2014-09-10-05.08.06.857000

      Parameter Name  : DATABASESIZE
      Parameter Value : 3173998592

      Parameter Name  : DATABASECAPACITY
      Parameter Value : 10166206464

      Return Status = 0

    which is approx 3GB, that matches with the size on the disk: c:\DB2, but it's again off with ~30% from the admintabinfo.

     

    Finally here is my formula that I use to estimate the DB size (based on our application):

    (# bytes in DB) = R x (S x V x ((24 x CR x R1) + (24 x R2) + R3))

    Where:
     R = performance record size; for a subsystem approximately 200 bytes
     S = number of subsystems
     V = number of volumes
     CR = Collection Rate; number of performance samples in a hour, for example, 5 min = 12 samples
     R1 = retention period for sample data, in days
     R2 = retention period for hourly data, in days
     R3= retention period for daily data, in days

    now this 200 bytes is from the db2cc or data studio client on average for the tables I expect the most growth (performance data tab), i just checked briefly throughout the tables, it's 200bytes on average.

    Again, if it's only a 10-20% difference, that would be fine. Let me know if you have any thoughts. Thanks

    Apparently my formula is wrong, the number of subsystems should have not be part of the calculus. Without that variable, the database sizing matches up the size on the disk. Thanks