Take advantage of fragment-level statistics and smarter statistics in IBM Informix

Keeping statistics up to date is a regular part of the maintenance of relational databases. However, running update statistics itself incurs overhead. Learn how new features in IBM® Informix® V11.7 help you to improve efficiency by determining when you really need to run update statistics. Another enhancement in IBM Informix V11.7 improves efficiency by letting you gather fragment-level statistics for fragmented tables. Learn how these enhancements can help you optimize your database and applications.

Nita Dembla (nita@us.ibm.com), Advisory Software Engineer, IBM

Photo of author Nita DemblaNita Dembla has worked on the IBM Informix development team for the past 10 years, specializing in SQL and the optimizer.



21 April 2011

Also available in Chinese

Introduction

New features in IBM Informix V11.7 let you dramatically improve the performance of running update statistics. Because running update statistics is now quicker and more efficient, you can run the utility more often then you have in the past. For example, you could run the update statistics command daily instead of weekly. Depending on the workload and queries on the system, this can improve the query plans and help improve system performance. These features impact and improve the way update statistics collects information for the optimizer.

Updating statistics is one of the most important administrative tasks that ensure the optimizer picks optimum execution plans for your queries. Collecting statistics on large systems can take a long time. So, DBAs tend to reduce the frequency of running update statistics to avoid the overhead.

This article first describes how smarter statistics speed the running of update statistics on your entire system, and then describes Fragment Level Statistics, a feature which further improves update statistics speed on fragmented tables.

Smarter statistics

Overview

Have you ever wondered if the update statistics command you are about to run is actually needed since the last time you updated statistics? How much has the data changed? Has it changed significantly enough to affect column distribution?

Smarter statistics allows the server to determine if the existing statistics accurately represent the current data or if they need to be refreshed. Prior to IBM Informix V11.7, when you ran the update statistics command, statistics were always recreated. Informix V11.7 introduces an AUTOMATIC mode of operation for update statistics. In this mode, the server allows you to define a data change threshold and refresh statistics only if the data has changed beyond the threshold since they were last built. Since the determination of stale statistics is based on how much the data in the table or fragment has changed, a set of data change counters (update, delete, inster (UDI)) are maintained for the table and indices. These counters are maintained automatically at the storage layer without requiring any switch.

This means that if you have tables that have not changed by more than a certain low percentage, then the update statistics command on these tables will not refresh the statistics, and the server considers the statistics good enough for optimization purposes.

Tracking data changes using UDI counters

In Informix V11.7, the update, delete and insert counters have a size of eight bytes each, and are added to the table and index partition page. These counters are incremented by the respective DML operation occurring on the partition, both table and index partitions.

When you run the update statistics command, a snapshot of these counters is saved in their respective system catalogs. These values are then used by consecutive update statistics runs to determine if existing statistics are good or stale.

The following tables list the new columns in system catalogs that capture UDI counters, and additional information captured during statistics run.

Table 1. New columns in V11.7 updated by UPDATE STATISTICS LOW
ColumnDescription
sysfragments.nupdatesNumber of updates on an detached index fragment
sysfragments.ndeletesNumber of deletes on an detached index fragment
sysfragments.ninsertsNumber of inserts on an detached index fragment
sysindices.ustlowtsTimestamp of last update statistics low on index
sysindices.nupdatesNumber of updates on table for attached index
sysindices.ndeletesNumber of deletes on table for attached index
sysindices.ninsertsNumber of inserts on table for attached index
sysindices.ustbuilddurationTime taken to build low statistics on an attached index
Table 2. New columns in V11.7 updated by UPDATE STATISTICS MEDIUM/HIGH
ColumnDescription
sysdistrib.nupdatesNumber of updates done on table when statistics are generated
sysdistrib.ndeletesNumber of deletes on table when statistics are generated
sysdistrib.ninsertsNumber of inserts on table when statistics are generated
sysdistrib.ustbuilddurationTime taken to build table level column statistics.

System setup and configuration

To enable the automatic mode of update statistics, set the ONCONFIG variable AUTO_STAT_MODE to 1. This is the default setting and affects the entire system. If you wish to restrict the feature to certain databases only, you can set the session environment AUTO_STAT_MODE to 1 in your sysdbopen procedure for the desired databases.

Another ONCONFIG parameter related to smarter statistics is STATCHANGE. STATCHANGE is the percentage change threshold value for all tables in your system. This ONCONFIG parameter takes an integer value between 0 and 100 and is set to a default value of 10%. Alternatively, STATCHANGE can also be set using the SQL statement session environment STATCHANGE x, where x is an integer between 0 and 100. When STATCHANGE is set in a session, it affects the tables on which update statistics is run in that session.

When you set the session level values using the SET ENVIRONMENT statement for AUTO_STAT_MODE and STATCHANGE, they override their corresponding ONCONFIG values.

                SET ENVIRONMENT STATCHANGE "5";
                UPDATE STATISTICS LOW FOR TABLE tab1;

In the previous example, you set the change threshold at 5% to decide if update statistics low is run on tab1.

STATCHANGE table property

You can use the table property STATCHANGE if you wish to control the change threshold at the table level instead of system level. The table property is used to set the change threshold for that specific table and its indexes. Similar to its ONCONFIG and session environment counterpart, the property takes an integer values between 0 and 100. Table property can be used to override the system (ONCONFIG and/or session) setting.

You can either set the table property at table creation time, or alter it at a later time. Syntax for STATCHANGE property in CREATE TABLE and ALTER TABLE statements is: STATCHANGE [integer | AUTO].

You can use the integer value or use AUTO to default it to system setting.

                CREATE TABLE tab1(cid serial, lname varchar(32), fname varchar(32))
                STATCHANGE 5;
                ALTER TABLE tab1 STATCHANGE AUTO;

As you can see from previous example, the smarter statistics is universal to all tables and indexes regardless of whether they are fragmented or not.

Note: Since statistics impact the optimizer, you are advised to set STATCHANGE to a reasonable value so that queries continue to return in optimum fashion.

Specifying and/or overriding Smarter Statistics using the update statistics command

Once your system level configuration finishes using AUTO_STAT_MODE and STATCHANGE at system or session level, smarter statistics ensures that statistics are gathered only for those tables that need new statistics. For the rest of the table, the update statistics command is NO-OP. However, there can be times when you want to refresh statistics for the table irrespective of how much the data has changed. You can use the AUTO and FORCE keywords in the update statistics command to specify if Smarter statistics should be used or not.

AUTO means to use AUTOMATIC mode of update statistics thereby enforcing smarter statistics.

FORCE means to forcefully rebuild all statistics regardless of change thresholds.

Note: The AUTO and FORCE keywords work with ALL modes of update statistics and have no effect on procedure statistics.

This command will ignore the STATCHANGE computation and rebuild all the table tab1 statistics.

                UPDATE STATISTICS HIGH FOR TABLE tab1 FORCE;

The AUTO option uses STATCHANGE computation to detect and build stale statistics only, for example:

UPDATE STATISTICS LOW FOR TABLE tab1
AUTO;

Effect of smarter statistics

The effect of smarter statistics can be quite profound on the amount of time it takes to update statistics on large systems.

  • Tables and indexes with changes less than the imposed STATCHANGE are completely skipped during update statistics.
  • Since each index (fragment) has its own UDI counter, the decision to skip or rebuild index statistics is done based on how much the index (fragment) has changed.
  • If the table has fragment-level statistics, the decision to refresh statistics is done at the fragment level, thereby rebuilding statistics only for changed or new fragments.

You can query columns ustlowts in systables and sysindices catalog tables before and after running UPDATE STATISTICS LOW to check if statistics are skipped or rebuilt. For MEDIUM and HIGH mode, query column constr_time in sysdistrib and sysfragdist catalogs.


Fragment-level statistics

Concept

Fragment-level statistics affects the way UPDATE STATISTICS MEDIUM/HIGH gathers data and generates column distribution on fragmented tables. Statistics generated by UPDATE STATISTICS LOW on tables and indexes have always been maintained at fragment level. These statistics are stored in catalog sysfragments.

In MEDIUM or HIGH mode, a histogram of the column data is created. Column histogram is represented as bins or buckets holding the column values.

Figure 1 shows how table level statistics are generated on a fragmented table. Notice that data from all fragments is sorted together to form table level statistics, which is encoded and stored in the system catalog sysdistrib. Table level statistics are decoded and loaded into the data distribution cache for the optimizer to use.

Figure 1. Sample figure showing generation of table level statistics for a fragmented table
Table level statistics illustration

In simple terms, fragment-level statistics allows storage of column data distributions per fragment, and building the table level from its constituent fragments. Fragment level column statistics are represented as mini-bins because each mini-bin is a small percentage of the final column bin.

Figure 2 shows how fragment-level statistics are built and stored. Data from each fragment is individually sorted, and mini-bin distribution is built and stored in the sysfragdist catalog. Finally table level statistics is formed by merging all constituent fragments statistics and stored in sysdistrib.

Figure 2. Sample figure showing generation of fragment-level statistics for a fragmented table
fragment-level statistics illustration

Fragmented statistics of a column is uniquely identified by the combination of tabid, fragid, and colno. Schema of sysfragdist is shown in table 3.

Table 3. Schema of sysfragdist catalog
ColumnDescription
tabidUnique identifying code of table ( = systables.tabid)
fragidUnique identifying code of fragment ( = sysfragments.partnum)
colnoUnique identifying code of column ( = syscolumns.colno)
seqnoSequence number (for distributions that span multiple rows)
modeUPDATE STATISTICS mode (H = high, or M = medium)
resolutionAverage percentage of the sample in each bin
confidenceEstimated likelihood that a MEDIUM mode sample value is equivalent to an exact HIGH mode result
rowssampledNumber of rows in the sample
ustbuilddurationTime spent to calculate the fragmented distribution for this column
constr_timeTime when the distribution was recorded
ustnrowsRows in fragment when distribution was calculated
minibinsizeFor internal use only
nupdatesNumber of updates to the fragment
ndeletesNumber of deletes to the fragment
ninsertsNumber of inserts to the table
encdistEncrypted fragment distribution
dbsnumUnique identifying code of sbspace where encdist is stored
versionReserved for future use

Considerations

You may want to consider fragment-level statistics in two main situations:

  • If you have a time cyclic database schema where data changes are localized to rows with certain specific column values. For example, a sales table fragmented by transaction date column such that new rows are added each month to store current month sales data.
  • You have fragmented tables and you frequently use ALTER FRAGMENT ATTACH/DETACH to add or remove data from the fragmented table.

Breaking the column distribution per fragment allows for statistics management at the fragment level. For example, refreshing of statistics can happen at fragment level, and Fragment Statistics can be added to, or removed from, table statistics.

Setting your system to use Fragment Level Statistics

Do the following to allow the server to build and use Fragment Level Statistics:

  • Set ONCONFIG parameter AUTO_STAT_MODE to 1. This is a default setting and affects the entire system. If you wish to restrict the feature to certain databases only, you can alternatively set the session environment AUTO_STAT_MODE to 1 in your sysdbopen procedure for the desired databases.
  • Ensure the ONCONFIG parameter SYSSBSPACENAME is set to a valid smartblob space, and allocate the smartblob using the onspaces –c –S command.

    For example: onspaces -c -S sbspace -p /work/dbspaces/sbspace -s 100000 -o 0.

    The amount of space required for a fragmented statistics of column depends on its datatype, number of fragments, and the resolution used in update statistics. The datatype of the column determines the actual number of bytes a single minibin requires. Let's call this number as x. The number of maximum values that a distribution minibin can hold, lets call this y, is calculated as:

    y = Round(Total number of rows * resolution * 0.01 / number of fragments * 100)

    The approximate number of total minibins for a table with uniform distribution can then be calculated as:

    z = Round(Total number of rows / number of values in a minibin (y))

    The total space required can then be determined as:

    Total space required = x * z

    You may also use the Storage Provisioning feature in V11.7 to ensure chunks are automatically allocated when required.

    The following table gives the size of minibin for each datatype with a combination of resolution and number of fragments.


    DatatypeMinibin size in bytesNumber of fragmentsResolutionSpace required in kilobytes
    integer2221044
    integer2231066
    integer2231.0773
    Character types278 maximum210556
    Character types278 maximum310834
    Character types278 maximum31.08340
    Decimal(10,2)2721054
    Decimal(10,2)2731081
    Decimal(10,2)2731.0810
    Decimal(32)59210118
    Decimal(32)59210177
    Decimal(32)592101770
    Datetime/Interval year to fraction(5)4021080
    Datetime/Interval year to fraction(5)40310120
    Datetime/Interval year to fraction(5)4031.01200
    Datetime/Interval year to fraction3821076
    Datetime/Interval year to fraction38310114
    Datetime/Interval year to fraction3831.01140
    Datetime/Interval year to day2421048
    Datetime/Interval year to day2431072
    Datetime/Interval year to day2431.0720
    Date2221044
    Date2231066
    Date2231.0660
    Float3221064
    Float3231096
    Float3231.0960

    The number of rows sampled affect the number of values contained in a minibin but do not directly impact the total space required to the fragmented statistics. However, depending on the skew or uniformity of the data, the number of overflow minibins, and hence the size, may vary.

  • Consider increasing logical logs to account for the additional fragment-level statistics rows in the sysfragdist catalog. Since the fragmented statistics are more granular than table statistics, they do take up more space as well.

With these steps, the database server can now identify the candidate tables that can benefit from fragment-level statistics and creates them AUTOMATICALLY when the regular update statistics is run. The candidate tables are identified by using the following set of rules:

  • Table is fragmented by expression, LIST, or INTERVAL strategy.
  • Table has over a million rows.

Alternatively, you can explicitly indicate which tables should have fragmented statistics by tweaking the table property STATLEVEL explained in the next section.

STATLEVEL table property

STATLEVEL is the level or granularity of the column distributions. STATLEVEL can be specified with CREATE TABLE and/or ALTER TABLE commands.

  • TABLE – Distributions are created at table level.
  • FRAGMENT – Distributions are created and maintained at each fragment.
  • AUTO – In Automatic mode, apply rules to determine if Fragment Level Statistics should be created, otherwise default to table level distributions.

The following example forces the use of fragment-level statistics by setting the STATLEVEL at creation time.

                CREATE TABLE tab1(col1 integer, col2 char(10))
                FRAGMENT BY EXPRESSION
                (col1 >= 0 AND col1 < 1000) IN dbspace1,
                (col1 >= 1000 AND col1 < 2000) IN dbspace2,
                (col1 >= 2000 AND col1 < 3000) IN dbspace3,
                remainder in rootdbs
                STATLEVEL FRAGMENT;

Note: The FRAGMENT option of STATLEVEL can be used for any fragmented table regardless of strategy used.

Alternatively, the following ALTER command used at a later time achieves the same effect:

                 ALTER TABLE tab1 STATLEVEL FRAGMENT;

And you are done! The new fragment-level statistics will be created and stored in the sysfragdist system catalog when UPDATE STATISTICS MEDIUM/HIGH is run on table tab1e.


Statistics and ALTER FRAGMENT

When you perform an ALTER FRAGMENT ATTACH/DETACH operation on a fragmented table with fragment-level statistics, the server will automatically refresh column statistics of the tables involved. Refreshing of the statistics runs in the background and the ALTER FRAGMENT command returns immediately after the alter operation is completed. The refreshing of statistics either starts after the alter operation is completed or if you are in a transaction, statistics refresh starts after the transaction is committed.

  • For ATTACH operation: fragmented statistics of the new fragment is built, and table level statistics is rebuilt from all fragmented statistics. Any existing fragments with out-of-date column statistics will also be rebuilt at this time.
  • For DETACH operation: table level statistics of the resulting tables are rebuilt from the fragmented statistics.

Note: Due to smarter statistics, any good fragmented statistics will be directly used for merging into table level statistics. This is not equivalent to running a complete statistics refresh on resulting tables of an ALTER FRAGMENT.

The background task that refreshes statistics called refresh_table_stats is defined in the ph_task table of the sysadmin database. If any errors are found, it will print errors in online.log.


Conclusion

With the smarter statistics feature, you can now configure the server to automatically detect and refresh only stale statistics when you run the update statistics command. By avoiding any unnecessary statistics updates, the smarter statistics feature reduces the time it takes to do statistics maintenance on your system. You also have the flexibility to configure settings at all levels, including instance, session and table, based on the needs of your system. With fragment-level statistics, the server can create and manage column statistics at the fragment level. This allows the server to refresh statistics for specific fragment(s) only. The server can also efficiently rebuild table column statistics after ALTER FRAGMENT ATTACH or DETACH is executed. Even if you configure and enable Smarter Statistics, the AUTO and FORCE enhancements to update statistics command allows you to specify if only stale or all statistics should be refreshed.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=647743
ArticleTitle=Take advantage of fragment-level statistics and smarter statistics in IBM Informix
publish-date=04212011