New features in IBM Informix V11.7 let you dramatically improve the
performance of running
update statistics is now
quicker and more efficient, you can run the utility more often then you
have in the past. For example, you could run the
update statistics command
daily instead of weekly. Depending on the workload and queries on the
system, this can improve the query plans and help improve system
performance. These features impact and improve the way update statistics
collects information for the optimizer.
Updating statistics is one of the most important administrative tasks that
ensure the optimizer picks optimum execution plans for your queries.
Collecting statistics on large systems can take a long time. So, DBAs tend
to reduce the frequency of running
update statistics to avoid the
This article first describes how smarter statistics speed the running of
update statistics on your entire system, and then describes Fragment Level
Statistics, a feature which further improves
update statistics speed on
Have you ever wondered if the
update statistics command you are about to
run is actually needed since the last time you updated statistics? How
much has the data changed? Has it changed significantly enough to affect
Smarter statistics allows the server to determine if the existing
statistics accurately represent the current data or if they need to be
refreshed. Prior to IBM Informix V11.7, when you ran the
command, statistics were always recreated. Informix V11.7 introduces an
AUTOMATIC mode of operation for
update statistics. In this mode, the
server allows you to define a data change threshold and refresh statistics
only if the data has changed beyond the threshold since they were last
built. Since the determination of stale statistics is based on how much
the data in the table or fragment has changed, a set of data change
counters (update, delete, inster (UDI)) are maintained for the table and
indices. These counters are maintained automatically at the storage layer
without requiring any switch.
This means that if you have tables that have not changed by more than a
certain low percentage, then the
update statistics command on these tables
will not refresh the statistics, and the server considers the statistics
good enough for optimization purposes.
Tracking data changes using UDI counters
In Informix V11.7, the update, delete and insert counters have a size of eight bytes each, and are added to the table and index partition page. These counters are incremented by the respective DML operation occurring on the partition, both table and index partitions.
When you run the
update statistics command, a snapshot of these counters
is saved in their respective system catalogs. These values are then used
update statistics runs to determine if existing statistics
are good or stale.
The following tables list the new columns in system catalogs that capture UDI counters, and additional information captured during statistics run.
Table 1. New columns in V11.7 updated by UPDATE STATISTICS LOW
|sysfragments.nupdates||Number of updates on an detached index fragment|
|sysfragments.ndeletes||Number of deletes on an detached index fragment|
|sysfragments.ninserts||Number of inserts on an detached index fragment|
|sysindices.ustlowts||Timestamp of last update statistics low on index|
|sysindices.nupdates||Number of updates on table for attached index|
|sysindices.ndeletes||Number of deletes on table for attached index|
|sysindices.ninserts||Number of inserts on table for attached index|
|sysindices.ustbuildduration||Time taken to build low statistics on an attached index|
Table 2. New columns in V11.7 updated by UPDATE STATISTICS MEDIUM/HIGH
|sysdistrib.nupdates||Number of updates done on table when statistics are generated|
|sysdistrib.ndeletes||Number of deletes on table when statistics are generated|
|sysdistrib.ninserts||Number of inserts on table when statistics are generated|
|sysdistrib.ustbuildduration||Time taken to build table level column statistics.|
System setup and configuration
To enable the automatic mode of
update statistics, set the ONCONFIG
AUTO_STAT_MODE to 1.
This is the default setting and affects the entire system. If you wish to
restrict the feature to certain databases only, you can set the session
1 in your
procedure for the desired databases.
Another ONCONFIG parameter related to smarter statistics is STATCHANGE.
STATCHANGE is the percentage change threshold value for all tables in your
system. This ONCONFIG parameter takes an integer value between 0 and 100
and is set to a default value of 10%. Alternatively, STATCHANGE can also
be set using the SQL statement session environment STATCHANGE
x is an
integer between 0 and 100. When STATCHANGE is set in a session, it affects
the tables on which
update statistics is run in that session.
When you set the session level values using the SET ENVIRONMENT statement for AUTO_STAT_MODE and STATCHANGE, they override their corresponding ONCONFIG values.
SET ENVIRONMENT STATCHANGE "5"; UPDATE STATISTICS LOW FOR TABLE tab1;
In the previous example, you set the change threshold at 5% to decide if
low is run on
STATCHANGE table property
You can use the table property STATCHANGE if you wish to control the change threshold at the table level instead of system level. The table property is used to set the change threshold for that specific table and its indexes. Similar to its ONCONFIG and session environment counterpart, the property takes an integer values between 0 and 100. Table property can be used to override the system (ONCONFIG and/or session) setting.
You can either set the table property at table creation time, or alter it
at a later time. Syntax for STATCHANGE property in CREATE TABLE and ALTER
TABLE statements is:
STATCHANGE [integer |
You can use the integer value or use AUTO to default it to system setting.
CREATE TABLE tab1(cid serial, lname varchar(32), fname varchar(32)) STATCHANGE 5; ALTER TABLE tab1 STATCHANGE AUTO;
As you can see from previous example, the smarter statistics is universal to all tables and indexes regardless of whether they are fragmented or not.
Note: Since statistics impact the optimizer, you are advised to set STATCHANGE to a reasonable value so that queries continue to return in optimum fashion.
Specifying and/or overriding Smarter
Statistics using the
update statistics command
Once your system level configuration finishes using AUTO_STAT_MODE and
STATCHANGE at system or session level, smarter statistics ensures that
statistics are gathered only for those tables that need new statistics.
For the rest of the table, the
update statistics command is NO-OP.
However, there can be times when you want to refresh statistics for the
table irrespective of how much the data has changed. You can use the AUTO
and FORCE keywords in the
update statistics command to specify if Smarter
statistics should be used or not.
AUTO means to use AUTOMATIC mode of
update statistics thereby enforcing
FORCE means to forcefully rebuild all statistics regardless of change thresholds.
Note: The AUTO and FORCE keywords work with ALL modes of
and have no effect on procedure statistics.
This command will ignore the STATCHANGE computation and rebuild all the
UPDATE STATISTICS HIGH FOR TABLE tab1 FORCE;
The AUTO option uses STATCHANGE computation to detect and build stale statistics only, for example:
UPDATE STATISTICS LOW FOR TABLE tab1 AUTO;
Effect of smarter statistics
The effect of smarter statistics can be quite profound on the amount of time it takes to update statistics on large systems.
- Tables and indexes with changes less than the imposed STATCHANGE are
completely skipped during
- Since each index (fragment) has its own UDI counter, the decision to skip or rebuild index statistics is done based on how much the index (fragment) has changed.
- If the table has fragment-level statistics, the decision to refresh statistics is done at the fragment level, thereby rebuilding statistics only for changed or new fragments.
You can query columns
sysindices catalog tables before and after
running UPDATE STATISTICS LOW to check if statistics are skipped or
rebuilt. For MEDIUM and HIGH mode, query column
Fragment-level statistics affects the way UPDATE STATISTICS MEDIUM/HIGH
gathers data and generates column distribution on fragmented tables.
Statistics generated by UPDATE STATISTICS LOW on tables and indexes have
always been maintained at fragment level. These statistics are stored in
In MEDIUM or HIGH mode, a histogram of the column data is created. Column histogram is represented as bins or buckets holding the column values.
Figure 1 shows how table level statistics are generated on a fragmented
table. Notice that data from all fragments is sorted together to form
table level statistics, which is encoded and stored in the system catalog
sysdistrib. Table level statistics are decoded
and loaded into the data distribution cache for the optimizer to use.
Figure 1. Sample figure showing generation of table level statistics for a fragmented table
In simple terms, fragment-level statistics allows storage of column data distributions per fragment, and building the table level from its constituent fragments. Fragment level column statistics are represented as mini-bins because each mini-bin is a small percentage of the final column bin.
Figure 2 shows how fragment-level statistics are built and stored. Data
from each fragment is individually sorted, and mini-bin distribution is
built and stored in the
Finally table level statistics is formed by merging all constituent
fragments statistics and stored in
Figure 2. Sample figure showing generation of fragment-level statistics for a fragmented table
Fragmented statistics of a column is uniquely identified by the combination
colno. Schema of
sysfragdist is shown in table 3.
Table 3. Schema of sysfragdist catalog
|tabid||Unique identifying code of table ( = systables.tabid)|
|fragid||Unique identifying code of fragment ( = sysfragments.partnum)|
|colno||Unique identifying code of column ( = syscolumns.colno)|
|seqno||Sequence number (for distributions that span multiple rows)|
|mode||UPDATE STATISTICS mode (H = high, or M = medium)|
|resolution||Average percentage of the sample in each bin|
|confidence||Estimated likelihood that a MEDIUM mode sample value is equivalent to an exact HIGH mode result|
|rowssampled||Number of rows in the sample|
|ustbuildduration||Time spent to calculate the fragmented distribution for this column|
|constr_time||Time when the distribution was recorded|
|ustnrows||Rows in fragment when distribution was calculated|
|minibinsize||For internal use only|
|nupdates||Number of updates to the fragment|
|ndeletes||Number of deletes to the fragment|
|ninserts||Number of inserts to the table|
|encdist||Encrypted fragment distribution|
|dbsnum||Unique identifying code of |
|version||Reserved for future use|
You may want to consider fragment-level statistics in two main situations:
- If you have a time cyclic database schema where data changes are localized to rows with certain specific column values. For example, a sales table fragmented by transaction date column such that new rows are added each month to store current month sales data.
- You have fragmented tables and you frequently use ALTER FRAGMENT ATTACH/DETACH to add or remove data from the fragmented table.
Breaking the column distribution per fragment allows for statistics management at the fragment level. For example, refreshing of statistics can happen at fragment level, and Fragment Statistics can be added to, or removed from, table statistics.
Setting your system to use Fragment Level Statistics
Do the following to allow the server to build and use Fragment Level Statistics:
- Set ONCONFIG parameter AUTO_STAT_MODE to 1. This is a
default setting and affects the entire system. If you wish to restrict
the feature to certain databases only, you can alternatively set the
session environment AUTO_STAT_MODE to 1 in your
sysdbopenprocedure for the desired databases.
Ensure the ONCONFIG parameter SYSSBSPACENAME is set to a valid smartblob space, and allocate the smartblob using the
onspaces –c –Scommand.
onspaces -c -S sbspace -p /work/dbspaces/sbspace -s 100000 -o 0.
The amount of space required for a fragmented statistics of column depends on its datatype, number of fragments, and the resolution used in
update statistics. The datatype of the column determines the actual number of bytes a single minibin requires. Let's call this number as
x. The number of maximum values that a distribution minibin can hold, lets call this
y, is calculated as:
y = Round(Total number of rows * resolution * 0.01 / number of fragments * 100)
The approximate number of total minibins for a table with uniform distribution can then be calculated as:
z = Round(Total number of rows / number of values in a minibin (y))
The total space required can then be determined as:
Total space required = x * z
You may also use the Storage Provisioning feature in V11.7 to ensure chunks are automatically allocated when required.
The following table gives the size of minibin for each datatype with a combination of resolution and number of fragments.
Datatype Minibin size in bytes Number of fragments Resolution Space required in kilobytes integer 22 2 10 44 integer 22 3 10 66 integer 22 3 1.0 773 Character types 278 maximum 2 10 556 Character types 278 maximum 3 10 834 Character types 278 maximum 3 1.0 8340 Decimal(10,2) 27 2 10 54 Decimal(10,2) 27 3 10 81 Decimal(10,2) 27 3 1.0 810 Decimal(32) 59 2 10 118 Decimal(32) 59 2 10 177 Decimal(32) 59 2 10 1770 Datetime/Interval year to fraction(5) 40 2 10 80 Datetime/Interval year to fraction(5) 40 3 10 120 Datetime/Interval year to fraction(5) 40 3 1.0 1200 Datetime/Interval year to fraction 38 2 10 76 Datetime/Interval year to fraction 38 3 10 114 Datetime/Interval year to fraction 38 3 1.0 1140 Datetime/Interval year to day 24 2 10 48 Datetime/Interval year to day 24 3 10 72 Datetime/Interval year to day 24 3 1.0 720 Date 22 2 10 44 Date 22 3 10 66 Date 22 3 1.0 660 Float 32 2 10 64 Float 32 3 10 96 Float 32 3 1.0 960
The number of rows sampled affect the number of values contained in a minibin but do not directly impact the total space required to the fragmented statistics. However, depending on the skew or uniformity of the data, the number of overflow minibins, and hence the size, may vary.
Consider increasing logical logs to account for the additional fragment-level statistics rows in the
sysfragdistcatalog. Since the fragmented statistics are more granular than table statistics, they do take up more space as well.
With these steps, the database server can now identify the candidate tables
that can benefit from fragment-level statistics and creates them
AUTOMATICALLY when the regular
update statistics is run. The candidate
tables are identified by using the following set of rules:
- Table is fragmented by expression, LIST, or INTERVAL strategy.
- Table has over a million rows.
Alternatively, you can explicitly indicate which tables should have fragmented statistics by tweaking the table property STATLEVEL explained in the next section.
STATLEVEL table property
STATLEVEL is the level or granularity of the column distributions. STATLEVEL can be specified with CREATE TABLE and/or ALTER TABLE commands.
- TABLE – Distributions are created at table level.
- FRAGMENT – Distributions are created and maintained at each fragment.
- AUTO – In Automatic mode, apply rules to determine if Fragment Level Statistics should be created, otherwise default to table level distributions.
The following example forces the use of fragment-level statistics by setting the STATLEVEL at creation time.
CREATE TABLE tab1(col1 integer, col2 char(10)) FRAGMENT BY EXPRESSION (col1 >= 0 AND col1 < 1000) IN dbspace1, (col1 >= 1000 AND col1 < 2000) IN dbspace2, (col1 >= 2000 AND col1 < 3000) IN dbspace3, remainder in rootdbs STATLEVEL FRAGMENT;
Note: The FRAGMENT option of STATLEVEL can be used for any fragmented table regardless of strategy used.
Alternatively, the following ALTER command used at a later time achieves the same effect:
ALTER TABLE tab1 STATLEVEL FRAGMENT;
And you are done! The new fragment-level statistics will be created and
stored in the
sysfragdist system catalog when
UPDATE STATISTICS MEDIUM/HIGH is run on table tab1e.
Statistics and ALTER FRAGMENT
When you perform an ALTER FRAGMENT ATTACH/DETACH operation on a fragmented table with fragment-level statistics, the server will automatically refresh column statistics of the tables involved. Refreshing of the statistics runs in the background and the ALTER FRAGMENT command returns immediately after the alter operation is completed. The refreshing of statistics either starts after the alter operation is completed or if you are in a transaction, statistics refresh starts after the transaction is committed.
- For ATTACH operation: fragmented statistics of the new fragment is built, and table level statistics is rebuilt from all fragmented statistics. Any existing fragments with out-of-date column statistics will also be rebuilt at this time.
- For DETACH operation: table level statistics of the resulting tables are rebuilt from the fragmented statistics.
Note: Due to smarter statistics, any good fragmented statistics will be directly used for merging into table level statistics. This is not equivalent to running a complete statistics refresh on resulting tables of an ALTER FRAGMENT.
The background task that refreshes statistics called
refresh_table_stats is defined in the
ph_task table of the
sysadmin database. If any errors are found, it
will print errors in online.log.
With the smarter statistics feature, you can now configure the server to
automatically detect and refresh only stale statistics when you run the
update statistics command. By avoiding any unnecessary statistics updates,
the smarter statistics feature reduces the time it takes to do statistics
maintenance on your system. You also have the flexibility to configure
settings at all levels, including instance, session and table, based on
the needs of your system. With fragment-level statistics, the server can
create and manage column statistics at the fragment level. This allows the
server to refresh statistics for specific fragment(s) only. The server can
also efficiently rebuild table column statistics after ALTER FRAGMENT
ATTACH or DETACH is executed. Even if you configure and enable Smarter
Statistics, the AUTO and FORCE enhancements to
update statistics command
allows you to specify if only stale or all statistics should be
- Learn more about
update statisticsin the article " Understanding and tuning update statistics" (developerWorks, March 2002).
- Learn about storage space management in the article "Storage provisioning in Informix" (developerWorks, March, 2011).
- Get the resources you need to advance your Informix database skills in the Informix area on developerWorks.
- Read more articles about Informix on developerWorks.
- Learn more about Information Management at the developerWorks Information Management zone. Find technical documentation, how-to articles, education, downloads, product information, and more.
- Stay current with developerWorks technical events and webcasts.
- Follow developerWorks on Twitter.
Get products and technologies
- Download a trial version of Informix Ultimate Edition or one of the free editions, Informix Developer Edition of Innovator-C Edition.
- Build your next development project with IBM trial software, available for download directly from developerWorks.
- Participate in the discussion forum.
- Check out the developerWorks blogs and get involved in the developerWorks community.