New features in IBM Informix V11.7 let you dramatically improve the
performance of running update statistics.
Because running update statistics is now
quicker and more efficient, you can run the utility more often then you
have in the past. For example, you could run the update statistics command
daily instead of weekly. Depending on the workload and queries on the
system, this can improve the query plans and help improve system
performance. These features impact and improve the way update statistics
collects information for the optimizer.
Updating statistics is one of the most important administrative tasks that
ensure the optimizer picks optimum execution plans for your queries.
Collecting statistics on large systems can take a long time. So, DBAs tend
to reduce the frequency of running update statistics to avoid the
overhead.
This article first describes how smarter statistics speed the running of
update statistics on your entire system, and then describes Fragment Level
Statistics, a feature which further improves update statistics speed on
fragmented tables.
Have you ever wondered if the update statistics command you are about to
run is actually needed since the last time you updated statistics? How
much has the data changed? Has it changed significantly enough to affect
column distribution?
Smarter statistics allows the server to determine if the existing
statistics accurately represent the current data or if they need to be
refreshed. Prior to IBM Informix V11.7, when you ran the update statistics
command, statistics were always recreated. Informix V11.7 introduces an
AUTOMATIC mode of operation for update statistics. In this mode, the
server allows you to define a data change threshold and refresh statistics
only if the data has changed beyond the threshold since they were last
built. Since the determination of stale statistics is based on how much
the data in the table or fragment has changed, a set of data change
counters (update, delete, inster (UDI)) are maintained for the table and
indices. These counters are maintained automatically at the storage layer
without requiring any switch.
This means that if you have tables that have not changed by more than a
certain low percentage, then the update statistics command on these tables
will not refresh the statistics, and the server considers the statistics
good enough for optimization purposes.
Tracking data changes using UDI counters
In Informix V11.7, the update, delete and insert counters have a size of eight bytes each, and are added to the table and index partition page. These counters are incremented by the respective DML operation occurring on the partition, both table and index partitions.
When you run the update statistics command, a snapshot of these counters
is saved in their respective system catalogs. These values are then used
by consecutive update statistics runs to determine if existing statistics
are good or stale.
The following tables list the new columns in system catalogs that capture UDI counters, and additional information captured during statistics run.
Table 1. New columns in V11.7 updated by UPDATE STATISTICS LOW
| Column | Description |
|---|---|
| sysfragments.nupdates | Number of updates on an detached index fragment |
| sysfragments.ndeletes | Number of deletes on an detached index fragment |
| sysfragments.ninserts | Number of inserts on an detached index fragment |
| sysindices.ustlowts | Timestamp of last update statistics low on index |
| sysindices.nupdates | Number of updates on table for attached index |
| sysindices.ndeletes | Number of deletes on table for attached index |
| sysindices.ninserts | Number of inserts on table for attached index |
| sysindices.ustbuildduration | Time taken to build low statistics on an attached index |
Table 2. New columns in V11.7 updated by UPDATE STATISTICS MEDIUM/HIGH
| Column | Description |
|---|---|
| sysdistrib.nupdates | Number of updates done on table when statistics are generated |
| sysdistrib.ndeletes | Number of deletes on table when statistics are generated |
| sysdistrib.ninserts | Number of inserts on table when statistics are generated |
| sysdistrib.ustbuildduration | Time taken to build table level column statistics. |
System setup and configuration
To enable the automatic mode of update statistics, set the ONCONFIG
variable AUTO_STAT_MODE to 1.
This is the default setting and affects the entire system. If you wish to
restrict the feature to certain databases only, you can set the session
environment AUTO_STAT_MODE to
1 in your sysdbopen
procedure for the desired databases.
Another ONCONFIG parameter related to smarter statistics is STATCHANGE.
STATCHANGE is the percentage change threshold value for all tables in your
system. This ONCONFIG parameter takes an integer value between 0 and 100
and is set to a default value of 10%. Alternatively, STATCHANGE can also
be set using the SQL statement session environment STATCHANGE
x, where x is an
integer between 0 and 100. When STATCHANGE is set in a session, it affects
the tables on which update statistics is run in that session.
When you set the session level values using the SET ENVIRONMENT statement for AUTO_STAT_MODE and STATCHANGE, they override their corresponding ONCONFIG values.
SET ENVIRONMENT STATCHANGE "5";
UPDATE STATISTICS LOW FOR TABLE tab1;
|
In the previous example, you set the change threshold at 5% to decide if
update statistics low is run on
tab1.
You can use the table property STATCHANGE if you wish to control the change threshold at the table level instead of system level. The table property is used to set the change threshold for that specific table and its indexes. Similar to its ONCONFIG and session environment counterpart, the property takes an integer values between 0 and 100. Table property can be used to override the system (ONCONFIG and/or session) setting.
You can either set the table property at table creation time, or alter it
at a later time. Syntax for STATCHANGE property in CREATE TABLE and ALTER
TABLE statements is:
STATCHANGE [integer |
AUTO].
You can use the integer value or use AUTO to default it to system setting.
CREATE TABLE tab1(cid serial, lname varchar(32), fname varchar(32))
STATCHANGE 5;
ALTER TABLE tab1 STATCHANGE AUTO;
|
As you can see from previous example, the smarter statistics is universal to all tables and indexes regardless of whether they are fragmented or not.
Note: Since statistics impact the optimizer, you are advised to set STATCHANGE to a reasonable value so that queries continue to return in optimum fashion.
Specifying and/or overriding Smarter
Statistics using the update statistics command
Once your system level configuration finishes using AUTO_STAT_MODE and
STATCHANGE at system or session level, smarter statistics ensures that
statistics are gathered only for those tables that need new statistics.
For the rest of the table, the update statistics command is NO-OP.
However, there can be times when you want to refresh statistics for the
table irrespective of how much the data has changed. You can use the AUTO
and FORCE keywords in the update statistics command to specify if Smarter
statistics should be used or not.
AUTO means to use AUTOMATIC mode of update statistics thereby enforcing
smarter statistics.
FORCE means to forcefully rebuild all statistics regardless of change thresholds.
Note: The AUTO and FORCE keywords work with ALL modes of update statistics
and have no effect on procedure statistics.
This command will ignore the STATCHANGE computation and rebuild all the
table tab1 statistics.
UPDATE STATISTICS HIGH FOR TABLE tab1 FORCE;
|
The AUTO option uses STATCHANGE computation to detect and build stale statistics only, for example:
UPDATE STATISTICS LOW FOR TABLE tab1 AUTO; |
The effect of smarter statistics can be quite profound on the amount of time it takes to update statistics on large systems.
- Tables and indexes with changes less than the imposed STATCHANGE are
completely skipped during
update statistics. - Since each index (fragment) has its own UDI counter, the decision to skip or rebuild index statistics is done based on how much the index (fragment) has changed.
- If the table has fragment-level statistics, the decision to refresh statistics is done at the fragment level, thereby rebuilding statistics only for changed or new fragments.
You can query columns ustlowts in
systables and
sysindices catalog tables before and after
running UPDATE STATISTICS LOW to check if statistics are skipped or
rebuilt. For MEDIUM and HIGH mode, query column
constr_time in
sysdistrib and
sysfragdist catalogs.
Fragment-level statistics affects the way UPDATE STATISTICS MEDIUM/HIGH
gathers data and generates column distribution on fragmented tables.
Statistics generated by UPDATE STATISTICS LOW on tables and indexes have
always been maintained at fragment level. These statistics are stored in
catalog sysfragments.
In MEDIUM or HIGH mode, a histogram of the column data is created. Column histogram is represented as bins or buckets holding the column values.
Figure 1 shows how table level statistics are generated on a fragmented
table. Notice that data from all fragments is sorted together to form
table level statistics, which is encoded and stored in the system catalog
sysdistrib. Table level statistics are decoded
and loaded into the data distribution cache for the optimizer to use.
Figure 1. Sample figure showing generation of table level statistics for a fragmented table
In simple terms, fragment-level statistics allows storage of column data distributions per fragment, and building the table level from its constituent fragments. Fragment level column statistics are represented as mini-bins because each mini-bin is a small percentage of the final column bin.
Figure 2 shows how fragment-level statistics are built and stored. Data
from each fragment is individually sorted, and mini-bin distribution is
built and stored in the sysfragdist catalog.
Finally table level statistics is formed by merging all constituent
fragments statistics and stored in
sysdistrib.
Figure 2. Sample figure showing generation of fragment-level statistics for a fragmented table
Fragmented statistics of a column is uniquely identified by the combination
of tabid, fragid,
and colno. Schema of
sysfragdist is shown in table 3.
Table 3. Schema of sysfragdist catalog
| Column | Description |
|---|---|
| tabid | Unique identifying code of table ( = systables.tabid) |
| fragid | Unique identifying code of fragment ( = sysfragments.partnum) |
| colno | Unique identifying code of column ( = syscolumns.colno) |
| seqno | Sequence number (for distributions that span multiple rows) |
| mode | UPDATE STATISTICS mode (H = high, or M = medium) |
| resolution | Average percentage of the sample in each bin |
| confidence | Estimated likelihood that a MEDIUM mode sample value is equivalent to an exact HIGH mode result |
| rowssampled | Number of rows in the sample |
| ustbuildduration | Time spent to calculate the fragmented distribution for this column |
| constr_time | Time when the distribution was recorded |
| ustnrows | Rows in fragment when distribution was calculated |
| minibinsize | For internal use only |
| nupdates | Number of updates to the fragment |
| ndeletes | Number of deletes to the fragment |
| ninserts | Number of inserts to the table |
| encdist | Encrypted fragment distribution |
| dbsnum | Unique identifying code of sbspace
where encdist is stored |
| version | Reserved for future use |
You may want to consider fragment-level statistics in two main situations:
- If you have a time cyclic database schema where data changes are localized to rows with certain specific column values. For example, a sales table fragmented by transaction date column such that new rows are added each month to store current month sales data.
- You have fragmented tables and you frequently use ALTER FRAGMENT ATTACH/DETACH to add or remove data from the fragmented table.
Breaking the column distribution per fragment allows for statistics management at the fragment level. For example, refreshing of statistics can happen at fragment level, and Fragment Statistics can be added to, or removed from, table statistics.
Setting your system to use Fragment Level Statistics
Do the following to allow the server to build and use Fragment Level Statistics:
- Set ONCONFIG parameter AUTO_STAT_MODE to 1. This is a
default setting and affects the entire system. If you wish to restrict
the feature to certain databases only, you can alternatively set the
session environment AUTO_STAT_MODE to 1 in your
sysdbopenprocedure for the desired databases. Ensure the ONCONFIG parameter SYSSBSPACENAME is set to a valid smartblob space, and allocate the smartblob using the
onspaces –c –Scommand.For example:
onspaces -c -S sbspace -p /work/dbspaces/sbspace -s 100000 -o 0.The amount of space required for a fragmented statistics of column depends on its datatype, number of fragments, and the resolution used in
update statistics. The datatype of the column determines the actual number of bytes a single minibin requires. Let's call this number asx. The number of maximum values that a distribution minibin can hold, lets call thisy, is calculated as:y = Round(Total number of rows * resolution * 0.01 / number of fragments * 100)The approximate number of total minibins for a table with uniform distribution can then be calculated as:
z = Round(Total number of rows / number of values in a minibin (y))The total space required can then be determined as:
Total space required = x * zYou may also use the Storage Provisioning feature in V11.7 to ensure chunks are automatically allocated when required.
The following table gives the size of minibin for each datatype with a combination of resolution and number of fragments.
Datatype Minibin size in bytes Number of fragments Resolution Space required in kilobytes integer 22 2 10 44 integer 22 3 10 66 integer 22 3 1.0 773 Character types 278 maximum 2 10 556 Character types 278 maximum 3 10 834 Character types 278 maximum 3 1.0 8340 Decimal(10,2) 27 2 10 54 Decimal(10,2) 27 3 10 81 Decimal(10,2) 27 3 1.0 810 Decimal(32) 59 2 10 118 Decimal(32) 59 2 10 177 Decimal(32) 59 2 10 1770 Datetime/Interval year to fraction(5) 40 2 10 80 Datetime/Interval year to fraction(5) 40 3 10 120 Datetime/Interval year to fraction(5) 40 3 1.0 1200 Datetime/Interval year to fraction 38 2 10 76 Datetime/Interval year to fraction 38 3 10 114 Datetime/Interval year to fraction 38 3 1.0 1140 Datetime/Interval year to day 24 2 10 48 Datetime/Interval year to day 24 3 10 72 Datetime/Interval year to day 24 3 1.0 720 Date 22 2 10 44 Date 22 3 10 66 Date 22 3 1.0 660 Float 32 2 10 64 Float 32 3 10 96 Float 32 3 1.0 960
The number of rows sampled affect the number of values contained in a minibin but do not directly impact the total space required to the fragmented statistics. However, depending on the skew or uniformity of the data, the number of overflow minibins, and hence the size, may vary.
Consider increasing logical logs to account for the additional fragment-level statistics rows in the
sysfragdistcatalog. Since the fragmented statistics are more granular than table statistics, they do take up more space as well.
With these steps, the database server can now identify the candidate tables
that can benefit from fragment-level statistics and creates them
AUTOMATICALLY when the regular update statistics is run. The candidate
tables are identified by using the following set of rules:
- Table is fragmented by expression, LIST, or INTERVAL strategy.
- Table has over a million rows.
Alternatively, you can explicitly indicate which tables should have fragmented statistics by tweaking the table property STATLEVEL explained in the next section.
STATLEVEL is the level or granularity of the column distributions. STATLEVEL can be specified with CREATE TABLE and/or ALTER TABLE commands.
- TABLE – Distributions are created at table level.
- FRAGMENT – Distributions are created and maintained at each fragment.
- AUTO – In Automatic mode, apply rules to determine if Fragment Level Statistics should be created, otherwise default to table level distributions.
The following example forces the use of fragment-level statistics by setting the STATLEVEL at creation time.
CREATE TABLE tab1(col1 integer, col2 char(10))
FRAGMENT BY EXPRESSION
(col1 >= 0 AND col1 < 1000) IN dbspace1,
(col1 >= 1000 AND col1 < 2000) IN dbspace2,
(col1 >= 2000 AND col1 < 3000) IN dbspace3,
remainder in rootdbs
STATLEVEL FRAGMENT;
|
Note: The FRAGMENT option of STATLEVEL can be used for any fragmented table regardless of strategy used.
Alternatively, the following ALTER command used at a later time achieves the same effect:
ALTER TABLE tab1 STATLEVEL FRAGMENT;
|
And you are done! The new fragment-level statistics will be created and
stored in the sysfragdist system catalog when
UPDATE STATISTICS MEDIUM/HIGH is run on table tab1e.
When you perform an ALTER FRAGMENT ATTACH/DETACH operation on a fragmented table with fragment-level statistics, the server will automatically refresh column statistics of the tables involved. Refreshing of the statistics runs in the background and the ALTER FRAGMENT command returns immediately after the alter operation is completed. The refreshing of statistics either starts after the alter operation is completed or if you are in a transaction, statistics refresh starts after the transaction is committed.
- For ATTACH operation: fragmented statistics of the new fragment is built, and table level statistics is rebuilt from all fragmented statistics. Any existing fragments with out-of-date column statistics will also be rebuilt at this time.
- For DETACH operation: table level statistics of the resulting tables are rebuilt from the fragmented statistics.
Note: Due to smarter statistics, any good fragmented statistics will be directly used for merging into table level statistics. This is not equivalent to running a complete statistics refresh on resulting tables of an ALTER FRAGMENT.
The background task that refreshes statistics called
refresh_table_stats is defined in the
ph_task table of the
sysadmin database. If any errors are found, it
will print errors in online.log.
With the smarter statistics feature, you can now configure the server to
automatically detect and refresh only stale statistics when you run the
update statistics command. By avoiding any unnecessary statistics updates,
the smarter statistics feature reduces the time it takes to do statistics
maintenance on your system. You also have the flexibility to configure
settings at all levels, including instance, session and table, based on
the needs of your system. With fragment-level statistics, the server can
create and manage column statistics at the fragment level. This allows the
server to refresh statistics for specific fragment(s) only. The server can
also efficiently rebuild table column statistics after ALTER FRAGMENT
ATTACH or DETACH is executed. Even if you configure and enable Smarter
Statistics, the AUTO and FORCE enhancements to update statistics command
allows you to specify if only stale or all statistics should be
refreshed.
Learn
- Learn more about
update statisticsin the article " Understanding and tuning update statistics" (developerWorks, March 2002). - Learn about storage space management in
the article "Storage provisioning in Informix" (developerWorks, March,
2011).
- Get the resources you need to advance your
Informix database skills in the Informix area on developerWorks.
- Read more articles about Informix on developerWorks.
- Learn more about Information Management at
the developerWorks
Information Management zone. Find technical documentation, how-to
articles, education, downloads, product information, and more.
- Stay current with developerWorks technical events and webcasts.
- Follow developerWorks on
Twitter.
Get products and technologies
- Download a trial version of Informix
Ultimate Edition or one of the free editions, Informix Developer
Edition of Innovator-C Edition.
- Build your next
development project with IBM trial
software, available for download directly from
developerWorks.
Discuss
- Participate in the discussion forum.
- Check out the developerWorks
blogs and get involved in the developerWorks
community.





