The Informix Dynamic Server does all of its work in shared memory. The server keeps pages in shared memory as long as it can to avoid needless I/O. For example, rows inserted into a data page in separate transactions can be done without reading the page from disk, modifying, and writing back to disk for each transaction. At some point, though, the database server must write that page back to disk. Room may be needed for other pages.
Retaining modified pages in memory creates recovery considerations. If the database server goes offline unexpectedly, it might leave modified pages in memory and unwritten to disk. During the recovery processes, the server needs to bring the data to a point of consistency before it can begin new processing. This is the process of fast recovery, which is achieved in two steps:
- Physical recovery
- Physical recovery is returning all pre-images of pages back to disk. A pre-image is the image of a page before it was modified.
- Logical recovery
- Logical recovery is the rolling forward of all transactions from the last checkpoint. All transactions that were committed will be replayed completely. But any transactions that were not committed will be rolled back.
A checkpoint creates a point of synchronization between disk and shared memory. After synchronization, the database server has a known point of consistency. During a checkpoint, all buffers in the buffer pool plus the logical log buffers are flushed to disk so that they are stably stored.
When a checkpoint occurs, a checkpoint record is recorded in the logical log. It is also recorded in the reserve pages. During recovery, the reserve page checkpoint record is used to find the last point of data consistency, and it is the starting point for fast recovery.
Before V11, the following situations would trigger a checkpoint:
- Administrative events, such as the engine being shutdown, a chunk, or dbspaces being added
- The physical log being 75% full
- A checkpoint already in the logical log space: if the oldest logical log (the next to be written) contains the last checkpoint, then Informix has to write a new checkpoint so that the log can be available for reuse
All four triggers remain in V11, but the interval between checkpoints can be modified by new features.
Informix needs a checkpoint for recovery, because it is a known point of
consistency. Checkpoints are indirectly affected by the
LOGFILES configuration parameters. When a
logical log file is examined to see if it can be freed (and written),
it is checked to see if it has the last checkpoint. Before it can be
freed, the database server must write a new checkpoint record in the
current logical-log file. The frequency of this happening could cause
the frequency of checkpoints to increase. This is not usually a
significant factor in V11.
If the physical log is very large, a long checkpoint interval can lead
to an increase in the time needed in fast recovery. This can be vital
in the case of an engine crash in a situation where database availability
is important. In that situation, using a combination of
CKPTINTVL and a moderately-sized physical
log is recommended.
CKPTINTVL expires and there have been no
updates (nothing physically logged and no new administrative events),
a checkpoint will not occur. When there are few page updates or
changes to the system, it is common to have a longer
Depending on parameters such as the size of the physical log buffer, the size of the buffer pool, and how many dirty buffers there are, a checkpoint can take some time to finish. Before V11, the checkpoints would frequently block any threads from getting into a critical section of code, which is code that must be done all in one unit, such as writing a page. This sometimes had the effect of making user applications wait on checkpoints. Database administrators often expended a great deal of time tuning database parameters to minimize blocking. This often became an effort to keep the number of dirty buffers low to minimize the number cleaned during a checkpoint. The configuration used for tuning checkpoints was as follows:
LRU_MIN_DIRTY and LRU_MAX_DIRTYwas used to tune the number of dirty buffers in the buffer pool.
CHKPTINTVLsets the interval between checkpoints.
However, minimizing the number of dirty pages in the buffer pool also meant reducing the beneficial effects of caching.
Introduced in Informix V9, fuzzy checkpoints were meant to reduce disk writes during a checkpoint. Fuzzy checkpoints have been deprecated and replaced with new configuration parameters in Informix V11.
Three new configuration parameters that affect checkpoints were introduced in Informix V11. This tutorial describes them in more detail in a later section that is focused on automatic tuning.
For checkpoint tuning, you have the option of retaining the checkpoints offered in previous versions or using the new features. The new feature parameters are as follows:
AUTO_CKPTSmodifies checkpoint frequency to prevent blocking due to lack of resources.
LRU_MAX_DIRTYto prevent foreground writes.
RTO_SERVER_RESTARTsets the checkpoint interval to achieve a configured fast recovery time.
CHKPTINTVLis ignored when
The checkpoint algorithm was reworked for Informix V11. The new algorithm is almost non-blocking. The V11 database server allows threads that would have been blocked in previous engines to do work during a checkpoint.
Before V11, database administrators commonly tuned LRU flushing aggressively so that LRUs were constantly being drained. As a consequence, checkpoints would finish faster, because there was less work to do. Consequently the total time other threads had to wait for the checkpoint to finish was minimized.
It is no longer necessary to tune LRU flushing so aggressively. Because
transactions are not blocked during checkpoints, you can typically
LRU_MAX_DIRTY values that are much higher
than were practical in pre-V11 engines. The number of dirty buffers
processed during a checkpoint is much higher, and checkpoints are often
much longer. However, it has the advantage of performance gains due to
less LRU flushing and more caching of dirty pages.
Certain aspects of the checkpoint process have not changed with the advent of non-blocking checkpoints. During checkpoint processing, transactions continue to write before images to the physical log and transaction records to the logical logs. The engine must write at least one checkpoint in the span of the logical logs. The following circumstances trigger checkpoints that might become blocking if the log resources become too low during the checkpoint:
- Physical log is 75% full
- Checkpoint is required because logical logs are almost spanned
In order to avoid situations in which checkpoints block transaction processing, complete the following steps:
- Turn on the automatic checkpoint feature. More frequent checkpoints will occur. This will prevent blocking from taking place due to lack of resources.
- Increase the physical or logical log size. The server will place a message in the online log to suggest which resource to increase and what size the resource should be.
ONDBSPACEDOWN is a configuration parameter
that dictates how the server will handle a non-mirrored regular
dbspace going down from an I/O error. Depending on the setting, the
engine could hang during a checkpoint. Temporary dbspaces are not
A message is read in the online log when a checkpoint completes. This
message also specifies how long a checkpoint took to complete. Before
V11, this was also an ad-hoc specification on how long users were
blocked out of critical sections. DBAs could use this info to help
tune their checkpoints. You can read these messages using
A new onstat option (
-g ckp) enables you to
track the checkpoint history for the previous 20 checkpoints. The
onstat option gives checkpoint duration. It also presents the trigger that
caused the checkpoint, which should be useful for configuration.