Monitoring checkpoint performance
Table 1. Monitoring checkpoints with onstat -ckp
| Critical Sections | Dskflu | Physical Log | Avg | Logical Log | Avg | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Interval | Clock Time | Trigger | LSN | Total Time | Flush Time | Block Time | # Waits | Ckpt Time | Wait Time | Long Time | # Dirty Buffers | /Sec | Total Pages | /Sec | Total Pages | /Sec |
| 1 | 18:41:36 | Startup | 1:f8 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 4 | 4 | 3 | 0 | 1 | 0 |
| 2 | 18:41:49 | Admin | 1:11c12cc | 0.3 | 0.2 | 0.0 | 1 | 0.0 | 0.0 | 0.0 | 2884 | 2884 | 1966 | 163 | 4549 | 379 |
| 3 | 18:42:21 | Llog | 8:188 | 2.3 | 2.0 | 2.0 | 1 | 0.0 | 2.0 | 2.0 | 14438 | 7388 | 318 | 10 | 65442 | 2181 |
| 4 | 18:42:44 | *User | 10:19c018 | 0.0 | 0.0 | 0.0 | 1 | 0.0 | 0.0 | 0.0 | 39 | 39 | 536 | 21 | 20412 | 816 |
| 5 | 18:46:21 | RTO | 13:188 | 197 | 196.9 | 0.0 | 19 | 0.0 | 0.0 | 0.0 | 158381 | 804 | 140253 | 2921 | 301751 | 6286 |
| Max Plog pages/wc | Max Llog pages/sec | Max Dskflush Time | Avg Dskflush pages/sec | Avg Dirty pages/sec | Blocked Time | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| 2954 | 1160 | 197 | 780 | 2187 | 0 | |||||
Some of the following messages might also be displayed along with the before-mentioned performance advisory information. They are intended to provide suggestions that would lead to better performance.
| ||||||||||
| Item | Units | Description |
|---|---|---|
| AUTO_CKPTS | On/Off | Displays if automatic checkpoints feature is on or off |
| RTO_SERVER_RESTART | Seconds | Displays the RTO policy. 0=RTO policy is off. |
| Estimated recovery time | Seconds | This is the estimated time it would take the IDS server to perform fast recovery. |
| Interval | Number | Checkpoint interval id |
| Clock Time | Wall clock time | This is the wall clock time that the checkpoint occurred |
| Trigger | Text | There are several events that can trigger a checkpoint. The most common are RTO (trying to maintain the Recovery Time Objective policy), Plog (physical log is 75% full) or Llog (running out of logical log resources). If there is a * in front of the event, that indicated the checkpoint was a transaction blocking checkpoint. |
| LSN | Log position | Log position of checkpoint |
| Total Time | Seconds | Total checkpoint duration from request time to checkpoint completion |
| Flush Time | Seconds | Time to flush bufferpools |
| Block Time | Seconds | Transaction blocking time |
| # Waits | Number | Number of transactions that blocked waiting for checkpoint |
| Ckpt Time | Seconds | This is the amount of time it takes for all transactions to recognize a checkpoint has been requested |
| Wait Time | Seconds | Average time thread waited for checkpoint |
| Long Time | Seconds | Longest amount of time a transaction waited for checkpoint |
| # Dirty Buffers | Number | Number of buffers flushed to disk during checkpoint processing |
| Dskflu/Sec | Number | Number of buffers flushed to disk per sec during checkpoint processing |
| Plog Total Pages | Number | Total number of pages physically logged during the checkpoint interval |
| Plog Avg/Sec | Number | Average rate of physical log activity during the checkpoint interval |
| Llog Total Pages | Number | Total number of pages logically logged during the checkpoint interval |
| Llog Avg/Sec | Number | Average rate of logical log activity during the checkpoint interval |
Table 2. Monitoring checkpoints with sysmaster
| Column Name | Column Type | Description |
|---|---|---|
| intvl | int | Checkpoint interval id |
| type | char(12) | Blocking or Non-blocking |
| caller | char(10) | There are several events that can trigger a checkpoint. The most common are RTO (trying to maintain the Recovery Time Objective policy), Plog (physical log is 75% full) or Llog (running out of logical log resources). If there is a * in front of the event, that indicated the checkpoint was a transaction blocking checkpoint. |
| clock_time | int | This is the wall clock time that the checkpoint occurred. See UNIX localtime() function. |
| crit_time | float | This is the amount of time it takes for all transactions to recognize a checkpoint has been requested |
| flush_time | float | Time taken to flush all the bufferpools |
| cp_time | float | Total checkpoint duration from request time to checkpoint completion |
| n_dirty_buffs | int | Number of buffers flushed to disk during checkpoint processing |
| plogs_per_sec | int | Average rate of physical log activity during the checkpoint interval |
| llogs_per_sec | int | Average rate of logical log activity during the checkpoint interval |
| dskflush_per_sec | int | Number of buffers flushed to disk per sec during checkpoint processing |
| ckpt_logid | int | Log position of checkpoint |
| ckpt_logpos | int | Log position of checkpoint |
| physused | int | Total number of pages physically logged during the checkpoint interval |
| logused | int | Total number of pages logically logged during the checkpoint interval |
| n_crit_waits | int | Number of transactions that blocked waiting for checkpoint |
| tot_crit_wait | float | Time accumulated by all transactions waiting on checkpoint |
| longest_crit_wait | float | Longest amount of time a transaction waited for checkpoint |
| block_time | float | Transaction blocking time |
| Column Name | Column Type | Description |
|---|---|---|
| ckpt_status | int | 0x0011=A checkpoint was blocked because of the physical log running out of resources. 0x0020=A checkpoint was blocked because of the logical log running out of resources. 0x0040=A checkpoint was blocked because of long transactions. 0x1000=Physical log is too small. 0x2000=Logical log space is too small. 0x4000=The physical log is too small for RTO |
| plogs_per_S | int | Average rate of physical logging activity |
| llogs_per_S | int | Average rate of logical logging activity |
| dskF_per_S | int | Average rate of pages flushed to disk |
| longest_dskF | int | Longest duration of time that it took to flush the bufferpool to disk during checkpoint processing |
| dirty_pgs_S | int | Average rate of pages being modified |
| sug_plog_size | int | Suggested physical log size |
| sug_llog_sz | int | Suggested logical log space size |
| ras_plog_sp | int | Rate at which fast recovery can restore the physical log |
| ras_llog_sp | int | Rate at which fast recovery can replay the logical log |
| boottime | int | Time it takes for the IDS server to boot shared memory and open chunks |
| auto-ckpts | int | 1=On,0=Off |
| auto_lru | int | 1=On,0=Off |
| cur_intvl | int | Current checkpoint interval id |
| boottime | int | 1=On,0=Off |
