IT27526: ONDBSPACEDOWN ALLOWING A CHECKPOINT WITH I/O ERRORS TO COMPLETE

APAR status

Closed as program error.

Error description

With ONDBSPACEDOWN 2 (wait) you could see the following:

01:54:22  Assert Warning: I/O write chunk 28, pagenum 20,
pagecnt 4 --> errno = 2
01:54:22  IBM Informix Dynamic Server Version 12.10.FC5W1XZ
01:54:22   Who: Thread(13, flush_sub(1), 6207b1a8, 11)
                File: rsbuff.c Line: 5725
01:54:22   Action: Please notify IBM Informix Techical Support.
01:54:22  stack trace for pid 113092 written to
/opt/informix/tmp/af.3f5d91d
01:54:22   See Also: /opt/informix/tmp/af.3f5d91d
01:54:22  I/O write chunk 28, pagenum 20, pagecnt 4 --> errno =
2
01:54:33  Checkpoint Completed:  duration was 70 seconds.
01:54:33  Sat Nov 18 - loguniq 84375, logpos 0x3926b018,
timestamp: 0xd2a3ebff Interval: 324187

01:54:33  Maximum server connections 806
01:54:33  Checkpoint Statistics - Avg. Txn Block Time 0.122, #
Txns blocked 27, Plog used 130899, Llog used 266693

01:54:33  WARNING: Checkpoint blocked by down space, waiting for
override or shutdown
02:00:07  Logical Log 84376 Complete, timestamp: 0xd2a401a4.
02:00:07  Logical Log 84376 - Backup Started


That is, the ongoing checkpoint incurring errors flushing dirty
buffers to disk is allowed to complete and only the next
checkpoint would be blocked.

Consequently, after 'onmode -ky', chunks would not be marked
down and fast recovery would start at that inconsistent, but
complete checkpoint:

...
02:02:59  Logical Recovery Started.
02:02:59  56 recovery worker threads will be started.
02:03:04  Fast Recovery Switching to Log 84376
02:03:05  Fast Recovery Switching to Log 84377
02:03:06  Logical Recovery has reached the transaction cleanup
phase.
02:03:12  Checkpoint Completed:  duration was 6 seconds.
02:03:12  Sat Nov 18 - loguniq 84376, logpos 0x35e5018,
timestamp: 0xd2aa771e Interval: 324188

02:03:12  Maximum server connections 0
02:03:12  Checkpoint Statistics - Avg. Txn Block Time 0.000, #
Txns blocked 0, Plog used 16455, Llog used 0

02:03:13  Checkpoint Completed:  duration was 0 seconds.
02:03:13  Sat Nov 18 - loguniq 84377, logpos 0x1018, timestamp:
0xd2aa774d Interval: 324189

02:03:13  Maximum server connections 0
02:03:13  Checkpoint Statistics - Avg. Txn Block Time 0.000, #
Txns blocked 0, Plog used 1900, Llog used 2

02:03:14  Logical Recovery Complete.
          21224 Committed, 21 Rolled Back, 0 Open, 0 Bad Locks

02:03:15  Onconfig parameter RAS_PLOG_SPEED modified from 148485
to 89735.
02:03:15  Onconfig parameter RAS_LLOG_SPEED modified from 690 to
1565.
02:03:15  Dataskip is now OFF for all dbspaces
02:03:15  listener-thread: err = -27002: oserr = 0: errstr = :
No connections are allowed in quiescent mode.

02:03:16  Checkpoint Completed:  duration was 0 seconds.
02:03:16  Sat Nov 18 - loguniq 84377, logpos 0x290c0, timestamp:
0xd2aa7fae Interval: 324190

02:03:16  Maximum server connections 0
02:03:16  Checkpoint Statistics - Avg. Txn Block Time 0.000, #
Txns blocked 0, Plog used 1562, Llog used 41

02:03:16  On-Line Mode

... which can be told from interval number of first checkpoint
after fast recovery.

What seemingly got through fast recovery fine, indeed is missing
at least one page flushed to disk and has to be considered
inconsistent, with oncheck likely to find corruption.
In a clustered environment this also can be assumed to be the
cause of (differing) corruptions on HDR secondary or RSS.

Local fix

Problem summary

****************************************************************
* USERS AFFECTED:                                              *
* Users of IDS 12.10.xC10 and earlier versions.                *
****************************************************************
* PROBLEM DESCRIPTION:                                         *
* ONDBSPACEDOWN allowing a checkpoint with I/O errors to       *
* complete.                                                    *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************

Problem conclusion

```
Fixed in IDS 12.10.xC11.
```

Temporary fix

Comments

APAR Information

APAR number
IT27526
Reported component name
INFORMIX SERVER
Reported component ID
5725A3900
Reported release
C10
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2018-12-24
Closed date
2019-10-08
Last modified date
2019-10-08

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name
INFORMIX SERVER
Fixed component ID
5725A3900

Applicable component levels

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSGU8G","label":"Informix Servers"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"C10","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
08 October 2019

Tips

IT27526: ONDBSPACEDOWN ALLOWING A CHECKPOINT WITH I/O ERRORS TO COMPLETE

Subscribe to this APAR

APAR status

Closed as program error.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name

Fixed component ID

Applicable component levels

Document Information

Share your feedback

Need support?