APAR status
Closed as program error.
Error description
For this to happen the primary server needs to have MIRROR set to 1 in the onconfig. Then mistakenly, the secondary servers need to have MIRROR set to 0. The primary system needs to have enough chunks allocated on the system to force the server to extend the reserve pages for both the chunk page and the mirror chunk page. Due to the defect however, if the secondary system is bounced after this point, because MIRROR is 0 on the secondary, the engine will call chfree to free up those extended mirror chunk reserve pages on the secondary. At this point the primary and secondaries chunk free list pages for the root chunk are out of sync. This can lead to the following type errors and crashes on the secondary if space needs to be allocated in the rootdbs: From the online log: 13:19:23 Rollforward of log record failed. iserrno = 0 13:19:23 Log Record: log = 3, pos = 0x155ec, type = OLDRSAM:CHALLOC(51), trans = 27 13:19:23 Assert Failed: INFORMIX-OnLine Must ABORT Critical media failure. 13:19:23 IBM Informix Dynamic Server Version 11.50.FC6W1 13:19:23 Who: Session(16, informix@rocklaubster.lenexa.ibm.com, 0, 0x4b82ae08) Thread(62, xchg_2.0, 4b7f4b28, 1) File: rsmirror.c Line: 1727 13:19:23 stack trace for pid 30728 written to /work2/prod/ids1150fc6w1/tmp/af.426c32b 13:19:23 See Also: /work2/prod/ids1150fc6w1/tmp/af.426c32b, shmem.426c32b.0 13:19:27 rsmirror.c, line 1727, thread 62, proc id 30728, INFORMIX-OnLine Must ABORT Critical media failure.. 13:19:27 The Master Daemon Died 13:19:27 PANIC: Attempting to bring system down Stack trace of the xchg thread from the af file: afstack afhandler afcrash_interface bring_media_down rollfwd_error rlogm_redo scan_logredo scan_logredo next_lscan prod_loop1 producer_thread startup
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: * * Users with secondary server and MIRROR value in onconfig * * inconsistent between primary and secondary servers. * **************************************************************** * PROBLEM DESCRIPTION: * * If MIRROR is set to 1 on primary and 0 on secondary and we * * bounce the secondary then we can lose all the information * * regarding mirror chunks on secondary. * * At this point the primary and secondaries chunk free list * * pages for the root chunk are out of sync. This can lead to * * the following type errors and crashes on the secondary if * * space needs to be allocated in the rootdbs: * * * * From the online log: * * * * 13:19:23 Rollforward of log record failed. iserrno = 0 * * 13:19:23 Log Record: log = 3, pos = 0x155ec, type = * * OLDRSAM:CHALLOC(51), trans = 27 * * 13:19:23 Assert Failed: INFORMIX-OnLine Must ABORT * * Critical media failure. * * 13:19:23 IBM Informix Dynamic Server Version 11.50.FC6W1 * * 13:19:23 Who: Session(16, * * informix@rocklaubster.lenexa.ibm.com, 0, 0x4b82ae08) * * Thread(62, xchg_2.0, 4b7f4b28, 1) * * File: rsmirror.c Line: 1727 * * 13:19:23 stack trace for pid 30728 written to * * /work2/prod/ids1150fc6w1/tmp/af.426c32b * * 13:19:23 See Also: * * /work2/prod/ids1150fc6w1/tmp/af.426c32b, shmem.426c32b.0 * * 13:19:27 rsmirror.c, line 1727, thread 62, proc id 30728, * * INFORMIX-OnLine Must ABORT * * Critical media failure.. * * 13:19:27 The Master Daemon Died * * 13:19:27 PANIC: Attempting to bring system down * * * * Stack trace of the xchg thread from the af file: * * * * afstack * * afhandler * * afcrash_interface * * bring_media_down * * rollfwd_error * * rlogm_redo * * scan_logredo * * scan_logredo * * next_lscan * * prod_loop1 * * producer_thread * * startup * **************************************************************** * RECOMMENDATION: * * Upgrade to 11.50.xC8 or above. * ****************************************************************
Problem conclusion
IDS 11.50.xc8 has the fix for this problem. We will not allow clearing of mirror reserved pages on secondary server irrespective of MIRROR setting.
Temporary fix
Comments
APAR Information
APAR number
IC68817
Reported component name
IBM IDS ENTRP E
Reported component ID
5724L2304
Reported release
B15
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2010-05-24
Closed date
2011-01-20
Last modified date
2011-01-20
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
IBM IDS ENTRP E
Fixed component ID
5724L2304
Applicable component levels
RB15 PSY
UP
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSGU8G","label":"Informix Servers"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"B15","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
20 January 2011