APAR status
Closed as program error.
Error description
It may happen on an Instance in an HA environment that the server face in a complete freeze because of a deadlock wait situation between an RSS_send and an dr_prsend thread. The wait can be identified by the print of the locked mutexes Locked mutexes: mid addr name holder lkcnt waiter waittime 21294 cc52ce68 drcb_lock 25677 0 21295 cc52cf10 drcb_node_count_lo 25677 0 69992 24827 21307 cc534080 SynchSWMR_t::0xcc5 69992 0 Owner of the drcb_node_count_lock is the thread 25677 which is dr_prsend. The wait for this mutex is thread 69992 which is RSS_send. From the onstat -g ath we can see the following status for the threads 25677 dr_prsend 1cpu 11/01 11:40:40 2.5455 46375 cond wait ReliableCV 69992 RSS_Send_ie1_ix 8cpu 11/01 11:40:40 0.0029 7 mutex wait drcb_node_ the owner of the mutex, thread 25677 is waiting for an condition which is tied to the mutext the RSS_send is owning. The stacks for the threads are Stack for thread: 69992 RSS_Send_ie1_ixdpp01a_qa base: 0x00000000d8126000 len: 69632 pc: 0x000000000143ead7 tos: 0x00000000d8136a10 state: mutex wait vp: 8 0x000000000143ead7 (oninit) yield_processor_mvp 0x000000000144afae (oninit) mt_lock_wait 0x0000000001451072 (oninit) mt_lock_helper 0x0000000001202138 (oninit) cloneAttachCB 0x0000000001206bf2 (oninit) cloneSend_Int 0x00000000011f0b82 (oninit) cloneStdSend 0x0000000001419870 (oninit) th_init_initgls 0x000000000145f2b7 (oninit) startup Stack for thread: 25677 dr_prsend base: 0x00000000dc972000 len: 69632 pc: 0x000000000143ead7 tos: 0x00000000dc982c40 state: cond wait vp: 1 0x000000000143ead7 (oninit) yield_processor_mvp 0x0000000001453441 (oninit) mt_wait 0x000000000107db04 (oninit) reliablecv_wait 0x000000000107ed7b (oninit) synchswmr_reader_enter 0x000000000128e418 (oninit) SendGlobalVersionInfo 0x00000000011d3822 (oninit) dr_state_change 0x00000000011dbf46 (oninit) dr_session_thread 0x000000000145f2b7 (oninit) startup Additional in the customer environment where the problem was diagnosed, there were a lot of waiters for the condition ReliableCV, since there were reads on the tables syscluster and sysha_nodes. These are victims not the rootcause.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: * * Users of Informix Server prior to 12.10.xC14 and 14.10.xC4. * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Update to Informix Server 12.10.xC14 or 14.10.xC4. * ****************************************************************
Problem conclusion
Fixed in Informix Server 12.10.xC14 and 14.10.xC4.
Temporary fix
Comments
APAR Information
APAR number
IT30876
Reported component name
INFORMIX SERVER
Reported component ID
5725A3900
Reported release
C10
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2019-11-07
Closed date
2020-02-27
Last modified date
2020-02-27
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
INFORMIX SERVER
Fixed component ID
5725A3900
Applicable component levels
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSGU8G","label":"Informix Servers"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"C10","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]
Document Information
Modified date:
27 February 2020