A fix is available
APAR status
Closed as program error.
Error description
On systems with multiple adapters on the same subnet, separate threads within the NIM module can end up in a "race condition" if the NIM is unable to bind a socket to the broadcast address, which may result in a core dump of the NIM process. There will be two errpt entries involves, a TS_NIM_DIED_ER and a CORE_DUMP. The relevant details should look similar to: ------------------------------------------------- LABEL: TS_NIM_DIED_ER IDENTIFIER: 38D19956 DETECTING MODULE rsct,nim_control.C,1.11.1.3,1845 Exit value, if not terminated with a signal 0 Signal number (0: no signal) 11 Core file created (1: core file; 0: no core file) 1 ------------------------------------------------- LABEL: CORE_DUMP IDENTIFIER: C60BB505 SIGNAL NUMBER 11 PROGRAM NAME (any hats nim process; was hats_nim in this case) ADDITIONAL INFORMATION receive_n CC receive_n 64 receive_t A4 _pthread_ D4 ?? Symptom Data REPORTABLE 1 INTERNAL ERROR 0 SYMPTOM CODE PCSS/SPI2 FLDS/hats_nim SIG/11 FLDS/receive_n VALU/cc FLDS/recei ---------------------------------------------------------------- A dbx of the core dump should reveal: (dbx) where local_adapter::receive_non_blocking(nim_adap_addr_union_t*,int*, (this = 0x2004b1d8, source_p = 0x20066978, pack_len_p = 0x200 msg = 0x200669e0), line 768 in "nim_local_adapter_ipv4.C" receive_thread_main(void*)(0x2004b638), line 896 in "nim_send_recv_ipv4.C" _pthread_body(??) at 0xd00080c8
Local fix
N/A - NIM module will be restarted automatically by hatsd.
Problem summary
A race condition has been identified in RSCT Topology Services's NIM (Network Interface Module). As a result of such race condition, it may happen that the NIM process may terminate abnormally with a core dump. When the NIM process terminates, a new instance is automatically started, and the subsystem will resume operating normally (without interruption to its client programs). And error log entry like the following will be created: ------------------------------------------------- LABEL: TS_NIM_DIED_ER IDENTIFIER: 38D19956 ... DETECTING MODULE rsct,nim_control.C,1.11.1.3,1845 Exit value, if not terminated with a signal 0 Signal number (0: no signal) 11 Core file created (1: core file; 0: no core file) 1 ------------------------------------------------- When examining the core file with dbx: dbx /usr/sbin/rsct/bin/hats_nim <core file> where the core file is located at /var/ha/run/topsvcs.<cluster_name>/core.nim.topsvcs.* a sequence like the following should be in the traceback: local_adapter::receive_non_blocking(nim_adap_addr_union_t*,i nt*,char**)(this = 0 x2004b1d8, source_p = 0x20066978, pack_len_p = 0x200669dc, msg = 0x200669e0), li ne 768 in "nim_local_adapter_ipv4.C" receive_thread_main(void*)(0x2004b638), line 896 in "nim_send_recv_ipv4.C" _pthread_body(??) at 0xd00080c8 An additional entry with LABEL "CORE_DUMP" will be created as well. The problem will happen infrequently, and only in HACMP configurations where multiple standby adapters for a given node belong to the same subnet.
Problem conclusion
The code in the RSCT Topology Services's NIM (Network Interface Module) was fixed to eliminate the race condition that was resulting in the abnormal termination of the NIM. With the fix, no more TS_NIM_DIED_ER error log entries should be created.
Temporary fix
Comments
APAR Information
APAR number
IY44496
Reported component name
RSCT/RMC FOR CS
Reported component ID
5765F07AP
Reported release
231
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Submitted date
2003-05-14
Closed date
2003-05-14
Last modified date
2004-01-20
APAR is sysrouted FROM one or more of the following:
IY43266
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
RSCT/RMC FOR CS
Fixed component ID
5765F07AP
Applicable component levels
R231 PSY U497101
UP04/01/20 I 1000
PTF to Fileset Mapping
U495556 rsct.basic.rte 2.3.1.3
U489856 rsct.basic.rte 2.3.1.2
U489029 rsct.basic.rte 2.3.1.1
U497101 rsct.basic.rte 2.3.1.5
U496336 rsct.basic.rte 2.3.1.4
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11O","label":"APARs - AIX 4.3 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"231","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11Q","label":"AIX 6.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"231","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11N","label":"APARs - AIX 5.1 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"231","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11P","label":"APARs - AIX 5.3 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"231","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11M","label":"APARs - AIX 5.2 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"231","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
20 January 2004