IV21752: EIF: ERROR CODE 67 IS NOT HANDLED WHILE SENDING EVENTS

Fixes are available

IBM Tivoli Monitoring 6.2.3 Fix Pack 5 (6.2.3-TIV-ITM-FP0005)
IBM Tivoli Monitoring 6.2.3 Fix Pack 2 (6.2.3-TIV-ITM-FP0002)
IBM Tivoli Monitoring 6.2.3 Fix Pack 4 (6.2.3-TIV-ITM-FP0004)
Tivoli Log File Agent, Version 6.3.0 Fix Pack 01 (6.3.0-TIV-ITM_LFA-FP0001)
Tivoli Log File Agent, Version 6.3.0 Interim Fix 04 6.3.0-TIV-ITM_LFA-IF0004
Tivoli Log File Agent, Version 6.3.0 Fix Pack 02 (6.3.0-TIV-ITM_LFA-FP0002)
Tivoli Log File Agent, Version 6.3.0 Interim Fix 05 6.3.0-TIV-ITM_LFA-IF0005

APAR status

Closed as program error.

Error description

1) When error code is 67 (i.e E_IPC_BROKEN ), the EIF sender
should try to switch over or resend the event at the very least.


2) The head pointer keeps getting moved even though the event
has not been sent.


I think both issues can be addressed if we add checks for
E_IPC_BROKEN.

With several events written to cache it looks like;


+4F97A3BD.00C1 maxsz:      65536
+4F97A3BD.00C1 head :         54
+4F97A3BD.00C1 tail :        746

I see for the first event;

(4F97A3BD.00F6-2:sockeif.c,814,"_imp_eipc_recv_data")
<0x41EEBB50,0x0>
recv on fd 5, sock_error 0xFFFFFFFF, error 67
(4F97A3BD.00F7-2:eipc.c,564,"get_peer_response_timed") peer
response
PEER_RESPONSE_UNKNOWN

+4F97A3BD.010B maxsz:      65536
+4F97A3BD.010B head :        227
+4F97A3BD.010B tail :        746

The head has moved up the number of bytes in the first event
which was
173 - even though there was an error returned - it seems to have
been ignored and carried on.


then for the second event, the same;

(4F97A3BD.012B-2:sockeif.c,814,"_imp_eipc_recv_data")
<0x41EEBB50,0x0>
recv on fd 5, sock_error 0xFFFFFFFF, error 67
(4F97A3BD.012C-2:eipc.c,564,"get_peer_response_timed") peer
response
PEER_RESPONSE_UNKNOWN

Which I did not expect as in general the first event gets lost
and the
connection made for the second to be sent, again it appears not
to be
caught and the cache move the head up one event worth;

+4F97A3BD.013A maxsz:      65536
+4F97A3BD.013A head :        400
+4F97A3BD.013A tail :        746

Actualy, what I didn't expect was this for the third event;

(4F97A3BD.015A-2:sockeif.c,814,"_imp_eipc_recv_data")
<0x41EEBB50,0x0>
recv on fd 5, sock_error 0xFFFFFFFF, error 67
(4F97A3BD.015B-2:eipc.c,564,"get_peer_response_timed") peer
response
PEER_RESPONSE_UNKNOWN

But this time it's rapidly followed by;

(4F97A3BD.015D-2:sockeif.c,338,"_imp_do_send") send 40 bytes
(4F97A3BD.015E-2:socket_imp.c,1741,"send_to") 174 bytes on send
rc=-1
(4F97A3BD.015F-2:socket_imp.c,1639,"socket_put_event_conn")
Connection
Oriented send failed will wait 120 seconds before resend.

Which I didn't see for the first two, it then does a count down;

(4F97A3C4.0000-2:socket_imp.c,1658,"socket_put_event_conn")
resend
approximate time remaining: 110 seconds
(4F97A3CB.0000-2:socket_imp.c,1658,"socket_put_event_conn")
resend
approximate time remaining: 99 seconds
(4F97A3D2.0000-2:socket_imp.c,1658,"socket_put_event_conn")
resend
approximate time remaining: 89 seconds
(4F97A3D9.0000-2:socket_imp.c,1658,"socket_put_event_conn")
resend
approximate time remaining: 78 seconds
(4F97A3E0.0000-2:socket_imp.c,1658,"socket_put_event_conn")
resend
approximate time remaining: 68 seconds
(4F97A3E7.0000-2:socket_imp.c,1658,"socket_put_event_conn")
resend
approximate time remaining: 57 seconds
(4F97A3EE.0000-2:socket_imp.c,1658,"socket_put_event_conn")
resend
approximate time remaining: 47 seconds
(4F97A3F5.0000-2:socket_imp.c,1658,"socket_put_event_conn")
resend
approximate time remaining: 36 seconds
(4F97A3FC.0000-2:socket_imp.c,1658,"socket_put_event_conn")
resend
approximate time remaining: 26 seconds
(4F97A403.0000-2:socket_imp.c,1658,"socket_put_event_conn")
resend
approximate time remaining: 15 seconds
(4F97A40A.0000-2:socket_imp.c,1658,"socket_put_event_conn")
resend
approximate time remaining: 5 seconds

Then looks like it closes the connection;

4F97A40D.0004-2:sockeif.c,255,"_imp_eipc_shutdown")
_imp_eipc_shutdown
fd 5  option 2 rc=-1
(4F97A40D.0005-2:sockeif.c,259,"_imp_eipc_shutdown")
_imp_eipc_shutdown
shutdown - [sys errno 107] fd 5  option 2 rc=-1

(surely rc=1 is a fail to close the connection?)

Then the connection is created;

(4F97A40D.001E-2:socket_imp.c,1920,"_create_eipc_client")
Connected to
[legacy_01] fujiobj <fujiobj.test.com@10.22.58.99>:9998 1

The third event gets sent;

+4F97A40D.0034 maxsz:      65536
+4F97A40D.0034 head :        573
+4F97A40D.0034 tail :        919

(more events added, but the first AND second return the fail
error 67
BUT only the third event then does anything about it, closes the
connection and re-establishes a good connection, it's purely my
opinion (DS) but I think it should do what it did for the third
event, for the first.



A really good log, it seems to do the right thing for the third
event, in that when the connection is detected as bad, it does
not remove the current event from cache, but makes the
connection and tries again, for event one and two it seems to
ignore that the connection was bad and moves the cache marker up
and effectively loses/deletes the event.

customer's env is ITM 6.2.2 FP03

Curious about the 120 second delay to re-establish a connection,
unsure what the original intention there might have been, surely
a) detect the connection has gone when dealing with the first
event.
b> remake the connection without a 120 second delay.

All files on ecurep under pmr.
RHEL 5.5 64bit given as cust env in pmr.

Local fix

Problem summary

EIF: Error Code 67 is not handled while sending events.


If Error code 67 (Connection is broken) is seen while sending
events to the Event Integration Facility (EIF) receiver, then
the EIF sender ignores it and keeps sending events forward, even
though the events are not being received by the EIF receiver.

Problem conclusion

The code has been modified to check for this error code and take
action accordingly.  The action would be either to try to
connect to a failover EIF receiver, if configured, or to keep
the event in the cache file and mark it as unsent.  This would
ensure that for this error condition, events are not lost.  This
fix is not applicable to 32 bit unix/linux or windows platforms.


The fix for this APAR is contained in the following maintenance
packages:

  | fix pack | 6.2.3-TIV-ITM-FP0002

Temporary fix

Comments

APAR Information

APAR number
IV21752
Reported component name
TEC GUI INTEGRA
Reported component ID
5724C04TG
Reported release
622
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2012-05-25
Closed date
2012-09-14
Last modified date
2012-10-08

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

OA40438

Fix information

Fixed component name
TEC GUI INTEGRA
Fixed component ID
5724C04TG

Applicable component levels

R623 PSY
UP

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCTLMS","label":"ITM TEC GUI Integration V6"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"622","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
08 October 2012

Tips

IV21752: EIF: ERROR CODE 67 IS NOT HANDLED WHILE SENDING EVENTS

Fixes are available

Subscribe

APAR status

Closed as program error.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name

Fixed component ID

Applicable component levels

R623 PSY

Document Information

Share your feedback

Need support?