IBM Support

IV21752: EIF: ERROR CODE 67 IS NOT HANDLED WHILE SENDING EVENTS

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • 1) When error code is 67 (i.e E_IPC_BROKEN ), the EIF sender
    should try to switch over or resend the event at the very least.
    
    
    2) The head pointer keeps getting moved even though the event
    has not been sent.
    
    
    I think both issues can be addressed if we add checks for
    E_IPC_BROKEN.
    
    With several events written to cache it looks like;
    
    
    +4F97A3BD.00C1 maxsz:      65536
    +4F97A3BD.00C1 head :         54
    +4F97A3BD.00C1 tail :        746
    
    I see for the first event;
    
    (4F97A3BD.00F6-2:sockeif.c,814,"_imp_eipc_recv_data")
    <0x41EEBB50,0x0>
    recv on fd 5, sock_error 0xFFFFFFFF, error 67
    (4F97A3BD.00F7-2:eipc.c,564,"get_peer_response_timed") peer
    response
    PEER_RESPONSE_UNKNOWN
    
    +4F97A3BD.010B maxsz:      65536
    +4F97A3BD.010B head :        227
    +4F97A3BD.010B tail :        746
    
    The head has moved up the number of bytes in the first event
    which was
    173 - even though there was an error returned - it seems to have
    been ignored and carried on.
    
    
    then for the second event, the same;
    
    (4F97A3BD.012B-2:sockeif.c,814,"_imp_eipc_recv_data")
    <0x41EEBB50,0x0>
    recv on fd 5, sock_error 0xFFFFFFFF, error 67
    (4F97A3BD.012C-2:eipc.c,564,"get_peer_response_timed") peer
    response
    PEER_RESPONSE_UNKNOWN
    
    Which I did not expect as in general the first event gets lost
    and the
    connection made for the second to be sent, again it appears not
    to be
    caught and the cache move the head up one event worth;
    
    +4F97A3BD.013A maxsz:      65536
    +4F97A3BD.013A head :        400
    +4F97A3BD.013A tail :        746
    
    Actualy, what I didn't expect was this for the third event;
    
    (4F97A3BD.015A-2:sockeif.c,814,"_imp_eipc_recv_data")
    <0x41EEBB50,0x0>
    recv on fd 5, sock_error 0xFFFFFFFF, error 67
    (4F97A3BD.015B-2:eipc.c,564,"get_peer_response_timed") peer
    response
    PEER_RESPONSE_UNKNOWN
    
    But this time it's rapidly followed by;
    
    (4F97A3BD.015D-2:sockeif.c,338,"_imp_do_send") send 40 bytes
    (4F97A3BD.015E-2:socket_imp.c,1741,"send_to") 174 bytes on send
    rc=-1
    (4F97A3BD.015F-2:socket_imp.c,1639,"socket_put_event_conn")
    Connection
    Oriented send failed will wait 120 seconds before resend.
    
    Which I didn't see for the first two, it then does a count down;
    
    (4F97A3C4.0000-2:socket_imp.c,1658,"socket_put_event_conn")
    resend
    approximate time remaining: 110 seconds
    (4F97A3CB.0000-2:socket_imp.c,1658,"socket_put_event_conn")
    resend
    approximate time remaining: 99 seconds
    (4F97A3D2.0000-2:socket_imp.c,1658,"socket_put_event_conn")
    resend
    approximate time remaining: 89 seconds
    (4F97A3D9.0000-2:socket_imp.c,1658,"socket_put_event_conn")
    resend
    approximate time remaining: 78 seconds
    (4F97A3E0.0000-2:socket_imp.c,1658,"socket_put_event_conn")
    resend
    approximate time remaining: 68 seconds
    (4F97A3E7.0000-2:socket_imp.c,1658,"socket_put_event_conn")
    resend
    approximate time remaining: 57 seconds
    (4F97A3EE.0000-2:socket_imp.c,1658,"socket_put_event_conn")
    resend
    approximate time remaining: 47 seconds
    (4F97A3F5.0000-2:socket_imp.c,1658,"socket_put_event_conn")
    resend
    approximate time remaining: 36 seconds
    (4F97A3FC.0000-2:socket_imp.c,1658,"socket_put_event_conn")
    resend
    approximate time remaining: 26 seconds
    (4F97A403.0000-2:socket_imp.c,1658,"socket_put_event_conn")
    resend
    approximate time remaining: 15 seconds
    (4F97A40A.0000-2:socket_imp.c,1658,"socket_put_event_conn")
    resend
    approximate time remaining: 5 seconds
    
    Then looks like it closes the connection;
    
    4F97A40D.0004-2:sockeif.c,255,"_imp_eipc_shutdown")
    _imp_eipc_shutdown
    fd 5  option 2 rc=-1
    (4F97A40D.0005-2:sockeif.c,259,"_imp_eipc_shutdown")
    _imp_eipc_shutdown
    shutdown - [sys errno 107] fd 5  option 2 rc=-1
    
    (surely rc=1 is a fail to close the connection?)
    
    Then the connection is created;
    
    (4F97A40D.001E-2:socket_imp.c,1920,"_create_eipc_client")
    Connected to
    [legacy_01] fujiobj <fujiobj.test.com@10.22.58.99>:9998 1
    
    The third event gets sent;
    
    +4F97A40D.0034 maxsz:      65536
    +4F97A40D.0034 head :        573
    +4F97A40D.0034 tail :        919
    
    (more events added, but the first AND second return the fail
    error 67
    BUT only the third event then does anything about it, closes the
    connection and re-establishes a good connection, it's purely my
    opinion (DS) but I think it should do what it did for the third
    event, for the first.
    
    
    
    A really good log, it seems to do the right thing for the third
    event, in that when the connection is detected as bad, it does
    not remove the current event from cache, but makes the
    connection and tries again, for event one and two it seems to
    ignore that the connection was bad and moves the cache marker up
    and effectively loses/deletes the event.
    
    customer's env is ITM 6.2.2 FP03
    
    Curious about the 120 second delay to re-establish a connection,
    unsure what the original intention there might have been, surely
    a) detect the connection has gone when dealing with the first
    event.
    b> remake the connection without a 120 second delay.
    
    All files on ecurep under pmr.
    RHEL 5.5 64bit given as cust env in pmr.
    

Local fix

Problem summary

  • EIF: Error Code 67 is not handled while sending events.
    
    
    If Error code 67 (Connection is broken) is seen while sending
    events to the Event Integration Facility (EIF) receiver, then
    the EIF sender ignores it and keeps sending events forward, even
    though the events are not being received by the EIF receiver.
    

Problem conclusion

  • The code has been modified to check for this error code and take
    action accordingly.  The action would be either to try to
    connect to a failover EIF receiver, if configured, or to keep
    the event in the cache file and mark it as unsent.  This would
    ensure that for this error condition, events are not lost.  This
    fix is not applicable to 32 bit unix/linux or windows platforms.
    
    
    The fix for this APAR is contained in the following maintenance
    packages:
    
      | fix pack | 6.2.3-TIV-ITM-FP0002
    

Temporary fix

Comments

APAR Information

  • APAR number

    IV21752

  • Reported component name

    TEC GUI INTEGRA

  • Reported component ID

    5724C04TG

  • Reported release

    622

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2012-05-25

  • Closed date

    2012-09-14

  • Last modified date

    2012-10-08

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    OA40438

Fix information

  • Fixed component name

    TEC GUI INTEGRA

  • Fixed component ID

    5724C04TG

Applicable component levels

  • R623 PSY

       UP

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCTLMS","label":"ITM TEC GUI Integration V6"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"622","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
08 October 2012