IBM Support

KAINT (keepalive) for an orphaned client channel in IBM MQ for z/OS

Troubleshooting


Problem

You observe a SVRCONN client channel in a RUN state even though the network connection to the client has stopped. If there are uncommitted gets or puts, after 2 log archives you will see the following message:
 CSQJ160I CSQ1 CSQJLRUR LONG-RUNNING UOW FOUND, URID=urid CONNECTION NAME=CSQ1CHIN
or after 3 checkpoints at V6 and above:
 
 CSQR026I LONG-RUNNING UOW SHUNTED TO RBA=rba, URID=urid CONNECTION NAME=CSQ1CHIN
The client application is no longer available, but the client channel is in a TCP/IP receive state according to a dump of the MSTR and CHIN or the SUBSTATE of DISPLAY CHSTATUS.

Symptom

Other symptoms might include one of these error messages in the CHIN log:

CSQX489E Maximum instance limit limit exceeded, channel channel-name connection conn-id 
CSQX490E Maximum client instance limit limit exceeded, channel channel-name connection conn-id
CSQX513E Current channel limit exceeded, channel channel-name connection conn-id 
CSQX573E Channel channel-name exceeded active channel limit

Cause

TCP/IP has not notified the channel initiator of the broken connection.

Resolving The Problem

  • To back out the unit of work:

Stop the SVRCONN channel with MODE(FORCE).

WARNING: Archive logs as far back as the beginning of the UOW must be available to avoid an abend during backout of persistent messages. If the archive logs are not available, you will need to recycle the queue manager, commit the messages when the CSQR021D is issued, and then manually recover from the partially processed UOW. For WebSphere MQ V6, PK24888 needs to be on to ensure CSQR021D is properly issued.
 

  • To avoid the need for manual intervention:
    • Activate TCP/IP Keepalive for the queue manager:

      ALTER QMGR TCPKEEP(YES)


      AND

    • Set the Keepalive Interval (KAINT) parameter for the SVRCONN definition:

      - For IBM MQ V8 and above, the default value of KAINT(AUTO) is appropriate in most cases. KAINT resolves to the negotiated HBINT+60 if HBINT is nonzero and to zero if HBINT is zero.

      - For WebSphere MQ V6 and V7, specify an integer in the KAINT (keepalive interval) parameter, for example if HBINT is the default of 300:

           ALTER CHANNEL(channel_name) CHLTYPE(SVRCONN) KAINT(360)

      - For WebSphere MQ for z/OS V6 and above, heartbeats are supported for client channels. With PK51598 applied to V6, and at V7, KAINT=AUTO for SVRCONN means the default TCP INTERVAL will be used (rather than meaning HBINT+60 as for other channels).

      PM86354 is needed for the KAINT attribute to be set correctly for WebSphere MQ 7.1.0.

      - For WebSphere MQ for z/OS 5.3.1:
      The default setting of AUTO will not result in a timeout because it is based on HBINT, and heartbeats are not supported for client channels at that release.


      - For a DataPower client, specify a Cache Timeout value in the DataPower configuration that is greater than the negotiated heartbeat interval but less than the keep alive interval.


      The keep-alive timeout results in the following messages:
      +CSQX208E +CSQ1 CSQXRESP Error receiving data,
      channel PC1.TO.CSQ1.SVRCONN,
      connection <ipname> (<ip addr>)
      (queue manager ????)
      TRPTYPE=TCP RC=00000461
      +CSQX599E +CSQ1 CSQXRESP Channel PC1.TO.CSQ1.SVRCONN ended abnormally



      The channel will roll back the messages. This means that the messages put as part of a unit of work are deleted, and messages retrieved as part of a unit of work are reinstated on the queue.

Additional information regarding channel timeout functions
  • WebSphere MQ for z/OS V6 and above:

    Support for HBINT and DISCINT for SVRCONN channels was added.
    HBINT:
    On server-connection and client-connection channels, heartbeats flow only when a server MCA is waiting on an MQGET command with the WAIT option that it has issued on behalf of a client application. Keepalive will not time out the channel as long the TCP/IP connection to the client still exists.

    DISCINT:
    The server-connection inactivity interval only applies between MQ API calls from a client, so no client will be disconnected during a long-running MQGET with wait call. Care must be taken not to disconnect clients unless they are capable of handling the disconnect appropriately. Client disconnect might not be useful for everyone, but is available for the client applications that can benefit from it.

    DISCINT(0) (zero) should be used with SVRCONN channels used by JMS clients including WebSphere Application Server (WAS or WSAS).  This will avoid unexpected AMQ9208 or MQJE001 with reason 2009 MQRC_CONNECTION_BROKEN on the client side.  The corresponding error in the CHIN joblog is CSQX259E.  Note that CSQX259E may have more than one root cause.  One cause is that the MQI channel (client channel) has a non-zero DISCINT value, which is fine in some cases but causes unexpected errors when connection pooling is being used.

    For WebSphere MQ 7.1.0, PI08703 is needed for DISCINT to be honored for a client channel with SHARECNV(0), and PI12316 is needed for SHARECNV>0.
    For IBM MQ 8.0.0, PI27504 is needed to prevent a premature timeout with SHARECNV and DISCINT nonzero and RCVTIME=0.

    V6 has more detail about what the client is waiting for in the SUBSTATE parameter of DISPLAY CHSTATUS. There is a DISPLAY CONN command that will aid in identifying the application or IP address associated with a long-running unit of work.
  • At WebSphere MQ 7.0.0 and above:
    • If the channel is defined with a nonzero value for SHARECNV and the CHSTATUS has a nonzero value for CURSHCNV, client heartbeating is available whether the channel is in an MQGET call or not. Be aware of the performance implications of sharing conversations on client-connection channels.

      PI62878 and PI62084 for V8 and PI68960 and PI69443 for V9 should be applied
       
    • RCVTIME and RCVTMIN apply to MQI channels that are sharing conversations. See PM65278. It says that for MQI channels that use sharing conversations, the heartbeat interval used by RCVTIME/RCVTMIN/RCVTTYPE is 5 seconds greater than the negotiated heartbeat interval.

      SupportPac MD0C: WebSphere MQ - Keeping Channels Up and Running recommends
        /cpf ALTER QMGR RCVTTYPE(ADD) RCVTIME(60)
      where "cpf" is the command prefix for the queue manager.

      As an example, with HBINT= 60, RCVTTYPE=ADD, and RCVTIME=60, you would have
         60 (HBINT) + 5 + 60 (RCVTIME) = 125 seconds
      for the timeout value if a heartbeat response is not received.

      PM70329 is needed for correct heartbeating for WebSphere MQ 7.0.1 and 7.1.0 if RCVTIME and channel read ahead are enabled.
       
    • WebSphere MQ 7.0.1 and above:
      Offers an automatic client reconnection feature.


IC98704 describes a problem with WebSphere MQ 7.1 and 7.5 client application connections not ending after the heartbeat timeout period.

PM84281 says that if you use CSQUTIL to build a client channel definition table (CCDT), CSQUTIL should be run against a queue manager of the same Version/Release/Modification.

[{"Business Unit":{"code":"BU004","label":"Hybrid Cloud"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Component":"Channels LU62 / TCP","Platform":[{"code":"PF035","label":"z/OS"}],"Version":"9.1;9.0;8.0","Edition":""},{"Business Unit":{"code":"BU004","label":"Hybrid Cloud"},"Product":{"code":"SSFKSJ","label":"WebSphere MQ"},"Component":"Channels LU62 / TCP","Platform":[{"code":"PF035","label":"z/OS"}],"Version":"7.1;7.0.1;7.0;6.0;5.3.1","Edition":""}]

Product Synonym

WMQ MQ

Document Information

Modified date:
25 November 2019

UID

swg21232484