Over the last couple of weeks, we've had a couple of customers report problems with sending data over channels.
Symptoms they saw included
- Message +CSQX544E +MEMQ CSQXRCTL Messages for channel TO.REMOTE sent to dead-letter queue
- The depth of the SYSTEM.CLUSTER.TRANSMIT.QUEUE(SCTQ) was filling up
- No messages were arriving at the remote end
- The buffer pool with the SCTQ was filling up - and messages were spilling to the page set.
The first problem is the CSQX544E message.
This shows a couple of problems
- There was a problem with the put of a message to the queue at the remote end.
- An attempt was made to put the message to the Dead Letter Queue - but this had problems
- It could not be put to the Dead Letter Queue at the remote end - so it was put to the DLQ at the sender end.
You should do the following
- Define a DLQ at the remote end
- Have a monitor to tell you if any messages arrive there
- Look at any messages and see why the put failed
- Fix these problems.
The SCTQ filled up
An XMIT queue has messages destined for queues QA,QA,QB,QA,QC,QA,QA,QC.
- The sender end gets the next message from the XMITQ and sends it over the network
- The receiver end does MQPUT to QA
- It does the same for messages destined for QA, QA, QB, and QA
- When it processes the message for QC, the MQPUT gets gets QFULL ( or perhaps message too big for the queue)
- If you have enabled message retry MRTMR and MRRTY on the channel, the channel will wait for MRTMR milliseconds and try again - it gets MQFULL again - so it waits
- It tries MRRTY times before trying to put the message to the DLQ.
- As this failed - it tells the sender end which then DLQs the message.
If you had MRTMR set to 1000 ms and MRRTY of 10 - then the channel will pause for 10 seconds.
While this retry is happening, the messages in the XMITQ are not being processed as the channel is attempting to put messages to the problem queue.
As a result the queue depth will build up.
The symptoms you will see are
- Messages not flowing
- The queue on the remote end is empty
- The SCTQ starts to fill up
- The buffer pool with the SCTQ starts to fill up
- Your buffer pool stats show lots of reads and writes to the page set
If you had 10 messages for queue QC - then your channel can pause for 10 * 10 seconds = very long time.
The root cause for the delays is because the channel had problems putting messages to queue QC, messages for other queues were delayed until the problem with QC was resolved.
So like many MQ problems - what looks like a problem in one area (QA) is actually caused by something else (QC)!
What can you do?
- Define the DLQ and monitor it - one each queue manager
- Consider splitting your SCTQ to isolate queues - for example isolate QA and from QC. See here