Resolve the problem of an MQTT client
program failing to connect to the telemetry (MQXR) service.
Before you begin
Is the problem at the server, at the client, or with the connection? Have you have written your
own MQTT v3 protocol handling client, or an MQTT client application using the C or Java MQTT clients?
See Verifying the
installation of MQ Telemetry for further information,
and check that the telemetry channel and telemetry (MQXR) service are running correctly.
About this task
There are a number of reasons why an MQTT
client might not connect, or you might conclude it has not connected, to the telemetry server.
Procedure
-
Consider what inferences can be drawn from the reason code that the telemetry (MQXR) service
returned to MqttClient.Connect. What type of connection failure is it?
Option |
Description |
REASON_CODE_INVALID_PROTOCOL_VERSION
|
Make sure that the socket address corresponds to a telemetry channel, and you have not used the
same socket address for another broker.
|
REASON_CODE_INVALID_CLIENT_ID
|
Check that the client identifier is no longer than 23 bytes, and contains only characters from
the range: A-Z, a-z, 0-9, './_%
|
REASON_CODE_SERVER_CONNECT_ERROR
|
Check that the telemetry (MQXR) service and the queue manager are running normally. Use
netstat to check that the socket address is not allocated to another application.
|
If you have written an MQTT client library rather
than use one of the libraries provided by MQ Telemetry, look
at the CONNACK
return code.
From these three errors you can infer that the client has connected to the telemetry (MQXR)
service, but the service has found an error.
-
Consider what inferences can be drawn from the reason codes that the client produces when the
telemetry (MQXR) service does not respond:
Option |
Description |
REASON_CODE_CLIENT_EXCEPTION
REASON_CODE_CLIENT_TIMEOUT
|
Look for an FDC file at the server; see Server-side logs. When the
telemetry (MQXR) service detects the client has timed out, it writes a first-failure data capture
(FDC) file. It writes an FDC file whenever the connection is unexpectedly broken.
|
The telemetry (MQXR) service might not have responded to the client, and the timeout at the
client expires. The MQ Telemetry
Java client only hangs if the application has set an
indefinite timeout. The client throws one of these exceptions after the timeout set for
MqttClient.Connect expires with an undiagnosed connection problem.
Unless you find an FDC file that correlates with the connection failure you cannot infer that the
client tried to connect to the server:
-
Confirm that the client sent a connection request.
-
Does the remote socket address used by the client match the socket address defined for the
telemetry channel?
The default file persistence class in the Java SE MQTT client supplied with IBM® MQ Telemetry creates a folder with the name: clientIdentifier-tcphostNameport or clientIdentifier-sslhostNameport in the client working directory. The folder name tells you the hostName and port used in the connection attempt; see Client-side log files and client-side configuration files.
-
Can you ping the remote server address?
-
Does netstat on the server show the telemetry channel is running on the port
the client is connecting too?
-
Check whether the telemetry (MQXR) service found a problem in the client request.
The telemetry (MQXR) service writes errors it detects into mqxr_n.log, and
the queue manager writes errors into AMQERR01.LOG .
-
Attempt to isolate the problem by running another client.
Run the sample programs on the server platform to eliminate uncertainties about the network
connection, then run the samples on the client platform.
-
Other things to check:
-
Are tens of thousands of MQTT clients trying to
connect at the same time?
Telemetry channels have a queue to buffer a backlog of incoming connections. Connections are
processed in excess of 10,000 a second. The size of the backlog buffer is configurable using the
telemetry channel wizard in IBM MQ Explorer. Its default
size is 4096. Check that the backlog has not been configured to a low value.
-
Are the telemetry (MQXR) service and queue manager still running?
-
Has the client connected to a high availability queue manager that has switched its TCPIP
address?
-
Is a firewall selectively filtering outbound or return data packets?