One of the most common errors to troubleshoot on DataPower looks like this:
Wed Mar 22 2017 22:31:49 [0x01130006][mpgw][error] mpgw(TEST_MPGW): tid(38886775)[error][9.xx.xx.xx]: Failed to establish a backside connection
In this blog, I will explore the meaning of this error message through typical scenarios and best practice troubleshooting techniques.
I. What does it mean?
Failed to establish a backside connection is typical of failed connection attempts or timeout scenarios.
When a TCP connection is established, there is a three way handshake that happens. The three way handshake will consist of the following:
- Client sends a frame with the SYN flag set
- Server sends a frame response with the SYN/ACK flags set
- Client sends a frame with the ACK flag set
For example, if you were to run a packet capture, you would see something that looks like this:
Once the TCP handshake has completed, transaction data can be sent across this connection.
If there is a problem with this TCP handshake, or the Backend Server does not respond in time to a request, DataPower will throw the "Failed to establish" error message.
It will be helpful to understand some of the typical scenarios that lead to the "Failed to establish" error message.
II. Typical Scenarios that cause the "Failed to establish" error message
The "Failed to establish" error message can be triggered from a variety of failure scenarios, for example:
- Problems with DNS lookup: If DataPower does not know how to resolve the IP associated with your hostname, then it will not know where to send its request.
- Problems with the routing table: If DataPower doesn't have a defined path to the next hop on your network, then the SYN packet will be sent out on the network and lost. There is a great blog done by Chris Sloan which covers the routing table on DataPower, see "IBM DataPower Gateways - Understanding Network Routing"
- Something downstream on the network dropping packets: If there is say a firewall between DataPower and the backend that has not been configured to allow traffic between DataPower and the Backend port, it can simply drop the packets sent by DataPower.
- Problems with the backend itself: The backend application or device may be overloaded which causes it to respond very slowly, or not at all.
Now that we have an understanding of expected TCP connectivity behavior and knowledge of the typical causes of this error message, we can move on to troubleshooting.
III. Troubleshooting the "Failed to establish" error message
There are several possible causes for the "Failed to establish" error message. This means that context surrounding the error message is paramount to finding root cause.
The absolute best set of data to gather for this scenario would be:
- The Error Report from the default domain: Think of the error report as a dump that contains the current state of the device. In relation to the failed connection error message, we can find information regarding interface configuration, routing table, DNS configuration, DNS hostname caching status, service state and availability, object configurations, network connection status, and hardware states. The reason we want the error report to come from the default domain is because it will contain logs, states, and configurations from objects in all application domains as well.
- Debug Level Logs: You can create a custom log target to gather debug level logs from the default domain. Again, we ideally want to get the logs from the default domain because the default domain contains the networking configuration and will contain more detailed logs regarding networking issues and states. Also, the default domain has visibility of all other domains, so your application domain's logs will also be included.
- A Packet Capture: The packet capture will be the most important piece of information to gather while troubleshooting the network connection issues. You can set filters on the packet capture for the hostname of the backend which the connection attempt is made to reduce the amount of 'noise' in the packet capture and get a clear picture of what is happening at the network level just before your connection attempt is put on the wire.
These three sets of data paint the big picture of the networking configuration and behavior on the DataPower device while the error scenario happens and will allow for a much quicker and more-complete interpretation of the cause of the "Failed to establish" error message.