However, much to my surprise, if we took down any of the application servers in the cluster of this very large cell I saw an anomaly. When the plug-in was attempting to route traffic to the downed application servers there seemed to be a really long lag on the connection refused processing. In fact, I was seeing least a second to get through the TCP/IP roundtrip. This made no sense to me. One of my colleagues, Keys Botzum, took a Java application and ran it on both Windows and Solaris. The application simply tried to connect to localhost (to eliminate any DNS lookups or network latency from the test) on a port no one was listening to and looped around 20 times. On Windows the test took slightly over 20 seconds. On Solaris, less than a second (which was the behaviour I was expecting on Windows).
If you are, or planning to, use Microsoft Windows on the IHS tier be aware of this strange failure scenario on Windows. I'll try to investigate and see if there are any Windows settings to help tune this. Though the plan is to move off Windows to Redhat Linux which right now sounds like the right move to me.