Out of sequence messages over TCP on AIX 61

2012-06-23T13:43:58Z

We have an application where several tasks are communicating messages via TCP. The application was originally developed on Solaris (last version we are running it on is Solaris 5.10), and we didn't have any issues there. The application was now ported to AIX, and we are running into several issues there we didn't see on Solaris. For instance, we are dealing with large delays in message delivery (related to Nagle and delayed acks, even though Nagle is turned off, but that is a different story) which me managed to work around, but we are still running into issue with out of sequence messages received by tasks. Notice these are actual messages received out of sequence, not just the TCP packets. We are dealing with multi-threaded tasks here, but the writing and reading happens in a single thread.

Is there some commonly known misconfigurations or porting errors to AIX 61 that can cause out of sequence messages? We have explored all possible options thus far without avail. The application code seems to be thread-safe, and we are not able to reproduce the same problems on Solaris. Is there some subtle non-determinism on AIX that doesn't exit on Sun? Or could that be related to the other issues we are seeing with delayed messages due to Nagle/Delayed Acks?