Customer was using Sterling Integrator version 5263. All file transfers are failing and application is moving extremely slow. Queues are high and it is moving slower than expected. Client had previously brought node1 down due to thread starvation and node2 is also experiencing the same behavior. FTP threads are hanging on get or delete operations, but hung, indefinitely. Queues are currently clogged with these threads. Client restarted both nodes but traffic keeps flowing. There were a number of FTP processes that were hanging in IBM Sterling B2B Integrator.
We focused on one particular BP and in the BP Detail it displayed, it executed the FTP Client Begin, CD, LIST (NLST *) all running at 12:02. But at Step 12 which is an Assign started at 12:02:33 - 12:02:33, and Step 13 Decision Engine Service started at 12:32:52 - 12:32:52. Here, there is 30 mins time elapse between these 2 steps. Many of his other BPs had the same 30 min wait. We looked at the active threads in the JVM and saw that all of the threads that were hanging were waiting on writes to the file system. The system is trying to persist various documents to the shared file repository and the writes are hanging for a long time. Some threads had been writing a single file for over 15 minutes. This is leading to timeouts in the ftp sessions because it is taking so long to store the data before going to the next step.
We took multiple thread dumps and found that FTPBeginSessionService was hung and not coming out of the begin session. We analyzed multiple BPs and found that FTPClientBeginSessionService waits forever, when DelayWaitingOnIO is set to "-1". We saw a technote that talks about this behavior :
This is working as designed. When we changed the value of DelayWaitingOnIO to 0 , the issue is resolved. The file transfer operation in BP goes into "WaitingOnIO" state and after sometime they fail as FTP server was down. After the change, there is no queue depth and files are processing.
Here are the valid values for DelayWaitingOnIO:
1) Positive integer : The number of seconds the business process has to wait for a response from the FTP server before going to WAITING_ON_IO state.
2) 0 : The business process goes to WAITING_ON_IO state after sending a request to the FTP server.
3) -1 : The business process waits for the response from the FTP server to complete. The business process does not go to WAITING_ON_IO state.
4) Less than -1 : The parameter value is set to 0 (default value).