Topic
  • No replies
SystemAdmin
SystemAdmin
150 Posts

Pinned topic ssh timeouts in JobRunner

‏2013-03-20T21:40:27Z |
Has anyone had trouble with ssh timeouts when using the FileTransfer mechanism in JobRunner?

I've written a simple JobRunner XML script that does the exact same ssh file transfer 4 times in a row. Sometimes, the first one fails and the other 3 succeed; other times, they all succeed. Here's what one of the 4 steps looks like:

<Step id="FileTransferViaSSH1" description="File transfer 1" type="Process" programName="FileTransfer" programType="java" active="true">
<Parameters>
<Parameter type="ssh"/>
<Parameter serverName="my.remote.server.hostname"/>
<Parameter userId="root"/>
<Parameter userPassword="myencryptedrsapassphrase"/>
<Parameter KeyStoreFileName="/root/.ssh/id_rsa"/>
<Parameter from="ssh:///var/IBM/TSAM/metering/dummy.txt" to="file:///tmp/dummy.txt" action="Copy" overwrite="true"/>
</Parameters>
</Step>

When it fails, the error looks like this:

3/20/13 16:53:34.309: INFORMATION COPY 'ssh:///var/IBM/TSAM/metering/dummy.txt' TO 'file:///tmp/dummy.txt' (overwrite: true)
3/20/13 16:53:34.309: INFORMATION Key store File Name: /root/.ssh/id_rsa
3/20/13 16:54:04.418: INFORMATION AUCCM5009E An attempt to connect to the following server failed: my.remote.server.hostname. Verify that the server is running, and then check all the required parameters: server name, port number, proxy server name, proxy port, proxy user ID and password.
3/20/13 16:54:04.421: ERROR AUCJR0013E An attempt to run the following step failed: FileTransferViaSSH1. Check the trace messages for additional details.

Notice the time difference between the 2nd and 3rd messages -- almost exactly 30 seconds. That's why I think some kind of timeout is occurring. There's no way I've coded this wrong, because as I said, sometimes all 4 transfers work, and sometimes 3 out of 4 work. When it does work, each step takes maybe 2 seconds (not 30).

I know of no way to configure an ssh time limit for TUAM. Does it just use the underlying ssh mechanism of the OS (I'm running SuSE Linux), or does it have its own Java-based ssh? It would not surprise me if the latter is the case, because I set up a similar job that ran a Perl script that invoked the ssh command, and it never failed. Another variable is that with the Perl script, I was using an ssh key without a passphrase. With the TUAM FileTransfer, the ssh key has a passphrase, and it's encrypted in the JobRunner XML.

In any case, it's acting like the remote ssh daemon is "sleepy", and has to be nudged awake (silly, but that's how it looks). This also could be a network issue, which would be way over my head. I've seen cases in other environments where a network configuration issue caused communications to sometimes work and sometimes fail. But if that's the case, why did my Perl/ssh approach never fail?