Timeouts when using viosvrcmd on the HMC to launch ios backups
talor 27000411MV Visits (3696)
This week I found that after updating our HMC and VIOS our VIO mksysb (backupios) backups started failing.
Our HMC is on 7.7.3 HMC code.
The mksysb backups are scripted from our NIM server, where we do the following to perform the backup:
- Setup SSH keys to the HMC
Below is basically what the script does, with the error checking parts omitted:
ssh $HMCUSER@$HMC "viosvrcmd -m $SERVER --id $HOST_ID -c 'mount /mksysb'"
HMCUSER would be hscroot and we have passwordless SSH configured.
This works very well, and in a restore we can just extract the SPOT resource from the mksysb and restore our VIO server.
After the update we start getting the below errors in our backups for some (not all) of our VIO servers.
HSCL2970 The IOServer command has failed because of the following reason:
So I ran the backup manually on the VIO server, and it worked fine. Considering the error tells us that 600 seconds (10 mins) is what's killing the backup.
We found that any backupios backup that we ran that took longer than 10 mins failed.
Next, I ran $updateios -commit and this committed any applied updates, and slightly reduced the space in /usr and made the backup slightly faster. I also did some house keeping and reduced the size of the rootvg and the backups ran under 10 mins.
Now knowning the fix, I logged a PMR with IBM support to find out how to increase the timeout value of the viosvrcmd command. In 7.7.2 the timeout for viosvrcmd is 3600 seconds (1 hour) and in 7.7.3 this has been decreased to 10 mins.
This is the cause of our problems. IBM support also mentioned that there is a design request in to change the timeout value in a future release of HMC. This may or may not happen. Hopefully it does.
Until then, if you are having this problem your options are:
- Check you commit ios updates after you update the VIO server code.