we had a resource in STUCK ONLINE after an unsuccessful stop request.
(Our script with kill -9 had a problem)
Now I was asked how an operator can solve this situation.
Within the ISC Operator Console, there is no "reset" button.
Furthermore, the resetrsct command didn't work neither.
We just want to re-issue the normal stop command, but there is no command to set the status to Online, so the action will start again.
I solved this situation by restarting the RecoveryRM...
Greetings from Muenster,
nukite8d 060001JV1D98 Posts
Re: Reaction at STUCKed resource2011-08-10T10:23:39ZThis is the accepted answer. This is the accepted answer.Found this in the docs [Admin & Users Guide):
The second and more likely reason for a resource to have an OpState of Stuck
Online (if the MonitorCommand returns 1 (Online) or 6 (Pending Offline), but
the resource has an OpState of ‘Stuck Online’) is that a the resource could not be
stopped by System Automation for Multiplatforms previously, and System
Automation for Multiplatforms has finally set the resource to Stuck Online. This
is the case if the execution of the StopCommand for this resource and a
subsequent reset against that resource failed to bring the resource offline.
This error cannot be recovered by System Automation for Multiplatforms and
manual intervention is required. After investigating why the resource did not
stop, an operator must stop the resource. When the OpState of the resource is
evaluated as Offline at the next execution of the MonitorCommand, System
Automation for Multiplatforms will again take control of this resource, and no
further manual steps are required.
But I would like to reset the state and let TSAMP try it again.
SteveIves 2000004GNA27 Posts
Re: Reaction at STUCKed resource2011-12-15T13:09:43ZThis is the accepted answer. This is the accepted answer.
- nukite8d 060001JV1D
Just doscovered this questions and thinking about it has helped me (I'm new to SA MP, and it's hard to grasp some of it's concepts).
I think that 'Stuck Online', like 'Failed Offline', means that due to a software error, the resource has failed to stop (or start), even after a reset. SA knows (or believes) that there is no point in running the stop or start command again.
If you were trying to start the resource, you can RESET which I think tells SA that you have resolved the issue and it should now start, so SA can try to start the resoruce again. It runs the START command with a RESET parm, so your script can take a different action to normal.
If (as in your case), you were stopping the resource, and your stop command fails, then it is up to you to stop the resource manually. Once the monitor command reports that it is Offline, SA starts working again.
The differnce between the start and stop recovery behaviour seems to be that when you've had trouble starting the resource, but this is now fixed, you simply tell SA that it's now OK and to try the start again - SA does not expect the user/operator to manually start the resource. But when the stop fails, SA expects the user/op to manuallys stop the resource, so the ops will need instructions on how to do this. This appears to be required only when your stop command is now working.
The above is just my understanding and I may have misunderstood..