If GRS seems hung after an ISG177E or ISG178E disruption message,
use the process that follows to recover the sysplex.
- Issue D GRS and D XCF,S,ALL on
all the systems to obtain the status of each system.
- GRS auto-restart processing might be in progress if any system
has a GRS status of ACTIVE. Give GRS enough time, usually 4 to 6 minutes,
to restart without manual intervention.
- If all the systems in the GRS display show either INACTIVE or
QUIESCE, issue XCF PATHIN and PATHOUT commands
to ensure the entire sysplex has good connectivity. If XCF on any
system is unable to deliver signals, one of the systems might not
have proper status keeping GRS from restarting. Recover paths as necessary
to ensure that all systems have good connectivity.
- If the sysplex has good connectivity, yet all the systems in the
GRS display still show either INACTIVE or QUIESCE, you can restart
the ring by manually driving the GRS group notification exits. To
do this, temporarily stop a system using either the hardware console
or QUIESCE command.
Attention: Do
not IPL.
- Stop the system for one GRS TOLINT interval. Restart the system
after the GRS TOLINT interval has expired. Restarting the system should
re-drive the GRS group notification exits. If stopping and restarting
a system does not restart GRS, then GRS on that system might not be
the problem. Pick a different system and try stopping and restarting
that system.
Note: - If you are running with an SFM policy that will take a stopped
system out of the sysplex, stop the policy before stopping the system
by using:
SETXCF STOP,POLICY,TYPE=SFM
- If you are able to restart the ring, start the SFM policy using:
SETXCF START,POLICY,TYPE=SFM,POLNAME=XXXXX
where XXXXX is
the SFM policy.
If you have completed the steps above on all the systems and the D
GRS output still displays INACTIVE, you can restart the sysplex
using the process that follows.
- Use the hardware console or the QUIESCE command
to temporarily stop the systems until only one is remaining.
Attention: Do not IPL.
- Use D XCF,S,ALL to check systems status. The
XCF display output should show only one system as ACTIVE and the other
systems as MONITOR-DETECTED STOP.
- When only one system is ACTIVE, wait a TOLINT interval until the
remaining system restarts as a one system ring.
- Issuing a D GRS after waiting a TOLINT interval
will show one system as ACTIVE and the other systems as QUIESCED.
- When the D GRS command displays an ACTIVE system,
start the other systems to have it join the ring.
If GRS is not able to restart, obtain the following data before
calling IBM® service:
- SYSLOG from all systems.
- LOGREC from all systems.
- SADUMP from any system that requires an IPL
Use the following JCL to obtain a dump of primary and alternate
CDS:
DUMP COMM=(your dump title)
R x,ASID=(1,6,7,A),REMOTE=(SYSLIST=*(1,6,7,A),DSPNAME,SDATA),CONT
R y,DSPNAME=('XCFAS'.*,'GRS'.*),CONT
R z,SDATA=(COUPLE,XESDATA,GRSQ,RGN,ALLNUC,CSA,PSA,SQA,SUM,TRT),END
where
x, y, and z are reply numbers.
Note: If a SADUMP of a critical system is not possible, take a console
dump.