z/OS MVS Planning: Global Resource Serialization
Previous topic | Next topic | Contents | Contact z/OS | Library | PDF


ISG177E and ISG178E recovery

z/OS MVS Planning: Global Resource Serialization
SA23-1389-00

If GRS seems hung after an ISG177E or ISG178E disruption message, use the process that follows to recover the sysplex.

  1. Issue D GRS and D XCF,S,ALL on all the systems to obtain the status of each system.
  2. GRS auto-restart processing might be in progress if any system has a GRS status of ACTIVE. Give GRS enough time, usually 4 to 6 minutes, to restart without manual intervention.
  3. If all the systems in the GRS display show either INACTIVE or QUIESCE, issue XCF PATHIN and PATHOUT commands to ensure the entire sysplex has good connectivity. If XCF on any system is unable to deliver signals, one of the systems might not have proper status keeping GRS from restarting. Recover paths as necessary to ensure that all systems have good connectivity.
  4. If the sysplex has good connectivity, yet all the systems in the GRS display still show either INACTIVE or QUIESCE, you can restart the ring by manually driving the GRS group notification exits. To do this, temporarily stop a system using either the hardware console or QUIESCE command.
    Attention: Do not IPL.
  5. Stop the system for one GRS TOLINT interval. Restart the system after the GRS TOLINT interval has expired. Restarting the system should re-drive the GRS group notification exits. If stopping and restarting a system does not restart GRS, then GRS on that system might not be the problem. Pick a different system and try stopping and restarting that system.
    Note:
    1. If you are running with an SFM policy that will take a stopped system out of the sysplex, stop the policy before stopping the system by using:
      SETXCF STOP,POLICY,TYPE=SFM
    2. If you are able to restart the ring, start the SFM policy using:
      SETXCF START,POLICY,TYPE=SFM,POLNAME=XXXXX
      where XXXXX is the SFM policy.

If you have completed the steps above on all the systems and the D GRS output still displays INACTIVE, you can restart the sysplex using the process that follows.

  1. Use the hardware console or the QUIESCE command to temporarily stop the systems until only one is remaining.
    Attention: Do not IPL.
  2. Use D XCF,S,ALL to check systems status. The XCF display output should show only one system as ACTIVE and the other systems as MONITOR-DETECTED STOP.
  3. When only one system is ACTIVE, wait a TOLINT interval until the remaining system restarts as a one system ring.
  4. Issuing a D GRS after waiting a TOLINT interval will show one system as ACTIVE and the other systems as QUIESCED.
  5. When the D GRS command displays an ACTIVE system, start the other systems to have it join the ring.

If GRS is not able to restart, obtain the following data before calling IBM® service:

  1. SYSLOG from all systems.
  2. LOGREC from all systems.
  3. SADUMP from any system that requires an IPL
Use the following JCL to obtain a dump of primary and alternate CDS:
DUMP COMM=(your dump title)
R x,ASID=(1,6,7,A),REMOTE=(SYSLIST=*(1,6,7,A),DSPNAME,SDATA),CONT
R y,DSPNAME=('XCFAS'.*,'GRS'.*),CONT
R z,SDATA=(COUPLE,XESDATA,GRSQ,RGN,ALLNUC,CSA,PSA,SQA,SUM,TRT),END
where x, y, and z are reply numbers.
Note: If a SADUMP of a critical system is not possible, take a console dump.

Go to the previous page Go to the next page




Copyright IBM Corporation 1990, 2014