z/OS Security Server RACF Diagnosis Guide
Previous topic | Next topic | Contents | Contact z/OS | Library | PDF


Actions to recover from a coupling facility error

z/OS Security Server RACF Diagnosis Guide
GA32-0886-00

Do not issue SETXCF to force the rebuild of a structure into a coupling facility that is not available to the system because the result is read-only mode. If SETXCF was issued, you need to exit out of read-only mode by issuing RVARY DATASHARE. Therefore, RACF® returns to the original coupling facility.

If you encounter a situation where coupling facility recovery scenarios do not work properly, the following information might make it possible for RACF to continue servicing requests.

For example, assume a sysplex with 3 members: J90, J91, and J92. A coupling facility containing RACF Structure IRRXCF00_B001 has been lost. Member J90 remains active.

The following messages are received at the operator console:
IRRX016I RACF MEMBER J90 DETECTED A COUPLING FACILITY ERROR
IXC521I REBUILD FOR STRUCTURE IRRXCF00_B001 HAS BEEN STARTED
IRRX020I REBUILD FOR STRUCTURE IRRXCF00_B001 ON MEMBER J90 HAS BEEN INITIATED
The following message is not received at the operator console:
IRRX008I REBUILD FOR STRUCTURE IRRXCF00_B001 HAS BEEN COMPLETED
Issuing the following command from the operator console:
DISPLAY XCF,STRUCTURE
displays the following: IRRXCF00_B001 ALLOCATED REBUILDING
Issuing the following command from the operator console:
DISPLAY GRS,CONTENTION
displays the following: SYSZRAC2 , minor name backup-racf-db is held on system J90 by RACFDS

In this situation, members of the sysplex might be unable to function properly because RACF is holding enqueues. A rebuild of a RACF structure has been requested but cannot go ahead. The following steps might be helpful in allowing RACF to continue operating, though it will be at a degraded mode.

  1. Make sure if message IXC402D has been received one or more times at the operator's console, you reply "down" to all of them. If this allows REBUILD to complete, you do not need to continue with the following steps.
  2. Issue the following command at the operator console:
    SETXCF STOP,REBUILD,STRNAME=IRRXCF00_B001

    This command stops REBUILD and release enqueues. Additionally, message IRRX004A is received, which displays the following: IRRX004A MEMBER J90 IS IN READ-ONLY MODE.

  3. Issue the following command at the operator console:
    RVARY NODATASHARE

    All remaining sysplex members now operate off the database, without the coupling facility. Note that performance will not be as good while running without the coupling facility.

  4. You might be able to further improve the situation if you have configured your sysplex appropriately. This means that you have more than one coupling facility and that after the failure of one of them is still available. To illustrate this, read the following:

    Take, for example, two coupling facilities. All structures for the primary RACF database are assigned to one coupling facility and all structures for the backup RACF database are assigned to the other coupling facility. (In this example, assume that no alternate coupling facilities have been assigned.) If you lose one or the other of the coupling facilities in this configuration, you can still get back into data sharing mode, though it will be without a backup database.

    For example:
    • If the coupling facility containing the primary database structures goes down, issue the command:
      RVARY SWITCH
      which makes the backup database primary and deactivates the old primary database.
    • If the coupling facility containing the backup database structures goes down, issue the command:
      RVARY INACTIVE
      on the backup database.
    • Issue the command:
      RVARY DATASHARE
      which allows remaining sysplex members the ability to connect to all structures on the available coupling facility.
    The fourth step has an adverse consequence, however. Although these steps improve performance while a coupling facility is unavailable, your backup and primary databases will most likely become out of sync. This must be resolved before you can go back to normal operations with both primary and backup databases. This can be done by using IRRUT200, as documented in z/OS Security Server RACF System Programmer's Guide.

Go to the previous page Go to the next page




Copyright IBM Corporation 1990, 2014