Failure recovery processing
When a VTAM® connects to the multinode persistent session coupling facility structure in the sysplex, VTAM indicates that it wants to be notified when other connections to the structure end. The following describes the recovery processing for a persistent-enabled application program.
- First, the other VTAMs are notified that there is a node failure. One of the other VTAMs that is connected to the multinode persistent session structure marks the failing VTAM persistent-enabled applications as recovery pending.
- The other VTAMs then clean up the failing VTAM persistent-disabled application programs
from the top of the coupling facility. When the coupling facility
structure is cleaned up, no recovery can occur. Note: From the perspective of the nonfailing partner, the sessions are still active. The sessions will remain active until the HPR connection is terminated.
- For all persistent-enabled application programs, a timer is started
with the PSTIMER value specified by the application program. There
is a timer for each MNPS application program residing in the failed VTAM. The PSTIMER indicates the
amount of time an application can remain in recovery pending state.
The recovering application must successfully issue an OPEN ACB before
the timer expires on one of the VTAMs or VTAM cleans up the application program session
information in the MNPS coupling facility structure. Note: Affinities associated with generic resource application programs that are multinode persistent session-capable remain in the generic resource coupling facility structure until the timer expires. If the generic resource application program is not recovered, affinities are not deleted until the application program is restarted.
For recovery to occur, the application program must be persistent-enabled and the sessions must have traversed an HPR connection. If these conditions are met and there is a VTAM in the sysplex that is connected to the MNPS coupling facility structure, recovery can take place. The following describes the recovery process:
- An application is started either through an operator command or the automatic restart manager (ARM). The application opens an ACB with the same application program name as the application program being recovered. Recovery can occur on the same VTAM that the application program resided on (if that VTAM has been restarted) or on a different VTAM.
- The recovering VTAM obtains
the capabilities of the application being recovered from the coupling
facility structure. The capabilities of the recovering application
program must match those of the application program being recovered
for recovery to continue. Note: The use of model definitions to define the application program reduces the chances of capabilities not matching. The use of model definitions for MNPS applications is required at a network node.
See z/OS Communications Server: SNA Programming for details on the actual capabilities that must match.
- To indicate that the application has been recovered, the information in the coupling facility structure is updated to reflect the new owning VTAM and the application program is marked in persistence-ENABLED state.
- The new owning VTAM determines
whether it has a CDRSC with the same name as the new application program.
If a CDRSC definition exists and VTAM determines
that the CDRSC represents the recovering application program when
it was active on its previous owning VTAM,
the sessions associated with the CDRSC are terminated and the CDRSC
becomes a shadow resource. This must be done because an application
program of the same name now exists.
If recovery occurs at a VTAM DLUS node, any sessions between the application and the DLUR-dependent LUs served by the VTAM DLUS network node are the exceptions to this processing. If the DLUS is intermediate or absent on the HPR connection path, the DLUS has only minimal session awareness of these sessions. These sessions are maintained and associated with the HPR connection that is to be rebuilt during MNPS recovery processing. Although the sessions are maintained, they are disconnected from the CDRSC representation of the application at this point in the processing and associated with the APPL RDTE that now represents the application. This allows OPEN ACB processing to continue.
- The new owning VTAM reads in all the application program data from the coupling facility into a VTAM data space.
- The new owning VTAM performs
a path switch to reestablish a route for the session. During the path
switch processing, the recovering side of the session will wait 90
seconds (or 4 minutes if both application programs are MNPS). If the
connection is not reestablished, recovery is not successful. The other end of the HPR connection is cleaned up when its path switch timer pops. For HPR connections being used by multinode persistent sessions, the minimum value for the path switch timer is four minutes. However, if a higher value is specified for the path switch timer (for VTAM, this is the HPRPST start option) at either session partner, it will be used.Note: The values of the PSTIMER and HPRPST should be coordinated to synchronize effective use.
- The new owning VTAM re-creates
the necessary session control blocks from the information in the coupling
facility. The application program must restore the session, using
the OPNDST RESTORE command, before session data traffic can resume
completely. See the z/OS Communications Server: SNA Programming for additional
information. Some applications might reset the sessions before resuming
data traffic and if they do that, they are able to reduce the performance
impact of MNPS by using NIBNTRCK on OPNSEC and OPNDST when establishing
sessions. Note: Applications using APPCCMD API will use the APPCCMD RESTORE command.
During recovery, you might notice a peak in storage usage on the new owning VTAM. When recovery is complete, storage usage should go back down.
A VARY INACT of an HPR connection being recovered because of multinode persistence processing will not be processed until recovery processing has completed.