SnF system recovery

Each SnF queue is located in one of several databases managed by the SIPN. The SIPN periodically replicates these databases, and each time it does so, it records the replication time. A request or file that was put into an SnF queue can be regarded as safely stored only if the database was later replicated. In rare situations, an error might occur that causes the SIPN to recover the system by beginning a new database run. When this happens, any requests or files that were put into an SnF queue after the most recent database replication might be lost, and so must be resent with a possible duplication indicator, as illustrated in Figure 1.

Figure 1. SnF system recovery. This figure shows timelines for three database runs. Runs 02 and 03 were each the result of an SnF system recovery. Each letter indicates a time at which a request was put into an SnF queue of this database. The requests that were put into an SnF queue at the times indicated by the letters e and i were stored after the most recent database replication for the corresponding run, and so are resent and flagged as possible duplicates.

Database  Run  
                 a  b   c   d  e    
   01     01   |------+------+-----X

                                     f  g      h     
   01     02                       |------+------+--X

                                                      i  
   01     03                                        |----->


Legend:
    | = Beginning of a database run
    + = Database replication
    X = End of a database run caused by an SnF system recovery operation
  a-i = Time at which a request was put into an SnF queue of this database

After the SIPN puts a request or file into the SnF queue of the receiver, it returns the following two values to the MSIF transfer service:

SnF input time: This value indicates the date and time that the SIPN put the request or file into the SnF queue.
Last replication time: This value indicates the date and time that the SIPN most recently replicated the database that contains the SnF queue into which the request or file was put.

Both of these values are of the form:

ddrr:YYYY-MM-DDThh:mm:ss

where ddrr represents the two-digit database ID followed by the two-digit ID of the current database run.

When the MSIF transfer service receives these values from the SIPN, there are two possibilities:

The current run ID is identical to the run ID that an MSIF transfer service last recorded for the database

This indicates that the SIPN did not begin a new run for this database. If a completed scenario for this run has an SnF input time that is:

Earlier than the last replication time, the corresponding request or file was safely stored in the receiver's SnF queue. The MSIF transfer service changes the condition of the completed scenario to finished.
Later than the last replication time, the MSIF transfer service cannot be sure whether the corresponding request or file was safely stored. It assigns the scenario the condition waitForReplication and retains the corresponding message payload or file. The scenario retains this condition until the MSIF transfer service receives a response for a subsequent scenario that uses the same SnF database and that contains a last replication time that is later than the SnF input time of the completed scenario. After the MSIF transfer service receives such a response, it changes the condition of the completed scenario to finished.

Note: The MSIF transfer service is not notified of the current last replication time until it or another MSIF transfer service receives a response for a subsequent transfer. For this reason, a scenario might retain the condition waitForReplication long after the replication of the SnF database has finished.

The run ID has increased since the MSIF transfer service last recorded it

This indicates that the SIPN carried out a SnF system recovery operation and began a new run for that database. When this happens:

The MSIF transfer service requests, from the SIPN, the last replication times of all of the database runs of that database. To request the last replication times, the MSIF transfer service uses the same SAG that received the response that contained the incremented run ID.
After the MSIF transfer service receives the last replication times from the SIPN, it identifies all recently terminated transfers that have an SnF input time that is later than the last replication time of its database run. The corresponding requests or files might not have been successfully stored in their SnF queues, and so must be resent with a possible duplication indicator.
The MSIF transfer service reprocesses the corresponding scenarios:
- It sets the transfer state back to Requested.
- It repeats RMA traffic filtering
- It flags the corresponding primitive as a possible duplicate.
- It re-sends the corresponding primitive. If it sends the primitive via an input channel, it uses a new input sequence number.
After the MSIF transfer service completes its reprocessing of the scenario, it sets the transfer state to Completed_Notif. If it passed a response to the sending application the first time it processed the scenario, it does not do so again.

The receiver is not directly affected by an SnF system recovery operation; however, the receiver must be able to process the resent requests and files as possible duplicates. How the MSIF transfer service processes such transfers is described in Possible duplicate handling.