Error recovery during two-phase commit

Recovering from error conditions is a normal task associated with application programming, system administration, database administration, and system operation. Distributing databases over several remote servers increases the potential for error resulting from network or communications failures. To ensure data integrity, the database manager provides the two-phase commit process. The following information explains how the database manager handles errors during the two-phase commit process.
  • First Phase Error

    If a database communicates that it failed to prepare to commit the unit of work, the database client rolls back the unit of work during the second phase of the commit process. A prepare message is not sent to the transaction manager database in this case.

    During the second phase, the client sends a rollback message to all participating databases that successfully prepared to commit during the first phase. Each database then writes an "ABORT" record to its log file, and releases the locks that were held for this unit of work.

  • Second Phase Error

    Error handling at this stage is dependent upon whether the second phase will commit or roll back the transaction. The second phase rolls back the transaction only if the first phase encountered an error.

    If one of the participating databases fails to commit the unit of work (possibly due to a communications failure), the transaction manager database attempts to perform the commit again on the failed database. The application, however, is informed that the commit was successful through the SQLCA. Db2® ensures that the uncommitted transaction in the database server is committed. The database manager configuration parameter resync_interval is used to specify how long the transaction manager database waits between attempts to commit the unit of work. All locks are held at the database server until the unit of work is committed.

    If the transaction manager database fails, it resynchronizes the unit of work when it is restarted. The resynchronization process attempts to complete all indoubt transactions; that is, those transactions that finished the first phase, but did not complete the second phase of the commit process. The database manager associated with the transaction manager database performs the resynchronization by:
    1. Connecting to the databases that indicated they were "PREPARED" to commit during the first phase of the commit process.
    2. Attempting to commit the indoubt transactions at those databases. (If the indoubt transactions cannot be found, the database manager assumes that the database successfully committed the transactions during the second phase of the commit process.)
    3. Committing the indoubt transactions in the transaction manager database, after all indoubt transactions are committed in the participating databases.

    If one of the participating databases fails and is restarted, the database manager for this database queries the transaction manager database for the status of this transaction, to determine whether the transaction should be rolled back. If the transaction is not found in the log, the database manager assumes that the transaction was rolled back, and rolls back the indoubt transaction in this database. Otherwise, the database waits for a commit request from the transaction manager database.

    If the transaction was coordinated by a transaction processing monitor (XA-compliant transaction manager), the database always depends on the TP monitor to initiate the resynchronization.

If, for some reason, you cannot wait for the transaction manager to automatically resolve indoubt transactions, there are actions you can take to manually resolve them. This manual process is sometimes referred to as "making a heuristic decision".

Error recovery if autorestart=off

If the autorestart database configuration parameter is set to OFF, and there are indoubt transactions in either the TM or RM databases, the RESTART DATABASE command is required to start the resynchronization process. When issuing the RESTART DATABASE command from the command line processor, use different sessions. If you restart a different database from the same session, the connection established by the previous invocation is dropped, and must be restarted again. Issue the TERMINATE command to drop the connection after no more indoubt transactions are returned by the LIST INDOUBT TRANSACTIONS command.