The two-phase commit process

A distributed transaction is a transaction that runs in multiple processes, usually on several machines. Each process works for the transaction. This is shown in Figure 1, where each oval indicates work that is being done on a different machine and each arrow between ovals indicates a remote procedure call (RPC).

Figure 1. Example of a distributed transaction
Example of a distributed transaction

Distributed transactions, like local transactions, must observe the ACID properties. However, maintenance of these properties is very complicated for distributed transactions because a failure can occur in any process. If such a failure occurs, each process must undo any work that has already been done on behalf of the transaction.

A distributed transaction processing system maintains the ACID properties in distributed transactions by using two features:

  • Recoverable processes. Recoverable processes log their actions and therefore can restore earlier states if a failure occurs.
  • A commit protocol. A commit protocol allows multiple processes to coordinate the committing or aborting of a transaction.

Recoverable processes can store two types of information: transaction state information and descriptions of changes to data. This information allows a process to participate in a two-phase commit and ensures isolation and durability. Transaction state information must be stored by all recoverable processes. However, only processes that manage application data (such as resource managers) must store descriptions of changes to data. Not all processes that are involved in a distributed transaction need to be recoverable. In general, clients are not recoverable because they do not interact directly with a resource manager.

The most common commit protocol is the two-phase commit protocol. In each transaction, one process acts as the coordinator. The coordinator oversees the activities of the other participants in the transaction, to ensure a consistent outcome. The two-phase commit protocol involves two phases:

Prepare phase
In the prepare phase, the coordinator sends a message to each process that is in the transaction. It asks each process to prepare to commit. When a process prepares, it guarantees that it can commit the transaction and makes a permanent record of its work. After guaranteeing that it can commit, it can no longer unilaterally decide to abort. If a process cannot prepare (that is, if it cannot guarantee that it can commit the transaction), it must abort.
Resolution phase
In the resolution phase, the coordinator records the responses. If all participants are prepared to commit, the transaction commits; otherwise, the transaction aborts. In either case, the coordinator informs all participants of the result. In the case of a commit, the participants acknowledge that they have committed. Committed changes to data are made permanent. This ensures that a successful transaction is reflected as a permanent change to a database and survives hardware and software errors.