Crash recovery process

Edit online

The rpc.statd daemon on each machine notifies the rpc.statd daemon on every other machine of its activities. When the rpc.statd daemon receives notice that another machine crashed or recovered, it notifies its rpc.lockd daemon.

If a server crashes, clients with locked files must be able to recover their locks. If a client crashes, its servers must hold the client locks while it recovers. Additionally, to preserve the overall transparency of NFS, the crash recovery must occur without requiring the intervention of the applications themselves.

The crash recovery procedure is simple. If the failure of a client is detected, the server releases the failed client locks on the assumption that the client application will request locks again as needed. If the crash and recovery of a server is detected, the client lock manager retransmits all lock requests previously granted by the server. This retransmitted information is used by the server to reconstruct its locking state during a grace period. (The grace period, 45 seconds by default, is a time period within which a server allows clients to reclaim their locks.)

The rpc.statd daemon uses the host names kept in /var/statmon/sm and /var/statmon/sm.bak to keep track of which hosts must be informed when the machine needs to recover operations.