Restoring an inactive clustered machine

You can restore a machine that has rebooted or been down when it is part of a cluster.

About this task

When the machine first starts mysql, the following events can occur, taking up some time:
  • Position recovery; the time taken will depend on whether the database has previously shutdown cleanly and how many sites the portal is hosting.
  • A synced node will be requested, at random, to become a donor node and send incremental state transfer.
  • If the incremental state transfer fails, due to the database logs having wrapped, then a non-incremental state transfer is requested, this causes xtrabackup to take a backup of a donor node and tar it across the network using socat on port 4444 to the joining node. It will also be reading the database logs, to ensure any new changes can be sent to the joining node as the donor can still accept writes to the database. Monitor /var/log/syslog for updates or errors on this process. If the SST fails then check /var/lib/innobackup.backup.log on the donor node for details.
Note: Processes running on the donor node might take significantly longer to run while the preceding process is running.
lsyncd / csync2 is configured in a ring, the ring sends from the 1st machine to the 2nd, and then the 2nd to the 3rd, until the last machine sends to the 1st. Each machine maintains a database for itself and for the node it is sending to in /var/lib/csync2.