nodeJoin Event
This event is triggered when a node joins the cluster after a node reboot or rejoined after losing membership to the cluster or getting started after an extended outage. Scope of the recovery is all file systems to which the node disks might belong to. In most case, the disk state can be ready/up if no I/O operation has been performed or ready/down. However, based on the prior events, the state could vary to suspended/down or unrecovered/recovering.
Recovery process
- Perform simple checks on the disks assigned to the file systems.
- Check if a tschdisk start is already running from a prior event. Kill the process to include disks from the current nodes.
- Start all disks on all nodes by running: tschdisk start -a to optimize recovery time. This command requires all nodes in the cluster to be functioning in order to access all the disks in the file system.
- Start All down disks on all Active nodes by running: tschdisk start -F<file containing disk list>.
- If the file system version is 5.0.2 and later, auto recovery will run mmchdisk fs-name resume -d <suspended-disk-by-auto-recovery>. If the file system version is earlier than 5.0.2, this command will not be executed.
- After successful completion, for file system version 5.0.2 and later, all disks must be in the
ready/up state. For file system version earlier than 5.0.2, all disks must be in the suspended/up
state.
For file system version 5.0.2 and later, if the administrator runs mmchdisk fs-name suspend -d <disks> and these disks do not resume by auto recovery, the administrator needs to resume these disks manually.
If a new diskFailure event is triggered while tschdisk start is in progress, the disks will not be restored to the Up state until the node joins the cluster and triggers a nodeJoin event.