The system recovery procedure works only when all node canisters are in candidate status.
Ensure that the service assistant displays all of the node canisters with the 550 error code. The
550 error code is the expected node error when more than half of the nodes in the system are missing
or when the active quorum disk cannot be found. If the service assistant displays any node canisters
with error codes 550 or 578 and all the recommended actions have been completed on these nodes, you
must remove their system data.
About this task
Before performing this task, ensure that you have read the introductory information in the
overall recover system procedure.
Having used the service assistant to identify the system status and specific error, you will
continue to use the service assistant to complete this procedure.
Selecting Change Node in the service assistant tool lists all of the
Storage Virtualize nodes that have
logged in to the node that is running the tool. Follow these guidelines when performing the recovery
procedure:
- The system column of the node table identifies any nodes that are not in the system of
nodes that must be recovered. Do not remove the system data for these nodes.
- Do not remove system information from any node that has online status, unless directed to do so
by remote technical support.
- Do not remove the system data from the first node until you ensure that the following conditions
are met:
- All nodes in the system of nodes are listed in the Change Node part of the service assistant and
are in service status with error 550 or 578
- You have checked the extra node error data for each node to ensure that no other communication
or hardware problem is causing the node error.
Procedure
-
In the service assistant tool, select the node with status service and error 550 or
578.
-
Select Manage System.
-
Click Remove System Data.
Note: Spare nodes do not go into the 878/578 state that active nodes do. As such, the
Manage System screen does not have the
Remove System
Data button for spare nodes. To remove system data on spare nodes, ssh onto any spare
node and run the following commands.
satask leavecluster
-force
satask stopservice
Failure to remove the
cluster state from the spare nodes results in the T3 failing, as the new cluster is unable to find
the spare nodes as available candidates.
-
Confirm that you want to remove the system data when prompted.
-
Remove the system data for the other nodes that display a 550 or a 578 error.
All nodes previously in this system must have a node status of
Candidate and have no errors listed against them.
-
Resolve any hardware errors until the error condition for all nodes in the system is
None.
-
Ensure that all nodes in the system of nodes to be recovered display a status of
candidate.
Results
When all nodes display a status of candidate and all error conditions are
None, you can run the system recovery procedure.