How can I test Storwize V7000 Node Canister failure?
anthonyv 2000004B9K Visits (22606)
I have received this question several times, so it's clearly something people are interested in.
The Storwize V7000 has two controllers known as node canisters. It's an active/active storage controller, in that both node canisters are processing I/O at any time and any volume can be happily accessed via either node canister.
The question then gets asked: what happens if a node canister fails and can I test this? The answer to the question of failure is that the second node canister will handle all the I/O on its own. Your host multipathing driver will switch to the remaining paths and life will go on. We know this works because doing a firmware upgrade takes one node canister offline at a time, so if you have already done a firmware update, then you have already tested node canister fail over. But what if you want to test this discretely? There are four ways:
First up this test assumes there is NOTHING else wrong with your Storwize V7000. We are not testing multiple failure here. You need to confirm the Recommend Actions panel as shown below, contains no items. If there are errors listed, fix them first.
Once we are certain our Storwize V7000 is clean and ready for test, we need to connect via the Service Assistant Web GUI. If you have not set up access to the service assistant, please read this blog post first.
So what's the process?
Firstly logon to the service assistant on node 1 and place node 2 into service state. I chose node 2 because normally node 1 is the configuration node (the node that owns the cluster IP address). You need to confirm your connected to node 1 (check at top right) and select node 2 (from the Change Node menu) and then choose to Enter Service State from the drop down and hit GO.
You will get this message confirming your placing node 2 into service state. If it looks correct, select OK.
The GUI will pause on this screen for a short period. Wait for the OK button to un-grey.
You will eventually get to this with Node 1 Active and Node 2 in Service.
Node 2 is now offline. Go and confirm that everything is working as desired on your hosts (half your paths will be offline but your hosts should still be able to access the Storwize V7000 via the other node canister).
When your host checking is complete, you can use the same drop down to Exit Service State on node2 and select GO.
You will get a pop up window to confirm your selection. If the window looks correct, select OK.
You will get the following panel. You will need to wait for the OK button to become available (to un-grey).
Provided both nodes now show as Active, your test is now complete.