Shared Storage Pool Stuck or Down? Don't Panic - Raise a PMR
nagger 100000MRSJ Comment (1) Visits (9460)
I have a few customers and IBMers tell me their Shared Storage Pool (SSP) failed to come up after some major disaster like
They then fiddle about and eventually, like hours or even days later, send me email.
All I can say is I am sorry to hear about their issue, that I don't have that problems and I have had my share of electricity cuts (ironically while testing the uninterruptible power supply!). But I can offer some advice . . .
The VIOS SSP feature is built to be easy to operate like a car. You have deliberately NOT been handed a large set of tools to diagnose issues or take it apart. Like a car it was found that users with powerful tools and little knowledge do more harm than good. A lease-hire car rule clearly states that any damage you do to the car trying to "fix it" will be paid for by YOU! What they have effectively done, in the above case, is broken down in their high tech car on the motorway and spent to whole weekend fiddling about and looking around the engine compartment - probably scratching their head and wondering what it all does! and tinkering with various parts and settings. The smart money would have called in the country car breakdown service immediately and get an expert on the case to "work the problem".
Now let me confess - I have had a few problems with SSP over the years but the SSP was not to blame - it was me. In testing I have have made some ghastly mistakes and some deliberately "to see what happens" to my "crash and burn" SSP. Whenever, I get problems with my "sort of production" like SSP, then commands (like cluster -list) reporting on the SSP starting up with "unable to connect to database" it is 100% a user created problem like DNS is down or I messed up the VIOS local /etc/hosts file (misspelling one VIOS hostname (a 2 should have been a 1) or duplicated IP address use (two VIOS with the same address due to bad editing)). Note: /etc/hosts should have all the VIOS listed on every VIOS in the SSP and /etc/netsvc.conf set to local first. Basically, these are self inflicted wounds. In fact it is often amazes me the SSP was working on any nodes after some of my screw-up!
So perhaps I should highlight to all SSP users at the first sign of a SSP issue that you are not alone.
Here is What to Do Next?
From working with the VIOS Shared Storage Pool support and development team, I am always impressed with their diagnosis skills and ability to workout a trick problem.
Cheers, Nigel Griffiths