VIOS Shared Storage Pool phase 6 - New FeaturesUPDATE: The Official IBM VIOS 2.2.5 Shared Storage Pools (Phase 6) New Features Announcement arrives three weeks after GA but better late than never, right! You can find it at this URL: VIO
My Summary of the Important FeaturesAvailable with VIOS 2.2.5 – Generally Available 11th Nov 2016 1. SSP VIOS nodes from 16 to 24 – a normally 12 dual VIOS servers
2. Fully support the DeveloperWorks SSP Disaster Recovery software
3. RAS (Reliable, Available & Serviceable)
• Cluster Wide Snap Automation – Useful for problem determination
• Asymmetric Network Handling – SSP stability during network issues
• Lease by Clock Tick – So you can change VIOS date & time
4. HMC further GUI support for SSP à Arrives with HMC 860
5. HMC Performance & Capacity Metrics including SSP Performance stats
6. List LU’s not mapped to a VM - new lu command option
Below are the details: 1 Increased SSP VIOS nodes from 16 to 24In Practice:
Need more then 24 VIOS using SSP? Here are a few ideas:
Nigel Notes:
Result: Larger SSP clusters, Tested to higher I/O rates and increased flexibility. 2 Official Support for the DeveloperWorks SSP Disaster Recovery methodThe code has been available for a few years from DeveloperWorks on a ad-hoc "best effort" basis from the developers and works fine in my experience and those of others I know. Here it is assumed your SAN team and infrastructure can create a point in time copy of the SSP LUNs at a remote location on completely different disks. On a site loss, you get your Virtual Machine disks back by starting the SSP on a different set of VIOS on alternative Power servers. The problem is the VIOS will look at the disks and refuse to work with them as they see it as "not their SSP". The trick is to have a SSP configuration backup file, a script which changes the names of the VIOS that are part of the SSP. Now, when that backup is recovered, your new VIOS does think it is a member of the SSP and happily joins or starts the SSP. You can with some preparation also get it to automatically map the virtual disk LU's to a newly created set of virtual machines, all ready to be started up and continue the service. See my YouTube Video on this: SSP Remote Pool Copy Activation for Disaster Recovery I can also detect changes to the "viosbr" (VIOS backup and restore command) to support this: $ viosbr
Usage: viosbr -backup -file FileName [-frequency dail viosbr -backup -clustername clusterName -file FileName [-frequency dail viosbr -nobackup viosbr -view -file FileName [ [-type devType] [-detail] | [-mapping] ] viosbr -view -file FileName -clustername clusterName [ [-type devType] [-detail] | [-mapping] ] viosbr -view -list [UserDir] viosbr -restore -file FileName [-validate | -inter] [-type devType] [-skipdevattr] viosbr -restore -file FileName [-type devType] [-force] viosbr -restore -file FileName -skipcluster viosbr -restore -clustername clusterName -file FileName -skipdevattr viosbr -restore -clustername clusterName -file FileName [-subfile NodeFile] [-validate | -inter | -force][-type devT viosbr -restore -clustername clusterName -file FileName -repopvs list_of_disks [-validate | -inter | -force][-type devType][-db] viosbr -restore -clustername clusterName -file FileName [-subfile NodeFile] -xmlvtds viosbr -dr -clustername clusterName -file FileName -type devType -typeInputs name:value [,...] -repopvs list_of_disks [-db] viosbr -recoverdb -clustername clusterName [ -file FileName ] viosbr -migrate -file FileName viosbr -autobackup {start | stop | status} [ -type {cluster | node} ] viosbr -autobackup save $ We await fully documentation on this topic. UPDATE: From what I can see, the VIOS SSP backup fix-up script is now handled by the viosbr -dr command option (which is excellent) and the tiny software fix that you needed to install is already installed with the VIOS. Both good ideas to have that function integrated in the normal VIOS function. Result: more confidence in setting up and using the remote DR for the whole pool feature. 3 RAS improvementsWe can view this is the Shared Storage Pool reaching maturity. The Support Team review common issues that they hit with real-life customers and look for ways to eliminate them or reduce the damage by keeping the SSP running. Here we have the top three that are visible to SSP administrators. I know of about half a dozen other improvements internally which are not shared publicly - mostly because they require an understanding of the code itself. There are large numbers of Shared Storage Pools in use today and lots of them are fully production use. I get rather annoyed when asked "Yes but how many are in production? Suggesting it is a tiny number. I would guess there are thousands in FULL PRODUCTION USE. It is very hard to get a hard number as its a free feature with your VIOS which is actually a PowerVM component. As IBM does not charge for it separately we can't track it. I would guesstimate: We are in the many tens of thousands of VIOS running SSP with many 1,000's in production. Here is a hard number: I have had 15 thousand views of my SSP YouTube Videos (up to 31st October 2016) The two main video take 1.5 hours to watch each i.e. 500 people are serious about their SSP. I would not pretend for a second all SSP users have watched my videos as there is internal and external training plus Technical University sessions for a start.
3a Cluster Wide Snap AutomationIt seems that the first Support response to a PMR is: “Can we have a snap? Problem diagnostics data.”
This command collects snap data from every node in the SSP cluster at the same time and creates a single compresses file on the node that the command was run. In brief the Syntax: clffdc -c TYPE [-p priority] [-v verbosity] Example: all data and priority=medium, as padmin: clffdc -c FULL -p 2 Two VIOS Nodes just installed, very small & simple config clffdc –c FULL –p 2 This took 5 minutes. # oem_setup_env ; cd /hom
# ls -s csnap*full* 77624 csna
Result: less VIOS / SSP administration time and better problem determination at the first attempt.
3b Asymmetric Network Handling - a new internal algorithmAll VIOS in the SSP cluster are equal except one or two are special - the VIOS that are special change over time. It is a role they get as an extra that is passed on to other VIOS if, for example, that VIOS gets shutdown by the administrator. One such role is the cluster managing node.
Rare condition: cluster manager has a partial network issue i.e. can communicate with some nodes but not all nodes [asymmetric]. It the past, failed communication from the managing node to another node(s) would result is all those VIOS being expelled. Now the manager node double checks if the other (good communicating nodes) can talk to each other (non-communicating nodes). If they can or a good number of them can then the managing VIOS node works out that it might be the problem (or its part of the network) and hands over the managing role to an alternative node. OK there is nothing the SSP Admin needs to do here but its good to know SSP has your back when you have network problems.
Result: during flaky network issues, we get fewer unnecessary expelled nodes = SSP stability While you fix the network.
3c Lease by Clock TickI did not understand this one so a kind SSP developer explained it: These leases are related to heartbeats and data integrity. One of the key things to guard against in a cluster is a node that looks dead but isn't really dead. In other words, if a node stops responding to heartbeats for a time, the other nodes will (eventually) assume it is really dead. They expel it, which removes it from the list of nodes they need to coordinate with. If that node isn't really dead but just dormant for an extended period of time, then when it comes back it:
It has outdated information and is therefore not in any fit state to touch the shared disks in the pool. So when a user in a prior release would change the date or time, the resulting time jump makes the node think it was dormant (possibly stuck in kdb (kernel debugger), possibly a victim
Once at VIOS 2.2.5 + 1st service pack, you can change the date or time at any . . . time!! Previously, you had to offline the VIOS & client LU’s using the clstartstop -stop command, change the date or time and then add the node back into the SSP with clstartstop -start.
Result: I suspect many people don't know or forget to use clstartstop (been there myself) so from this 2.2.5 release on-wards, we will not have this issue.
4 HMC 860 graphical user interface Enhanced+ adds more SSP management featuresI have not got the new HMC code installed yet - its not out for another month. So sorry no screen capture yet. But the Enhanced+ view has always been adding SSP features with every release for a few years now.
Result: Saves Admin time and it means you don't need to log on to a VIOS to see your SSP configuration or make changes.
5 HMC Performance and Capacity Metrics (PCM) for SSPAgain I have not got this new HMC available but it does come with a significant new feature is making loads of performance numbers available. This includes lots of SSP stats so that you can see, for example, the total I/O going to/from you SSP disks. Now the bad news the format will be in JSON which is fairly painful to use, restructure and graph. Once I have examples from my SSP I will create and publish worked examples of the data. The new HMC release might (note MIGHT) come with tools. From beta samples of the data we should get performance stats including: SSP and tier level stats
Then for each LUN in the SSP - which might be far too detailed for normal use a SSP normally has a few dozen LUNs.
Remember this is from beta code so some changes or extras are possible.
Result: loads more stats to collect, graph and think about - this will give Administrators insight in to what the SSP is up too and warning about the future (SSP filling up or over worked).
6 Listing mapped and unmapped to Virtual Machines LUsIf you say "mapped" when you read the word "provisioned" below you can work out what this does! $ lu -list -attr prov $ lu -list -attr provisioned=true = Is mapped to a VM
Unfortunately it does not say on which VIOS those LU's are mapped (mounted) i.e. test3 and test 5
See my nmap script for that. See the AIXpert Blog: Shared Storage Pools Hands-On Fun with Virtual Disks (LU) by Example
Result: Administrators performing regular maintenance can quickly determine if they have LUs no longer mapped to a VM - this could be a mistake that needs investigating or might be due to disks from deleted virtual machines and the disk space could be released back to the SSP.
7 Secret SSP performance boosterThis is scheduled for early 2017 - I wish I could say more at this time :-) This will require hardware in the VIOS to support this function. Results: a large performance jump for disk I/O intensive workloads with minimum SSP administrator setup.
- - - The End - - - |