I got this interesting question from Andrew Bielecki (BieleckiAP060000HJ8U) from a different blog entry:
- I have 10 VIO servers with 16TB storage pool (16 x 1TB LUNs) on EMC VNX storage which we want to migrate to EMC VMAX3.
- What would be the best way to migrate this pool?
- Should we add 16 LUNs from VMAX3 and remove 16 LUNs from VNX or should we use failgrp command and mirror pool on VNX to VMAX3?
It is a good question as there are many ways to do this operation and you are thinking of two good ones.
The actual underlying disks subsystems vendor makes no different - I assume target disks are faster and/or more reliable.
I assume the new LUNs are the same size or larger than to older ones (i.e. no reduction in size to deal with)
- Add + Remove one LUN at a time: Add a new LUN with pv -add on the new equipment then pv -remove an old one - repeat for each LUN in turn.
- This would work but there are unpleasant side effects.
- SSP will see this as two operations that are unconnected - once you add a LUN it will (in the background) start moving Logical Units (LUs) of 1 MB blocks to the new LUN to optimise performance (spreading the blocks evenly across all LUNs (assuming that are all the same size). In your case roughly 1/16th of the blocks will be moved to the new LUN. If you remove a different LUN shortly afterwards it will see a more urgent request to evacuate the old LUN in which case all the blocks are moved to the now 15 old LUNs and a relatively empty 16th just added LUN. I think this will result in many more block moves (large scale disk I/O operations) than is really needed.
- I am not sure (as I have not tried a bulk LUN swap) but it may want to complete the first operation before starting the next. This could result in a long elapsed time.
- This will result in most blocks moving more than once - some of them 16 times!
- Add + Remove ALL LUNs in go: Add all the new LUNs with pv -add in one command then pv - remove all the old LUNs in one command.
- As above but this effectively doubles the LUN = moves half the blocks to the new LUNs then halves the number of LUNs which force a large scale reorganisation of the LU blocks.
- Don't assume that the contents of an old LUN with be straight forwardly moved to the new LUN - it is likely to go through a algorithm to reassign blocks, twice.
- Replaces a LUN: pv - replace an existing LUN with a new LUN
- I think this will result in a whole sale move of all the blocks of the old LUN moving to the new LUN.
- This can be done many LUNs to many LUNs in one command.
- At least it is one move of the blocks.
- Use a mirror: Add a mirror to the pool with the failgrp -create command on the new LUNs and then remove the first set of LUNs with failgrp -remove
- First I should point out that having your Shared Storage Pool mirrored between two disk subsystems is good and normal (if you can afford the disks). This keeps the SSP and all client virtual machines running even if one disk subsystem completely fails.
- This operation (assuming you add all 16 LUNs in one go as the mirror) will be both concurrent and safe.
- SSP will mirror all the LU allocated blocks at the same time to the new disk subsystem and in one go it will balance all the blocks across all 16 new LUNs = no multiple moves.
- It is safe because if there are problems with the new LUNs (perhaps its a new disks subsystem and could do with a "burn in") having the mirror buys you a safety margin. If there is a problem you can simply drop the new mirror.
- Use a tier: Add the new LUNs via by creating a tier with tier -create, then lu - move the LUs to the new tier and finally remove the original "default" tier.
- This is using the new (end of 2015) feature but there is a sting.
- The original tier is known as the SYSTEM tier and contains the meta data about the SSP internally.
- I don't know of a way to force the SYSTEM meta data to move to another tier without using pv -replace.
- Not recommended - currently.
- If you were just adding the new faster disks (not removing the original LUNs) then placing them in to a new tier would be a very good idea.
- In your case of the Shared Storage Pool (or tier) not being mirrored, I recommend method 4 to add a mirror with failgrp -create, wait for the mirror to SYNC and then remove the old LUNs with failgrp -remove
- For the case of the Shared Storage Pool (or tier) already being mirrored, I recommend method 3 to replace a LUN at a time with pv - replace. As this is just 16 LUNs I would probably do it one at time in a script (so I could triple check the hdisk names are right).
In all case I would watch the disk status carefully and watch the disk I/O to so that I know the expected I/O was taking place and when finished.
Due to the large volumes of data being move and that the blocks will need to be blocked from I/O in turn when actually in transit - try to avoid peak I/O times.
Note: the VIOS you are logged into might not be the one moving the data.
Before the change:
- Check your HMC logs for SSP warnings or even errors.
- Take a SSP settings backup: viosbr -backup -clustername clusterName -file FileName and get the backup off the VIOS
- Run the VIOS advisor command and look for any missing tricks to gain performance - the advisor command is now default install and called part (yes a rather silly name).
- Get rid of any unused LU - reduces the I/O.
- Check the cluster status (ncluster) , that all VIO Servers are up to date, Pool Use (npool) find LUs not mapped anywhere (possible candidate to be removed) (nmap) with Nigel's special Shared Storage Pool scripts that can be found here
In VIOS 2.2..5 = SSP phase 5 the mirror status can be found in the tier -list output - looked for MIRROR_STATE = SYNCED
and watch the disk I/O using your favorite performance command, which is nmon :-)
I hope this helps with your on going SSP re-organisation, cheers, Nigel