<   Previous Post  Theme Modifying in...
tmp  Next Post:   >

Comments (24)
  • Add a Comment
  • Edit
  • More Actions v
  • Quarantine this Entry

1 seb_ commented Permalink

Hi Barry! For point 7 (svc_snap -c): does "virtual-to-physical mapping" mean the virtualization matrix? If yes, is this already implemented in pre 5.1 code?<br /> Cheers seb

2 orbist commented Permalink

seb - the -c option actually just says "run a config backup now" rather than picking up the auto generated daily one. Within the XML file generated, it does have the output of mdisk to vdisk mappings, but it doesn't have the full virtualization extent table. This is checkpointed to the quorum disks every 10 minutes, but there is no external command provided to extract the data out of the quorum disks - its intended for internal SVC code use only in the event that the virtualization map needs to be restored.

3 nixfu commented Permalink

One good question to answer for a SVC FAQ is to explain the process of upgrading from a EE edition install to full SVC edition.<div>&nbsp;</div> <div>&nbsp;</div> The EE is a super value proposition, and I personally have gotten asked this question a bunch of times.<div>&nbsp;</div> <div>&nbsp;</div> A clear explanation of how the licenses and the hardware are affected needs to be published. I don't think I have ever actually seen it publicly stated anywhere or in the docs anywhere.<div>&nbsp;</div>

4 orbist commented Permalink

nixfu <div>&nbsp;</div> Its quite simple, there are two questions here. Hardware and licensing.<div>&nbsp;</div> Hardware wise, you can run the EE nodes with either an EE license or per TB license. The code is the same, so you can simply add more nodes to the cluster if you saturate the EE IO group. Or if you want to swap out EE (8A4) for 8G4 nodes, you follow standard non-disruptive hardware upgrade procedure. So just like any other SVC node hardware any model will happily cluster and co-exist as a single entity.<div>&nbsp;</div> To upgrade the license you simple need to contact IBM and pay for the upgrade, then input the licensing number using the chlicense -virtualization command.<div>&nbsp;</div> All IBM SVC software licensing is done on trust, so if you go slightly over your license for a while (while cheques clear, or during a migration etc) we won't stop function from working. You will get a few errors logged, but SVC will not stop working.

5 mdruryscs commented Permalink

I'm trying to isolate production hosts and test/dev hosts on a 4 node SVC cluster. The test/dev environment is roughly twice the size (in terms of hosts/storage/IOPS) of production. Besides keeping production on better performing backend storage as a way to isolate the 2 environments, what do you think about the idea of using one I/O group for Production hosts and the other I/O group for the Test/Dev hosts? We are trying to keep the setup as simple as possible for the lone storage admin. I realize you can spread both Prod and Test hosts across all 4 nodes, and use governing/throttling at the vdisk level to make certain Test/Dev doesn't hog all the I/O requests. However, this doesn't seem to be a very simple thing to do in SVC as you have to first create the vidsk that is going to be governed, then change it to actually give it a govern value. Also, how on earth do you come up with the appropriate value for governing? Seems like a lot of trial and error and extra work. We love the K.I.S.S. principle around here! :-) I would love to hear your take on this scenario.

6 orbist commented Permalink

mdruryscs <div>&nbsp;</div> Ultimately the question needs to be - how wrong will dev/test get it? <div>&nbsp;</div> Seriously though, if a rogue application tries to run away with all the resources in the system, then its good to have somewhere that limits just how much it can grab.<div>&nbsp;</div> One option as you mention is vdisk throttling. Without understanding the applications, and the need of each vdisk, for a storage admin it is probably quite difficult to set either an IO/s or MB/s limit per vdisk.<div>&nbsp;</div> SVC as a whole should not be impacted by a single application running amock, and the cache has the partitioning concept. So no one mdisk group can consume all the cache, therefore a rogue vdisk can't take more than its share of cache resource. However, this feature was provided to protect against hardware issues on the underlying disk controllers, and any resulting overloading that may result - its not intended to limit an already overloaded disk system when an application goes awol.<div>&nbsp;</div> While you could segregate vdisks based on mdisk groups alone, and thus spread the production and test/dev workloads between both IO groups - given that you want to follow a K.I.S.S approach...<div>&nbsp;</div> I'd recommend that you keep production on one IO group and dev/test on the other - that way a 'mistake' or rogue application in dev/test is guaranteed not to impact production. I'd also ensure that the mdisk groups used by each environment are kept apart.<div>&nbsp;</div> Hope this helps.<div>&nbsp;</div> Barry

7 iafs commented Permalink

Hello Barry. We're trying to assess how close our SVC cluster gets to the current restrictions of V4.3.x code. The two parameters that I couldn't find a way to measure - "Concurrent SCSI tasks (commands) per node" and "Concurrent commands per FC port". Is there a way to collect these statistics? What would happen if one of the nodes/ports actually reaches the limit?<div>&nbsp;</div> Thank you,

8 TMasteen commented Permalink

Hallo Barry,<div>&nbsp;</div> We use SVC FlashCopy. We made several fcconsistgrps. We noticed that sometimes stopping a fcconsistgrp can take a long time. Most of the time the 983003 (Stopped) message comes within 10 seconds and sometimes it takes more then 15 minutes (staying in the Stopping State). I'm wondering what the reasons can be or where to look for to see whats causing this.<div>&nbsp;</div>

9 orbist commented Permalink

iafs,<div>&nbsp;</div> We don't export the number of concurrent commands as a statistic. Its very unlikely that you are reaching these limits. This is basically the incoming queue of work per port and node. SVC will accept up to 10,000 commands at any point in time. 2500 of these will be actively processed. This would mean having over 1000 vdisks each driving 10 concurrent commands. Thats going to need some pretty heavy application workload - not to mention host hardware to even generate that number of concurrent I/O.<div>&nbsp;</div> The node CPU and general disk backend is likely to be saturated before you get close to these limits. The best thing to monitor is the CPU utlisation, trying to keep it below 75% as recommended - this recommendation comes from the fact that should a node fail, its partner node has to take on the workload of both nodes. There is less work for the remaining node to do, as it doesn't have to mirror cache data etc - hence 75% and not 50%. <div>&nbsp;</div> In test I drive my nodes to 100% utilisation, and I can do this with maybe only 1000 concurrent commands running to 640 15K RPM disks.<div>&nbsp;</div> In summary, I wouldn't worry about these limits, and concentrate more on the CPU utilisation, backend response times and node to node latency (the time taken for the cache to mirror between nodes)<div>&nbsp;</div> Barry

10 orbist commented Permalink

TMasteen,<div>&nbsp;</div> When stopping an FC consistency group, a few things have to happen. Here is the explanation in the SVC functional specification :<div>&nbsp;</div> Stopping<div>&nbsp;</div> The mapping is in the process of transferring data to an older mapping.<br /> The behaviour of the Target Virtual Disk depends on whether the background copy process had completed whilst the mapping was in the Copying state.<div>&nbsp;</div> If the copy process had completed then the Target Virtual Disk remains Online whilst the stopping copy process completes.<div>&nbsp;</div> If the copy process had not completed then data in the cache is discarded for the Target Virtual Disk. The Target Virtual Disk is taken Offline and the stopping copy process runs.<div>&nbsp;</div> When the data has been copied then a 'Stop Complete' asynchronous event is notified. The mapping will move to state Idle/Copied if the background copy has completed or to Stopped if it hasn’t.<div>&nbsp;</div> ---<div>&nbsp;</div> I guess the question is how complex is the dependency chain you have created. Is this just src -&gt; tgt or are you using multi-target - with various copies triggered at different times?<div>&nbsp;</div> I will check with the FC team on Monday and see if there is more to this than the spec describes :)

11 Storage commented Permalink

Barry , posting a question from Twitter. How come #svc_snap is not documented in #SC26-7903-04 ?

12 seb_ commented Permalink

Asked myself the same thing last week (helped some colleages with the open beta test of v5.1). I can't hardly imagine that it's intended to be missing, because it is in the CLI online help and beside of that sooner or later every customer will have to provide a data collection :o)<div>&nbsp;</div>

13 iafs commented Permalink

hi Barry,<div>&nbsp;</div> Question on large scale data migration to SVC approaches. Choosing between Vdisk mirroring and Flash Copy. The main criteria for the migration approach - there must be a simple and reliable way to backout, the number and duration of host outages must be minimized. Flash Copy licensing cost aside do you see any significant advantages of using Vdisk mirroring vs. Flash Copy?<div>&nbsp;</div> Flash Copy seems to be a better option for us since FC target can be accessed read/write right after FC has started and that the FC source (image mode vdisk in this case) remains completely intact. The entire migration could be done with only one short outage per server.<div>&nbsp;</div> also, thank you for answering my previous question so quickly.<br /> -iafs<div>&nbsp;</div>

14 orbist commented Permalink

seb et al,<div>&nbsp;</div> svc_snap is a support collection tool - intended for use under direction from IBM. It serves little purpose for general usage by end users, hence the reason its not included in the CLI guide.

15 orbist commented Permalink

iafs,<div>&nbsp;</div> I don't think FC will give you what you are looking for. <div>&nbsp;</div> Introducing SVC into an environment needs image mode vdisks to begin with. That should be your only planned interruption for the servers. If you then use FC, the target becomes the 're-striped' method by which you migrate data around the disks - but then you need to re-map the target as your primary copy of the data - requiring another interruption to your servers.<div>&nbsp;</div> If you use vdisk mirroring, you would create an image mode vdisk initially, map to server and server starts doing I/O again. You then add a second copy to the vdisk, striping it across a new managed disk group. Should you then commit to the migrate, then you remove the original copy (image mode) leaving the new copy as the active vdisk (thus no interruption to the servers) since its the vdisk they talk to and not the individual copies.<div>&nbsp;</div> The same thing can be achieved using the vdisk migrate command, here you physically move between the original image mode vdisk (managed disk group) and a new managed disk group - striping as you go. This however doesn't leave the original image mode vdisk in place (which I assume is what you want to do) However you can 'migrate back to image' to a single vdisk with one to one LBA mapping vdisk to mdisk. <div>&nbsp;</div> I would go with the vdisk mirrroing option if you want the extra benefit of being able to simple unmap and extract the original image mode vdisk back to direct mapping outside of SVC, mainly because when you do cut over to use the striped 'copy' there is no interruption to server access.