Storage virtualization – part 3 – Performance
orbist 060000HPM5 Comments (8) Visits (14832)
Cornerstone #5 The potential to increase system performance
While this topic directly corresponds with ‘Cornerstone #5’ it does also contribute #2 Simplification of storage management and #4 Increased storage utilization
Pooling and Striping
Most enterprise level controllers, the likes of DS8000, DMX and USP, and some mid-range controllers provide a degree of array pooling. That is, a method to concatenate the capacity provided by more than one RAID array into a single pool of usable capacity. Thus, luns are provisioned by carving up a chunk of this pool. One of the main reasons for providing this function is to increase the performance of the provisioned luns. Intrinsically Storage virtualization devices provide pooling abilities. However Storage virtualization devices also allow the pooling of storage across multiple controller instances, and generally a much greater degree of freedom in how luns are provisioned. Most of the large vendor products will automatically decide how to provision the luns based on the RAID type or some internal controller design point – sounds sensible. However in some cases this may mean that luns are actually carved from only one array or a subset of one array. So how does this add performance? It doesn’t. It aids the performance of the box by sticking to whats been judged to be best by the vendor (based on the design of the product), but it does not provide the same type of pooling that a virtualization device can provide.
So the key thing here is the heterogeneous striping that is possible once you have virtualized your SAN storage. All three approaches can provide you with this pooling or striping ability – but it all comes down to the heterogeniality of the chosen devices – I’ll cover interop and its importances later. Most virtualization devices provide similar striping abilities – to some degree. Here I will focus on an example using SVC – as its what I know best!
With SVC a pool is called a ‘managed disk group’ This group can contain up to 128 ‘managed disks’. Each of these managed disks is an array, so in itself is many physical spindles. The pool is internally divided into ‘extents’ – these are of variable size and can be from 16MB up to 512MB (today) – the extent size is an attribute of the pool, and is fixed for a given pool. If my understanding of the Invista documentation is correct, the equivalent extent size is variable per virtual lun. An SVC cluster can support 4M (4x 1024 x 1024) extents – so the chosen extent size does dictate the total virtualized capacity. When you create a ‘striped’ virtual disk, you specify the pool you want to use (you can also specify sub or super-sets of the mdisks in the pool to manually control the stripe-set) Anyway suppose you are creating a virtual disk that is large enough to stripe across 128 managed disks, and these were themselves 4-disk RAID-10 arrays. You now have a single virtual disk that could have the random read performance of 4x128 (512) physical disks, and random write performance of 2x128 (256) physical disks. In such internal tests using fairly old DS4300 controllers with 15K RPM drives, I have measured a single virtual disk returning over 125K random read miss operations, and over 50K random write miss operations – before the response time of the disks themselves start to head off in the usual ‘hockey stick’ manner. Now this is of course a pretty extreme example, and in general most users will stripe across maybe 4 to 8 arrays, but that’s still 4x to 8x the standard performance on random operations. My main caveat here would be that of course the disks themselves have limits (if only they didn’t!) and as you increase the number of virtual disks being carved from a single pool, you will reach some saturation point where the workload of all the virtual disks matches up with the workload that would have been possible on the single arrays. As with any storage system the overall peak performance potential is only as great as the sum of its parts – start going over that point and you will suffer, unless you can provide some buffering (cache) to cope with the busy peaks.
As I’ve already discussed in response to Hu’s questions and in my Over-alloaction post, Over-allocation is probably easier to do in a virtual lun – where you already have to store a mapping table and can just update this on the fly as new data is written to the virtual device. Otherwise controllers need to implement a method of recording which extents have and have not yet been allocated. Essentially going some way towards making it a virtualizing controller!
The specific design approach can affect the latency associated with IO requests. So lets look at why this is the case.
So why a cache?
I guess one question is why do you need a cache? Some vendors will claim there is no need for a cache when you have large caches in your enterprise controllers. While this maybe partly true, not all your storage, and not all customers have enterprise controllers with huge caches.
We have proved again and again that adding SVC above most mid-range controllers which generally have a smaller cache than SVC can improve performance by a noticeable amount – it’s difficult to quantify as it will depend on the cache hit ratio across the virtual environment but something in the order of 10-20% could be expected. There is also evidence to show that even enterprise controllers with large caches are not impacted adversely by SVC, as the actual read or write operations will generally 'hit' the cache in the controller just as they would do without SVC fronting them. That is, SVC does not chance the caching characteristics of a controller beneath it. It should be thought of more as now having multiple levels of cache, like CPU’s do these days. SVC is your L1 and the controller is your L2. I've seen no evidence that the SVC cache is hampering underlying controller cache algorithms – usually it’s the other way – SVC is sending too much I/O to the back-end controllers. We recently had to add a feedback path to catch slow responding controllers and ramp down the destage rate, especially with the power available to the latest 8G4 nodes. SVC does provide the usual sequential detect / pre-fetch algorithms and again this doubles with the same type of algorithms in the controllers allowing the very high sequential throughput rates, as benchmarked by SPC-2.
It can also be seen from the above discussion that an in-band approach does benefit from having a cache for both improving performance and reducing latency. However devices without cache have to rely solely on striping abilities to enhance the performance of the existing storage systems.
As I discussed in part2, once you have fully virtualized your storage you require Copy services in the virtualization device. If you have a cache in the same device, while you do need to temporarily flush the cache to prepare to take the snap, once the point in time has been triggered there is no impact to source write operations. The cache stops the additional latency that would otherwise have been introduced by the read / merge / write operations.
One further thought with respect to online data migration. Migration services maybe acting upon data, that is in the middle of copying or moving extents. If a new write comes in for blocks on the in-flight extent(s). Without caching, the new write will be stalled until the complete extent has been migrated.
In this part I’ve covered pooling, striping and caching. I am aware that I’ve had more of an SVC slant in this part of the discussion, and it was intentional. I wanted to show why not all in-band implementations are afflicted by additional latency. Yes, SVC does add a few microseconds of latency to some operations, but in the bigger scheme of things the cache negates most of this and can provide additional performance gains – over and above and striping. The other advantages that come with sitting in the data path and so having visibility of all the data that flows through the device outweigh the minimal additional latency on read-miss operations.