IBM Support

Cows versus Rowers

Technical Blog Post


Abstract

Cows versus Rowers

Body

You may have read on the register that they got their hands on an e-mail about XIV Gen4 not happening (depending on how you read it you can either summarize that XIV is dying, or that XIV is alive and well and has been renamed to A9000). I have very little contact with the XIV team, it's hard enough keeping up with what is going on with SVC/Storwize/Spectrum Virtualize. I'm also not important enough to get copied on e-mails from executives, so haven't seen the e-mail myself.

 

But there was also a comment about RoW snapshots coming to Spectrum Virtualize in 2018, again I've no idea if that is true. But I thought I would talk about why CoW snapshots are a good thing for Spectrum Virtualize. So first what is the difference:

CoW - Copy on Write, when you create a snapshot you copy all data from the original to the snapshot. If you want to write to the original then you copy the old data from the original to the snapshot and then write the new data over the old.

RoW - Redirect on Write, the difference here is that when you get a new write that write gets written straight away but doesn't overwrite the old data, it may get written in the middle of the snapshot, or to a completely new place. Some metadata is then updated to say where the new data is. One way of doing this is to have that the original copy becomes the snapshot and you start copying from the snapshot to the new version, but there are many different ways of doing it.

 

So you may think that from a performance point of view RoW is the way to go, and if you have a strorage controller full of SSDs then you may be right. In a homogeneous controller RoW is great, if we had taken the FlashSystem 840 and added snapshot technology then RoW would have been the clear choice. It doesn't really matter if the new and old data moves round or gets mixed together or gets fragmented, it's all really fast flash.

 

But you have to remember that Spectrum Virtualize's roots are very much in "virtualization" of storage, so its strengths come in heterogeneous SANs. You might have your FlashSystem 900 and a V3700 full of nearlines connected to the same SVC cluster, or you might have a V7000 with a dense drawer full of nearlines and some Tier 1 flash in the controller. Now you've got different tiers of storage, you create your volumes on the faster tier and decide to use the slower tier for the snapshot. Suddenly RoW becomes a headache, does it redirect to the slower tier meaning that all subsequent reads of the data come from the slower tier or does it redirect to the faster tier meaning that you have the old and new data using up space in that tier until the old data can be moved to the slow tier and free up some space? What happens if the fast tier is full?

 

Even within a tier if you decide that you want your snapshot to be compressed working out exactly what you want from a redirect is complicated. Again does it end up being stored somewhere new until the data it is overwriting has been compressed and moved elsewhere? For Spectrum Virtualize there are other complications, EasyTier needs to understand where the live copy vs the snapshot copy is so that it can work optimally to make the best use of your faster tiers, you could have provisioned your volume with two copies from two different mdiskgrps containing Distributed RAID6 to cope with double drive failures, but your snapshots may only want a single copy on Distributed RAID5.

 

RoW in heterogeneous environments can work, there are just a lot of different options, and depending on the choices made you will have some problems with performance of some workloads, or you may need to over-provision your faster tiers to allow for redirects (and therefor lower your performance per dollar). If Spectrum Virtualize implements it, do we try and expose those options to the customer? We have always tried to keep the configuration as simple as we can (I'd argue we haven't really managed that in a lot of cases), as the point of storage virtualization is to reduce the admin costs of the SAN, there are controllers out there that have pages of options for all sorts of different things that you can go on courses for weeks to understand, but I can't see us changing tack now and going in that direction.

 

Also remember that most workloads are read heavy, and so sometimes a little but of pain on a write is worth it to make sure that those reads are as fast as you need them to be, especially when you have multiple levels of write caching in hosts and in the SAN.

 

Again, I must repeat I have no idea what the plans are (if any) for implementing RoW in Spectrum Virtualize. But hopefully this gives you more of an insight into why CoW has a place when it comes to performance.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SS4S7L","label":"IBM Spectrum Virtualize Software"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

UID

ibm16164205