Re: Using Virtualization for DR
orbist 060000HPM5 Comments (2) Visits (12413)
Chris posted an interesting solution to provide 3DC disaster recovery, yet only having local virtualization devices.
I posted a couple of responses to this question, and commented that this is a complex area of I/O flow and I would describe here how SVC ensures the remote site is - what we call - consistent. That is, the remote site must always contain a useable copy of the data, which may be slightly behind the local site when asynchronous replication (GlobalMirror) is used. However this does require an SVC cluster at the remote site.
SVC solves the problem by implementing the replication layer above the cache, thus ensuring that the remote site always has a consistent image, thus avoiding cache coherency issues as a result of replication.
To explain a bit further its important to understand the 'stack' of layers inside SVC. I'm providing this here to give readers a greater understanding of how SVC is implemented and answer Chris's integrity questions.
So far nothing unusual, and certainly no rocket science. Chris asked about integrity, with respect to keeping in-order I/Os on the backend disks. Well thats not really how virtualization, nor any caching disk controller system works. Yes, in-order processing is guaranteed. That is, at any time an I/O request to the SVC will return the most recently written blocks for that request, however this can come from cache or disk. The whole point of a cache is to keep the commonly referenced data in memory. Thus, something that is written often may not be destaged to disk for some time as each 'hit' makes it rises to the top of the LRU list. As I say, this is business as usual for any caching disk controller. The question of coherency is covered by the mirroring of the data to the partner node. So if a hardware failure occurs on one node in the I/O group, the partner node contains a copy of the cache (dirty writes) and so will flush this data instantly to disk and continue to operate in write through mode. SVC was designed around no single point of failure (SPoF) and while there is some redundancy built into a node, the pair-of-nodes that form an I/O group were designed to implement no SPoF at the higher node level. Power failures are handled via the battery backup that allows the node to be held up long enough to flush cache data to an internal disk.
Chris's question of integrity, or what I know as 'consistency' can be answered by considering both Fig2 and Fig1. Because the I/O is only completed back to the host after the remote site has acknowledged the write has made it to the remote nodes cache, you can guarantee that should a disaster strike the local site, the remote site always has a consistent image that will pass application and filesystem data checking.
SVC provides consistency groups for all three major copy services, FlashCopy, MetroMirror and GlobalMirror. Therefore this consistency can span multiple vdisks.
However, after all that explanation, Chris's proposal is not possible without an SVC at the remote site, which would bring the solution back inline with an industry standard 3DC layout. The SVC cache as Chris suggests does not guarantee 'on disk integrity' unless the cache is first flushed to disk, therefore without an SVC at the remote site you would not have a consistent remote on-disk image. More importantly, due to the major risk of data-miscompares, connecting the same disks to both SVC and directly to a host is not supported. An interesting idea though Chris.
Added after intial post :
I've just thought of a way this could theoretically be done using SVC if it supported both intra-cluster and inter-cluster replication of the same virtual disks (which today it doesn't).
Please note, this is not supported or in any way recommended, I just thought it was worth pointing out that technically it is possible!
SVC does support two modes of replication, intra-cluster (within a single cluster) and inter-cluster (normal mode of operation to a remote peer cluster). If you were to setup the sites as per Fig4 then it would technically be possible.
Yet again, let me just clarify this is not something that is supported or is likely to be supported, but an interesting concept none-the-less.
Technorati Tags: SVC, IBM, GlobalMirror, MetroMirror, Cache, Three Site, Disaster Recovery, Data Center