I've been working with a customer and we've realised that our best practice for snapshots (or Thin provisioned FlashCopies with a background copy rate of zero) may have been slightly flawed. There is no functional impact to these changes- but we can make some small changes that will have a beneficial impact on performance.
There are actually 2 recommendations here:
- The first one is trivial to fix and can have a measurable impact.
- This applies to all forms of thin provisioned and compressed volumes, but they are most likely to be interesting if you have a FlashCopy or a Global Mirror Change volume configuration for very busy production workloads
- The second one is harder to fix and won't have as big an impact for most customers, but I thought I'd mention it here for completeness.
- This is only relevant for FlashCopy or Global Mirror with Change Volumes.
Both issues Especially if you are have any CPU cores that are using above 70% CPU or low latency requirements. It's important to know that whilst the configuration changes you are making will probably be changes to the FlashCopy target volume, they can improve the performance of the production volume whilst the FlashCopy is actually running. In case you don't realise it, Global Mirror with Change Volumes is using FlashCopy to between the normal volumes and the change volumes.
I hope that I manage to explain these so that you can follow along, and that these help at least a few of you out there. As always - comments and questions are welcome.
1/ Running without an emergency buffer for the snapshots
When you create a "regular" thin provisioned volume or compressed volume, we have a best practice that says that the initial rsize (aka real capacity aka rsize) should be set to 2% of the volume capacity. This 2% provides an emergency buffer in case the storage pool runs out of space - because the volume can continue to store new data in that 2% emergency buffer whilst you find a way to enlarge the storage pool. I will call it the emergency buffer for the rest of this description
However - 2% of your total capacity can actually be quite a lot of space, so we decided that for snapshots - we would recommend an emergency buffer of zero. This was because snapshots volumes normally only store a very small amount of data, and there was a high chance that you may end up having the emergency buffer be much bigger than the amount of data that you were storing. Hence the reason for a best practice of an emergency buffer of 0%.
For example - if you have a 2TB volume and you take a snapshot with an emergency buffer of 2% - that would be about 40GB of emergency buffer. It's very possible that your host workload is limited to a small region of disk, and you may only actually store 10GB of useful data (used size) on that snapshot volume. So in that example, the emergency buffer would be 4 times bigger than the data you want to store.
This all makes sense, and has been the best practice for a number of years. If you right click in the GUI and ask the GUI to take a snapshot - the GUI will automatically create a thin provisioned target volume with an emergency buffer of zero percent. However if you ask the GUI to create you a volume and then you create the FlashCopy mapping to that volume yourself - you will be using an emergency buffer of 2%
So what has changed? The problem is that when you have an emergency buffer of zero - then host IO can cause you to run out of space. When you run out of space you have to wait for more space to be assigned before you can perform any more writes to new regions of the thin provisioned or compressed disk. This waiting will cause a delay to your host writes which can impact performance. If your write IO rate is low enough then the write cache will hide this impact from your hosts, but in a busy environment, this may cause the cache to fill up and that can affect host write performance.
Here is an analogy which might help to explain the scenario:
Think of a train on a track. The track is the capacity that you are using to store your data (the real capacity).
As you write user data onto the volume, the train moves along the track.
When the train reaches the end of one individual piece of track, the driver signals the engineers to lay another piece of track at the end of the line.
If you create a volume with an emergency buffer of 0 pieces of track, then the train reaches the end of the track before laying the next piece of track. As such it has to stop and wait for the engineers to lay the next piece on a regular basis.
However, if you create a volume with an emergency buffer of 5 pieces of track, then when you get to the end of piece number one - the engineer starts laying piece number 6 - but whilst he is doing that, you are rolling along piece number 2 without having to wait for the engineer.
So - what's the outcome of all this?
I would recommend that all thin provisioned or compressed volumes should have an emergency buffer of non-zero. A value of 1 or 2 GiB is more than sufficient for almost everyone's need. The busier your FlashCopy source volume is the more important it is to have the emergency buffer be non-zero. However it's important to note that the performance benefit doesn't scale, 100GB won't give you a better performance characteristic than 2 GB.
How did I come up with a value of 2GiB
The way we calculated the 2GB was by looking at the peak write data rate for this customer's volumes, and we worked out how much emergency buffer we would need to absorb their workload for 30 seconds. 30 seconds should be plenty of time to allocate new space (or for those engineers to lay more track). I can't remember exactly what their peak workload was - but lets do the calculation the other way around:
(2GiB * 1024) / 30 seconds = 68MiB/s
So this means that you could absorb 68MiB/s to this volume for 30 seconds without having to allocate any more space.
If you have volumes doing 200MB/s (for example) then you could do the calculation as follows.
( 200MiB/s * 30 seconds) / 1024 MiB in a GB = 5.9 GiB
I should point out that it normally only takes milliseconds to allocate new space - so the 30 second buffer is just to avoid all possibility of doubt.
How do you tell whether you have any volumes with an emergency buffer of zero
All you need to do is compare the current real capacity of the volume with the used capacity of the volume. If the difference between the two is less than 1 GiB (technically if it is less than 1 extent - but 1GiB is easier to explain) then you probably have an emergency buffer size of zero. This information is available in the CLI and it's also available in the GUI if you open the details of the volume.
In my example below the real capacity is around 1GiB larger than my used capacity. So I'm probably OK. However to be on the safe side I could always add another 1 GiB to this to be absolutely sure.
If you are using the CLI - there is a property called free_capacity which calculates that difference for you - so you don't need to do the subtraction in your head.
So how do you increase this emergency buffer?
It's really simple - if you want to add an additional 2 GiB of emergency capacity to a volume you just run the following command:
expandvdisksize -rsize 2 -unit gb <volume name or id>
If this is a mirrored volume then you will also need to specify -copy <copy ID>
IMPORTANT: This command will add two GiB to your real size - so if you run the same command many times on the same volume, it will keep on growing.
2/ Preferred node of source and target volumes of FlashCopy
Another discovery that we made is that if the FlashCopy source and target volumes are either in different IO groups, or they have different preferred nodes, then the act of copying data from the source to the target volume requires inter-node communication.
Now inter-node communication is going on all of the time, but if you have a system where you are seeing any of the following symptoms, then there is a chance that the inter-node messaging may be starting to become an issue, and if we remove unnecessary inter-node messaging then the system can go even faster:
- Port to Local Node Receive Queue Time
- Zero buffer credits on ports performing inter-node messaging.
Unfortunately to find out whether the source and target volumes of the FlashCopy map have the same preferred node requires you to use the command line. Doing a detailed listing of the vdisk will show you the property.
If the two volumes happen to be in the same IO group, but on different preferred nodes, then you can dynamically move the preferred node using the "movevdisk" command as long as you are using 7.3.0 or later.
If you are using older code levels or if your two volumes are in different IO group, then the simplest solution is to create a new volume and reconfigure Global Mirror with Change Volumes or FlashCopy to use the new volume rather than the old one. This is because whilst it is possible to move a volume to a different IO group whilst it is part of a FlashCopy map, the FlashCopy will often still need to perform inter-node messaging back to the original IO group.