Data Reduction Pools

Technical Blog Post

Abstract

Body

I assume by now everyone has read Barry's blog post about Data Reduction Pools that were released with Spectrum Virtualize v8.1.2 but I thought I would talk about one of the cool features we added as part of it along with some of the changes that might confuse anyone who spends too long looking at the xml stats files.

Way back in 7.3.0 a re-architecture of the cache was released, the re-architecture included some things that we never actually enabled. One of those was a feature that allowed volumes to cache writes even when running 1-way. From a performance point of view this feature is great, it means that when a node is removed (either for an expected or unexpected reason) the cache manages to hide the write latency of the backend. From a redundancy point of view it is rather scary, if you have data cached on a single node what happens if that node suffers catastrophic hardware failure?

Ever since version 1.1.0 when there was only a single node the cache would start to flush and all new writes would be written all the way to disk. This is to get the system to a state where that single node being lost doesn't cause data loss as fast as possible. It also means that during upgrades or service actions that require removing a node all data is flushed before the node is removed.

With Data Reduction Pools things have changed. This change mainly hinges on the design of the metadata handling, in any Log Structured Array system you have the data that the host has written and metadata that the knows where that data is stored. In our first implementation of compression the metadata was stored inline with the data, so writes come in from the host and turn into a sequential stream of compressed data with small metadata writes mixed through. In our new implementation of compression (and thin provisioning) the metadata writes are written separately. This allows the cache to handle data and metadata differently.

With a Data Reduction Pool when only running with a single node the metadata writes are flushed as quickly as possible, while the data can continue to be cached. Loosing metadata writes is really bad, it makes recovery of the system much more complicated.

So the first advantage this gives is that if the system has one of those very rare instances of loosing the last node with the cache, there is a much better chance that no metadata writes were lost.

The second advantage it gives, is much better performance when there is only one node. This has a huge impact in particular on upgrades which allows us to give much better performance throughout the upgrade. Before a node is removed the metadata is flushed, which should be very quick, then when the node comes back and is ready to rejoin the cluster the data is very quickly flushed (it is fast due to the fact that it is a nice sequential stream, and it is the only data that needs to be flushed). So the periods where the cache is being flushed are shorter and whilst the node is being upgraded we are still caching the data.

The third advantage is that if the system looses that last node, because we have preserved all the metadata if you try to read data that has been lost, you get a medium error. This means that working out what data needs to be reconstructed should be a lot easier.

When I talk about loosing the cached data this is only if the node can't produce a hardened dump file or the hardened dump file is lost, these events are extremely rare, it does not include things like power loss. During a power loss the node will dump all of the cache to its internal hard-disk using the battery backup. When the power is restored the cached data is loaded from the internal hard-disk and everything continues on as before with no data loss.

This splitting of the metadata and the data has made the xml statistics a little more complicated. Cache has metadata that doesn't necessarily belong to a specific volume, as metadata for lots of volumes can be stored within a single grain. As a result there are now statistics at the pool level for various forms of metadata, and gives you a combined view of all data reads/writes for a Data Reduction Pool. In general whilst these are helpful for our support team for debugging performance issues, most users will want to continue looking at the statistics for the volume (which does also include all the data reads and writes).

As data read and writes to a Data Reduction Pool are accounted for in the pool level statistics and the volume copy level statistics, there is no need to add multiple statistics together and if you do it will look like there have been twice the number of IOs as the system has actually done. It also hopefully means any bespoke performance monitoring solutions you have should continue to work.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SS4S7L","label":"IBM Spectrum Virtualize Software"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

UID

ibm16164187

Tips

Data Reduction Pools

Technical Blog Post

Abstract

Body

UID

Share your feedback

Need support?