Why predicting data reduction can lower costs for all-flash storage

Share this post:

The more data you have, the more hardware you need in the back end. This is where the economics of storage come in: if we can reduce the amount of data, we can cut costs by reducing the amount of hardware needed to store it. And if you can get an accurate estimate of what data can be reduced and by how much, you can really save by buying only the hardware you need.

Because data reduction works differently for different kinds of data, clients are not always aware of the potential and of whether or not it pays to invest in new hardware. This is where our tool can help.

As part of our work on IBM’s new FlashSystem A9000, my team at IBM Research – Haifa made two contributions. First, we worked on algorithms to help scale the performance of data reduction for very large systems. Second, we built a new tool code-named the “Data Reduction Estimator”, to help clients identify how much they can benefit from deduplication and compression of their data.

Deduplication, also known as “dedup”, is a data reduction method that works by saving repeated data chunks only once and then pointing to them from all of the other places they are used. This method is particularly well-suited for flash storage, which can read memory almost instantaneously but costs more than standard storage.

New challenges for data reduction in primary storage

Compression has been around since the 1970s and is widely accepted in the industry. In fact, many of us use it every day in zip files. Deduplication made its name about 10 years ago. It could reduce the size of data being backed up, allowing systems to back up only what changed from the day before. At that time it became clear that if you use a smart system for deduplication, you can save tons of space for backup.dedup

More recently, dedup has become a popular choice for reducing the size of primary storage.  We looked carefully into the kinds of data that do or don’t benefit from deduplication. Take, for example, high-level enterprise data like databases that store repositories of names or transactions, where there are a lot of small entries without very much repetition. This data cannot get much reduction from deduplication.  But the opposite is true for the world of virtual machines running on the cloud.

A virtual machine essentially takes a person’s computer and places it within a central storage where an operating system runs on it to simulate a physical machine.  A large enterprise like a bank doesn’t need to give every employee a computer. Rather, each person can have a screen and a keyboard and run their “desktop” virtually on the cloud. With each person running the same operating system, and very likely many of the same programs, we can benefit from huge cost reductions with deduplication. For organizations with 1,000 machines, you can reduce the amount of storage space needed by the system significantly.

The bottom line is that data reduction is great for reducing costs, and deduplication is a key capability when it comes to primary storage in all-flash systems. My talk at Edge together with IBM Fellow Andy Walls was on a “Deep Dive into Deduplication in All-Flash Storage” to explain deduplication, the related challenges, what to expect from different workloads, and how to use the tool.

Click here to learn more about the work on cloud storage at IBM Research – Haifa.

Cloud Storage Scientist, IBM Research

Add Comment
No Comments

Leave a Reply

Your email address will not be published.Required fields are marked *

More Workload & resource optimization stories

IBM Spectrum Protect Plus delivers modern data protection

Modern data protection for your data-driven, multi-cloud digital business IBM Spectrum Protect Plus has been well received by press, analysts and the market. Steve Scully, Senior Analyst at The Evaluator Group, identified one of the critical reasons why in a recent paper on the subject: “A trend in the data center is to get more […]

Continue reading

Container data availability and reuse

Containers provide rapid deployment in a lightweight framework. They are ideal for scaling up and down services, rapid provisioning for development and an integral part of many DevOps workflows. Containers have less overhead than virtual machines and the flexibility for practically any workload. As organizations adopt containers more broadly, data management and availability requirements have […]

Continue reading

Transforming complexity into simplicity

Where there’s greater complexity in your business and IT/cloud infrastructure, there’s often increased risk. But, with the upcoming announcement of a new VersaStack™ Cisco Validated Design (CVD) for your private cloud, IBM and Cisco are minimizing your risk so your private cloud deployment is simple, easy and efficient. The new CVD, which is currently scheduled […]

Continue reading