Why predicting data reduction can lower costs for all-flash storage

Share this post:

The more data you have, the more hardware you need in the back end. This is where the economics of storage come in: if we can reduce the amount of data, we can cut costs by reducing the amount of hardware needed to store it. And if you can get an accurate estimate of what data can be reduced and by how much, you can really save by buying only the hardware you need.

Because data reduction works differently for different kinds of data, clients are not always aware of the potential and of whether or not it pays to invest in new hardware. This is where our tool can help.

As part of our work on IBM’s new FlashSystem A9000, my team at IBM Research – Haifa made two contributions. First, we worked on algorithms to help scale the performance of data reduction for very large systems. Second, we built a new tool code-named the “Data Reduction Estimator”, to help clients identify how much they can benefit from deduplication and compression of their data.

Deduplication, also known as “dedup”, is a data reduction method that works by saving repeated data chunks only once and then pointing to them from all of the other places they are used. This method is particularly well-suited for flash storage, which can read memory almost instantaneously but costs more than standard storage.

New challenges for data reduction in primary storage

Compression has been around since the 1970s and is widely accepted in the industry. In fact, many of us use it every day in zip files. Deduplication made its name about 10 years ago. It could reduce the size of data being backed up, allowing systems to back up only what changed from the day before. At that time it became clear that if you use a smart system for deduplication, you can save tons of space for backup.dedup, Data Reduction Storage

More recently, dedup has become a popular choice for reducing the size of primary storage.  We looked carefully into the kinds of data that do or don’t benefit from deduplication. Take, for example, high-level enterprise data like databases that store repositories of names or transactions, where there are a lot of small entries without very much repetition. This data cannot get much reduction from deduplication.  But the opposite is true for the world of virtual machines running on the cloud.

A virtual machine essentially takes a person’s computer and places it within a central storage where an operating system runs on it to simulate a physical machine.  A large enterprise like a bank doesn’t need to give every employee a computer. Rather, each person can have a screen and a keyboard and run their “desktop” virtually on the cloud. With each person running the same operating system, and very likely many of the same programs, we can benefit from huge cost reductions with deduplication. For organizations with 1,000 machines, you can reduce the amount of storage space needed by the system significantly.

The bottom line is that data reduction is great for reducing costs, and deduplication is a key capability when it comes to primary storage in all-flash systems. My talk at Edge together with IBM Fellow Andy Walls was on a “Deep Dive into Deduplication in All-Flash Storage” to explain deduplication, the related challenges, what to expect from different workloads, and how to use the tool.

Click here to learn more about the work on cloud storage at IBM Research – Haifa.

More Storage stories

The hot storage trends for 2020

Hybrid cloud, Multicloud, Storage

Now that 2019 has ended, we anticipate incredible storage advancements to come in 2020. Storage is the essential foundation for all your application, workloads, and data sets. If your storage is not reliable, resilient, performant, and flexible, the value of your most critical business asset–your data–decreases dramatically.  Read on to see what is coming your more

The Top 10 storage moments of 2019

Cloud computing, Multicloud, Storage

2019 was a big year for IBM Storage, with a slew of exciting launches of new solutions, fascinating and valuable reports, and deep dives into the ways in which storage can help your organization continue to innovate and drive value from your oceans of data. But amongst all that great news, what stands out as more

A future of powerful clouds

Hybrid cloud storage, Multicloud, Storage

In a very good way, the future is filled with clouds. In the realm of information technology, this statement is especially true. Already, the majority of organizations worldwide are taking advantage of more than one cloud provider.[1] IBM calls this a “hybrid multicloud” environment – “hybrid” meaning both on- and off-premises resources are involved, and more