Why predicting data reduction can lower costs for all-flash storage

Share this post:

The more data you have, the more hardware you need in the back end. This is where the economics of storage come in: if we can reduce the amount of data, we can cut costs by reducing the amount of hardware needed to store it. And if you can get an accurate estimate of what data can be reduced and by how much, you can really save by buying only the hardware you need.

Because data reduction works differently for different kinds of data, clients are not always aware of the potential and of whether or not it pays to invest in new hardware. This is where our tool can help.

As part of our work on IBM’s new FlashSystem A9000, my team at IBM Research – Haifa made two contributions. First, we worked on algorithms to help scale the performance of data reduction for very large systems. Second, we built a new tool code-named the “Data Reduction Estimator”, to help clients identify how much they can benefit from deduplication and compression of their data.

Deduplication, also known as “dedup”, is a data reduction method that works by saving repeated data chunks only once and then pointing to them from all of the other places they are used. This method is particularly well-suited for flash storage, which can read memory almost instantaneously but costs more than standard storage.

New challenges for data reduction in primary storage

Compression has been around since the 1970s and is widely accepted in the industry. In fact, many of us use it every day in zip files. Deduplication made its name about 10 years ago. It could reduce the size of data being backed up, allowing systems to back up only what changed from the day before. At that time it became clear that if you use a smart system for deduplication, you can save tons of space for backup.dedup

More recently, dedup has become a popular choice for reducing the size of primary storage.  We looked carefully into the kinds of data that do or don’t benefit from deduplication. Take, for example, high-level enterprise data like databases that store repositories of names or transactions, where there are a lot of small entries without very much repetition. This data cannot get much reduction from deduplication.  But the opposite is true for the world of virtual machines running on the cloud.

A virtual machine essentially takes a person’s computer and places it within a central storage where an operating system runs on it to simulate a physical machine.  A large enterprise like a bank doesn’t need to give every employee a computer. Rather, each person can have a screen and a keyboard and run their “desktop” virtually on the cloud. With each person running the same operating system, and very likely many of the same programs, we can benefit from huge cost reductions with deduplication. For organizations with 1,000 machines, you can reduce the amount of storage space needed by the system significantly.

The bottom line is that data reduction is great for reducing costs, and deduplication is a key capability when it comes to primary storage in all-flash systems. My talk at Edge together with IBM Fellow Andy Walls was on a “Deep Dive into Deduplication in All-Flash Storage” to explain deduplication, the related challenges, what to expect from different workloads, and how to use the tool.

Click here to learn more about the work on cloud storage at IBM Research – Haifa.

Add Comment
No Comments

Leave a Reply

Your email address will not be published.Required fields are marked *

More Workload & resource optimization Stories

Software-defined storage: How to get started

Software-defined storage (SDS) has gained momentum in the enterprise IT market. The adoption of SDS among forward-thinking organizations is being driven in part by IT cost reduction, automation and infrastructure flexibility needs. SDS provides a means to transition from storage area network (SAN) storage and network-attached storage (NAS), and to scale-out architectures that use server-rich […]

Continue reading

IBM’s all-flash innovation continues

Flash storage has come a long way. A few years ago, only about a third of enterprises worldwide had deployed flash storage solutions. By next year, nearly 80 percent of enterprises will be using flash in some form.[1] And industry analysts expect that by 2020, all-flash arrays will drive more than 70 percent of all […]

Continue reading

Storage for the future

Industry analysts believe that the amount of data in the world will grow from 4.4 zettabytes in 2013 to 180 ZB in 2025,[1] more than doubling every two years.[2] This astounding growth will be fueled by sources such as the Internet of Things (IoT), video surveillance and mobile and social systems of engagement. For example, […]

Continue reading