Storage

Why predicting data reduction can lower costs for all-flash storage

Share this post:

The more data you have, the more hardware you need in the back end. This is where the economics of storage come in: if we can reduce the amount of data, we can cut costs by reducing the amount of hardware needed to store it. And if you can get an accurate estimate of what data can be reduced and by how much, you can really save by buying only the hardware you need.

Because data reduction works differently for different kinds of data, clients are not always aware of the potential and of whether or not it pays to invest in new hardware. This is where our tool can help.

As part of our work on IBM’s new FlashSystem A9000, my team at IBM Research – Haifa made two contributions. First, we worked on algorithms to help scale the performance of data reduction for very large systems. Second, we built a new tool code-named the “Data Reduction Estimator”, to help clients identify how much they can benefit from deduplication and compression of their data.

Deduplication, also known as “dedup”, is a data reduction method that works by saving repeated data chunks only once and then pointing to them from all of the other places they are used. This method is particularly well-suited for flash storage, which can read memory almost instantaneously but costs more than standard storage.

New challenges for data reduction in primary storage

Compression has been around since the 1970s and is widely accepted in the industry. In fact, many of us use it every day in zip files. Deduplication made its name about 10 years ago. It could reduce the size of data being backed up, allowing systems to back up only what changed from the day before. At that time it became clear that if you use a smart system for deduplication, you can save tons of space for backup.dedup

More recently, dedup has become a popular choice for reducing the size of primary storage.  We looked carefully into the kinds of data that do or don’t benefit from deduplication. Take, for example, high-level enterprise data like databases that store repositories of names or transactions, where there are a lot of small entries without very much repetition. This data cannot get much reduction from deduplication.  But the opposite is true for the world of virtual machines running on the cloud.

A virtual machine essentially takes a person’s computer and places it within a central storage where an operating system runs on it to simulate a physical machine.  A large enterprise like a bank doesn’t need to give every employee a computer. Rather, each person can have a screen and a keyboard and run their “desktop” virtually on the cloud. With each person running the same operating system, and very likely many of the same programs, we can benefit from huge cost reductions with deduplication. For organizations with 1,000 machines, you can reduce the amount of storage space needed by the system significantly.

The bottom line is that data reduction is great for reducing costs, and deduplication is a key capability when it comes to primary storage in all-flash systems. My talk at Edge together with IBM Fellow Andy Walls was on a “Deep Dive into Deduplication in All-Flash Storage” to explain deduplication, the related challenges, what to expect from different workloads, and how to use the tool.

Click here to learn more about the work on cloud storage at IBM Research – Haifa.

Cloud Storage Scientist, IBM Research

More Storage stories

IBM FlashSystem 9100 – The core of the data-driven multi-cloud enterprise

IBM believes that today there is only one kind of successful enterprise – the data-driven multi-cloud organization.[1] We can see the needs of data-driven businesses reflected in some of the most powerful trends currently driving enterprise data storage: Non-Volatile Memory Express (NVMe), artificial intelligence, multi-cloud, containers and more.[2] The question becomes: “Is there a storage […]

Continue reading

IBM Elastic Storage Server adds Storage Utility Offering option

IBM introduces a flexible procurement method that aligns your unstructured data capacity costs to your business initiatives. Buy scale-out file and object storage with cognitive data management your way. Unstructured data creation continues at an explosive and accelerating rate and does not appear to be slowing down. Customers are looking to their trusted IT partners […]

Continue reading

Powering business at the speed of life

Mark Peters, Practice Director and Senior Analyst at Enterprise Strategy Group (ESG) observes: “Time and again our market research reminds us that it’s the needs of overall business, more than any other factors, that drive the choice and adoption of IT technologies. And, whatever those overall needs, what’s equally clear is that right now businesses […]

Continue reading