Getting Hot with Data Retrieval

Share this post:

IBM scientists and developers spanning from Almaden, California; Tucson, Arizona and Zurich, Switzerland recently achieved a significant breakthrough in distributed FLASH cache for enterprise transaction processing. The technology was recently unveiled at the IBM EDGE 2012 conference in Orlando, Florida and was demonstrated to show a latency improvement of more than 5x for certain workloads. The advancement will make finding documents faster and with real time analytics.

We recently caught up with one IBM’s storage technologies scientist Dr. Ioannis Koltsidas in Switzerland to understand the achievement. 

“In this era of Big Data, this technology can help with
real time analytics for banking transactions, medical data
and billing systems,” said Dr. Koltsidas

Can you explain what was achieved in simple terms?

Ioannis Koltsidas: Sure, simply put we have created a novel caching framework that exploits synergies between storage area network (SAN) storage and servers called Triton. In complex global IT environments it is not uncommon to have multiple servers connected to a SAN. Within these environments there is hot data, which is accessed often, and cold data, which isn’t. We’ve developed several novel technologies that enable users to access the hot data at a fraction of the SAN latency by storing it in local caches based on Flash memory.

What are some of the applications for this technology?

IK: I’d say that most data-intensive applications will benefit from this technology. We are especially looking at applications such as transaction processing for brokerage workloads, document retrieval and content management, as well as Virtual Machine storage in scale-out environments. Also, in this era of Big Data, this technology can help with real time analytics for banking transactions, medical data and billing systems, for instance.

Using large solid state drive arrays such as the IBM EXP30 Ultra  we can store up to 10 Terabytes in the cache. So if an organization has a lot of hot data we can make it quick to retrieve. 

What specifically did you contribute?

IK: I helped in designing a smart way to manage the cache so that high performance and high scalability can be achieved. More specifically, an algorithm that recognizes which data is hot and which is not. It nearly knows what you want before you do, because it looks for patterns in what data is accessed and when.

What’s next for the technology?

IK: It will be generally available in 2013, but we will continue to refine the code and look to port it to different server and storage platforms and make it available to both native and virtual environments. We also see a strong opportunity with IBM PureSystems and Netezza

Last question, what is your own personal motivation in this research?

IK:  My PhD thesis focused on databases for flash storage so this is a topic near and dear to me.  As I mentioned we are also now firmly in the era of Big Data and as nearly any scientist will tell you its always good to have strong market demand for your research.
More stories

A new supercomputing-powered weather model may ready us for Exascale

In the U.S. alone, extreme weather caused some 297 deaths and $53.5 billion in economic damage in 2016. Globally, natural disasters caused $175 billion in damage. It’s essential for governments, business and people to receive advance warning of wild weather in order to minimize its impact, yet today the information we get is limited. Current […]

Continue reading

DREAM Challenge results: Can machine learning help improve accuracy in breast cancer screening?

        Breast Cancer is the most common cancer in women. It is estimated that one out of eight women will be diagnosed with breast cancer in their lifetime. The good news is that 99 percent of women whose breast cancer was detected early (stage 1 or 0) survive beyond five years after […]

Continue reading

Computational Neuroscience

New Issue of the IBM Journal of Research and Development   Understanding the brain’s dynamics is of central importance to neuroscience. Our ability to observe, model, and infer from neuroscientific data the principles and mechanisms of brain dynamics determines our ability to understand the brain’s unusual cognitive and behavioral capabilities. Our guest editors, James Kozloski, […]

Continue reading