Storage

Fast data: The future of big data

Share this post:

It’s not news that big data is getting bigger by the second.  However, in addition to sheer volume, there is also increasing demand to take action faster than ever based on the data.  An organization’s leaders want to gain a competitive advantage by turning raw data into actionable intelligence. How can they quickly and efficiently pull together huge volumes of data from dozens or even hundreds of isolated and disparate data sources? Much of this data is not of the traditional, structured variety but is, instead, being driven by the growth of the Internet of Things (IoT) and the collection of data from digital human interactions. According to IDC, by 2025 there will be 80 billion connected devices, from fewer than 20 billion today, with over 150,000 new connected devices being added every minute.

To handle the demand for speed and the volume of analytics, organization leaders are starting down the path towards human/digital interaction and cognitive applications that mine data in order to react to change. The first steps down this road are being taken with the adoption of technologies such as Apache Spark, along with machine learning and deep learning.  But computing power is only part of the answer: accessing and managing all of this data can create a significant bottleneck.

An organization’s big data resides in tens or hundreds of isolated systems associated with different applications or serving different lines of business. Moreover, Hadoop, the most commonly used framework for big data analytics, requires data from other systems to be copied over to the Hadoop Distributed File System (HDFS). This is a time-consuming process during which data can get stale.  It is also a waste of resources, since it results in multiple copies of the data – the original plus the HDFS copy.

A solution for managing big data

The solution to these data access and management challenges is a high-performance data and file management solution designed to support big data analytics. IBM has announced the IBM All Flash Elastic Storage Server (ESS) 5.2, whose solid state storage improves data bandwidth performance by 60 percent over previous solutions.

Incorporating IBM Spectrum Scale, ESS spans an organization’s data lakes, creating one unified data ocean with a single namespace against which to run analytics quickly and efficiently.  It supports a wide variety of network protocols and provides the ability to automatically and transparently tier data across flash, disk, tape and cloud. Another important advantage of IBM Spectrum Scale is that it provides direct access for Hadoop to underlying data storage without requiring data to be copied over into an HDFS environment.

Rapid changes in the big data analytics ecosystem are being driven by open source and industry-wide improvements. IBM is partnering with Hortonworks, and IBM Spectrum Scale 4.2.3 has been certified with the Hortonworks Data Platform (HDP) 2.6/Ambari 2.5.

Managing multiple frameworks and versions requires advanced workload management. IBM Spectrum Scale software can be deployed with IBM Spectrum Conductor with Spark to provide a unique solution that optimizes performance, eases management and comes complete with Apache Spark.

In summary, the new IBM All-Flash Elastic Storage Server 5.2 expands the existing ESS family to provide industry-leading performance and efficiency in support of faster big data analytics and allows users to:

  • Reduce performance bottlenecks on critical IT workloads such as backup.
  • Run Hadoop and other big-data applications directly on enterprise storage.
  • Share data across applications with unified storage for file and object data.
  • Benefit from high-availability design for five nines of availability with faster rebuild of failed disks with erasure coding for declustered RAID technology and fully redundant data pathways.

To learn more about IBM Elastic Storage Server and how it can help you manage your big data assets, please visit our website or check out the datasheet.

Add Comment
No Comments

Leave a Reply

Your email address will not be published.Required fields are marked *

More Workload & resource optimization Stories

IBM unveils new software for AI, machine and deep learning

IBM Spectrum Software helps ease adoption and production of parallel processing and clustered computing IBM is announcing new software to deliver faster time-to insight for high performance data analytics (HPDA) workloads, such as Spark, Tensor Flow and Caffé, for AI, machine learning and deep learning. Based on the same software which will be deployed for […]

Continue reading

IBM Storage Utility Offering

Buy storage capacity your way IBM introduces a procurement method that aligns your capacity costs to your business initiatives  Data creation continues at an explosive rate, and does not appear to be slowing down. Customers look to their IT partners to help solve these problems and often ask… how can I better align my capacity […]

Continue reading

To protect and serve: New features in IBM Cloud Object Storage System

Organizations are challenged with meeting the demand for storing, protecting and distributing data that is being created in larger volumes, at a faster rate and in greater variety than ever before. Today, so much of this data being created is unstructured. It is being stored for longer periods of time to meet regulations and legal […]

Continue reading