Fast data: The future of big data

Share this post:

It’s not news that big data is getting bigger by the second.  However, in addition to sheer volume, there is also increasing demand to take action faster than ever based on the data.  An organization’s leaders want to gain a competitive advantage by turning raw data into actionable intelligence. How can they quickly and efficiently pull together huge volumes of data from dozens or even hundreds of isolated and disparate data sources? Much of this data is not of the traditional, structured variety but is, instead, being driven by the growth of the Internet of Things (IoT) and the collection of data from digital human interactions. According to IDC, by 2025 there will be 80 billion connected devices, from fewer than 20 billion today, with over 150,000 new connected devices being added every minute.

To handle the demand for speed and the volume of analytics, organization leaders are starting down the path towards human/digital interaction and cognitive applications that mine data in order to react to change. The first steps down this road are being taken with the adoption of technologies such as Apache Spark, along with machine learning and deep learning.  But computing power is only part of the answer: accessing and managing all of this data can create a significant bottleneck.

An organization’s big data resides in tens or hundreds of isolated systems associated with different applications or serving different lines of business. Moreover, Hadoop, the most commonly used framework for big data analytics, requires data from other systems to be copied over to the Hadoop Distributed File System (HDFS). This is a time-consuming process during which data can get stale.  It is also a waste of resources, since it results in multiple copies of the data – the original plus the HDFS copy.

A solution for managing big data

The solution to these data access and management challenges is a high-performance data and file management solution designed to support big data analytics. IBM has announced the IBM All Flash Elastic Storage Server (ESS) 5.2, whose solid state storage improves data bandwidth performance by 60 percent over previous solutions.

Incorporating IBM Spectrum Scale, ESS spans an organization’s data lakes, creating one unified data ocean with a single namespace against which to run analytics quickly and efficiently.  It supports a wide variety of network protocols and provides the ability to automatically and transparently tier data across flash, disk, tape and cloud. Another important advantage of IBM Spectrum Scale is that it provides direct access for Hadoop to underlying data storage without requiring data to be copied over into an HDFS environment.

Rapid changes in the big data analytics ecosystem are being driven by open source and industry-wide improvements. IBM is partnering with Hortonworks, and IBM Spectrum Scale 4.2.3 has been certified with the Hortonworks Data Platform (HDP) 2.6/Ambari 2.5.

Managing multiple frameworks and versions requires advanced workload management. IBM Spectrum Scale software can be deployed with IBM Spectrum Conductor with Spark to provide a unique solution that optimizes performance, eases management and comes complete with Apache Spark.

In summary, the new IBM All-Flash Elastic Storage Server 5.2 expands the existing ESS family to provide industry-leading performance and efficiency in support of faster big data analytics and allows users to:

  • Reduce performance bottlenecks on critical IT workloads such as backup.
  • Run Hadoop and other big-data applications directly on enterprise storage.
  • Share data across applications with unified storage for file and object data.
  • Benefit from high-availability design for five nines of availability with faster rebuild of failed disks with erasure coding for declustered RAID technology and fully redundant data pathways.

To learn more about IBM Elastic Storage Server and how it can help you manage your big data assets, please visit our website or check out the datasheet.

More Storage stories

Storage made simple for hybrid multicloud: the new IBM FlashSystem family

Flash storage, Hybrid cloud storage, Storage

In part one of this blog post series, we discussed IBM’s approach for delivering innovation while simplifying your storage infrastructure, reducing complexity, and cutting costs. Now let’s take a closer look at the details of the new IBM FlashSystem family, a single platform designed to simplify your storage infrastructure, reduce complexity and cut costs, while more

Storage made simple for hybrid multicloud

Flash storage, Hybrid cloud storage, Storage

Albert Einstein said, “Everything should be made as simple as possible, but not simpler.” In line with that inspiration, IBM is launching a new family of flash storage solutions designed to meet the full range of enterprise storage needs from entry to mid-range to high-end systems that extend to your hybrid multicloud storage deployments. All more

The hot storage trends for 2020

Hybrid cloud, Multicloud, Storage

Now that 2019 has ended, we anticipate incredible storage advancements to come in 2020. Storage is the essential foundation for all your application, workloads, and data sets. If your storage is not reliable, resilient, performant, and flexible, the value of your most critical business asset–your data–decreases dramatically.  Read on to see what is coming your more