Fast data: The future of big data

Share this post:

It’s not news that big data is getting bigger by the second.  However, in addition to sheer volume, there is also increasing demand to take action faster than ever based on the data.  An organization’s leaders want to gain a competitive advantage by turning raw data into actionable intelligence. How can they quickly and efficiently pull together huge volumes of data from dozens or even hundreds of isolated and disparate data sources? Much of this data is not of the traditional, structured variety but is, instead, being driven by the growth of the Internet of Things (IoT) and the collection of data from digital human interactions. According to IDC, by 2025 there will be 80 billion connected devices, from fewer than 20 billion today, with over 150,000 new connected devices being added every minute.

To handle the demand for speed and the volume of analytics, organization leaders are starting down the path towards human/digital interaction and cognitive applications that mine data in order to react to change. The first steps down this road are being taken with the adoption of technologies such as Apache Spark, along with machine learning and deep learning.  But computing power is only part of the answer: accessing and managing all of this data can create a significant bottleneck.

An organization’s big data resides in tens or hundreds of isolated systems associated with different applications or serving different lines of business. Moreover, Hadoop, the most commonly used framework for big data analytics, requires data from other systems to be copied over to the Hadoop Distributed File System (HDFS). This is a time-consuming process during which data can get stale.  It is also a waste of resources, since it results in multiple copies of the data – the original plus the HDFS copy.

A solution for managing big data

The solution to these data access and management challenges is a high-performance data and file management solution designed to support big data analytics. IBM has announced the IBM All Flash Elastic Storage Server (ESS) 5.2, whose solid state storage improves data bandwidth performance by 60 percent over previous solutions.

Incorporating IBM Spectrum Scale, ESS spans an organization’s data lakes, creating one unified data ocean with a single namespace against which to run analytics quickly and efficiently.  It supports a wide variety of network protocols and provides the ability to automatically and transparently tier data across flash, disk, tape and cloud. Another important advantage of IBM Spectrum Scale is that it provides direct access for Hadoop to underlying data storage without requiring data to be copied over into an HDFS environment.

Rapid changes in the big data analytics ecosystem are being driven by open source and industry-wide improvements. IBM is partnering with Hortonworks, and IBM Spectrum Scale 4.2.3 has been certified with the Hortonworks Data Platform (HDP) 2.6/Ambari 2.5.

Managing multiple frameworks and versions requires advanced workload management. IBM Spectrum Scale software can be deployed with IBM Spectrum Conductor with Spark to provide a unique solution that optimizes performance, eases management and comes complete with Apache Spark.

In summary, the new IBM All-Flash Elastic Storage Server 5.2 expands the existing ESS family to provide industry-leading performance and efficiency in support of faster big data analytics and allows users to:

  • Reduce performance bottlenecks on critical IT workloads such as backup.
  • Run Hadoop and other big-data applications directly on enterprise storage.
  • Share data across applications with unified storage for file and object data.
  • Benefit from high-availability design for five nines of availability with faster rebuild of failed disks with erasure coding for declustered RAID technology and fully redundant data pathways.

To learn more about IBM Elastic Storage Server and how it can help you manage your big data assets, please visit our website or check out the datasheet.

More Storage stories

IBM Storage brings enterprise data services to containers

Cloud object storage, Flash storage, Hybrid cloud storage...

Today, 85 percent of enterprises are operating in a hybrid multicloud environment;[1] at the same time, IDC expects the worldwide installed base of container instances to reach three billion in 2021.[2] While hybrid multicloud architectures and container adoption have become common, challenges exist for these interwoven technologies. Containers are easily deployed for experimentation and new more

Storage for the exabyte future

AI, Cloud object storage, Storage

“There is no AI without IA (information architecture)” is a common phrase here at IBM. It describes the business and operation platform every business needs to connect and manage the lifecycle of their AI applications. Data scientists, analytic teams, and line of business need access to the data that helps drive innovation, insight, and ultimately more

The next big leaps for IBM modern data protection

Data security, Multicloud, Storage

Recent analyst research indicates why hybrid multicloud support is becoming increasingly important. According to a 2019 ESG report [1], 67 percent of organizations surveyed currently use public cloud services in their data protection environment. Among those companies, on average 26 percent of their protection environments (measured by amount of data) are housed in the cloud, more