IBM Support

IBM Spectrum Scale and Hortonworks HDP for Winning Big Data Plays

Technical Blog Post


Abstract

IBM Spectrum Scale and Hortonworks HDP for Winning Big Data Plays

Body

IBM and Hortonworks have announced the integrated analytics solution using IBM Spectrum Scale / Elastic Storage (ESS), and Hortonworks Data Platform (HDP). This announcement comes a week after a series of other updates to the IBM Spectrum and Cloud Object Storage product lines. It sees HDP being certified on IBM Spectrum Scale on both Power8 and x86.

Ed Walsh, general manager for IBM Storage and Software Defined Infrastructure said: “Every organization is becoming a digital organization. With this announcement, IBM is delivering a powerful platform to extend the use of data and for cognitive applications. This announcement shows our partner community IBM’s commitment to helps clients grow, develop, and transform the use of their own data with less complexity.”

Big Data & Analytics with Hadoop

For rapidly growing, unstructured data, Hadoop is the platform of choice for many organizations, enabling them to store, process, and analyze petabytes of information. Traditional data repositories cannot scale with unstructured big data workloads. Enterprises are adopting Hadoop for storing large chunks of data and running analytics to derive valuable insights. Current Market size for Hadoop $6B forecasted to grow to $50B by 2020.

Why Hortonworks

HDP is the secure, enterprise-ready open source Apache™ Hadoop® distribution based on a centralized architecture (YARN). HDP addresses the complete needs of data-at-rest, powers real-time customer applications and delivers robust analytics that accelerate decision making and innovation.

  • Pure play 100% open source distribution
  • Hortonworks is #1 Apache Hadoop committer
  • Greater than 1000 customers
  • ODPi compliance
  • Apache Spark is part of HDP distribution

Note: IBM Spectrum Scale is already certified and supported with IBM’s Hadoop distribution (IBM IOP/BigInsights)

Better storage for Hadoop

The default storage for Hadoop is HDFS. HDFS is Hadoop Distributed File System which runs on storage-rich servers (storage internal to servers).

Hadoop finds a value-added data platform in IBM Spectrum Scale and IBM Elastic Storage (ESS) which provide enhanced features and eliminate the need to have multiple data copies.  Following table illustrates some ways IBM Spectrum Scale and IBM Elastic Storage (ESS) enhance Hadoop.

 

HDFS

IBM Elastic Storage (ESS) and IBM Spectrum Scale

Clients have to copy data from enterprise storage to HDFS in order to run Hadoop jobs because Hadoop does not directly run on standard protocols like SMB/Object.

 

Reduce data center footprint

Spectrum Scale / ESS supports access to the same data through HDFS/NFS/SMB/Object. No data copying required for running Hadoop analytics.

 

HDFS is a shared nothing architecture, disks and cores grow in same ratio. Less efficient for high throughput jobs

 

Reduce cluster sprawl

ESS is a shared storage best known for its scalability and performance.

 

Costly data protection - Default uses 3-way replication. (Erasure coding in HDFS has some limitations and is perhaps more appropriate for cold data)

 

ESS Software RAID eliminates, need for 3-way replication to achieve data protection.

 

 

image

 

 

 

 

Conclusion

Bringing Hortonworks Data Platform to IBM Spectrum Scale or IBM Elastic Storage Server provides three key benefits: better storage efficiency, hybrid storage, and high performance. In terms of the first benefit, Elastic Storage Server uses erasure coding that eliminates the need to have multiple data copies. Second, the combined service extends on-premises storage to the cloud, delivering economic, security, and accessibility benefits. Third, Elastic Storage Server delivers high-speed data throughput.

 

 

 

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"HW206","label":"Storage Systems"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

UID

ibm16164949