The Power of One: IBM + Hortonworks drives advanced analytics

What is Apache Hadoop®?

Apache Hadoop offers highly reliable, scalable, distributed processing of large data sets using simple programming models. With the ability to be built on clusters of commodity computers, Hadoop provides a cost-effective solution for storing and processing structured, semi- and unstructured data with no format requirements.

Key Big Data Use Cases for Hadoop

  1. New data formats – Utilize new forms of semi- and unstructured data such as streaming audio and video, social media, sentiment and clickstream data that can’t be ingested into the Enterprise Data Warehouse (EDW). This data can provide more accurate analytic decisions in response to today’s new technologies such as Internet-of-Things (IOT), artificial intelligence(AI), cloud and mobile.  
  2. Data lake analytics:  Provide a platform for real-time, self-service access and advanced analytics for data users like data scientists, line of business owners (LOBs) and developers. The Hadoop-based data lake is the future of data science, an interdisciplinary field that combines machine learning, statistics, advanced analysis and programming.
  3. Data offload and consolidation: Optimize your Enterprise Data Warehouse (EDW) and streamline costs by moving “cold” or data not currently in use to a Hadoop-based data lake. Consolidating by moving siloed data to the data lake decreases costs, increases accessibility and drives better, more accurate decisions.

Learn more about Big Data


open source iconography

100% Open Source

The IBM and Hortonworks partnership provides an integrated, open source Hadoop-based platform with the tools needed for advanced analytic workloads. Both companies are members of the Open Data Platform Initiative (ODPi), a multi-vendor standards association focused on advancing the adoption of Hadoop

Database iconography

Enterprise grade distribution

The combination of the Hortonworks platform with IBM Db2® Big SQL offers the benefits of Hadoop with added security, governance and machine learning capabilities. Db2 Big SQL is the first SQL-on-Hadoop solution that understands commonly used SQL syntax from other vendors and products such as Oracle, IBM Db2 and IBM Netezza®.

Connection iconography

IBM and Hortonworks, better together

Build, govern, secure and quickly gain valuable analytic insights from your data using a single ecosystem of products and services. Benefit from combined collaboration and investment in the open source community, while removing concerns about connectivity and stability.

In the spotlight

Get started with Apache Hadoop®

IBM, in partnership with Hortonworks, offers Hortonworks Data Platform (HDP), a secure, enterprise-ready open source Hadoop distribution based on a centralized architecture. HDP, when used with IBM Db2 Big SQL, addresses a range of data-at-rest and data-in-motion use cases, provides data federation across the organization, powers real-time customer applications, and delivers robust analytics accelerating analytic decisioning.

Accelerate big data collection and dataflow management

Hortonworks DataFlow (HDF) for IBM, powered by Apache NiFi, is the first integrated platform that solves the challenges of collecting and transporting data from a multitude of sources. HDF for IBM enables simple, fast data acquisition, secure data transport, prioritized data flow and clear traceability of data from the edge of your network to the core data center. It uses a combination of an intuitive visual interface, a high-fidelity access and authorization mechanism and an always-on chain of custody (data provenance) framework.

Accelerated and Stable Apache Hadoop®

The best way to move forward with Hadoop is to choose an installation package that simplifies interoperability. The Open Data Platform Initiative (ODPi) is a multi-vendor standards association focused on advancing the adoption of Hadoop in the enterprise by promoting the interoperability of big data tools. ODPi simplifies and standardizes the Apache Hadoop big data ecosystem with a common reference specification called the ODPi Core.


IBM Hosted Analytics with Hortonworks

Created to address enterprise analytics and data science requirements, this offering drives one solution to store, explore and score big data.

Db2 Big SQL

A hybrid SQL on Hadoop engine providing low latency support for ad-hoc and complex queries and connecting disparate sources using a single database connection.

Hortonworks on Power

Use Hortonworks Data Platform on IBM Power Systems™ to increase efficiency, maximize performance and accelerate insights.

Big Replicate

Enterprise-class replication for Apache Hadoop and object store. Data is replicated as it streams in, and the need for files to be fully written and closed before transfer is eliminated.


Db2 Big SQL demos

Explore several Db2 Big SQL demos to walk through business benefits and core features, integration with Data Server Manager for creating federated connections to Db2 Warehouse on Cloud, as well as how to integrate with IBM Cognos® Analytics to create dashboards and reports.

Connect more data from more sources with a data lake

Data lakes are gaining prominence as businesses incorporate more unstructured data and look to generate insights from real-time ad hoc queries and analysis. Learn more about the new types of data and sources that can be leveraged by integrating data lakes into your existing data management.

eBook: Build a better data lake

Discover best practices to follow and the potential pitfalls to avoid when integrating a data lake in your existing data infrastructure. Learn how enterprise-grade security and governance can allow any business to leverage a growing diversity of data to drive innovation across the organization.

Engage with an expert

Schedule a one-on-one call with an expert to learn about the IBM Hortonworks relationship and how we can help you extend data science and machine learning across the Apache Hadoop ecosystem.