IBM and Hortonworks partnership

IBM and Hortonworks Inc. have partnered to offer an enterprise-grade Hadoop distribution with data integration and advanced querying tools. This partnership combines the best of Hortonworks Data Platform (HDP) and Hortonworks Data Flow (HDF) with IBM Db2® Big SQL to offer enterprise-grade scalability, security and governance with the ability to federate both data at rest and data in motion.

Together, IBM and Hortonworks enable data scientists and line of business owners, and can improve the discovery of insights, testing and the ability to do ad-hoc and real-time queries for predictive analytics.


Data accessibility iconography

Easier data access to data across the organization

Access structured and unstructured data residing both on premises and in the cloud.

Data preparation iconography

Faster data preparation

Take less time to access and locate data, thereby speeding up data preparation and reuse efforts

Agility cycle iconography

Enhance agility

Components of the data lake can be employed as a sandbox that enables users to build and test analytics models with greater agility.

Analytic data points iconography

More accurate insights, stronger decisions

Track data lineage to help ensure data is trustworthy. 


Apache Hadoop

Manage large volumes and different types of data with open source Hadoop. Tap into unmatched performance, simplicity and standards compliance to use all data, regardless of where it resides. Visualize, filter and analyze large data sets into consumable, business-specific contexts.

Apache Spark

Build algorithms quickly, iterate faster and put analytics into action with Spark. Easily create models that capture insight from complex data, and apply that insight in time to drive outcomes. Access all data, build analytic models quickly, iterate fast in a unified programming model and deploy those analytics anywhere. 

Stream computing

Stream computing enables organizations to process data streams which are always on and never ceasing. This helps them spot opportunities and risks across all data in time to effect change.

Governance and metadata tools

Governance and metadata tools enable you to locate and retrieve information about data objects as well as their meaning, physical location, characteristics, and usage.



IBM Db2 Big SQL is a hybrid SQL engine running on Hadoop that understands multiple SQL dialects from Db2, Netezza and Oracle. Enjoy low latency support for ad-hoc and complex queries while using a single database connection or query to connect disparate sources.

Try the free edition

IBM Big Replicate

IBM Big Replicate provides enterprise-class replication for Apache Hadoop and object store. With non-invasive technology that replicates data as it streams in, IBM Big Replicate eliminates the need for files to be fully written and closed before transfer.

IBM Data Science Experience

IBM Data Science Experience is a collaborative cloud-based environment where data scientists can use multiple tools such as RStudio, Jupyter and Python to activate insights.


Data lake: Taming the data dragon

Gartner predicts that, by 2019, 75 percent of analytics solutions will incorporate 10 or more data sources from second-party partners or third-party providers. Data lakes allow businesses and data scientists to use this data for mobile apps, predictive analytics, monetizing their data and more. Learn how to leverage this data infrastructure while maintaining business-critical data provisioning and preventing ungoverned data environments and usage. 

Connect more data from more sources with a data lake

Data lakes are gaining prominence as businesses begin to incorporate more unstructured data into their data strategy and management. Data lakes are also enabling organizations to generate insights from real-time ad hoc queries and analysis, while helping to reduce IT cost. Learn more about the new types of data and new sources that can be leveraged by integrating data lakes into your existing data management.

Making Sense of Big Data

Third-party data and content is revolutionizing data management platforms. Sixty-six percent of respondents in EMA/9sight big data end-user research indicated they were using 2 - 6 platforms in concert to support their big data initiatives. Learn how Hadoop, NoSQL environments and traditional databases are evolving to accommodate streaming analytics, IoT devices and more unstructured data than ever before.

Engage with an expert

A no-cost, one-on-one call with an experienced IBM Expert.

Learn about our products, solutions and services to build

and grow a successful data lake.