A new chapter in the IBM and Cloudera partnership

By and Paul Rivot | 3 minute read | August 26, 2021

The amount of data collected by large enterprises is estimated to grow 10 times each year, and 90% of this data remains unused or underutilized. Managing these data sources across various siloes is time-consuming and costly. A lack of a cohesive governance strategy can lead to challenges in visibility, governance, portability and management that prevent enterprises from unlocking the business value of their data.

To help enterprises effectively manage their data needs, IBM entered a partnership with Cloudera almost a decade ago to expand our big data capabilities. In 2019, Cloudera merged with Hortonworks to pursue a hybrid cloud vision that further brought our companies together.

Today IBM is excited to announce a new chapter of our partnership with Cloudera that puts us in an even stronger position to help enterprises with their data and AI needs. We are strengthening our joint development and go-to-market programs to bring the advanced analytical capabilities of IBM Cloud Pak for Data, a unified platform for data and AI, to Cloudera Data Platform. This new offering will enable use cases in data science, machine learning, business intelligence, and real-time analytics directly on data within Cloudera Data Platform. The integration brings Cloudera under the IBM data fabric, a hybrid, multicloud data architecture that helps businesses access the right data just in time at the optimum cost, with end-to-end governance, regardless of where the data is stored.

Introducing Cloudera Data Platform for IBM Cloud Pak for Data

As the name suggests, this offering combines Cloudera’s best-in-class data lake with the advanced analytical capabilities of IBM Cloud Pak for Data. Cloudera Data Platform (CDP) for IBM Cloud Pak for Data provides one of the most complete multi-function platforms in the market. Now, businesses can run edge, streaming, data engineering, ETL, data warehousing, data visualization, and machine learning use cases with a single offering.

CDP for IBM Cloud Pak for Data provides a fast path to modernize data platforms in place  without performing a costly architectural reimplementation and migration.

CDP for IBM Cloud Pak for Data is hybrid and secure. It can run end-to-end anywhere with a full span of security and fine-grained enterprise-level governance that many other platforms can’t match. IBM’s state-of-the-art data fabric uses AI to automate complex data management tasks and universally discover, integrate, catalog, secure, and govern data across multiple environments.

Key features

  • Separation of storage and compute — CDP for IBM Cloud Pak for Data provides a data fabric with secure access to data anywhere it resides, from ingest to governance and data engineering, serving advanced analytics and high-performance BI all on one platform.
  • SQL analytics for all your data — By leveraging Big SQL as well as Hive and Impala, CDP for IBM Cloud Pak for Data provides warehouse-grade performance that exceeds the performance of alternatives in the market.
  • Run data science at scale — Use Watson Studio and CDP to build, run, and manage AI models to a petabyte scale.
  • Automated AI lifecycle management — CDP for IBM Cloud Pak for Data leverages the automation capabilities of IBM Watson Studio to speed up lifecycle of your critical data science projects.
  • Streamline data engineering — Take advantage of Cloudera Streaming Analytics, such as Flink, Apache Kafka, and SQL Stream Builder, and integrate it with IBM technologies like DataStage to achieve full breadth data engineering
  • Real-time reporting and BI — Data can be ingested in real-time with Flink and then displayed in IBM Cloud Pak for Data analytics dashboards.
  • Automated governance and cataloging — Data and associated metadata discovered are automatically catalogued, and assets are generated, removing the need for manual metadata/DDL generation
  • Open platform — Built on open systems and using non-proprietary data formats, the solution allows businesses to leverage data on any cloud.

In short, CDP for IBM Cloud Pak for Data:

  1. Enables data science at scale
  2. Provides a seamless single view of data with complete security and governance, without the need for data movement or replication
  3. Merges stream and batch data sets for analytics and real-time dashboards.

Together these benefits protect your existing technology investments in Hadoop while unlocking the business value of your data.

Next steps

To learn more about CDP for IBM Cloud Pak for Data, please visit our product page. You can also book a personal consultation there.

For more details, please visit IBM Cloud Pak for Data, IBM Data Fabric, and Cloudera Data Platform or join the Cloud Pak for Data Community.