August 26, 2021 By Maryam Ashoori
Paul Rivot
3 min read

The amount of data collected by large enterprises is estimated to grow 10 times each year, and 90% of this data remains unused or underutilized. Managing these data sources across various siloes is time-consuming and costly. A lack of a cohesive governance strategy can lead to challenges in visibility, governance, portability and management that prevent enterprises from unlocking the business value of their data.

To help enterprises effectively manage their data needs, IBM entered a partnership with Cloudera almost a decade ago to expand our big data capabilities. In 2019, Cloudera merged with Hortonworks to pursue a hybrid cloud vision that further brought our companies together.

Today IBM is excited to announce a new chapter of our partnership with Cloudera that puts us in an even stronger position to help enterprises with their data and AI needs. We are strengthening our joint development and go-to-market programs to bring the advanced analytical capabilities of IBM Cloud Pak for Data, a unified platform for data and AI, to Cloudera Data Platform. This new offering will enable use cases in data science, machine learning, business intelligence, and real-time analytics directly on data within Cloudera Data Platform. The integration brings Cloudera under the IBM data fabric, a hybrid, multicloud data architecture that helps businesses access the right data just in time at the optimum cost, with end-to-end governance, regardless of where the data is stored.

Introducing Cloudera Data Platform for IBM Cloud Pak for Data

As the name suggests, this offering combines Cloudera’s best-in-class data lake with the advanced analytical capabilities of IBM Cloud Pak for Data. Cloudera Data Platform (CDP) for IBM Cloud Pak for Data provides one of the most complete multi-function platforms in the market. Now, businesses can run edge, streaming, data engineering, ETL, data warehousing, data visualization, and machine learning use cases with a single offering.

CDP for IBM Cloud Pak for Data provides a fast path to modernize data platforms in place  without performing a costly architectural reimplementation and migration.

CDP for IBM Cloud Pak for Data is hybrid and secure. It can run end-to-end anywhere with a full span of security and fine-grained enterprise-level governance that many other platforms can’t match. IBM’s state-of-the-art data fabric uses AI to automate complex data management tasks and universally discover, integrate, catalog, secure, and govern data across multiple environments.

Key features

  • Separation of storage and compute — CDP for IBM Cloud Pak for Data provides a data fabric with secure access to data anywhere it resides, from ingest to governance and data engineering, serving advanced analytics and high-performance BI all on one platform.
  • SQL analytics for all your data — By leveraging Big SQL as well as Hive and Impala, CDP for IBM Cloud Pak for Data provides warehouse-grade performance that exceeds the performance of alternatives in the market.
  • Run data science at scale — Use Watson Studio and CDP to build, run, and manage AI models to a petabyte scale.
  • Automated AI lifecycle management — CDP for IBM Cloud Pak for Data leverages the automation capabilities of IBM Watson Studio to speed up lifecycle of your critical data science projects.
  • Streamline data engineering — Take advantage of Cloudera Streaming Analytics, such as Flink, Apache Kafka, and SQL Stream Builder, and integrate it with IBM technologies like DataStage to achieve full breadth data engineering
  • Real-time reporting and BI — Data can be ingested in real-time with Flink and then displayed in IBM Cloud Pak for Data analytics dashboards.
  • Automated governance and cataloging — Data and associated metadata discovered are automatically catalogued, and assets are generated, removing the need for manual metadata/DDL generation
  • Open platform — Built on open systems and using non-proprietary data formats, the solution allows businesses to leverage data on any cloud.

In short, CDP for IBM Cloud Pak for Data:

  1. Enables data science at scale
  2. Provides a seamless single view of data with complete security and governance, without the need for data movement or replication
  3. Merges stream and batch data sets for analytics and real-time dashboards.

Together these benefits protect your existing technology investments in Hadoop while unlocking the business value of your data.

Next steps

To learn more about CDP for IBM Cloud Pak for Data, please visit our product page. You can also book a personal consultation there.

For more details, please visit IBM Cloud Pak for Data, IBM Data Fabric, and Cloudera Data Platform or join the Cloud Pak for Data Community.

Was this article helpful?

More from Cloud

The history of the central processing unit (CPU)

10 min read - The central processing unit (CPU) is the computer’s brain. It handles the assignment and processing of tasks, in addition to functions that make a computer run. There’s no way to overstate the importance of the CPU to computing. Virtually all computer systems contain, at the least, some type of basic CPU. Regardless of whether they’re used in personal computers (PCs), laptops, tablets, smartphones or even in supercomputers whose output is so strong it must be measured in floating-point operations per…

A clear path to value: Overcome challenges on your FinOps journey 

3 min read - In recent years, cloud adoption services have accelerated, with companies increasingly moving from traditional on-premises hosting to public cloud solutions. However, the rise of hybrid and multi-cloud patterns has led to challenges in optimizing value and controlling cloud expenditure, resulting in a shift from capital to operational expenses.   According to a Gartner report, cloud operational expenses are expected to surpass traditional IT spending, reflecting the ongoing transformation in expenditure patterns by 2025. FinOps is an evolving cloud financial management discipline…

IBM Power8 end of service: What are my options?

3 min read - IBM Power8® generation of IBM Power Systems was introduced ten years ago and it is now time to retire that generation. The end-of-service (EoS) support for the entire IBM Power8 server line is scheduled for this year, commencing in March 2024 and concluding in October 2024. EoS dates vary by model: 31 March 2024: maintenance expires for Power Systems S812LC, S822, S822L, 822LC, 824 and 824L. 31 May 2024: maintenance expires for Power Systems S812L, S814 and 822LC. 31 October…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters