August 26, 2021 By Maryam Ashoori
Paul Rivot
3 min read

The amount of data collected by large enterprises is estimated to grow 10 times each year, and 90% of this data remains unused or underutilized. Managing these data sources across various siloes is time-consuming and costly. A lack of a cohesive governance strategy can lead to challenges in visibility, governance, portability and management that prevent enterprises from unlocking the business value of their data.

To help enterprises effectively manage their data needs, IBM entered a partnership with Cloudera almost a decade ago to expand our big data capabilities. In 2019, Cloudera merged with Hortonworks to pursue a hybrid cloud vision that further brought our companies together.

Today IBM is excited to announce a new chapter of our partnership with Cloudera that puts us in an even stronger position to help enterprises with their data and AI needs. We are strengthening our joint development and go-to-market programs to bring the advanced analytical capabilities of IBM Cloud Pak for Data, a unified platform for data and AI, to Cloudera Data Platform. This new offering will enable use cases in data science, machine learning, business intelligence, and real-time analytics directly on data within Cloudera Data Platform. The integration brings Cloudera under the IBM data fabric, a hybrid, multicloud data architecture that helps businesses access the right data just in time at the optimum cost, with end-to-end governance, regardless of where the data is stored.

Introducing Cloudera Data Platform for IBM Cloud Pak for Data

As the name suggests, this offering combines Cloudera’s best-in-class data lake with the advanced analytical capabilities of IBM Cloud Pak for Data. Cloudera Data Platform (CDP) for IBM Cloud Pak for Data provides one of the most complete multi-function platforms in the market. Now, businesses can run edge, streaming, data engineering, ETL, data warehousing, data visualization, and machine learning use cases with a single offering.

CDP for IBM Cloud Pak for Data provides a fast path to modernize data platforms in place  without performing a costly architectural reimplementation and migration.

CDP for IBM Cloud Pak for Data is hybrid and secure. It can run end-to-end anywhere with a full span of security and fine-grained enterprise-level governance that many other platforms can’t match. IBM’s state-of-the-art data fabric uses AI to automate complex data management tasks and universally discover, integrate, catalog, secure, and govern data across multiple environments.

Key features

  • Separation of storage and compute — CDP for IBM Cloud Pak for Data provides a data fabric with secure access to data anywhere it resides, from ingest to governance and data engineering, serving advanced analytics and high-performance BI all on one platform.
  • SQL analytics for all your data — By leveraging Big SQL as well as Hive and Impala, CDP for IBM Cloud Pak for Data provides warehouse-grade performance that exceeds the performance of alternatives in the market.
  • Run data science at scale — Use Watson Studio and CDP to build, run, and manage AI models to a petabyte scale.
  • Automated AI lifecycle management — CDP for IBM Cloud Pak for Data leverages the automation capabilities of IBM Watson Studio to speed up lifecycle of your critical data science projects.
  • Streamline data engineering — Take advantage of Cloudera Streaming Analytics, such as Flink, Apache Kafka, and SQL Stream Builder, and integrate it with IBM technologies like DataStage to achieve full breadth data engineering
  • Real-time reporting and BI — Data can be ingested in real-time with Flink and then displayed in IBM Cloud Pak for Data analytics dashboards.
  • Automated governance and cataloging — Data and associated metadata discovered are automatically catalogued, and assets are generated, removing the need for manual metadata/DDL generation
  • Open platform — Built on open systems and using non-proprietary data formats, the solution allows businesses to leverage data on any cloud.

In short, CDP for IBM Cloud Pak for Data:

  1. Enables data science at scale
  2. Provides a seamless single view of data with complete security and governance, without the need for data movement or replication
  3. Merges stream and batch data sets for analytics and real-time dashboards.

Together these benefits protect your existing technology investments in Hadoop while unlocking the business value of your data.

Next steps

To learn more about CDP for IBM Cloud Pak for Data, please visit our product page. You can also book a personal consultation there.

For more details, please visit IBM Cloud Pak for Data, IBM Data Fabric, and Cloudera Data Platform or join the Cloud Pak for Data Community.

Was this article helpful?
YesNo

More from Cloud

The future of 5G: What to expect from this transformational technology

7 min read - Since its rollout in 2019, 5G wireless networks have been growing in both availability and use cases. Apple was one of the first manufacturers to test the appetite for 5G in 2020 by offering its newest iPhone with 5G compatibility. From there, the floodgates opened, and today as much as 62% of smartphones are built with 5G connectivity (link resides outside ibm.com.) The number of networks also continues to grow, with many popular Internet Service Providers (ISPs) like Verizon, Google…

Getting started with Kafka client metrics

4 min read - Apache Kafka stands as a widely recognized open source event store and stream processing platform. It has evolved into the de facto standard for data streaming, as over 80% of Fortune 500 companies use it. All major cloud providers provide managed data streaming services to meet this growing demand. One key advantage of opting for managed Kafka services is the delegation of responsibility for broker and operational metrics, allowing users to focus solely on metrics specific to applications. In this…

IBM Tech Now: March 11, 2024

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 94 On this episode, we're covering the IBM X-Force Threat Intelligence Index 2024: IBM X-Force Threat Intelligence Index 2024 landing page Download the report Watch the webinar: "Cybersecurity in 2024: Exploiting the human attack…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters