February 7, 2024 By Ashley Bassman
Anson Kokkat
4 min read

The sense of urgency has never been higher for businesses to leverage data and AI for competitive advantage. Today’s leaders are still grappling with unprecedented data challenges in scaling AI. Not only will data volumes continue to grow, but new formats of unstructured data are growing 30–60% annually1. Data silos and data complexity are multiplying further in more locations and applications, preventing data from being accessed, enriched and used effectively. To make use of large volumes of unstructured data for analytics and AI, organizations turned to Hadoop data lakes for cost effective storage, open formats and flexibility. But as data volumes grow, these traditional data lakes are constrained by performance, governance and complexity to maintain. 

To scale trusted analytics and AI workloads, organizations are adopting an open data lakehouse approach, which combines the performance and governance of data warehouses and flexibility of data lakes on low-cost object storage. IBM® and Cloudera share this vision and with our strategic collaboration, a new era of AI possibilities unfolds. 

What is Cloudera Data Platform? 

Five years ago, IBM and Cloudera strategically partnered to bring Cloudera’s data-in-motion capabilities, spanning analytics for IoT and event streaming processing, monitoring and management, to IBM customers. This integration is designed to work seamlessly with our data fabric architecture, fostering a holistic approach to data and AI strategy. The Cloudera Data Platform (CDP) solution enables customers to run rapid analytics on unstructured data anywhere and is built on a combination of Hadoop-based solutions and incorporates over 30 open-source and proprietary components.  

However, given today’s challenges maintaining, governing and transforming data for AI that is stored in Hadoop data lakes, IBM and Cloudera have built a unique integration to share and prepare data for trusted AI workloads. The IBM watsonx™ and Cloudera Data Platform (CDP) integration enables customers to augment their Hadoop data lake with warehouse-like performance, optimize for cost with simple object storage and multiple query engines and scale AI across the enterprise with trusted data. This collaboration not only simplifies data management for Cloudera customers but also enables them to bring those data lake workloads into a modern environment that is ready for generative AI use cases.

Better together: IBM watsonx and Cloudera 

IBM watsonx™ AI and data platform includes three core components and a set of AI assistants designed to help you scale and accelerate the impact of AI with trusted data across your business. The core components include: a studio for new  foundation models, generative AI and machine learning, a fit-for-purpose data store built on an open data lakehouse architecture, and a toolkit, to accelerate AI workflows that are built with responsibility, transparency and explainability.  

Let’s take a closer look at how these components work with CDP data to address the complexities of unstructured data and enable customers to scale AI with trust.  

1. Streamlining data for AI without compromise 

One of the critical challenges in AI implementation lies in efficiently sharing data without duplication or migration. Watsonx.data open data lakehouse, in tandem with Cloudera, addresses this challenge by allowing users to access diverse data sources and data types across the hybrid cloud within minutes. The integration supports sharing data in Apache Iceberg table format, offering a unique solution to augment existing Hadoop data lakes with warehouse-like performance and robust governance, security and lineage. CDP’s support for Iceberg open table format ensures that a single copy of data can be shared with customers’ tools of choice including Hive, Impala and watsonx.data’s various engines, including Presto, Spark, Db2, Netezza and any query engine that supports Iceberg for the best price-performance, without resorting to migration, duplication or the need for extensive ETL processes. 

2. Accelerating data insights with Natural Language capabilities 

Watsonx and Cloudera empower users to accelerate data insights without the need for complex SQL queries. Watsonx.data introduces a conversational interface that allows CDP users to use natural language to discover, enrich and transform data. The AI functionality within the system recommends relevant data sets based on user queries and automatically generates semantics for easy identification of data. This approach not only streamlines the data exploration process but also enhances user experience by making data insights more accessible. With AI-driven recommendations, organizations can uncover hidden patterns and correlations within their data, driving informed decision-making. 

3. Bringing governed data to AI applications of choice 

Data governance, lineage and reproducibility are essential elements for any organization aiming to harness the power of AI responsibly. Watsonx and Cloudera facilitate the unification, discovery and preparation of CDP data for AI applications. CDP users can store, query and search vector embeddings in watsonx.data with integrated vector database capabilities. Watsonx.data introduces the Milvus vector store (in tech preview), enabling users to collect, curate and prepare unstructured data from CDP as vectors for retrieval augmented generation (RAG) use cases in watsonx.ai™ and reduce generative AI model hallucinations. Furthermore, the collaboration introduces the watsonx.ai studio, enabling AI builders to scale both traditional ML and new generative AI use cases powered by CDP data. Deploying responsible, transparent and explainable AI workflows for CDP customers is made possible with watsonx.governance™, ensuring that organizations can navigate the complex landscape of AI ethics and compliance seamlessly with an end-to-end toolkit for AI governance across the entire model lifecycle. 

Wacth this demo on how to use CDP data for RAG use cases with watsonx.ai.

 

Embracing the future of AI with watsonx and Cloudera 

As organizations grapple with the challenges of scaling AI, the integration of watsonx and Cloudera presents a compelling solution. By addressing the complexities of data sharing, accelerating data insights through natural language capabilities and ensuring the governance of data for AI applications, this collaboration sets a new standard for the industry. In the pursuit of unlocking the true potential of AI, leaders can now leverage the combined strengths of Cloudera Data Platform and IBM watsonx. The future of AI is not just about algorithms and models; it’s about empowering organizations to extract meaningful insights from their data, responsibly and efficiently. With watsonx and Cloudera, that future is now within reach. 

Ready to get started? Try watsonx for free today.


Learn more about IBM and Cloudera

Gartner 2022 Strategic Roadmap for Storage, Gartner 

More from AI for the Enterprise

Why CHROs are the key to unlocking the potential of AI for the workforce 

3 min read - It’s no longer a question of whether AI will transform business and the workforce, but how it will happen. A study by the IBM® Institute for Business Value revealed that up to three-quarters of CEOs believe that competitive advantage will depend on who has the most advanced generative AI.  With so many leaders now embracing the technology for business transformation, some wonder which C-Suite leader will be in the driver’s seat to orchestrate and accelerate that change.  CHROs today are…

How your business can prioritize responsible AI with IBM watsonx 

3 min read - Over the next decade, AI will impact all industries and help shape which companies, teams and executives come out ahead. This is why we’ve seen so many early AI adopters in sports, where even the slightest competitive advantage can be the difference between first and second place.    Take last year’s US Open, for example, where IBM watsonx™ projected the level of advantage or disadvantage of all players in the singles draw. Overseas, Sevilla FC launched a tool built on watsonx…

Enterprise generative AI made simple: IBM’s differentiated approach to delivering enterprise grade foundation models 

5 min read - In 2023, organizational departments such as human resources, IT and customer care focused on generative artificial intelligence (AI) use cases such as summarization, code generation and question-answering to reduce costs and boost productivity. A Gartner executive poll indicates that 55% of organizations are already piloting or implementing generative AI. The major challenge facing enterprise decision-makers is achieving the right balance between operationalizing generative AI faster and mitigating foundational model-related risks, while staying on top of a rapidly evolving technology landscape. …

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters