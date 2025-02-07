IBM watsonx™ AI and data platform includes three core components and a set of AI assistants designed to help you scale and accelerate the impact of AI with trusted data across your business. The core components include: a studio for new foundation models, generative AI and machine learning, a fit-for-purpose data store built on an open data lakehouse architecture, and a toolkit, to accelerate AI workflows that are built with responsibility, transparency and explainability.

Let’s take a closer look at how these components work with CDP data to address the complexities of unstructured data and enable customers to scale AI with trust.

1. Streamlining data for AI without compromise

One of the critical challenges in AI implementation lies in efficiently sharing data without duplication or migration. Watsonx.data open data lakehouse, in tandem with Cloudera, addresses this challenge by allowing users to access diverse data sources and data types across the hybrid cloud within minutes. The integration supports sharing data in Apache Iceberg table format, offering a unique solution to augment existing Hadoop data lakes with warehouse-like performance and robust governance, security and lineage. CDP’s support for Iceberg open table format ensures that a single copy of data can be shared with customers’ tools of choice including Hive, Impala and watsonx.data’s various engines, including Presto, Spark, Db2, Netezza and any query engine that supports Iceberg for the best price-performance, without resorting to migration, duplication or the need for extensive ETL processes.

2. Accelerating data insights with Natural Language capabilities

Watsonx and Cloudera empower users to accelerate data insights without the need for complex SQL queries. Watsonx.data introduces a conversational interface that allows CDP users to use natural language to discover, enrich and transform data. The AI functionality within the system recommends relevant data sets based on user queries and automatically generates semantics for easy identification of data. This approach not only streamlines the data exploration process but also enhances user experience by making data insights more accessible. With AI-driven recommendations, organizations can uncover hidden patterns and correlations within their data, driving informed decision-making.

3. Bringing governed data to AI applications of choice

Data governance, lineage and reproducibility are essential elements for any organization aiming to harness the power of AI responsibly. Watsonx and Cloudera facilitate the unification, discovery and preparation of CDP data for AI applications. CDP users can store, query and search vector embeddings in watsonx.data with integrated vector database capabilities. Watsonx.data introduces the Milvus vector store (in tech preview), enabling users to collect, curate and prepare unstructured data from CDP as vectors for retrieval augmented generation (RAG) use cases in watsonx.ai™ and reduce generative AI model hallucinations. Furthermore, the collaboration introduces the watsonx.ai studio, enabling AI builders to scale both traditional ML and new generative AI use cases powered by CDP data. Deploying responsible, transparent and explainable AI workflows for CDP customers is made possible with watsonx.governance™, ensuring that organizations can navigate the complex landscape of AI ethics and compliance seamlessly with an end-to-end toolkit for AI governance across the entire model lifecycle.



Wacth this demo on how to use CDP data for RAG use cases with watsonx.ai.