Improving AI accuracy with AI-ready unstructured and structured data with IBM watsonx.data

Author

Vice President, Product Management - watsonx.data

IBM

Today, IBM launched the evolution of IBM watsonx.data, the only hybrid, open data lakehouse for enterprise AI and analytics, for general availability.

Organizations can now simplify and scale the access, preparation and delivery of unstructured and structured data to power more accurate, relevant gen AI applications, scale self-service analytics and simplify and scale previously complex data access, enrichment and governance.

More accurate AI than conventional RAG

Enterprise data is the best tool to power accurate, differentiated AI that is relevant to your industry and your clients and drives competitive advantage. However, 90% of enterprise data is unstructured data, which has largely remained inaccessible and underutilized for gen AI.

Now you can access, prepare and deliver your enterprise unstructured data to power 40% more accurate AI than conventional RAG with IBM watsonx.data.¹ Watsonx.data is uniquely:

Hybrid and open to access data wherever it resides and deploy across on-premises, cloud, and multi-cloud environments with interoperability with your existing ecosystem and data investments.
Workload optimized with multiple fit-for-purpose query engines including the new open-source Apache Gluten enhanced Spark to optimize workloads for cost and performance.
Gen AI ready with embedded data fabric capabilities—watsonx.data integration and watsonx.data intelligence—all within the data lakehouse, to avoid creating yet another data silo.

Now you can scale and automate:

Ingestion of your unstructured and structured data from a variety of new source systems including Filenet, Box, Google Docs, and more.
Semantic enrichment of your data, creating both vectorized embeddings and structured derivatives from extracted and normalized entities in your documents to power AI applications that understand positional context, relationships and calculations for more accurate, complete outputs.
Governance of your data with access controls inherited from document source systems through to the retrieval of your data for AI, with PII annotation to avoid surfacing sensitive information.
Retrieval of that data across a broad spectrum of workloads from BI to gen AI applications and agents

All of this can be done within IBM watsonx.data to unlock enterprise unstructured data for AI and traditional analytics, such as data engineering, BI and ML.

Speed and scalability for complex analytical workloads

IBM watsonx.data now offers Apache Gluten accelerated Spark as one of its multiple fit-for-purpose query engines, significantly boosting performance for compute-intensive Spark SQL workloads. Apache Gluten, a high-performance library, optimizes Apache Spark SQL workloads by offloading execution to Velox, a native C++ execution engine. This integration delivers faster query processing and enhanced resource efficiency for large-scale data analytics. Now organizations can execute complex analytical tasks with even greater speed and scalability and at lower costs.

New DataStax noSQL database adds operational and vector capabilities

IBM recently acquired DataStax, bringing a NoSQL operational vector datastore, built on Apache Cassandra, to watsonx.data. This addition to watsonx.data enhances our vector capabilities and strengthens our retrieval-augmented generation and knowledge embedding capabilities.

DataStax is optimized for read and write gen AI applications and operational workloads that demand real-time performance, high availability and scale- bringing organizations the speed, reliability and multi-modal support needed for modern AI applications.

DataStax also seamlessly connects with Langflow, soon to be available as part of IBM watsonx.ai. Langflow is an open-source tool with over 60,000 GitHub stars, that enables developers to prototype, build and deploy retrieval-augmented generation and multi-agent AI applications through an intuitive low-code interface to reduce development friction and accelerate time to value.

Bringing our Think 2025 preview to general availability

We announced the closed preview of these capabilities at Think 2025, while sharing the stage with distinguished guest speakers across the Data keynote session, spotlight sessions, and techbyte demos, who are paving the way for data and AI innovation in their industries.

Lockheed Martin joined the keynote stage with Meta. Lockheed recently leveraged the transformed watsonx.data, enabling 70,000 engineers, scientists and technicians to retrieve answers and information from millions of documents using natural language. "We are rapidly accelerating our innovation and efficiency, to get solutions out of the lab and into the field, helping create a safer, more secure world," says John Clark, senior vice president of Technology and Strategic Innovation at Lockheed.

EY recently debuted groundbreaking AI-powered Global Tax Compliance Solutions that address the largest challenges facing tax departments, built with watsonx. “EY delivers tax services in over 150 countries, and almost universally in those countries, our clients struggle with data,” says Christopher Aiken, Americas Indirect Tax AI Leader at EY. “watsonx has cut down our human effort for data cleansing, enrichment and quality review by 30 - 50%.”

USAA is leveraging GenAI to drive the future of insurance and improve customer experience. “In the insurance industry, we deal with a significant amount of unstructured data,” says Ramnik Bajaj, Chief Data Analytics & AI Officer at USAA. “For instance, home inspection reports, police reports and accident images contain very little structured data. With gen AI, we have the opportunity to extract key attributes and insights from this unstructured data, making it much more accessible and useful for underwriters, adjusters and service representatives.”

Get started with watsonx.data today

You can now get started with the evolution watsonx.data as part of the premium edition.

Learn more

Try a free trial with USD 2000 in free credits

¹ Based on internal testing comparing the answer correctness of AI model outputs using watsonx.data Premium Edition retrieval layer to vector-only RAG on three common use cases with IBM proprietary datasets using the same set of selected opensource commodity inferencing, judging and embedding models and additional variables. Results can vary.