Most enterprises have spent the past year chasing generative AI pilots. The problem is that most are stuck there. Experimental agents and RAG systems stall before reaching production—not because the models fail, but because the data behind them isn’t ready.
Edward Calvesbert, Vice President, Product Management of watsonx.data at IBM, has seen this pattern play out across industries. The core issue, in his view, is data quality and fragmentation. Companies are dealing with data trapped across silos and often lacking the structure, metadata and governance that agents need to use it effectively.
But 2026 might be different. Calvesbert sees a few big shifts reshaping the data stack all at once. Hybrid cloud isn’t a stopgap anymore—it’s become the design pattern for enterprise scale, as companies look for more flexibility and cost control across providers. Zero copy integration (which lets you query data without duplicating it) saves both time and money while providing access to more data. And frontier models are making it easier to combine structured and unstructured data to generate new insights and power agentic workflows.
At the same time, the data platform market is consolidating around fewer vendors—but this time, they’re building on open standards instead of closed ecosystems.
IBM Think sat down with Calvesbert to dig into what “AI-ready data” actually means and what data leaders should prioritize in 2026.
Enterprises are typically missing unified access to both structured and unstructured data, with [up to] 90% of their data locked away in unstructured silos. They lack a unified knowledge and semantic layer to deliver consistent governance across data sources. And without that foundation, it’s difficult to combine new data with existing reference data to unlock insights and automation. Most critically, they lack a clear path from pilots to production with proper security, compliance, governance and cost-effectiveness at enterprise scale.
Data fragmentation prevents teams from accessing and combining information across sources and formats—and from delivering that data as reliable context and tools to models and agents. Missing enterprise readiness features like security, compliance and governance create barriers to deployment. Consistent accuracy and reliability as enterprises progress from informational use cases to analytics and agentic automation is still a significant challenge.
Industry newsletter
Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think Newsletter. See the IBM Privacy Statement.
Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.
Zero copy means querying data where it resides, without moving or duplicating it. This solves several critical problems: it eliminates duplication costs, reduces time-to-value by removing complex ETL (extract, transform, load) processes, and avoids vendor lock-in by keeping data in open formats accessible to multiple tools. For enterprises, watsonx.data’s zero copy capabilities provide a hybrid cloud bridge between on-prem and cloud environments, enabling seamless access to operational, analytical and AI workloads without data movement.
Querying data remotely may introduce performance trade-offs and latency compared to having local copies optimized for specific workloads. Integration is achieved via open standards but ensuring security and compliance across federated data sources adds governance complexity that organizations must carefully manage.
Real-time latency requirements, compliance mandates that keep data on-premises, and cost pressures—along with concerns about hyperscaler lock-in—are driving hybrid-by-design strategies. The infrastructure intensity of new gen AI workloads also points to the long-term need for on-prem and private cloud deployments to enable cost-efficient scalability.
Many enterprises are trapped in the walled gardens created by leading ISVs (independent software vendors) and hyperscalers. As more and more data platforms adopt and support zero copy integration and gen AI makes code and query refactoring easy, we expect a dramatic increase in the portability of workloads. This workload portability will create a dynamic marketplace where enterprises will be able to place different workloads across different engines from different vendors to achieve optimal price-performance for a broad range of SLAs (service level agreements) and selection criteria, including deploying across hybrid cloud environments.
Watsonx.data already provides multiple fit-for-purpose data engines (Presto, Spark, OpenSearch and Cassandra) with native C++ and Jvector capabilities to deliver optimal price-performance for many operational, analytical and AI workloads. GPU-accelerated execution will enable a step-function improvement in this price-performance and will be critical to processing the ever-increasing volumes of new unstructured and AI-generated data.
The market is clearly consolidating to a smaller number of data platforms as clients look to mitigate complexity and risk and benefit from economies of skill and scale. At the same time, clients expect these platforms to support open standards at multiple levels in the stack and MCP-based interfaces that allow platforms to interoperate across the broader ecosystem. IBM watsonx.data is positioned as the leading open, hybrid-cloud data platform for enterprises, providing breadth of workload coverage from BI to gen AI, choice of tooling and optimal price-performance.
Most data estates are still too complex and fragmented to support AI at scale. Frontier models perform well when supported by strong semantics and governance. But the uncomfortable truth is, without a converged platform providing unified access to both structured and unstructured data, organizations will struggle to move analytics and agentic automation into production at the speed and scale needed for competitive advantage.
I think most user interactions with enterprise data and databases will soon be intermediated by agents, which means that more business and technical users with diverse skills will be able to leverage the data for their own use cases and priorities. However, to finally enable this “democratization of data,” users must become fluent with agentic development and analytical tools, their strengths and weaknesses, and how to evaluate quality and performance.
GPU-accelerated data processing is around the corner and hyperconverged infrastructure to facilitate the deployment of these new workloads will be a source of competitive advantage. Agentic data engineering pipelines will become critical to generate high-quality data products, and real-time data processing is increasingly crucial for certain high-value use cases across industries.
Watsonx.data enables you to scale analytics and AI with all your data, wherever it resides, through an open, hybrid and governed data store.
Design a data strategy that eliminates data silos, reduces complexity and improves data quality for exceptional customer and employee experiences.
Unlock the value of enterprise data with IBM Consulting®, building an insight-driven organization that delivers business advantage.