The biggest data trends for 2026

Eye-level view of two IBMers engaged in a focused discussion during an office meeting.

Most enterprises have spent the past year chasing generative AI pilots. The problem is that most are stuck there. Experimental agents and RAG systems stall before reaching production—not because the models fail, but because the data behind them isn’t ready.
 
Edward Calvesbert, Vice President, Product Management of watsonx.data at IBM, has seen this pattern play out across industries. The core issue, in his view, is data quality and fragmentation. Companies are dealing with data trapped across silos and often lacking the structure, metadata and governance that agents need to use it effectively.
 
But 2026 might be different. Calvesbert sees a few big shifts reshaping the data stack all at once. Hybrid cloud isn’t a stopgap anymore—it’s become the design pattern for enterprise scale, as companies look for more flexibility and cost control across providers. Zero copy integration (which lets you query data without duplicating it) saves both time and money while providing access to more data. And frontier models are making it easier to combine structured and unstructured data to generate new insights and power agentic workflows.

At the same time, the data platform market is consolidating around fewer vendors—but this time, they’re building on open standards instead of closed ecosystems.
 
IBM Think sat down with Calvesbert to dig into what “AI-ready data” actually means and what data leaders should prioritize in 2026.

Defining AI-ready data

When enterprises say they want “AI-ready data,” what are they usually missing?

Enterprises are typically missing unified access to both structured and unstructured data, with [up to] 90% of their data locked away in unstructured silos. They lack a unified knowledge and semantic layer to deliver consistent governance across data sources. And without that foundation, it’s difficult to combine new data with existing reference data to unlock insights and automation. Most critically, they lack a clear path from pilots to production with proper security, compliance, governance and cost-effectiveness at enterprise scale.

What’s the most common reason gen AI projects stall or fail after early pilots?

Data fragmentation prevents teams from accessing and combining information across sources and formats—and from delivering that data as reliable context and tools to models and agents. Missing enterprise readiness features like security, compliance and governance create barriers to deployment. Consistent accuracy and reliability as enterprises progress from informational use cases to analytics and agentic automation is still a significant challenge.

The latest tech news, backed by expert insights

Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think Newsletter. See the IBM Privacy Statement.

Thank you! You are subscribed.

Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.

Zero copy integration

Zero copy access seems to be getting a lot of attention. For someone less technical, what is it, and what problem does it solve?

Zero copy means querying data where it resides, without moving or duplicating it. This solves several critical problems: it eliminates duplication costs, reduces time-to-value by removing complex ETL (extract, transform, load) processes, and avoids vendor lock-in by keeping data in open formats accessible to multiple tools. For enterprises, watsonx.data’s zero copy capabilities provide a hybrid cloud bridge between on-prem and cloud environments, enabling seamless access to operational, analytical and AI workloads without data movement.

Where does zero copy break down today, if at all?

Querying data remotely may introduce performance trade-offs and latency compared to having local copies optimized for specific workloads. Integration is achieved via open standards but ensuring security and compliance across federated data sources adds governance complexity that organizations must carefully manage.

Hybrid cloud momentum

Five years ago, hybrid cloud felt transitional. Why does it now look like a long-term state?

Real-time latency requirements, compliance mandates that keep data on-premises, and cost pressures—along with concerns about hyperscaler lock-in—are driving hybrid-by-design strategies. The infrastructure intensity of new gen AI workloads also points to the long-term need for on-prem and private cloud deployments to enable cost-efficient scalability.

There seems to be growing frustration about analytics and AI costs. What’s behind that and how will it drive the evolution of the market?

Many enterprises are trapped in the walled gardens created by leading ISVs (independent software vendors) and hyperscalers. As more and more data platforms adopt and support zero copy integration and gen AI makes code and query refactoring easy, we expect a dramatic increase in the portability of workloads. This workload portability will create a dynamic marketplace where enterprises will be able to place different workloads across different engines from different vendors to achieve optimal price-performance for a broad range of SLAs (service level agreements) and selection criteria, including deploying across hybrid cloud environments.

Watsonx.data already provides multiple fit-for-purpose data engines (Presto, Spark, OpenSearch and Cassandra) with native C++ and Jvector capabilities to deliver optimal price-performance for many operational, analytical and AI workloads. GPU-accelerated execution will enable a step-function improvement in this price-performance and will be critical to processing the ever-increasing volumes of new unstructured and AI-generated data.

Consolidation vs. fragmentation

Are data platforms consolidating or fragmenting? What does the competitive landscape look like in 2026 versus where it was a few years ago?

The market is clearly consolidating to a smaller number of data platforms as clients look to mitigate complexity and risk and benefit from economies of skill and scale. At the same time, clients expect these platforms to support open standards at multiple levels in the stack and MCP-based interfaces that allow platforms to interoperate across the broader ecosystem. IBM watsonx.data is positioned as the leading open, hybrid-cloud data platform for enterprises, providing breadth of workload coverage from BI to gen AI, choice of tooling and optimal price-performance.

Mixture of Experts | 13 February, episode 94

Decoding AI: Weekly News Roundup

Join our world-class panel of engineers, researchers, product leaders and more as they cut through the AI noise to bring you the latest in AI news and insights.

Looking ahead

What’s one uncomfortable truth leaders need to hear about their data strategies in 2026?

Most data estates are still too complex and fragmented to support AI at scale. Frontier models perform well when supported by strong semantics and governance. But the uncomfortable truth is, without a converged platform providing unified access to both structured and unstructured data, organizations will struggle to move analytics and agentic automation into production at the speed and scale needed for competitive advantage.

Beyond the technical gaps, where do leaders most misjudge their organization’s readiness for AI—organizationally or operationally?

I think most user interactions with enterprise data and databases will soon be intermediated by agents, which means that more business and technical users with diverse skills will be able to leverage the data for their own use cases and priorities. However, to finally enable this “democratization of data,” users must become fluent with agentic development and analytical tools, their strengths and weaknesses, and how to evaluate quality and performance.

Are there any other data trends for 2026 and beyond that data leaders should keep an eye out for?

GPU-accelerated data processing is around the corner and hyperconverged infrastructure to facilitate the deployment of these new workloads will be a source of competitive advantage. Agentic data engineering pipelines will become critical to generate high-quality data products, and real-time data processing is increasingly crucial for certain high-value use cases across industries.

Antonia Davison

Staff Writer

Related solutions
IBM® watsonx.data™

Watsonx.data enables you to scale analytics and AI with all your data, wherever it resides, through an open, hybrid and governed data store.

Discover watsonx.data
Data management software and solutions

Design a data strategy that eliminates data silos, reduces complexity and improves data quality for exceptional customer and employee experiences.

Discover data management solutions
Data and analytics consulting services

Unlock the value of enterprise data with IBM Consulting®, building an insight-driven organization that delivers business advantage.

Discover analytics services
Take the next step

Unify all your data for AI and analytics with IBM® watsonx.data™. Put your data to work, wherever it resides, with the hybrid, open data lakehouse for AI and analytics.

Discover watsonx.data Explore data management solutions