Agents that scale versus agents that stall

Graphic render of data security featuring structured and unstructured data being secured by IBM Guardium

Every organization needs problem-solvers. I’m talking about independent operators, contemptuous of the type of hand-holding that plagues inefficient enterprises and confident enough to read between the lines. When software demonstrates this type of intelligence, we say “it just works.” When it’s an employee, we say “she just gets it.” 

Then there’s the other end of the spectrum, characterized by deferrals, delays and indecision. Often a step behind or acting on outdated information, these squeaky wheels grind everything to a halt and routinely elicit one of the most frustrated phrases in office-speak, “I’ll just do it myself.” 

Millions of AI agents—and you no doubt know this if you find yourself reading this blog—will be built and deployed in the next few years. According to the IBM Institute for Business Value, 70% of surveyed executives say agentic AI is critical to their future strategy.  The question is, what type of agents are you unleashing—problem-solvers or problem-creators?   

The difference between the two comes down to a familiar foe: silos. It’s too tempting to confirm optimism bias during the ideal conditions of pilot season; when it’s time for prime-time—that is, enterprise-wide deployment—the complexities of big business impede progress.  Tangled workflows, patchwork governance and inconsistent data access turn every agent into a one-off maintenance issue. What was supposed to drive productivity becomes a major productivity drain. Call it AI irony.

To scale, organizations must orchestrate all their agents holistically, creating a roster of consistently governed AI collaborators that easily integrate with existing tools. When orchestration works, processes align, silos dissolve and AI potential turns into real outcomes. Still, orchestration alone won’t win the AI race. Data is the differentiator. It’s the force that makes your agents—all of them, not just the POC test cases—fluent in your business and trustworthy enough to act autonomously.

After all, generic data leads to generic AI that speaks the same monotone as your competitors. Or worse, poorly managed data can turn AI into a liability that spreads errors faster and farther than any human ever could.  

It took too long for the market to recognize the importance of preparing data for AI, an oversight that’s caused ROI to be TBD and manifests itself in a slew of stats demonstrating that most organizations are still stuck in pilot season. Indeed, only 5% of organizations surveyed have integrated AI tools into workflows at scale, according to a report from MIT. 

The great data reset

A great correction is currently underway as organizations pour billions into their data initiatives. According to forthcoming survey data from the IBM Institute of Business Value, approximately 13% of IT budgets were allocated to data strategy in 2025, up from 4% in 2022. Similarly, 82% of surveyed chief data officers report that they’re hiring for roles that didn’t exist last year.  

The goal, of course, is to imbue your AI with the type of proprietary, trusted data that makes your business unique. When you or your customers prompt your AI, it should return contextually relevant information that’s consistent with your organization’s goals, values and regulatory obligations. Agentic AI raises the stakes even further. When you set an agent in motion and empower it to make decisions and pursue explicit goals, you must trust that it knows your business and its culture— your data—inside and out. 

For agents to succeed, they need quality data—which, according to The Data Management Association,  is data that’s accurate, complete, consistent, timely, unique and valid. IBM adds a seventh data quality dimension, homogeneity,  which is a quality measure that ensures varied data can be harmonized for consistent interpretation and enriched for semantic understanding.

Maintaining data quality isn’t easy, particularly in the age of zettabytes. Manual quality assurances are time-consuming, error-prone and require a scale of data professionals that simply doesn’t exist amid a persistent talent shortage. 

Organizations have tried to bridge the gap by precariously constructing data stacks toppling over with data warehouses, data lakes and integration tools. Patches, dashboards and scripts add further bloat. The ad hoc approach too often leads to technical debt that compounds constantly—and unpredictably. Innovation slides to the back burner when your IT staff is stuck on mere maintenance, pouring their productivity down the crevices in your data estate. 

Where do we go from here? 

Building a solid data foundation

The answer begins with a data layer that connects, enriches and governs all your data sources and serves as a wellspring for AI agents fluent in your organization’s context and voice. With that foundation, agents deliver decisions you can trust—accelerating workflows, reducing risk and driving productivity at scale.

Metadata is the language of that layer. It provides the context that makes your data easily consumable for AI or more traditional workloads, such as analytics and data engineering. Yet, manual classification doesn’t scale. Automated tagging does—because it applies structure at the speed of ingestion. It captures lineage, sensitivity and business meaning—with human oversight available when needed—to reduce risk and accelerate downstream tasks, such as retrieval and compliance. In short, it turns raw assets into governed, contextual knowledge before anyone even asks for it. 

Context is powerful. Ultimately, it leads to more accurate AI and more confident decision-making. However, data without the right permissions is a liability, not an asset.

Access rules shouldn’t live in spreadsheets. They should travel with the data. As assets move from a document store to a lakehouse to a fine-tuning job, permissions should move too. When policies apply themselves based on identity, role and purpose, the right people see the right data at the right time. This process reduces risk, prevents accidental exposure and keeps compliance from becoming a fire drill.

Strong governance is essential, but it’s only part of the equation. The architecture beneath it determines whether control scales or stalls. Open and hybrid by design is the right approach because most enterprises already span multiple clouds and on-prem environments. Separating storage and compute avoids costly migrations and the disruptions they cause. Open file formats, such as Apache Iceberg, make this possible by decoupling applications from storage, letting tools read and write data in place—wherever it resides. They also prevent lock-in to a single vendor’s database. Flexibility isn’t a luxury—it’s a safeguard against runaway costs and rigid systems that can’t adapt when priorities shift. No wonder then, that three-quarters of organizations expect to increase their use of open-source AI technologies—including open file formats—over the next few years, citing lower implementation and maintenance costs, according to a study from McKinsey

Unstructured is no longer out-of-reach

Unstructured data remains the great untapped reservoir. Invoices, emails, logs, images—even this blog—hold insights, I hope, that rarely make it into analytics because they’re scattered across systems, locked in incompatible formats and lacking neat labels. Manual extraction is a nonstarter. It demands hours of human effort, invites mistakes and collapses under the weight of enterprise-scale data. Automation is the only way to impose order at an enterprise-level: identifying entities, capturing values and layering semantics that reflect how your business actually speaks and how it wants to show up in the market. From there, a schema emerges that machines can process and humans—and AI agents—can trust. 

When this enriched data flows into a retrieval layer that blends text-to-SQL, vector retrieval and hybrid queries, agents stop guessing. They start reasoning and acting with confidence. Traditional RAG systems, by contrast, often struggle to understand context, making them ill-suited for enterprise-scale reasoning. A unified approach avoids those pitfalls, giving agents the depth and precision they need to act decisively.

Turning unstructured chaos into structured clarity is a start, but intelligence is what makes that clarity useful. Without it, even the best-organized data remains inert. Data intelligence gives every asset a story—where it came from, how it changed and who is accountable for it. Cataloging and lineage aren’t just housekeeping; they’re the foundation for trust. Quality scoring ensures agents aren’t reasoning on shaky ground. Publishing data products with well-defined terms turns raw resources into consumable services that teams can rely on. When an agent cites a figure, the source should be one click away. When a definition changes, every dependent system should know before the next decision is made.

But intelligence alone isn’t enough. IBM’s 2024 AI in Action report found that data complexity—including integration across fragmented systems—remains one of the top barriers to scaling AI.  Agents and other systems that rely on data need integration that’s continuous rather than one and done. Integration is how data gets shaped in motion: standardized, enriched, governed and made ready for use as it flows. Pipelines should adapt with each run, learning from drift and optimizing for performance, cost and quality. Observability matters too. When integration is visible and responsive, downstream systems—including agents—don’t inherit silent errors or stale logic.

When integration and intelligence work together, the result feels familiar: it just works. Not because of luck, but because the architecture beneath it is deliberate. A data layer that connects your estate, applies meaning and carries governance through every move—agentic or otherwise—increases accuracy and drives confident decision-making. That’s how you turn a promising demo into a dependable system. That’s how you move from pilots to production without losing the plot.

Lou Foglia

Associate Creative Director

IBM

Sources

1. From AI projects to profits: How agentic AI can sustain financial returns, IBM Institute for Business Value, 9 June 2025.

2. The GenAI Divide: State of AI in Business 2025, MIT Nanda, July 2025

3. The AI multiplier effect: Accelerate growth with decision-ready data, IBM Institute for Business Value, December 2025

4. The Six Primary Dimensions for Data Quality Assessment, DAMA United Kingdom, October 2013.

5. Data quality dimensions, IBM, 17 October 2025.

6. Open source technology in the age of AI, McKinsey & Company, the Mozilla Foundation and the Patrick J. McGovern Foundation, April 2025.

7. AI in Action 2024, IBM, 2024.

Take the next step

IBM® watsonx Orchestrate® and IBM®  watsonx.data® help organizations build AI agents shaped by their trusted data.

Discover watsonx Orchestrate Discover watsonx.data