Data sits at the heart of modern enterprise. It shapes business strategies, informs decision-making and underpins everything from pricing models to automation. As organizations rely more heavily on big data and real-time analytics to fuel their artificial intelligence (AI) initiatives, the impact of poor data quality has become impossible to ignore.
A 2025 report by the IBM Institute for Business Value (IBV) found that 43% of chief operations officers identify data quality issues as their most significant data priority.1 And for good reason: over a quarter of organizations estimate they lose more than USD 5 million annually due to poor data quality, with 7% reporting losses of USD 25 million or more.
Yet poor data quality often goes unnoticed because its impact rarely appears at the point of failure. Instead, it surfaces downstream as lost revenue, inefficiencies, compliance risks and missed opportunities. That delay is what makes poor data quality especially dangerous. It gradually influences datasets and systems, shaping strategic decisions long before the issue and its root causes are identified.
This insidious effect becomes even more consequential in today’s AI-driven landscape particularly with the rise of generative AI. Further research from IBM IBV shows that data quality and governance are among the top challenges holding back AI adoption. Concerns about data accuracy or bias rank as a leading barrier to scaling AI initiatives, reported by nearly half (45%) of business leaders.
The reason is simple: AI systems inherit and amplify data quality issues. When that data is inconsistent, incomplete, biased or outdated, both models and the agents built on top of them are less accurate and prone to spreading issues at scale. By contrast, organizations with mature data quality and governance frameworks are more likely to move AI use cases from pilot to production, sustaining value over time.
Industry newsletter
Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think Newsletter. See the IBM Privacy Statement.
Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.
Poor-quality data occurs when datasets fail to meet the requirements of a specific business operation. Even data that appears accurate and complete can function as “bad data” if it isn’t fit for purpose, meaning it doesn’t support the use case, workflow or AI outcome it’s meant to enable.
That failure can stem from a range of issues, including inaccurate data, incomplete data fields, inconsistent data formats or missing data points. Even small human errors when inputting contact information—be it a mistyped phone number or invalid address data—can propagate downstream. These discrepancies can lead to duplicate records or missing data during early stages of data collection and data integration, weakening data analysis, reducing AI performance and, ultimately, affecting business outcomes.
Often, data quality issues are described using dimensions such as data accuracy, completeness, timeliness and consistency. These dimensions matter, but they do not tell the full story. Relying on them alone is like depending on a slightly miscalibrated scale—each individual reading appears reasonable, yet small errors compound and lead to poor decisions.
Common indicators of poor or low-quality data include inconsistency across data sources, missing customer data, outdated data or datasets that cannot be traced back to critical data owners. As data volume grows, these issues add up: high-quality data erodes, inefficiencies are introduced across the organization’s data management initiatives, and AI performance degrades.
Organizations looking to optimize data analytics, automation and AI face challenges that extend well beyond traditional data errors. Yesterday’s concerns, such as skewed dashboards and siloed systems, still matter. But today, the rise of agentic AI systems and autonomous workflows brings a new level of risk. These systems rely on well-governed, reliable data not just for training, but for every interaction: grounding responses, triggering actions and informing decisions across the enterprise.
While most organizations aren’t training their own large language models (LLMs), a survey from PwC shows that 79% of respondents are adopting AI agents in some form. These agents can range from simple copilots to advanced retrieval-augmented generation (RAG) applications. In these environments, data quality problems can produce unpredictable AI behavior like hallucinated outputs or cause models to drift over time.
Alongside adoption, AI spending is accelerating—forecasted to surpass USD 2 trillion in 2026, with 37% year-over-year growth—according to Gartner.2 When AI investment scales, the cost of poor data quality scales with it, meaning the margin for error narrows.
Beyond the risks to AI, data quality failures continue to create challenges such as:
Dashboards and business intelligence tools are used to guide high-stakes strategic decisions. When inaccurate or incomplete data underpins those data quality tools, leaders may misjudge performance, misprice offerings or pursue initiatives based on flawed assumptions.
Automation and machine learning models depend on consistent, validated datasets. They also reflect and amplify their flaws. When poor quality data enters machine learning workflows, its inaccuracies, biases and inconsistencies can propagate across downstream systems, diminishing business value and operational efficiency.
Repeated exposure to inaccurate or inconsistent data erodes confidence among stakeholders. Data engineers and data teams spend more time reconciling datasets trapped in data silos than advancing initiatives. Business users begin to question insights and customer experiences inevitably suffer.
In sensitive industries like healthcare, or those governed by regulations such as the General Data Protection Regulation (GDPR), inaccurate or poorly governed personal data introduces compliance risks. Weak data governance and insufficient data validation controls can expose organizations to audits, reputational damage and hefty fines.
Despite its scale, quantifying the cost of poor data quality remains difficult because its effects are distributed across systems, teams and time. Issues often manifest as secondary effects: delayed workflows, reduced operational efficiency or bad business outcomes tied to flawed insights and data decay.
These inefficiencies are rarely tracked as a single metric. Rather, they are proxies for cost, with each reflecting time spent, value lost or opportunities missed. The diffusion of impact makes the resulting financial losses easy to underestimate.
Instead of calculating a precise dollar figure, many organizations perform data audits and track several metrics. These investigations reveal how frequently data quality issues occur and how long they persist. Common metrics include:
Both recent and widely cited incidents illustrate how poor data quality translates into measurable harm for businesses.
In early 2022, Unity Technologies disclosed that inaccurate data ingestion had corrupted datasets used to train advertising-related machine learning models. Faulty data sources introduced errors into data pipelines supporting predictive targeting and bidding algorithms. Unity reported approximately USD 110 million in lost revenue tied to underperforming models, delayed initiatives and the cost of retraining affected datasets.
In 2022, Equifax issued inaccurate credit scores to millions of consumers due to incorrect data values generated by a legacy system. In some cases, errors were significant enough to influence lending decisions, exposing both consumers and lenders to financial risk.
Beyond the blow to the company’s reputation, the fallout included regulatory scrutiny, class-action litigation and a USD 725,000 settlement—one of several penalties the company faced for credit reporting and dispute-handling failures.
In 2018, Samsung Securities processed an invalid data entry while attempting to issue employee dividends, mistakenly triggering the issuance of billions of duplicate shares. Insufficient validation and human-in-the-loop controls allowed the erroneous data values to reach downstream trading systems.
Although the issue was identified within minutes, the consequences were severe: market disruption, regulatory penalties, leadership resignations and an estimated hundreds of millions of dollars in market value loss.
Traditional approaches, such as reviewing data quality exclusively within a data warehouse, no longer scale. Today’s AI systems interact with data continuously, not episodically, with many operating on streaming or event-driven inputs.
This evolution means organizations must “shift left” on data integrity: Pushing detection, prevention and remediation closer to the moment data is created, rather than waiting for issues to surface downstream.
Having a strong data quality management program can help organizations avoid the consequences of poor data quality. It can also create a competitive advantage in an era where AI and agentic systems depend on trustworthy, real-time data.
To achieve this, organizations need more than isolated fixes. Instead, they need a scalable, repeatable approach to managing data quality. By viewing data quality as an operating model rather than a checklist, organizations can reshape how they manage ownership, control and accountability across the entire data lifecycle.
While not exhaustive, modern practices for preventing data quality issues include:
We are living in a time when AI systems are being asked to act rather than recommend. That pivot puts pressure on organizations to get data quality right from the start or risk issues that compound across business processes. Looking ahead, businesses will need to move beyond operational fixes and instead view data quality as a prerequisite for AI success, not just a safeguard against risk.
Create and manage smart streaming data pipelines through an intuitive graphical interface, facilitating seamless data integration across hybrid and multicloud environments.
Watsonx.data enables you to scale analytics and AI with all your data, wherever it resides, through an open, hybrid and governed data store.
Unlock the value of enterprise data with IBM Consulting®, building an insight-driven organization that delivers business advantage.
1 “The 2025 CDO Study: The AI multiplier effect.” IBM Institute for Business Value, 12 November 2025
2 “Gartner Says Worldwide AI Spending Will Total USD 1.5 Trillion in 2025.” Gartner, 17 September 2025