Data poisoning is a type of cyberattack where threat actors manipulate or corrupt the training data used to develop artificial intelligence (AI) and machine learning (ML) models.
Neural networks, large language models (LLMs) and deep learning models rely heavily on the quality and integrity of training data, which ultimately determines a model’s functions. This training data can come from various sources, such as the internet, government databases and third-party data providers. By injecting incorrect or biased data points (poisoned data) into these training datasets, malicious actors can subtly or drastically alter a model’s behavior.
For example, data manipulation through poisoning can lead to data misclassification, which reduces the efficacy and accuracy of AI and ML systems. What’s more, these attacks can introduce serious cybersecurity risks, especially in industries such as healthcare and autonomous vehicles.
Data poisoning attacks can be classified into two categories based on intent: targeted and nontargeted.
Targeted data poisoning attacks manipulate AI model outputs in a specific way. For example, cybercriminals might inject poisoned data into a chatbot or generative AI (gen AI) application such as ChatGPT to alter its responses. Similarly, in a cybersecurity scenario, an attacker might introduce poisoned data to a model designed to detect malware, causing it to miss certain threats.
Targeted attacks manipulate the behavior of the model in a way that benefits the attacker, potentially creating new vulnerabilities in the system.
Nontargeted attacks focus on degrading the general robustness of a model. Instead of attacking specific outputs, the goal is to weaken the model’s ability to process data correctly. For example, in autonomous vehicles, nontargeted data poisoning might cause the system to misinterpret inputs from its sensors, mistaking a “stop” sign as a “yield” sign. These types of attacks make AI models more susceptible to adversarial attacks, where an attacker tries to use small, often imperceptible flaws in the model's decision-making process.
Data poisoning attacks can take various forms, including label flipping, data injection, backdoor attacks and clean-label attacks. Each type targets different aspects of an AI model’s functions.
In label flipping attacks, malicious actors manipulate the labels in training data, swapping correct labels with incorrect ones. Consider Nightshade, an AI poisoning tool developed at the University of Chicago. Nightshade allows digital artists to subtly alter the pixels in their images before uploading them online. When AI companies scrape online datasets to train their generative AI models, the altered images disrupt the training process. This manipulation can cause the AI models to misclassify or behave unpredictably—sometimes mistaking images of cows for leather bags.1
Data injection introduces fabricated data points to the training dataset, often to steer the AI model’s behavior in a specific direction. A common example is the SQL injection, where attackers add “1=1” or “=” into an input field. When included in an SQL query, this malicious data alters the query’s meaning, returning all records instead of just one.2 Similarly, in machine learning models, data injection can manipulate the model’s decision-making. This can cause the model to misclassify or exhibit biases, undermining data integrity and overall model robustness.
Backdoor attacks are dangerous because they introduce subtle manipulations, such as inaudible background noise on audio or imperceptible watermarks on images. This leaves the AI system functioning normally under most conditions. However, when a specific trigger input is encountered, the model behaves in a way that benefits the attacker. In the case of open source models—where access to the training data and algorithms might be less restricted—these attacks can be especially harmful. ReversingLabs reported an increase in threats—more than 1300%—circulating through open source repositories from 2020 to 2023.3
In clean-label attacks, attackers modify the data in ways that are difficult to detect. The key characteristic is that the poisoned data still appears correctly labeled, making it challenging for traditional data validation methods to identify. These attacks use the complexity of modern machine learning and deep learning systems, which can fail to flag small, seemingly innocuous changes. Clean-label attacks are among the stealthiest, leaving AI models vulnerable to skewed outputs and degrading model functions.
While data poisoning and prompt injections target different stages of the AI lifecycle, they share a common goal: exploiting vulnerabilities in model inputs. Data poisoning manipulates training datasets, embedding corrupted or malicious data that can compromise a model’s learning process and long-term functionality. In contrast, prompt injections disguise malicious inputs as legitimate prompts, manipulating generative AI systems into leaking sensitive data, spreading misinformation or worse.
Hackers can deploy these strategies separately or in tandem to amplify their impact. For instance, an insider with access to an organization’s systems could theoretically poison a training dataset by embedding skewed or biased data, bypassing validation measures. Later, the insider could exploit the compromised system by performing a prompt injection, activating the poisoned data and triggering malicious behavior. This could include leaking sensitive information, creating a backdoor for further adversarial attacks or weakening the system’s decision-making capabilities.
Data poisoning can have a wide range of impacts on AI and ML models, affecting both their security and overall model performance.
Poisoned training datasets can cause machine learning models to misclassify inputs, undermining the reliability and functions of AI models. In consumer-facing applications, this can cause inaccurate recommendations that erode customer trust and experience. Similarly, in supply chain management, poisoned data can cause flawed forecasts, delays and errors, damaging both model performance and business efficacy. These misclassifications expose vulnerabilities in the training data and can compromise the overall robustness of AI systems.
Data poisoning can also amplify existing biases in AI systems. Attackers can target specific subsets of data—such as a particular demographic—to introduce biased inputs. This can make the AI model perform unfairly or inaccurately. For example, facial recognition models trained with biased or poisoned data might misidentify people from certain groups, leading to discriminatory outcomes. These types of attacks can affect both the fairness and accuracy of ML models across various applications, from hiring decisions to law enforcement surveillance.
Data poisoning can open the door to more sophisticated attacks, such as inversion attacks in which hackers attempt to reverse-engineer the model’s training data. Once an attacker successfully poisons the training data, they can further use these vulnerabilities to launch more adversarial attacks or trigger backdoor actions. In systems designed for sensitive tasks, such as healthcare diagnostics or cybersecurity, these security risks can be especially dangerous.
To defend against data poisoning attacks, organizations can implement strategies to help ensure the integrity of training datasets, improve model robustness and monitor AI models continuously.
A fundamental defense strategy against data poisoning is validating and sanitizing training data before use. Implementing data validation processes during the training phase can help identify and remove suspicious or corrupted data points before they negatively impact the model. This step is essential for preventing the introduction of malicious data into AI systems, especially when using open source data sources or models where integrity is harder to maintain.
Adversarial training is a proactive method of defending against data poisoning and other types of attacks. By intentionally introducing adversarial examples into training models, developers can teach the model to recognize and resist poisoned data, improving its robustness against manipulation. For high-risk applications such as autonomous vehicles or AI security, adversarial training is a crucial step in making AI and ML models more robust and trustworthy.
Once deployed, AI systems can be continuously monitored to detect unusual behavior that might indicate a data poisoning attack. Anomaly detection tools, such as pattern recognition algorithms, can help security teams identify discrepancies in both inputs and outputs and respond quickly if a system is compromised. Ongoing auditing is especially important for generative AI applications such as ChatGPT, where real-time updates to training data and model behavior can be critical in preventing misuse. If an anomaly is detected, the model can be paused or reevaluated to prevent further damage.
Implementing strict access controls is another strategy to mitigate data poisoning risks. Limiting who can modify training datasets and repositories can reduce the risk of unauthorized tampering. Also, incorporating security measures such as encryption can help protect data sources and AI systems from external attacks. In high-stakes environments, such as healthcare and cybersecurity, strict security controls can help ensure that machine learning models remain secure and trustworthy.
1 What is Nightshade, University of Chicago, 2024.
2 SQL Injection, W3 Schools.
3 Key Takeaways from the 2024 State of SSCS Report, ReversingLabs, 16 January 2024.
Gain insights to prepare and respond to cyberattacks with greater speed and effectiveness with the IBM X-Force threat intelligence index.
Discover the benefits and ROI of IBM® Security Guardium data protection in this Forrester TEI study.
Learn about strategies to simplify and accelerate your data resilience roadmap while addressing the latest regulatory compliance requirements.
Data breach costs have hit a new high. Get essential insights to help your security and IT teams better manage risk and limit potential losses.
Follow clear steps to complete tasks and learn how to effectively use technologies in your projects.
Stay up to date with the latest trends and news about data security.
Identity and access management (IAM) is a cybersecurity discipline that deals with user access and resource permissions.
Govern generative AI models from anywhere and deploy on cloud or on premises with IBM watsonx.governance.
Protect data across multiple environments, meet privacy regulations and simplify operational complexity.
IBM provides comprehensive data security services to protect enterprise data, applications and AI.