What Is a Data Breach?

By Matthew Kosinski

What is a data breach?

A data breach is any security incident in which unauthorized parties access sensitive or confidential information, including personal data (Social Security numbers, bank account numbers, healthcare data) and corporate data (customer records, intellectual property, financial information).

The terms “data breach” and “breach” are often used interchangeably with “cyberattack.” However, not all cyberattacks are data breaches. Data breaches include only those security breaches where someone gains unauthorized access to data.

For example, a distributed denial of service (DDoS) attack that overwhelms a website is not a data breach. A ransomware attack that locks up a company's customer data and threatens to leak the stolen data unless the company pays a ransom is a data breach. The physical theft of hard drives, USB flash drives or even paper files containing sensitive information is also a data breach.

The latest tech news, backed by expert insights

Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think newsletter. See the IBM Privacy Statement.

An expensive problem

According to IBM’s Cost of a Data Breach 2025 report, the global average cost of a data breach is USD 4.44 million. While organizations of every size and kind are vulnerable to breaches, the severity of these breaches and the costs to remediate them can vary.

For example, the average cost of a data breach in the United States is USD 10.22 million, about 4 times the cost of a breach in India (USD 2.51 million).

Breach consequences tend to be especially severe for organizations in highly regulated fields like healthcare, finance and the public sector, where steep fines and penalties can compound the costs. For example, according to the IBM report, the average cost of a healthcare data breach in 2025 is USD 7.42 million, the highest average breach cost among industries for the 14th consecutive year.

Data breach costs arise from several factors, with IBM’s report noting four key ones: lost business, detection and escalation, post-breach response and notification.

The loss of business, revenue and customers resulting from a breach costs organizations USD 1.38 million on average. The price of detecting and escalating the breach is even higher at USD 1.47 million. Post-breach expenses—including fines, settlements, legal fees, providing free credit monitoring to affected customers and similar expenditures—cost the average breach victim USD 1.20 million.

Notification costs, which include reporting breaches to customers, regulators and other third parties, are the lowest at USD 390,000. However, reporting requirements can still be onerous and time-consuming.

The US Cyber Incident Reporting for Critical Infrastructure Act of 2022 (CIRCIA) requires organizations in national security, finance and other designated industries to report cybersecurity incidents affecting personal data or business operations to the Department of Homeland Security within 72 hours.
US organizations subject to the Health Insurance Portability and Accountability Act (HIPAA) must notify the US Department of Health and Human Services, affected individuals and sometimes the media if protected health information is breached.
All 50 US states also have their own data breach notification laws.
The General Data Protection Regulation (GDPR) requires companies doing business with EU citizens to notify authorities of breaches within 72 hours.

Security Intelligence | 4 March, episode 23

Your weekly news podcast for cybersecurity pros

Whether you're a builder, defender, business leader or simply want to stay secure in a connected world, you'll find timely updates and timeless principles in a lively, accessible format. New episodes on Wednesdays at 6am EST.

Watch the latest podcast episode

Why data breaches happen

Data breaches are caused by:

Innocent mistakes, such as an employee emailing confidential information to the wrong person.
Malicious insiders, including angry or laid-off employees who want to hurt the company or cause reputational damage, and greedy employees who want to profit off the company's data.
Hackers, malicious outsiders who commit intentional cybercrimes to steal data. Hackers can act as lone operators or part of an organized ring.

Financial gain is the primary motivation for most malicious data breaches. Hackers steal credit card numbers, bank accounts or other financial information to directly drain funds from people and companies.

Some attackers steal personally identifiable information (PII)—such as Social Security numbers and phone numbers—for identity theft, taking out loans and opening credit cards in their victims' names. Cybercriminals can also sell stolen PII and account information on the dark web, where they can fetch as much as USD 500 for bank login credentials.

A data breach can also be the first phase of a larger attack. For example, hackers might steal the email account passwords of corporate executives and use those accounts to conduct business email compromise scams.

Data breaches might have objectives other than personal enrichment. Unscrupulous organizations might steal trade secrets from competitors, and nation-state actors might breach government systems to steal information about sensitive political dealings, military operations or national infrastructure.

How data breaches happen

Most intentional data breaches caused by internal or external threat actors follow the same basic pattern:

Research: The threat actor identifies a target and looks for weaknesses that they can use to break into the target's system. These weaknesses can be technical, such as inadequate security controls, or human, such as employees susceptible to social engineering.
Attack: The threat actor starts an attack on the target by using their chosen method. The attacker might send a spear-phishing email, directly exploit vulnerabilities in the system, use stolen login credentials to take over an account or leverage other common data breach attack vectors.
Compromise data: Inside the system, the attacker locates the data they want and does what they came to do. Common tactics include exfiltrating data for sale or use, destroying the data or locking up data to demand a ransom.

Common data breach attack vectors

Malicious actors can use various attack vectors or methods to carry out data breaches. Some of the most common include:

Stolen or compromised credentials

According to the Cost of a Data Breach 2025 report, stolen or compromised credentials is one of the top five most common initial attack vectors, accounting for 10% of data breaches and taking up to 186 days to identify.

Hackers can compromise credentials by using brute force attacks to crack passwords, buying stolen credentials off the dark web or tricking employees into revealing their passwords through social engineering attacks.

Social engineering attacks

Social engineering is the act of psychologically manipulating people into unwittingly compromising their own information security.

Phishing, the most common type of social engineering attack, is also the most common data breach attack vector, accounting for 16% of breaches. Phishing scams use fraudulent emails, text messages, social media content or websites to trick users into sharing credentials or downloading malware.

Ransomware

Ransomware, a type of malware that holds data hostage until a victim pays a ransom, costs an average of USD 5.08 million according to the Cost of a Data Breach 2025 report. These breaches tend to be expensive, as this figure does not include ransom payments, which can run to tens of millions of dollars.

System vulnerabilities

Cybercriminals can gain access to a target network by exploiting weaknesses in websites, operating systems, endpoints, APIs and common software like Microsoft Office or other IT assets.

Threat actors don't need to hit their targets directly. In supply chain attacks, hackers exploit vulnerabilities in the networks of a company's service providers and vendors to steal its data.

When hackers locate a vulnerability, they often use it to plant malware in the network. Spyware, which records a victim's keystrokes and other sensitive data and sends it back to a server that the hackers control, is a common type of malware used in data breaches.

SQL injection

Another method of directly breaching target systems is SQL injection, which takes advantage of weaknesses in the Structured Query Language (SQL) databases of unsecured websites.

Hackers enter malicious code into user-facing fields, such as search bars and login windows. This code causes the database to divulge private data like credit card numbers or customers' personal details.

Human error and IT failures

Threat actors can take advantage of employees' mistakes to gain access to confidential information. According to the Cost of a Data Breach 2025 report, human error accounts for 26% of data breaches, while IT failures account for 23%.

For example, misconfigured or outdated systems can let unauthorized parties access data they shouldn't be able to. Employees can expose data by storing it in unsecured locations, misplacing devices with sensitive information saved on their hard drives or mistakenly granting network users excessive access privileges. Cybercriminals can use IT failures, such as temporary system outages, to sneak into sensitive databases.

Physical security compromises

Threat actors may break into company offices to steal employees' devices (such as laptops and cellphones), paper documents and physical hard drives containing sensitive data. Attackers can also place skimming devices on physical credit and debit card readers to collect payment card information.

Notable data breaches

TJX

The 2007 breach of TJX Corporation, the parent company of retailers TJ Maxx and Marshalls, was at that time the largest and costliest consumer data breach in US history. The data privacy of approximately 94 million customers was compromised, and the company suffered more than USD 256 million in financial losses.

Hackers gained access to the data by planting traffic sniffers on the wireless networks of two stores. The sniffers allowed the hackers to capture information as it was transmitted from the store's cash registers to back-end systems.

Yahoo

In 2013, Yahoo suffered what might be the largest data breach in history. Hackers exploited a weakness in the company's cookie system to access the names, birthdates, email addresses and passwords of all 3 billion Yahoo users.

The full extent of the breach was revealed in 2016 while Verizon was in talks to buy the company. As a result, Verizon reduced its acquisition offer by USD 350 million.

Equifax

In 2017, hackers breached the credit reporting agency Equifax and accessed the personal data of more than 143 million Americans.

Hackers exploited an unpatched weakness in Equifax's website to gain access to the network. The hackers then moved laterally to other servers to find Social Security numbers, driver's license numbers and credit card numbers. The attack cost Equifax USD 1.4 billion between settlements, fines and other costs associated with repairing the breach.

SolarWinds

In 2020, Russian threat actors executed a supply chain attack by hacking the software vendor SolarWinds. Hackers used the organization's network monitoring platform, Orion, to covertly distribute malware to SolarWinds' customers.

Russian spies gained access to the confidential information of various US government agencies, including the Treasury, Justice and State Departments, that use SolarWinds' services.

Colonial Pipeline

In 2021, hackers infected Colonial Pipeline's systems with ransomware, forcing the company to temporarily shut down the pipeline that supplies 45% of the US East Coast's fuel.

Hackers breached the network by using an employee's password that they found on the dark web. The Colonial Pipeline Company paid a USD 4.4 million ransom in cryptocurrency, but federal law enforcement recovered roughly USD 2.3 million of that payment.

23andMe

In the fall of 2023, hackers stole the data of 6.9 million 23andMe users. The breach was notable for a couple of reasons. First, because 23andMe conducts genetic testing, the attackers obtained some unconventional and highly personal information, including family trees and DNA data.

Second, the hackers breached user accounts through a technique called "credential stuffing." In this kind of attack, hackers use credentials exposed in previous leaks from other sources to break into users' unrelated accounts on different platforms. These attacks work because many people reuse the same username and password combinations across sites.

Data breach prevention and mitigation

According to the Cost of a Data Breach 2025 report, it takes an average of 241 days to identify and contain an active breach across all industries. Deploying the right security solutions can help organizations detect and respond to these breaches faster.

Standard risk management measures, such as regular vulnerability assessments, scheduled backups, timely patching and proper database configurations, can help prevent some breaches and soften the blow of those that occur.

However, many organizations today implement more advanced controls and best practices to stop more breaches and significantly mitigate the damage they cause.

Data security tools

Organizations can deploy specialized data security solutions to automatically discover and classify sensitive data, apply encryption and other protections and gain real-time insight into data usage.

Incident response plans

Organizations can mitigate breach damage by adopting formal incident response plans for detecting, containing and eradicating cyberthreats. According to the Cost of a Data Breach 2025 report, the third most popular area of security investment for 2025 was IR planning and testing, at 35% of all respondents.

AI and automation

Organizations that extensively integrate artificial intelligence (AI) and automation into security operations resolve breaches 80 days faster than those that don't, according to the Cost of a Data Breach 2025 report. The report also found that security AI and automation reduce the cost of an average breach by USD 1.9 million or a savings of over 34% (as compared to organizations that don't use security AI and automation).

Many data security, data loss prevention and identity and access management tools now incorporate AI and automation.

Employee training

Because social engineering and phishing attacks are leading causes of breaches, training employees to recognize and avoid these attacks can reduce a company's risk of a data breach. In addition, training employees to handle data properly can help prevent accidental data breaches and data leaks.

Identity and access management (IAM)

Password managers, two-factor authentication (2FA) or multifactor authentication (MFA), single sign-on (SSO) and other identity and access management (IAM) tools can protect employee accounts, VPNs and credentials from theft.

Organizations can also enforce role-based access controls and the principle of least privilege to limit employee access to only the data that they need for their roles. These policies can help stop both insider threats and hackers who hijack legitimate accounts.

Techsplainers | Podcast | What is a data breach?

Listen to: 'What is a data breach?'

Follow Techsplainers: Spotifyand Apple Podcasts

Find more episodes

Author

Matthew Kosinski

Staff Editor

IBM Think

Achieve continuous compliance in a hybrid data world with IBM Guardium Data Protection

Register for this webinar to learn how AI governance helps organizations manage risk, meet evolving regulations and build trusted, responsible AI at scale.

What is a data breach?