What is data loss prevention (DLP)?
Explore IBM's data loss prevention solution Subscribe to Security Topic Updates
Illustration with collage of pictograms of clouds, mobile phone, fingerprint, check mark
What is DLP?

Data loss prevention (DLP) refers to the strategies, processes, and technologies cybersecurity teams use to protect sensitive data from theft, loss, and misuse.  

Data is a competitive differentiator for many businesses, and the average corporate network houses a trove of trade secrets, customers' personal data, and other sensitive information. Hackers target this data for their own gain, yet organizations often struggle to thwart these attackers. It can be hard to keep critical data secure while hundreds, if not thousands, of authorized users access it across cloud storage and on-premises repositories every day. 

DLP strategies and tools help organizations prevent data leaks and losses by tracking data throughout the network and enforcing granular security policies. That way, security teams can ensure that only the right people can access the right data for the right reasons.

Cost of a Data Breach

Get insights to better manage the risk of a data breach with the latest Cost of a Data Breach report.

Related content

Register for the X-Force Threat Intelligence Index

Types of data loss

Data loss events are often described as data breaches, data leakage, or data exfiltration. The terms are sometimes used interchangeably but they have distinct meanings.

A data breach is any cyberattack or other security incident in which unauthorized unauthorized parties gain access to sensitive data or confidential information. This includes personal data (for example, Social Security numbers, bank account numbers, healthcare data) or corporate data (for example customer records, intellectual property, financial data). According to IBM's Cost of a Data Breach 2023 report, the average breach costs USD 4.45 million, a 15 percent increase in the last three years.

Data leakage is the accidental exposure of sensitive data or confidential information to the public. Data exfiltration is actual data theft—when an attacker moves or copies someone else’s data to a device under the attacker’s control.

What causes data loss?

Data loss occurs for many reasons, but the most common causes include:

  1. Security vulnerabilities
  2. Weak or stolen credentials
  3. Insider threats
  4. Malware
  5. Social engineering
  6. Physical device theft
  • Security vulnerabilities—weaknesses or flaws in the structure, code, or implementation of an application, device, network, or other IT asset that hackers can exploit. These include coding errors, misconfigurations, and zero-day vulnerabilities (unknown or as yet unpatched weaknesses).

  • Weak or stolen credentials—passwords that hackers can easily guess, or passwords or other credentials (for example, ID cards) that hackers or cybercriminals steal.

  • Insider threats—authorized users who put data at risk through carelessness or malicious intent. Malicious insiders are often motivated by personal gain or a grievance toward the company.

  • Malware—software created specifically to harm a computer system or its users. The best-known form of data-threatening malware is ransomware, which encrypts data so that it can’t be accessed, and demands a ransom payment for the decryption key (and sometimes a second payment to prevent the data from being exfiltrated or shared with other cybercriminals).

  • Social engineering—tactics that fool people into sharing data they shouldn’t share. This can be as artful as a phishing attack that convinces an employee to email employees’ confidential data, or as artless as leaving a malware-infected USB flash drive where someone will find it and use it.

  • Physical device theft—stealing a laptop, smartphone, or other device that grants the thief access to the network and permission to access data.

Data loss prevention strategies and policies

Organizations create formal DLP strategies to protect against all types of data loss. At the core of a DLP strategy is a set of DLP policies that define how users should handle enterprise data. DLP policies cover key data security practices like where to store data, who can access it, how to use it, and how to put security controls around it. 

Rather than drafting a single policy for all data, information security teams typically create different policies for the different types of data in their networks. This is because different kinds of data often need to be handled differently. 

For example, personally identifiable information (PII), like credit card numbers and home addresses, is usually subject to data security regulations that dictate what a company can do with it. On the other hand, the company has free rein over what it does with its intellectual property (IP). Furthermore, the people who need access to PII may not be the same people who need access to company IP. Both kinds of data need to be protected but in different ways.  

Security teams create multiple, granular DLP policies so they can apply the appropriate security standards to each type of data without interfering with the approved behavior of authorized end users. Organizations revise these policies regularly to keep up with changes to relevant regulations, the enterprise network, and business operations.

Why DLP solutions are important

Manually enforcing DLP policies can be challenging, if not impossible. Not only are different sets of data subject to different rules, but organizations must also monitor every piece of data throughout the network including 

  • Data in use—data being accessed or processed—for example, data being used for analysis or calculations, or a text document being edited by an end user.

  • Data in motion—data moving through a network, such as data being transmitted by an event streaming server or a messaging app.

  • Data at rest—data in storage, like data sitting in a cloud drive.

Because DLP policy enforcement requires continuous data visibility across the organization, information security teams typically rely on specialized DLP software tools to ensure that users follow data security policies. These DLP tools can automate key functions like identifying sensitive data, tracking its use, and blocking illicit access.   

DLP solutions often work in tandem with other security controls to protect data. For example, firewalls can help stop malicious traffic into and out of networks. Security information and event management (SIEM) systems can help detect anomalous behavior that may point to data leaks. Extended detection and response (XDR) solutions enable organizations to launch robust, automated responses to data breaches.

Types of DLP solutions

There are three main types of DLP solutions:  network, endpoint, and cloud DLP. Organizations may choose to use one type of solution or a combination of multiple solutions, depending on their needs and how their data is stored.

Network DLP

Network DLP solutions focus on how data moves through, into, and out of a network. They often use artificial intelligence (AI) and machine learning to detect anomalous traffic flows that might signal a data leak or loss. While network DLP tools are designed to monitor data in motion, many can also offer visibility into data in use and at rest on the network.

Endpoint DLP

Endpoint DLP tools monitor activity on laptops, servers, mobile devices, and other devices accessing the network. These solutions are installed directly on the devices they monitor, and they can stop users from performing prohibited actions on those devices. Some endpoint DLP tools can also block unapproved data transfers between devices. 

Cloud DLP

Cloud DLP solutions focus on data stored in and accessed by cloud services. They can scan, classify, monitor, and encrypt data in cloud repositories. These tools can also help enforce access control policies on individual end users and any cloud services that may access company data.

How DLP solutions help enforce DLP policy

Security teams follow a four-step process to put DLP policies into practice, and DLP tools play an important role in each step.

Data identification and classification

First, the organization catalogs all its structured and unstructured data. Structured data is data with a standardized form. It is usually clearly labeled and stored in a database. Credit card numbers are an example of structured data: They are always 16 digits long. Unstructured data is free-form information, like text documents or images. 

Security teams typically use DLP tools to accomplish this step. These tools can often scan the entire network to find data wherever it is stored—in the cloud, on physical endpoints, on employees' personal devices, and elsewhere.  

Next, the organization classifies this data, sorting it into groups based on sensitivity level and shared characteristics. Classifying data allows the organization to apply the right DLP policies to the right kinds of data. For example, some organizations might group data based on type: financial data, marketing data, intellectual property, and so on. Other organizations might group data based on relevant regulations, such as the General Data Protection Regulation (GDPR), the Health Insurance Portability and Accountability Act (HIPAA), the Payment Card Industry Data Security Standard (PCI DSS), and so on.

Many DLP solutions can automate data classification. These tools can use artificial intelligence, machine learning, and pattern matching to analyze structured and unstructured data to determine what kind of data it is, whether it is sensitive, and which DLP policies should apply.

Data monitoring

After data is classified, the security team monitors how it is handled. DLP tools can use several techniques to identify and track sensitive data in use. These techniques include: 

  • Data matching, such as comparing file contents to known sensitive data.

  • Pattern matching, such as looking for data that follows a certain format — for example, a nine-digit number formatted XXX-XX-XXXX might be a social security number.

  • Content analysis, such as using AI and machine learning to parse an email message for confidential information.

  • Detecting labels, tags, and other metadata that explicitly identify a file as sensitive.  

When a DLP tool finds sensitive data being handled, it looks for policy violations, abnormal behavior, system vulnerabilities, and other signs of potential data loss, including:   

  • Data leakages, like a user trying to share a confidential file with someone outside the organization. 

  • Unauthorized users attempting to access critical data or perform unapproved actions, like editing, erasing, or copying a sensitive file. 

  • Malware signatures, traffic from unknown devices, or other indicators of malicious activity.

Applying data protections

When DLP solutions detect policy violations, they can respond with real-time remediation efforts. Examples include:  

  • Encrypting data as it moves through the network

  • Terminating unauthorized data transfers and blocking malicious traffic

  • Warning users that they are violating policies

  • Flagging suspicious behavior for the security team to review

  • Triggering additional authentication challenges before users can interact with critical data

Some DLP tools also help with data recovery, automatically backing up information so it can be restored after a loss.  

Organizations can take more proactive measures to enforce DLP policies as well. Effective identity and access management (IAM), including role-based access control policies, can restrict data access to the right people. Training employees on data security requirements and best practices can help prevent more accidental data losses and leaks before they happen. 

Documenting and reporting on DLP efforts

DLP tools typically feature dashboards and reporting functions that security teams can use to monitor sensitive data throughout the network. This documentation enables the security team to track DLP program performance over time so that policies and strategies can be adjusted as needed. 

DLP tools can also help organizations comply with relevant regulations by keeping records of their data security efforts. In the event of a cyberattack or audit, the organization can use these records to prove that it followed the appropriate data handling procedures. 

DLP and regulatory compliance

DLP strategies are often tightly aligned with compliance efforts. Many organizations craft their DLP policies specifically to comply with rules like the General Data Protection Regulation (GDPR), the Health Insurance Portability and Accountability Act (HIPAA), and the Payment Card Industry Data Security Standard (PCI-DSS).   

Different regulations impose different standards for different kinds of data. For example, HIPAA sets forth rules for personal health information, while PCI-DSS dictates how organizations handle payment card data. A company that collects both kinds of data would likely need a separate DLP policy for each type to meet compliance requirements.   

Many DLP solutions include prewritten DLP policies that are aligned to the various data security standards companies may need to meet. 

Related solutions
IBM Security® QRadar® Suite

Outsmart attacks with a connected, modernized security suite. The QRadar portfolio is embedded with enterprise-grade AI and offers integrated products for endpoint security, log management, SIEM and SOAR—all with a common user interface, shared insights and connected workflows.

Explore QRadar Suite
IBM Security Guardium®

Protect sensitive data on-premises and in the cloud. IBM Security Guardium is a data security solution that adapts as the threat environment changes, providing complete visibility, compliance, and protection throughout the data security lifecycle.

Explore IBM Security Guardium

Data security and protection solutions

Implemented on-premises or in a hybrid cloud, IBM data security solutions help you gain greater visibility and insights to investigate and remediate cyberthreats, enforce real-time controls and manage regulatory compliance.

Explore data security and protection solutions
Resources Cost of a Data Breach 2023

Be better prepared for breaches by understanding their causes and the factors that increase or reduce costs. Learn from the experiences of more than 550 organizations that were hit by a data breach.

What is ransomware?

Ransomware is a form of malware that threatens to destroy or withhold the victim’s data or files unless a ransom is paid to the attacker to unencrypt and restore access to the data.

What is SIEM?

SIEM (security information and event management) is software that helps organizations recognize and address potential security threats and vulnerabilities before they can disrupt business operations.

Take the next step

Cybersecurity threats are becoming more advanced, more persistent and are demanding more effort by security analysts to sift through countless alerts and incidents. IBM Security QRadar SIEM helps you remediate threats faster while maintaining your bottom line. QRadar SIEM prioritizes high-fidelity alerts to help you catch threats that others miss.

Explore QRadar SIEM Book a live demo