What is Personally Identifiable Information (PII)?

What is PII?

Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, email address or phone number.

As people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.

Sharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers—like serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals.

Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM’s Cost of a Data Breach 2023 report, the average cost of a data breach caused by a ransomware attack was USD 5.13 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.

Cost of a Data Breach

Get insights to better manage the risk of a data breach with the latest Cost of a Data Breach report.

Related content

Direct versus indirect identifiers

PII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.

Indirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87% of U.S. citizens (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code and date of birth.

Sensitive PII versus non-sensitive PII

Not all personal data is considered PII. For example, data about a person's streaming habits isn't PII. It's because it would be hard, if not impossible, to identify someone based solely on what they've watched on Netflix. PII only refers to information that points to a particular person—like the kind of information you might supply to verify your identity when contacting your bank.

Among PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen. A social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:

Unique identification numbers, such as driver's license numbers, passport numbers and other government-issued ID numbers
Biometric data, such as fingerprints and retinal scans
Financial information, including bank account numbers and credit card numbers
Medical records

Sensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it, or taking other cybersecurity measures.

Non-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:

A person's full name
Mother's maiden name
Telephone number
IP address
Place of birth
Date of birth
Geographical details (ZIP code, city, state, country, etc.)
Employment information
Email address or mailing address
Race or ethnicity
Religion

Non-sensitive PII is often publicly available—for example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.

For example, a hacker could break into someone's bank account app with their phone number, email address and mother's maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother's maiden name provides an answer to the security question.

It's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.

When does sensitive information become PII?

Context also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated. However, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com). The FTC argues that the data broker Kochava was selling geolocation data that counted as PII because "the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity."

Advances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM® and the University of Maryland have devised an algorithm (link resides outside ibm.com). This algorithm identifies specific individuals by combining anonymous location data with publicly available information from social networking sites.

Data privacy laws and PII

International privacy regulations

According to McKinsey (link resides outside ibm.com), 75% of countries have implemented data privacy laws governing the collection, retention and use of PII. Complying with these regulations can be difficult because different jurisdictions may have different or even contradictory rules. The rise of cloud computing and remote workforces also poses a challenge. In these environments, data may be collected in one place, stored in another, and processed in a third. Different regulations may apply to the data at each stage, depending on geographical location.

Complicating things further, different regulations set different standards for what kinds of data must be protected. The European Union's General Data Protection Regulation (GDPR) requires organizations to protect all personal data, defined (link resides outside ibm.com) as "any information relating to an identified or identifiable natural person." Under the GDPR, organizations must protect sensitive and non-sensitive PII. They must also safeguard things that might not even be considered sensitive data in other contexts. This information includes political opinions, organizational affiliations and descriptions of physical characteristics.

U.S. privacy regulations

The U.S. government's Office of Management and Budget (OMB) more narrowly defines PII (link resides outside ibm.com) as

[I]nformation which can be used to distinguish or trace an individual's identity, such as their name, social security number, biometric records, etc. alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual, such as date and place of birth, mother's maiden name, etc.

As Gartner analyst Bart Willemsen (link resides outside ibm.com) put it, "In the U.S. … PII historically refers to two or three dozen identifiers like name, address, SSN, driver's license, or credit card number."

While the U.S. lacks federal-level data privacy laws, government agencies are subject to the Privacy Act of 1974, which governs how federal agencies collect, use, and share PII. Some U.S. states have their own data privacy regulations, most notably California. The California Consumer Privacy Act (CCPA) and the California Privacy Rights Act (CPRA) grant consumers certain rights over how organizations collect, store and use their PII.

Industry-specific privacy regulations

Some industries also have their own data privacy regulations. In the U.S., the Health Insurance Portability and Accountability Act (HIPAA) governs how healthcare organizations collect and protect medical records and patient PII. Similarly, the Payment Card Industry Data Security Standard (PCI DSS) is a global financial industry standard for how credit card companies, merchants and payment processors handle sensitive cardholder information.

Research suggests that organizations have struggled to navigate this varying landscape of laws and industry standards. According to ESG (link resides outside ibm.com), 66% of companies that have undergone data privacy audits in the last three years have failed at least once, and 23% have failed three or more times. Failure to comply with relevant data privacy regulations can lead to fines, reputational damage, lost business, and other consequences for organizations. For example, Amazon was fined USD 888 million for violating the GDPR in 2021 (link resides outside ibm.com).

Protecting PII

Hackers steal PII for many reasons: to commit identity theft, for blackmail, or to sell it on the black market, where they can fetch as much as USD 1 per social security number and USD 2,000 for a passport number (link resides outside ibm.com). Hackers may also target PII as part of a larger attack: They may hold it hostage using ransomware or steal PII to take over executives' email accounts for use in spear phishing and business email compromise (BEC) scams.

Cybercriminals often use social engineering attacks to trick unsuspecting victims into willingly handing over PII, but they may also purchase it on the dark web or gain access as part of a larger data breach. PII can be stolen physically by rooting through a person's trash or spying on them as they use a computer. Malicious actors may also monitor a target's social media accounts, where many people unknowingly share non-sensitive PII every day. Over time, an attacker can gather enough information to impersonate a victim or break into their accounts.

For organizations, protecting PII can be complicated. The growth of cloud computing and SaaS services means that PII may be stored and processed in multiple locations instead of a single, centralized network. According to a report from ESG (link resides outside ibm.com), the amount of sensitive data stored in public clouds is expected to double by 2024, and more than half of organizations believe that this data is not sufficiently secure.

To safeguard PII, organizations typically create data privacy frameworks. These frameworks can take different forms depending on the organization, the PII it collects, and the data privacy regulations it must follow. As an example, the National Institute of Standards and Technology (NIST) provides this sample framework (link resides outside ibm.com):

1. Identify all PII in the organization's systems.

2. Minimize the collection and use of PII, and regularly dispose of any PII no longer needed.

3. Categorize PII according to sensitivity level.

4. Apply data security controls. Example controls may include:

Encryption: Encrypting PII in transit, at rest, and in use through homomorphic encryption or confidential computing can help keep PII secure and compliant regardless of where it is stored or handled.
Identity and access management (IAM): Two-factor or multifactor authentication can place more barriers between hackers and sensitive data. Similarly, enforcing the principle of least privilege through zero trust architecture and role-based access controls (RBAC) can limit the amount of PII hackers can access if they breach the network.
Training: Employees learn how to properly handle and dispose of PII. The employees also learn how to protect their own PII. This training covers areas like anti-phishing, social engineering and social media awareness.
Anonymization: Data anonymization is the process of removing the identifying characteristics of sensitive data. Common anonymization techniques include stripping identifiers from data, aggregating data or strategically adding noise to the data.
Cybersecurity tools: Data loss prevention (DLP) tools can help track data as it moves throughout the network, making it easier to detect leaks and breaches. Other cybersecurity solutions that offer high-level views of activity on the network—such as extended detection and response (XDR) tools—can also assist in tracking the use and misuse of PII.

5. Draft an incident response plan for PII leaks and breaches.

It's worth noting that NIST and other data privacy experts often recommend applying different controls to different data sets based on how sensitive the data is. Using strict controls for non-sensitive data may be cumbersome and not cost-effective.