Personally identifiable information (PII) is any information connected to a specific individual that can be used to uncover that individual's identity, such as their social security number, full name, or email address.
As people have come to increasingly rely on information technology in their work and personal lives, the amount of PII shared with organizations has grown. For example, companies collect customers' personal data to understand their markets, and consumers readily give out their telephone numbers and home addresses to sign up for services and shop online.
Sharing PII can have its benefits, as it allows businesses to tailor products and services to the wants and needs of their customers—like serving up more relevant search results in navigation apps. However, the growing storehouses of PII accumulated by organizations attract the attention of cybercriminals. Hackers steal PII to commit identity theft, sell it on the black market, or hold it captive via ransomware. According to IBM’s Cost of a Data Breach 2022 report, 83 percent of companies have suffered more than one data breach, with the average breach costing USD 4.35 million. Individuals and information security professionals must navigate a complex IT and legal landscape to maintain data privacy in the face of these attacks.
PII comes in two types: direct identifiers and indirect identifiers. Direct identifiers are unique to a person and include things like a passport number or driver's license number. A single direct identifier is typically enough to determine someone's identity.
Indirect identifiers are not unique. They include more general personal details like race and place of birth. While a single indirect identifier can't identify a person, a combination can. For example, 87 percent of U.S. citizens (PDF, 303 KB) (link resides outside ibm.com) could be identified based on nothing more than their gender, ZIP code, and date of birth.
Not all personal data is considered PII. For example, data about a person's streaming habits isn't PII because it would be hard, if not impossible, to tell who someone is based solely on what they've watched on Netflix. PII only refers to information that points to a particular person — like the kind of information you might supply to verify your identity when contacting your bank.
Among PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen. A social security number (SSN) is a good example of sensitive PII. Because many government agencies and financial institutions use SSNs to verify people's identities, a criminal who steals an SSN could easily access their victim's tax records or bank accounts. Other examples of sensitive PII include:
Sensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it, or taking other cybersecurity measures.
Non-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a social media handle would be non-sensitive PII: It could identify someone, but a malicious actor couldn't commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include:
Non-sensitive PII is often publicly available — e.g., telephone numbers may be listed in a phonebook, and addresses may be listed in a local government's public property records. Some data privacy regulations don't require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That's because criminals could cause trouble by assembling multiple pieces of non-sensitive PII.
For example, a hacker could break into someone's bank account app with their phone number, email address, and mother's maiden name. The email gives them a username, spoofing the phone number gives them a way to receive a verification code, and the mother's maiden name gives them an answer to the security question.
It's important to note that whether something counts as sensitive or non-sensitive PII depends heavily on context. A full name on its own may be non-sensitive, but a list of people who have visited a certain doctor would be sensitive. Similarly, a person's phone number may be publicly available, but a database of phone numbers used for two-factor authentication on a social media site would be sensitive PII.
Context also determines whether something is considered PII at all. For example, aggregated anonymous geolocation data is often seen as generic personal data because the identity of any single user can't be isolated. However, individual records of anonymous geolocation data can become PII, as demonstrated by a recent Federal Trade Commission (FTC) lawsuit (link resides outside ibm.com). The FTC argues that the data broker Kochava was selling geolocation data that counted as PII because "the company's customized data feeds allow purchasers to identify and track specific mobile device users. For example, the location of a mobile device at night is likely the user's home address and could be combined with property records to uncover their identity."
Advances in technology are also making it easier to identify people with fewer pieces of information, potentially lowering the threshold for what is considered PII in general. For example, researchers at IBM and the University of Maryland devised an algorithm (PDF, 959 KB) (link resides outside ibm.com) for identifying specific individuals by combining anonymous location data with publicly available information from social networking sites.
According to McKinsey (link resides outside ibm.com), 75 percent of countries have implemented data privacy laws governing the collection, retention, and use of PII. Complying with these regulations can be difficult because different jurisdictions may have different or even contradictory rules. The rise of cloud computing and remote workforces also poses a challenge. In these environments, data may be collected in one place, stored in another, and processed in a third. Different regulations may apply to the data at each stage, depending on geographical location.
Complicating things further, different regulations set different standards for what kinds of data must be protected. The European Union's General Data Protection Regulation (GDPR) requires organizations to protect all personal data, defined (link resides outside ibm.com) as "any information relating to an identified or identifiable natural person." Under the GDPR, organizations must protect sensitive and non-sensitive PII, but also things that might not even be considered sensitive data in other contexts, such as political opinions, organizational affiliations, and descriptions of physical characteristics.
The U.S. government's Office of Management and Budget (OMB) more narrowly defines PII (PDF, 227KB) (link resides outside ibm.com) as
[I]nformation which can be used to distinguish or trace an individual's identity, such as their name, social security number, biometric records, etc. alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual, such as date and place of birth, mother's maiden name, etc.
As Gartner analyst Bart Willemsen (link resides outside ibm.com) put it, "In the U.S. … PII historically refers to two or three dozen identifiers like name, address, SSN, driver's license, or credit card number."
While the U.S. lacks federal-level data privacy laws, government agencies are subject to the Privacy Act of 1974, which governs how federal agencies collect, use, and share PII. Some U.S. states have their own data privacy regulations, most notably California. The California Consumer Privacy Act (CCPA) and the California Privacy Rights Act (CPRA) grant consumers certain rights over how organizations collect, store, and use their PII.
Some industries also have their own data privacy regulations. In the U.S., the Health Insurance Portability and Accountability Act (HIPAA) governs how healthcare organizations collect and protect medical records and patient PII. Similarly, the Payment Card Industry Data Security Standard (PCI DSS) is a global financial industry standard for how credit card companies, merchants, and payment processors handle sensitive cardholder information.
Research suggests organizations have struggled to navigate this varying landscape of laws and industry standards. According to ESG (link resides outside ibm.com), 66 percent of companies that have undergone data privacy audits in the last three years have failed at least once, and 23 percent have failed three or more times. Failure to comply with relevant data privacy regulations can lead to fines, reputational damage, lost business, and other consequences for organizations. For example, Amazon was fined USD 888 million for violating the GDPR in 2021 (link resides outside ibm.com).
Hackers steal PII for many reasons: to commit identity theft, for blackmail, or to sell it on the black market, where they can fetch as much as USD 1 per social security number and USD 2,000 for a passport number (link resides outside ibm.com). Hackers may also target PII as part of a larger attack: They may hold it hostage using ransomware or steal PII to take over executives' email accounts for use in spear phishing and business email compromise (BEC) scams.
Cybercriminals often use social engineering attacks to trick unsuspecting victims into willingly handing over PII, but they may also purchase it on the dark web or gain access as part of a larger data breach. PII can be stolen physically by rooting through a person's trash or spying on them as they use a computer. Malicious actors may also monitor a target's social media accounts, where many people unknowingly share non-sensitive PII every day. Over time, an attacker can gather enough information to impersonate a victim or break into their accounts.
For organizations, protecting PII can be complicated. The growth of cloud computing and SaaS services means PII may be stored and processed in multiple locations instead of a single, centralized network. According to a report from ESG (link resides outside ibm.com), the amount of sensitive data stored in public clouds is expected to double by 2024, and more than half of organizations believe this data is not sufficiently secure.
To safeguard PII, organizations typically create data privacy frameworks. These frameworks can take different forms depending on the organization, the PII it collects, and the data privacy regulations it must follow. As an example, the National Institute of Standards and Technology (NIST) provides the following sample framework (link resides outside ibm.com):
1. Identify all PII in the organization's systems.
2. Minimize the collection and use of PII, and regularly dispose of any PII no longer needed.
3. Categorize PII according to sensitivity level.
4. Apply data security controls. Example controls may include:
5. Draft an incident response plan for PII leaks and breaches.
It's worth noting that NIST and other data privacy experts often recommend applying different controls to different data sets based on how sensitive the data is. Using strict controls for non-sensitive data may be cumbersome and not cost-effective.
Strengthen data privacy protection, build customer trust, and grow your business.
A robust data-centric cybersecurity program can provide you comprehensive data protection, centralized visibility, and monitoring against unauthorized access, exposure, or data theft across your enterprise data landscape.
Protect enterprise data and address regulatory compliance with data-centric security solutions and services.
The 2022 Cost of a Data Breach report shares the latest insights into the expanding threat landscape and offers recommendations for how to save time and limit losses.
Why data security is vital for the well-being of any enterprise today.
Learn how data governance ensures companies get the most from their data assets.
Organizations that go beyond simple regulatory compliance can build trust with customers and stand out from competitors. IBM Security® solutions help you deliver trusted customer experiences and grow your business with a holistic, adaptive approach to data privacy based on zero trust principles and proven data privacy protection.