What is a primary key?

Authors

Alice Gomstyn

Staff Writer

IBM Think

Alexandra Jonker

Staff Editor

IBM Think

What is a primary key?

A primary key is a column or columns in a database table with values that uniquely identify each row or record. For example, an employee ID column could be a primary key in a table of employee information.
 

Primary keys play an important role in relational databases by reinforcing data integrity and enabling successful data retrieval. Additionally, primary keys can be referenced by another type of key to define relationships between tables in relational databases.

Often-cited examples of primary keys found in various databases include:

  • Driver’s license numbers
  • Employee ID numbers
  • Product ID numbers
  • Social security numbers
  • Student ID numbers

The latest tech news, backed by expert insights

Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think newsletter. See the IBM Privacy Statement.

Thank you! You are subscribed.

Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.

Understanding keys and relational databases

Before exploring the particulars of primary keys, it’s helpful to understand relational database systems and the roles that keys such as primary keys play within database design.

Relational databases house data structured across multiple tables, with related data points across different tables. Such databases are managed through relational database management systems (RDBMS) such as IBM Db2, Oracle Database, Microsoft SQL Server and open source database systems such as PostgreSQL and mySQL.

Structured query language (SQL) is a common programming language used for database interactions, although some database management systems support other languages.

In the context of databases, a key is a column or an ordered collection of columns used to identify rows of data in a table. Keys help organize databases by showing relationships between related tables, identifying unique records and ensuring data accuracy and integrity. These benefits support successful, efficient SQL database queries.

AI Academy

Is data management the secret to generative AI?

Explore why high-quality data is essential for the successful use of generative AI.

What are the characteristics of a primary key?

A primary key, also known as an SQL primary key, gives a unique identifier to each record associated with a primary key value. Important characteristics of primary keys include:

  • No null values
  • No duplicate values
  • May take the form of an existing column in a table (natural key) or be added as a new column (surrogate key)
  • May include a single column or multiple columns (as a composite primary key)

A primary key might be an existing column within a single table, in which case it is known as a natural key. However, sometimes there is no single column in the table that meets the primary key constraint—the primary key rule that values must be unique and not null.

In such circumstances, a new column of unique values, not derived from the table’s existing data, may be generated to serve as the primary key. This artificially generated primary key is known as a surrogate key.

Relational database management systems commonly feature the capability to generate unique values for a column, which can then be used as a surrogate key. One example is the auto-increment feature of mySQL.

Primary keys can also be composite keys, meaning they consist of more than a single column of values.

Composite primary keys—unique combinations of columns that can be used to identify records—can serve as an alternative to generating surrogate keys. For example, a column of customer last names might be paired with a column containing dates of birth to form a composite primary key.

What data types are used in primary keys?

SQL supports a range of data types, ranging from timestamp (representations of dates and times) to varchar (variable length character strings). But not every data type is a good choice for primary keys.

It’s often recommended that columns containing numeric values, particularly integers, be used as primary keys since those are typically processed more quickly by relational database management systems.

What are other important types of keys?

Other keys used in database management include:

  • Unique keys
  • Candidate keys
  • Super keys
  • Foreign keys

Unique keys

Primary keys are a subset of unique keys. Unique keys follow what’s known as a unique constraint—the rule that the values of a key are valid only if they are unique. While all primary keys are unique keys, not all unique keys are primary keys. That’s because, unlike primary keys, unique keys may include null values.

Candidate keys

Candidate keys are keys that could serve as primary keys because they contain unique values and no null values. The difference between primary keys and candidate keys is that an existing table can have multiple candidate keys, but only one primary key.

Super keys

Super keys are similar to composite primary keys in that they may include multiple columns and be used to identify records. However, super keys might also include more columns or information than is strictly needed to uniquely identify a record, while composite primary keys do not include more columns or data than is necessary.

Foreign keys

A foreign key in one table refers to a specific primary key in another table in order to define a relationship between the tables. For instance, a foreign key in a table with customer order information may refer to a primary key column of unique customer IDs that’s part of another table containing customer data. This way, an order by a given customer may be linked to important information about that customer, such as their email address and date of birth.

The significance of referential integrity constraints in defining table relationships

To understand how primary keys and foreign keys define relationships between tables in relational databases, it helps to consider referential integrity constraints.

Referential integrity constraints, also known as referential constraints or foreign key constraints, are rules requiring that foreign key values in one table match the primary key values in another table.

For example, in an employee database, a referential integrity constraint may specify that each employee in the EMPLOYEE table must be a member of an existing company department listed in the DEPARTMENT table.

In this case, the DEPARTMENT table’s primary key is a column storing unique department numbers, while the EMPLOYEE table’s foreign key is a column storing the same unique department numbers. These matching columns mean the tables adhere to the foreign key constraint.

Since the DEPARTMENT table is home to the primary key referenced by the EMPLOYEE table’s foreign key, the EMPLOYEE key is considered as having a dependency on the DEPARTMENT table. In turn, the DEPARTMENT table is considered the “parent table” in this table relationship.

Referential integrity constraints can block erroneous data insertions into databases. Such constraints, for instance, would prevent a user from adding an employee record to the EMPLOYEE table (as pictured below) if the department number in that employee record did not exist in DEPARTMENT TABLE.

A diagram showing how primary key and foreign key constraints can prevent the insertion of an invalid record into a table.

What other concepts and practices relate to primary keys?

Indexes

Relational database management systems may create or require the creation of a unique index to correspond to each primary key in the database. In the context of database design, an index is a set of pointers that refer to rows in a table. Indexes can be used to improve performance, including the optimization of data retrieval.

Data modeling

Data modeling provides a visual representation of a database schema, or a blueprint for data relationships among tables in a database. Data models can include information on the primary keys and foreign keys that are used to maintain those relationships.

SQL statements

SQL statements are commands used to interact with relational databases. Under SQL syntax, primary keys for tables may be assigned or added using CREATE TABLE or ALTER TABLE statements.

For example, consider a CREATE TABLE statement for a table of employee names called EMP using IBM’s Db2. The column names are ID (for employee ID), and FIRSTNME and LASTNAME (with 15-character maximums for each). When ID is selected as the primary key, the statement could look like this:

CREATE TABLE EMP
    (ID INT NOT NULL,
    FIRSTNME VARCHAR (15) NOT NULL,
    LASTNAME VARCHAR (15) NOT NULL,
  PRIMARY KEY (ID));

 

In ALTER TABLE statements (for existing tables) in Db2, a clause used to add a primary key is ADD PRIMARY KEY, while foreign keys are added through ADD CONSTRAINT in combination with a reference to the parent table.

Related solutions
IBM StreamSets

Create and manage smart streaming data pipelines through an intuitive graphical interface, facilitating seamless data integration across hybrid and multicloud environments.

Explore StreamSets
IBM® watsonx.data™

Watsonx.data enables you to scale analytics and AI with all your data, wherever it resides, through an open, hybrid and governed data store.

Discover watsonx.data
Data and analytics consulting services

Unlock the value of enterprise data with IBM Consulting®, building an insight-driven organization that delivers business advantage.

Discover analytics services
Take the next step

Unify all your data for AI and analytics with IBM® watsonx.data™. Put your data to work, wherever it resides, with the hybrid, open data lakehouse for AI and analytics.

Discover watsonx.data Explore data management solutions