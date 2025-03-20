Primary keys play an important role in relational databases by reinforcing data integrity and enabling successful data retrieval. Additionally, primary keys can be referenced by another type of key to define relationships between tables in relational databases.
Often-cited examples of primary keys found in various databases include:
Before exploring the particulars of primary keys, it’s helpful to understand relational database systems and the roles that keys such as primary keys play within database design.
Relational databases house data structured across multiple tables, with related data points across different tables. Such databases are managed through relational database management systems (RDBMS) such as IBM Db2, Oracle Database, Microsoft SQL Server and open source database systems such as PostgreSQL and mySQL.
Structured query language (SQL) is a common programming language used for database interactions, although some database management systems support other languages.
In the context of databases, a key is a column or an ordered collection of columns used to identify rows of data in a table. Keys help organize databases by showing relationships between related tables, identifying unique records and ensuring data accuracy and integrity. These benefits support successful, efficient SQL database queries.
A primary key, also known as an SQL primary key, gives a unique identifier to each record associated with a primary key value. Important characteristics of primary keys include:
A primary key might be an existing column within a single table, in which case it is known as a natural key. However, sometimes there is no single column in the table that meets the primary key constraint—the primary key rule that values must be unique and not null.
In such circumstances, a new column of unique values, not derived from the table’s existing data, may be generated to serve as the primary key. This artificially generated primary key is known as a surrogate key.
Relational database management systems commonly feature the capability to generate unique values for a column, which can then be used as a surrogate key. One example is the auto-increment feature of mySQL.
Primary keys can also be composite keys, meaning they consist of more than a single column of values.
Composite primary keys—unique combinations of columns that can be used to identify records—can serve as an alternative to generating surrogate keys. For example, a column of customer last names might be paired with a column containing dates of birth to form a composite primary key.
SQL supports a range of data types, ranging from timestamp (representations of dates and times) to varchar (variable length character strings). But not every data type is a good choice for primary keys.
It’s often recommended that columns containing numeric values, particularly integers, be used as primary keys since those are typically processed more quickly by relational database management systems.
Other keys used in database management include:
Primary keys are a subset of unique keys. Unique keys follow what’s known as a unique constraint—the rule that the values of a key are valid only if they are unique. While all primary keys are unique keys, not all unique keys are primary keys. That’s because, unlike primary keys, unique keys may include null values.
Candidate keys are keys that could serve as primary keys because they contain unique values and no null values. The difference between primary keys and candidate keys is that an existing table can have multiple candidate keys, but only one primary key.
Super keys are similar to composite primary keys in that they may include multiple columns and be used to identify records. However, super keys might also include more columns or information than is strictly needed to uniquely identify a record, while composite primary keys do not include more columns or data than is necessary.
A foreign key in one table refers to a specific primary key in another table in order to define a relationship between the tables. For instance, a foreign key in a table with customer order information may refer to a primary key column of unique customer IDs that’s part of another table containing customer data. This way, an order by a given customer may be linked to important information about that customer, such as their email address and date of birth.
To understand how primary keys and foreign keys define relationships between tables in relational databases, it helps to consider referential integrity constraints.
Referential integrity constraints, also known as referential constraints or foreign key constraints, are rules requiring that foreign key values in one table match the primary key values in another table.
For example, in an employee database, a referential integrity constraint may specify that each employee in the EMPLOYEE table must be a member of an existing company department listed in the DEPARTMENT table.
In this case, the DEPARTMENT table’s primary key is a column storing unique department numbers, while the EMPLOYEE table’s foreign key is a column storing the same unique department numbers. These matching columns mean the tables adhere to the foreign key constraint.
Since the DEPARTMENT table is home to the primary key referenced by the EMPLOYEE table’s foreign key, the EMPLOYEE key is considered as having a dependency on the DEPARTMENT table. In turn, the DEPARTMENT table is considered the “parent table” in this table relationship.
Referential integrity constraints can block erroneous data insertions into databases. Such constraints, for instance, would prevent a user from adding an employee record to the EMPLOYEE table (as pictured below) if the department number in that employee record did not exist in DEPARTMENT TABLE.
Relational database management systems may create or require the creation of a unique index to correspond to each primary key in the database. In the context of database design, an index is a set of pointers that refer to rows in a table. Indexes can be used to improve performance, including the optimization of data retrieval.
Data modeling provides a visual representation of a database schema, or a blueprint for data relationships among tables in a database. Data models can include information on the primary keys and foreign keys that are used to maintain those relationships.
SQL statements are commands used to interact with relational databases. Under SQL syntax, primary keys for tables may be assigned or added using CREATE TABLE or ALTER TABLE statements.
For example, consider a CREATE TABLE statement for a table of employee names called EMP using IBM’s Db2. The column names are ID (for employee ID), and FIRSTNME and LASTNAME (with 15-character maximums for each). When ID is selected as the primary key, the statement could look like this:
In ALTER TABLE statements (for existing tables) in Db2, a clause used to add a primary key is ADD PRIMARY KEY, while foreign keys are added through ADD CONSTRAINT in combination with a reference to the parent table.
