Natural keys

In data warehouse tables, natural keys are meaningful values that identify records, such as social security numbers that identify specific customers, calendar dates in a time dimension, or SKU numbers in a product dimension.
In some cases, natural keys are unique identifiers and can serve as primary keys. Dates in a time dimension are reliable in this way. For example, the following records contain dates as unique natural keys:
 2000-03-30 TH 14 MAR Q1_2000 2000
 2000-03-31 FR 14 MAR Q1_2000 2000

However, some natural keys are not durable enough to serve as primary keys. For example, if you have a large customer table that maintains the history of your customers' purchasing activity, that table might contain several rows that reference the same natural customer ID at different points in time. You must use a meaningless generated value, or surrogate key, to uniquely identify these rows.

Another reason for using surrogate keys to uniquely identify rows is to reduce the size or complexity of primary key columns. For example, a character or datetime key column might be effective for a 3,000-row dimension table, but long strings become unwieldy when they participate, as foreign keys, in the primary key of a billion-row fact table. Fact table loads and query processing both benefit from the use of relatively small numeric key values. For example, here are the date records from the previous example with a numeric surrogate key generated for each date:
 820 2000-03-30 TH 14 MAR Q1_2000 2000
 821 2000-03-31 FR 14 MAR Q1_2000 2000


Feedback | Information roadmap