Normalization in database design

Normalization helps you avoid redundancies and inconsistencies in your data. There are several forms of normalization.

After you define entities and decide on attributes for the entities, you normalize entities to avoid redundancy. An entity is normalized if it meets a set of constraints for a particular normal form, which this information describes. Entities can be in first, second, third, and fourth normal forms, each of which has certain rules that are associated with it. In some cases, you follow these rules, and in other cases, you do not follow them.

The rules for normal form are cumulative. In other words, for an entity to satisfy the rules of second normal form, it also must satisfy the rules of first normal form. An entity that satisfies the rules of fourth normal form also satisfies the rules of first, second, and third normal form.

In the context of logical data modeling, an instance is one particular occurrence. An instance of an entity is a set of data values for all the attributes that correspond to that entity.

For example, the following figure shows one instance of the EMPLOYEE entity.
Figure 1. One instance of an entity
Begin figure description. This figure shows one instance of an entity. End figure description.
First normal form
A relational entity satisfies the requirement of first normal form if every instance of the entity contains only one value, but never multiple repeating attributes.

Repeating attributes, often called a repeating group, are different attributes that are inherently the same. In an entity that satisfies the requirement of first normal form, each attribute is independent and unique in its meaning and its name. For example, assume that an entity contains the following attributes:

EMPLOYEE_NUMBER
JANUARY_SALARY_AMOUNT
FEBRUARY_SALARY_AMOUNT
MARCH_SALARY_AMOUNT

This situation violates the requirement of first normal form, because JANUARY_SALARY_AMOUNT, FEBRUARY_SALARY_AMOUNT, and MARCH_SALARY_AMOUNT are essentially the same attribute, EMPLOYEE_MONTHLY_SALARY_AMOUNT.

Second normal form
An entity is in second normal form if each attribute that is not in the primary key provides a fact that depends on the entire key. A violation of the second normal form occurs when a non-primary key attribute is a fact about a subset of a composite key. For example, an inventory entity records quantities of specific parts that are stored at particular warehouses. The following figure shows the attributes of the inventory entity.
Figure 2. A primary key that violates second normal form
Begin figure description. This figure shows a primary key that violates the second normal form. End figure description.

Here, the primary key consists of the PART and the WAREHOUSE attributes together. Because the attribute WAREHOUSE_ADDRESS depends only on the value of WAREHOUSE, the entity violates the rule for second normal form. This design causes several problems:

  • Each instance for a part that this warehouse stores repeats the address of the warehouse.
  • If the address of the warehouse changes, every instance referring to a part that is stored in that warehouse must be updated.
  • Because of the redundancy, the data might become inconsistent. Different instances could show different addresses for the same warehouse.
  • If at any time the warehouse has no stored parts, the address of the warehouse might not exist in any instances in the entity.

To satisfy second normal form, the information in the figure above would be in two entities, as the following figure shows.

Figure 3. Two entities that satisfy second normal form
Begin figure description. This figure shows two entities that satisfy the second normal form. End figure description.
Third normal form
An entity is in third normal form if each non-primary key attribute provides a fact that is independent of other non-key attributes and depends only on the key.

A violation of the third normal form occurs when a non-primary attribute is a fact about another non-key attribute. For example, the first entity in the following figure contains the attributes EMPLOYEE_NUMBER and DEPARTMENT_NUMBER. Suppose that a program or user adds an attribute, DEPARTMENT_NAME, to the entity. The new attribute depends on DEPARTMENT_NUMBER, whereas the primary key is on the EMPLOYEE_NUMBER attribute. The entity now violates third normal form.

Changing the DEPARTMENT_NAME value based on the update of a single employee, David Brown, does not change the DEPARTMENT_NAME value for other employees in that department. The updated version of the entity in the following figure illustrates the resulting inconsistency. Additionally, updating the DEPARTMENT_NAME in this table does not update it in any other table that might contain a DEPARTMENT_NAME column.

Figure 4. The update of an unnormalized entity. Information in the entity has become inconsistent.
Begin figure description. This figure shows an unnormalized entity. End figure description.

You can normalize the entity by modifying the EMPLOYEE_DEPARTMENT entity and creating two new entities: EMPLOYEE and DEPARTMENT. The following figure shows the new entities. The DEPARTMENT entity contains attributes for DEPARTMENT_NUMBER and DEPARTMENT_NAME. Now, an update such as changing a department name is much easier. You need to make the update only to the DEPARTMENT entity.

Figure 5. Normalized entities: EMPLOYEE, DEPARTMENT, and EMPLOYEE_DEPARTMENT
Begin figure description. This figure shows a normalized entity. End figure description.
Fourth normal form
An entity is in fourth normal form if no instance contains two or more independent, multivalued facts about an entity.

For example, Consider the EMPLOYEE entity. Each instance of EMPLOYEE could have both SKILL_CODE and LANGUAGE_CODE. An employee can have several skills and know several languages. Two relationships exist, one between employees and skills, and one between employees and languages. An entity is not in fourth normal form if it represents both relationships, as the following figure shows.

Figure 6. An entity that violates fourth normal form
Begin figure description. This figure shows an entity that violates the fourth normal form. End figure description.

Instead, you can avoid this violation by creating two entities that represent both relationships, as the following figure shows.

Figure 7. Entities that are in fourth normal form
Begin figure description. This figure shows entities that are in the fourth normal form. End figure description.

If, however, the facts are interdependent (that is, the employee applies certain languages only to certain skills) you should not split the entity.

You can put any data into fourth normal form. A good rule to follow when doing logical database design is to arrange all the data in entities that are in fourth normal form. Then decide whether the result gives you an acceptable level of performance. If the performance is not acceptable, denormalizing your design is a good approach to improving performance.