Entity normalization

After you define entities and decide on attributes for the entities, you normalize entities to avoid redundancy.

An entity is normalized if it meets a set of constraints for a particular normal form, which this section describes. Normalization helps you avoid redundancies and inconsistencies in your data. This section summarizes rules for first, second, third, and fourth normal forms of entities, and it describes reasons why you should or shouldn't follow these rules.

The following are the normal forms of an entity:

The rules for normal form are cumulative. In other words, for an entity to satisfy the rules of second normal form, it also must satisfy the rules of first normal form. An entity that satisfies the rules of fourth normal form also satisfies the rules of first, second, and third normal form.

In the context of logical data modeling, an instance is one particular occurrence. An instance of an entity is a set of data values for all of the attributes that correspond to that entity.

Example: The following figure shows one instance of the EMPLOYEE entity.
Figure 1. The EMPLOYEE entity
Begin figure description. One row in the Employee table. End figure description.

First normal form

A relational entity satisfies the requirement of first normal form if every instance of an entity contains only one value, never multiple repeating attributes. Repeating attributes, often called a repeating group, are different attributes that are inherently the same. In an entity that satisfies the requirement of first normal form, each attribute is independent and unique in its meaning and its name.

Example: Assume that an entity contains the following attributes:

EMPLOYEE_NUMBER
JANUARY_SALARY_AMOUNT
FEBRUARY_SALARY_AMOUNT
MARCH_SALARY_AMOUNT

This situation violates the requirement of first normal form, because JANUARY_SALARY_AMOUNT, FEBRUARY_SALARY_AMOUNT, and MARCH_SALARY_AMOUNT are essentially the same attribute, EMPLOYEE_ MONTHLY_SALARY_AMOUNT.

Second normal form

An entity is in second normal form if each attribute that is not in the primary key provides a fact that depends on the entire key. A violation of the second normal form occurs when a nonprimary key attribute is a fact about a subset of a composite key.

Example: An inventory entity records quantities of specific parts that are stored at particular warehouses. The following figure shows the attributes of the inventory entity.

Figure 2. Entity in violation of the second normal form
Begin figure description. Two columns in a four-column table form the key. End figure description.

Here, the primary key consists of the PART and the WAREHOUSE attributes together. Because the attribute WAREHOUSE_ADDRESS depends only on the value of WAREHOUSE, the entity violates the rule for second normal form. This design causes several problems:

  • Each instance for a part that this warehouse stores repeats the address of the warehouse.
  • If the address of the warehouse changes, every instance referring to a part that is stored in that warehouse must be updated.
  • Because of the redundancy, the data might become inconsistent. Different instances could show different addresses for the same warehouse.
  • If at any time the warehouse has no stored parts, the address of the warehouse might not exist in any instances in the entity.

To satisfy second normal form, the information in the previous figure would be in two entities, as the following figure shows.

Figure 3. Entities that satisfy the second normal form
Begin figure description. Division of preceding example into two entities. End figure description.

Third normal form

An entity is in third normal form if each nonprimary key attribute provides a fact that is independent of other non-key attributes and depends only on the key. A violation of the third normal form occurs when a nonprimary attribute is a fact about another non-key attribute.

Example: The first entity contains the attributes EMPLOYEE_NUMBER and DEPARTMENT_NUMBER. Suppose that a program or user adds an attribute, DEPARTMENT_NAME, to the entity. The new attribute depends on DEPARTMENT_NUMBER, whereas the primary key is on the EMPLOYEE_NUMBER attribute. The entity now violates third normal form.

Changing the DEPARTMENT_NAME value based on the update of a single employee, David Brown, does not change the DEPARTMENT_NAME value for other employees in that department. The updated version of the entity as shown in the previous figure illustrates the resulting inconsistency. Additionally, updating the DEPARTMENT_ NAME in this table does not update it in any other table that might contain a DEPARTMENT_NAME column.

Figure 4. Results of an update in a table that violates the third normal form
Begin figure description. Two versions of the same table, before and after an update. End figure description.

You can normalize the entity by modifying the EMPLOYEE_DEPARTMENT entity and creating two new entities: EMPLOYEE and DEPARTMENT. The following figure shows the new entities. The DEPARTMENT entity contains attributes for DEPARTMENT_NUMBER and DEPARTMENT_NAME. Now, an update such as changing a department name is much easier. You need to make the update only to the DEPARTMENT entity.

Figure 5. Employee and department entities that satisfy the third normal form
Begin figure description. Three tables solve the problem described in the previous figure. End figure description.

Fourth normal form

An entity is in fourth normal form if no instance contains two or more independent, multivalued facts about an entity.

Example: Consider the EMPLOYEE entity. Each instance of EMPLOYEE could have both SKILL_CODE and LANGUAGE_CODE. An employee can have several skills and know several languages. Two relationships exist, one between employees and skills, and one between employees and languages. An entity is not in fourth normal form if it represents both relationships, as the previous figure shows.

Figure 6. Entity in violation of the fourth normal form
Begin figure description. Three columns in a five-column table form the key. End figure description.

Instead, you can avoid this violation by creating two entities that represent both relationships, as the following figure shows.

Figure 7. Entities that satisfy the fourth normal form
Begin figure description. Two tables solve the problem described in the previous figure. End figure description.

If, however, the facts are interdependent (that is, the employee applies certain languages only to certain skills), you should not split the entity.

You can put any data into fourth normal form. A good rule to follow when doing logical database design is to arrange all the data in entities that are in fourth normal form. Then decide whether the result gives you an acceptable level of performance. If the performance is not acceptable, denormalizing your design is a good approach to improving performance.