Normalization in database design
Normalization helps you avoid redundancies and inconsistencies in your data. There are several forms of normalization.
After you define entities and decide on attributes for the entities, you normalize entities to avoid redundancy. An entity is normalized if it meets a set of constraints for a particular normal form, which this information describes. Entities can be in first, second, third, and fourth normal forms, each of which has certain rules that are associated with it. In some cases, you follow these rules, and in other cases, you do not follow them.
The rules for normal form are cumulative. In other words, for an entity to satisfy the rules of second normal form, it also must satisfy the rules of first normal form. An entity that satisfies the rules of fourth normal form also satisfies the rules of first, second, and third normal form.
In the context of logical data modeling, an instance is one particular occurrence. An instance of an entity is a set of data values for all the attributes that correspond to that entity.
- First normal form
- A relational entity satisfies the requirement of first normal form if every instance of the entity contains only one value, but never multiple repeating attributes.
Repeating attributes, often called a repeating group, are different attributes that are inherently the same. In an entity that satisfies the requirement of first normal form, each attribute is independent and unique in its meaning and its name. For example, assume that an entity contains the following attributes:
EMPLOYEE_NUMBER JANUARY_SALARY_AMOUNT FEBRUARY_SALARY_AMOUNT MARCH_SALARY_AMOUNT
This situation violates the requirement of first normal form, because JANUARY_SALARY_AMOUNT, FEBRUARY_SALARY_AMOUNT, and MARCH_SALARY_AMOUNT are essentially the same attribute, EMPLOYEE_MONTHLY_SALARY_AMOUNT.
- Second normal form
- An entity is in second normal form if each attribute that is not in the primary key provides a fact that depends on the entire key. A violation of the second normal form occurs when a non-primary key attribute is a fact about a subset of a composite key. For example, an inventory entity records quantities of specific parts that are stored at particular warehouses. The following figure shows the attributes of the inventory entity.
Here, the primary key consists of the PART and the WAREHOUSE attributes together. Because the attribute WAREHOUSE_ADDRESS depends only on the value of WAREHOUSE, the entity violates the rule for second normal form. This design causes several problems:
- Each instance for a part that this warehouse stores repeats the address of the warehouse.
- If the address of the warehouse changes, every instance referring to a part that is stored in that warehouse must be updated.
- Because of the redundancy, the data might become inconsistent. Different instances could show different addresses for the same warehouse.
- If at any time the warehouse has no stored parts, the address of the warehouse might not exist in any instances in the entity.
To satisfy second normal form, the information in the figure above would be in two entities, as the following figure shows.
- Third normal form
- An entity is in third normal form if each non-primary key attribute provides a fact
that is independent of other non-key attributes and depends only on the key.
A violation of the third normal form occurs when a non-primary attribute is a fact about another non-key attribute. For example, the first entity in the following figure contains the attributes EMPLOYEE_NUMBER and DEPARTMENT_NUMBER. Suppose that a program or user adds an attribute, DEPARTMENT_NAME, to the entity. The new attribute depends on DEPARTMENT_NUMBER, whereas the primary key is on the EMPLOYEE_NUMBER attribute. The entity now violates third normal form.
Changing the DEPARTMENT_NAME value based on the update of a single employee, David Brown, does not change the DEPARTMENT_NAME value for other employees in that department. The updated version of the entity in the following figure illustrates the resulting inconsistency. Additionally, updating the DEPARTMENT_NAME in this table does not update it in any other table that might contain a DEPARTMENT_NAME column.
You can normalize the entity by modifying the EMPLOYEE_DEPARTMENT entity and creating two new entities: EMPLOYEE and DEPARTMENT. The following figure shows the new entities. The DEPARTMENT entity contains attributes for DEPARTMENT_NUMBER and DEPARTMENT_NAME. Now, an update such as changing a department name is much easier. You need to make the update only to the DEPARTMENT entity.
- Fourth normal form
- An entity is in fourth normal form if no instance contains two or more independent, multivalued
facts about an entity.
For example, Consider the EMPLOYEE entity. Each instance of EMPLOYEE could have both SKILL_CODE and LANGUAGE_CODE. An employee can have several skills and know several languages. Two relationships exist, one between employees and skills, and one between employees and languages. An entity is not in fourth normal form if it represents both relationships, as the following figure shows.
Instead, you can avoid this violation by creating two entities that represent both relationships, as the following figure shows.
If, however, the facts are interdependent (that is, the employee applies certain languages only to certain skills) you should not split the entity.
You can put any data into fourth normal form. A good rule to follow when doing logical database design is to arrange all the data in entities that are in fourth normal form. Then decide whether the result gives you an acceptable level of performance. If the performance is not acceptable, denormalizing your design is a good approach to improving performance.