Data modeling

Logical data modeling is the process of documenting the comprehensive business information requirements in an accurate and consistent format.

Designing and implementing a successful database, one that satisfies the needs of an organization, requires a logical data model. Analysts who do data modeling define the data items and the business rules that affect those data items. The process of data modeling acknowledges that business data is a vital asset that the organization needs to understand and carefully manage.

Consider the following business facts that a manufacturing company needs to represent in its data model:

  • Customers purchase products.
  • Products consist of parts.
  • Suppliers manufacture parts.
  • Warehouses store parts.
  • Transportation vehicles move the parts from suppliers to warehouses and then to manufacturers.

These are all business facts that a manufacturing company's logical data model needs to include. Many people inside and outside the company rely on information that is based on these facts. Many reports include data about these facts.

Any business, not just manufacturing companies, can benefit from the task of data modeling. Database systems that supply information to decision makers, customers, suppliers, and others are more successful if their foundation is a sound data model.

An overview of the data modeling process

You might wonder how people build data models. Data analysts can perform the task of data modeling in a variety of ways. (This process assumes that a data analyst is performing the steps, but some companies assign this task to other people in the organization.) Many data analysts follow these steps:

  1. Build critical user views.

    Analysts begin building a logical data model by carefully examining a single business activity or function. They develop a user view, which is the model or representation of critical information that the business activity requires. (In a later stage, the analyst combines each individual user view with all the other user views into a consolidated logical data model.) This initial stage of the data modeling process is highly interactive. Because data analysts cannot fully understand all areas of the business that they are modeling, they work closely with the actual users. Working together, analysts and users define the major entities (significant objects of interest) and determine the general relationships between these entities.

  2. Add key business rules to user views.

    Next, analysts add key detailed information items and the most important business rules. Key business rules affect insert, update, and delete operations on the data.

    For example, a business rule might require that each customer entity have at least one unique identifier. Any attempt to insert or update a customer identifier that matches another customer identifier is not valid. In a data model, a unique identifier is called a primary key.
  3. Add detail to user views and validate them.

    After the analysts work with users to define the key entities and relationships, they add other descriptive details that are less vital. They also associate these descriptive details, called attributes, to the entities.

    For example, a customer entity probably has an associated phone number. The phone number is a non-key attribute of the customer entity.

    Analysts also validate all the user views that they have developed. To validate the views, analysts use the normalization process and process models. Process models document the details of how the business will use the data. You can read more about process models and data models in other books on those subjects.

  4. Determine additional business rules that affect attributes.

    Next, analysts clarify the data-driven business rules. Data-driven business rules are constraints on particular data values. These constraints need to be true, regardless of any particular processing requirements. Analysts define these constraints during the data design stage, rather than during application design. The advantage to defining data-driven business rules is that programmers of many applications don't need to write code to enforce these business rules.

    For example, assume that a business rule requires that a customer entity have either a phone number or an address, or both. If this rule doesn't apply to the data itself, programmers must develop, test, and maintain applications that verify the existence of one of these attributes.

    Data-driven business requirements have a direct relationship with the data, thereby relieving programmers from extra work.

  5. Integrate user views.

    In this last phase of data modeling, analysts combine the different user views that they have built into a consolidated logical data model. If other data models already exist in the organization, the analysts integrate the new data model with the existing one. At this stage, analysts also strive to make their data model flexible so that it can support the current business environment and possible future changes.

    For example, assume that a retail company operates in a single country and that business plans include expansion to other countries. Armed with knowledge of these plans, analysts can build the model so that it is flexible enough to support expansion into other countries.

Recommendations for logical data modeling

To build sound data models, analysts follow a well-planned methodology, which includes:

  • Working interactively with the users as much as possible.
  • Using diagrams to represent as much of the logical data model as possible.
  • Building a data dictionary to supplement the logical data model diagrams. (A data dictionary is a repository of information about an organization's application programs, databases, logical data models, users, and authorizations. A data dictionary can be manual or automated.)

Data modeling: Some practical examples

To perform the data modeling task, you begin by defining your entities, the significant objects of interest. Entities are the things about which you want to store information. For example, you might want to define an entity for employees called EMPLOYEE because you need to store information about everyone who works for your organization. You might also define an entity, called DEPARTMENT, for departments.

Next, you define primary keys for your entities. A primary key is a unique identifier for an entity. In the case of the EMPLOYEE entity, you probably need to store lots of information. However, most of this information (such as gender, birth date, address, and hire date) would not be a good choice for the primary key. In this case, you could choose a unique employee ID or number (EMPLOYEE_NUMBER) as the primary key. In the case of the DEPARTMENT entity, you could use a unique department number (DEPARTMENT_NUMBER) as the primary key.

After you decide on the entities and their primary keys, you can define the relationships that exist between the entities. The relationships are based on the primary keys. If you have an entity for EMPLOYEE and another entity for DEPARTMENT, the relationship that exists is that employees are assigned to departments.

After you define the entities, their primary keys, and their relationships, you can define additional attributes for the entities. In the case of the EMPLOYEE entity, you might define the following additional attributes:

  • Birth date
  • Hire date
  • Home address
  • Office phone number
  • Gender
  • Resume

You can read more about defining attributes later in this information.

Finally, you normalize the data.