Entity attributes in database design

When you define attributes for entities, you generally work with the data administrator to decide on names, data types, and appropriate values for the attributes.

Naming conventions for entity attributes

Most organizations have naming conventions. In addition to following these conventions, data administrators also base attribute definitions on class words. A class word is a single word that indicates the nature of the data that the attribute represents.

For example, the class word NUMBER indicates an attribute that identifies the number of an entity. Attribute names that identify the numbers of entities should therefore include the class word of NUMBER. Some examples are EMPLOYEE_NUMBER, PROJECT_NUMBER, and DEPARTMENT_NUMBER.

When an organization does not have well-defined guidelines for attribute names, the data administrators try to determine how the database designers have historically named attributes. Problems occur when multiple individuals are inventing their own naming schemes without consulting each other.

Data types for entity attributes

A data type must be specified for each attribute. Most organizations have well-defined guidelines for using the different data types. Here is an overview of the main data types that you can use for the attributes of your entities.

String data types
Data that contains a combination of letters, numbers, and special characters. String data types are listed below:
  • CHARACTER: Fixed-length character strings. The common short name for this data type is CHAR.
  • VARCHAR: Varying-length character strings.
  • CLOB: Varying-length character large object strings, typically used when a character string might exceed the limits of the VARCHAR data type.
  • GRAPHIC: Fixed-length graphic strings that contain double-byte characters.
  • VARGRAPHIC: Varying-length graphic strings that contain double-byte characters.
  • DBCLOB: Varying-length strings of double-byte characters in a large object.
  • BINARY: A sequence of bytes that is not associated with a code page.
  • VARBINARY: Varying-length binary strings.
  • BLOB: Varying-length binary strings in a large object.
  • XML: Varying-length string that is an internal representation of XML.
Numeric data types
Data that contains digits. Numeric data types are listed below:
  • SMALLINT: for small integers.
  • INTEGER: for large integers.
  • BIGINT: for bigger values.
  • DECIMAL(p,s) or NUMERIC(p,s), where p is precision and s is scale: for packed decimal numbers with precision p and scale s. Precision is the total number of digits, and scale is the number of digits to the right of the decimal point.
  • DECFLOAT: for decimal floating-point numbers.
  • REAL: for single-precision floating-point numbers.
  • DOUBLE: for double-precision floating-point numbers.
Datetime data types
Data values that represent dates, times, or timestamps. Datetime data types are listed below:
  • DATE: Dates with a three-part value that represents a year, month, and day.
  • TIME: Times with a three-part value that represents a time of day in hours, minutes, and seconds.
  • TIMESTAMP: Timestamps with a seven-part value that represents a date and time by year, month, day, hour, minute, second, and microsecond.
Examples
You might use the following data types for attributes of the EMPLOYEE entity:
  • EMPLOYEE_NUMBER: CHAR(6)
  • EMPLOYEE_LAST_NAME: VARCHAR(15)
  • EMPLOYEE_HIRE_DATE: DATE
  • EMPLOYEE_SALARY_AMOUNT: DECIMAL(9,2)

The data types that you choose are business definitions of the data type. During physical database design you might need to change data type definitions or use a subset of these data types. The database or the host language might not support all of these definitions, or you might make a different choice for performance reasons.

For example, you might need to represent monetary amounts, but Db2 and many host languages do not have a data type MONEY. In the United States, a natural choice for the SQL data type in this situation is DECIMAL(10,2) to represent dollars. But you might also consider the INTEGER data type for fast, efficient performance.

Values for key attributes

When you design a database, you must decide what values are acceptable for the various attributes of an entity. For example, you would not want to allow numeric data in an attribute for a person's name. The data types that you choose limit the values that apply to a given attribute, but you can also use other mechanisms. These other mechanisms are domains, null values, and default values.

Domains
A domain describes the conditions that an attribute value must meet to be a valid value. Sometimes the domain identifies a range of valid values. By defining the domain for a particular attribute, you apply business rules to ensure that the data makes sense. For example:
  • A domain might state that a phone number attribute must be a 10-digit value that contains only numbers. You would not want the phone number to be incomplete, nor would you want it to contain alphabetic or special characters and thereby be invalid. You could choose to use either a numeric data type or a character data type. However, the domain states the business rule that the value must be a 10-digit value that consists of numbers. Before finalizing this rule, consider if you have a need for international phone numbers, which have different formats.
  • A domain might state that a month attribute must be a 2-digit value in the range 01–12. Again, you could choose to use datetime, character, or numeric data types for this value, but the domain demands that the value must be in the range of 01 through 12. In this case, incorporating the month into a datetime data type is probably the best choice. This decision should be reviewed again during physical database design.
Null values
A null value is a special indicator that represents the absence of a value. When you are designing attributes for your entities, you will sometimes find that an attribute does not have a value for every instance of the entity. For example, you might want an attribute for a person's middle name, but you can't require a value because some people have no middle name. For these occasions, you can define the attribute so that it can contain null values.

The value can be absent because it is unknown, not yet supplied, or nonexistent. The DBMS treats the null value as an actual value, not as a zero value, a blank, or an empty string.

Just as some attributes should be allowed to contain null values, other attributes should not contain null values.

For example, For the EMPLOYEE entity, you might not want to allow the attribute EMPLOYEE_LAST_NAME to contain a null value.

Default values
A default value is a value that applies to an attribute if no other valid value is available. In some cases, you might not want a specific attribute to contain a null value, but you don't want to require that the user or program always provide a value. In this case, a default value might be appropriate. For example, assume that you don't want the EMPLOYEE_HIRE_DATE attribute to contain null values and that you don't want to require users to provide this data. If data about new employees is generally added to the database on the employee's first day of employment, you could define a default value of the current date.