Grouping data elements into hierarchies

The data elements that describe a data aggregate, the student, might be represented by the descriptive names STUSEQ#, STUNAME, CUST, LOCTN, STATUS, ABSENCE, and GRADE. We call this group of data elements the student data aggregate.

Data elements have values and names. In the student data elements example, the values are a particular student's sequence number, the student's name, company, company location, the student's status in the class, the student's absences, and grade. The names of the data aggregate are not unique—they describe all the students in the class in the same terms. The combined values, however, of a data aggregate occurrence are unique. No two students can have the same values in each of these fields.

As you group data elements into data aggregates and data structures, look at the data elements that make up each group and choose one or more data elements that uniquely identify that group. This is the data aggregate's controlling key, which is the data element or group of data elements in the aggregate that uniquely identifies the aggregate. Sometimes you must use more than one data element for the key in order to uniquely identify the aggregate.

By following the three steps explained in this topic, you can develop a conceptual data structure for a business process's data. However, you are not developing the logical data structure for the program that performs the business process. The three steps are:

  1. Separate repeating data elements in a single occurrence of the data aggregate.
  2. Separate duplicate values in multiple occurrences of the data aggregate.
  3. Group each data element with its controlling keys.

Step 1. separating repeating data elements

Look at a single occurrence of the data aggregate. The following table shows what this looks like for the class aggregate; the data element is listed with the class aggregate occurrence.

Table 1. Single occurrence of class aggregate
Data element Class aggregate occurrence
EDCNTR CHICAGO
DATE(START) 1/14/96
CRSNAME TRANSISTOR THEORY
CRS CODE 41837
LENGTH 10 DAYS
INSTRS multiple
STUSEQ# multiple
STUNAME multiple
CUST multiple
LOCTN multiple
STATUS multiple
ABSENCE multiple
GRADE multiple

The data elements defined as multiple are the elements that repeat. Separate the repeating data elements by shifting them to a lower level. Keep data elements with their controlling keys.

The data elements that repeat for a single class are: STUSEQ#, STUNAME, CUST, LOCTN, STATUS, ABSENCE, and GRADE. INSTRS is also a repeating data element, because some classes require two instructors, although this class requires only one.

When you separate repeating data elements into groups, you have the structure shown in the following figure.

In the following figure, the data elements in each box form an aggregate. The entire figure depicts a data structure. The data elements include the Course aggregate, the Student aggregate, and the Instructor aggregate.

The following figure shows these aggregates with the keys indicated with leading asterisks (*).

Figure 1. Current roster after step 1
Begin figure description. This figure is described in the surrounding text. End figure description.

The keys for the data aggregates are shown in the following table.

Table 2. Data aggregates and keys for current roster after step 1
Data aggregate Keys
Course aggregate EDCNTR, DATE, CRSCODE
Student aggregate EDCNTR, DATE, CRSCODE, STUSEQ#
Instructor aggregate EDCNTR, DATE, CRSCODE, INSTRS

The asterisks in the previous figure identify the key data elements. For the Class aggregate, it takes multiple data elements to identify the course, so you need multiple data elements to make up the key. The data elements that comprise the Class aggregate are:

  • Controlling key element, STUSEQ#
  • STUNAME
  • CUST
  • LOCTN
  • STATUS
  • ABSENCE
  • GRADE

The data elements that comprise the Instructor aggregate are:

  • Key element, INSTRS

The Course aggregate and the Instructor aggregate inherit the following keys from the root segment, Course aggregate:

  • EDCNTR
  • DATE
  • CRSCODE

After you have shifted repeating data elements, make sure that each element is in the same group as its controlling key. INSTRS is separated from the group of data elements describing a student because the information about instructors is unrelated to the information about the students. The student sequence number does not control who the instructor is.

In the example shown in the previous figure, the Student aggregate and Instructor aggregate are both dependents of the Course aggregate. A dependent aggregate's key includes the concatenated keys of all the aggregates above the dependent aggregate. This is because a dependent's controlling key does not mean anything if you do not know the keys of the higher aggregates. For example, if you knew that a student's sequence number was 4, you would be able to find out all the information about the student associated with that number. This number would be meaningless, however, if it were not associated with a particular course. But, because the key for the Student aggregate is made up of Ed Center, date, and course code, you can deduce which class the student is in.

Step 2. isolating duplicate aggregate values

Look at multiple occurrences of the aggregate—in this case, the values you might have for two classes. The following table shows multiple occurrences (2) of the same data elements. As you look at this table, check for duplicate values. Remember that both occurrences describe one course.

Table 3. Multiple occurrences of class aggregate
Data element list Occurrence 1 Occurrence 2
EDCNTR CHICAGO NEW YORK
DATE(START) 1/14/96 3/10/96
CRSNAME TRANS THEORY TRANS THEORY
CRSCODE 41837 41837
LENGTH 10 DAYS 10 DAYS
INSTRS multiple multiple
STUSEQ# multiple multiple
STUNAME multiple multiple
CUST multiple multiple
LOCTN multiple multiple
STATUS multiple multiple
ABSENCE multiple multiple
GRADE multiple multiple

The data elements defined as multiple are the data elements that repeat. The values in these elements are not the same. The aggregate is always unique for a particular class.

In this step, compare the two occurrences and shift the fields with duplicate values (TRANS THEORY and so on) to a higher level. If you need to, choose a controlling key for aggregates that do not yet have keys.

In the previous table, CRSNAME, CRSCODE, and LENGTH are the fields that have duplicate values. Much of this process is intuitive. Student status and grade, although they can have duplicate values, should not be separated because they are not meaningful values by themselves. These values would not be used to identify a particular student. This becomes clear when you remember to keep data elements with their controlling keys. When you separate duplicate values, you have the structure shown in the following figure.

Figure 2. Current roster after step 2
begin figure description. This figure is described in the surrounding text. end figure description.

Step 3. grouping data elements with their controlling keys

This step is often a check on the first two steps. (Sometimes the first two steps have already done what this step instructs you to do.)

At this stage, make sure that each data element is in the group that contains its controlling key. The data element should depend on the full key. If the data element depends only on part of the key, separate the data element along with the partial (controlling) key on which it depends.

In this example, CUST and LOCTN do not depend on the STUSEQ#. They are related to the student, but they do not depend on the student. They identify the company and company address of the student.

CUST and LOCTN are not dependent on the course, the Ed Center, or the date, either. They are separate from all of these things. Because a student is only associated with one CUST and LOCTN, but a CUST and LOCTN can have many students attending classes, the CUST and LOCTN aggregate should be above the student aggregate.

The following figure shows these aggregates and keys indicated with leading asterisks (*) and shows what the structure looks like when you separate CUST and LOCTN.

Figure 3. Current roster after step 3
Begin figure description. Data aggregates: Course points to Class. Class points to Student and Instructor. Customer/Location also points to Student. End figure description.

The keys for the data aggregates are shown in the following table.

Table 4. Data aggregates and keys for current roster after step 3
Data aggregate Keys
Course aggregate CRSCODE
Class aggregate CRSCODE, EDCNTR, DATE
Customer aggregate CUST, LOCTN
Student aggregate (when viewed from the customer aggregate in Current roster after step 3instead of from the course aggregate, in Current roster after step 2) CUST, LOCTN, STUSEQ, CRSCODE, EDCNTR, DATE
Instructor aggregate CRSCODE, EDCNTR, DATE, INSTRS

Deciding on the arrangement of the customer and location information is part of designing a database. Data structuring should separate any inconsistent data elements from the rest of the data elements.