Grouping data elements into hierarchies
The data elements that describe a data aggregate, the student, might be represented by the descriptive names STUSEQ#, STUNAME, CUST, LOCTN, STATUS, ABSENCE, and GRADE. We call this group of data elements the student data aggregate.
Data elements have values and names. In the student data elements example, the values are a particular student's sequence number, the student's name, company, company location, the student's status in the class, the student's absences, and grade. The names of the data aggregate are not unique—they describe all the students in the class in the same terms. The combined values, however, of a data aggregate occurrence are unique. No two students can have the same values in each of these fields.
As you group data elements into data aggregates and data structures, look at the data elements that make up each group and choose one or more data elements that uniquely identify that group. This is the data aggregate's controlling key, which is the data element or group of data elements in the aggregate that uniquely identifies the aggregate. Sometimes you must use more than one data element for the key in order to uniquely identify the aggregate.
By following the three steps explained in this topic, you can develop a conceptual data structure for a business process's data. However, you are not developing the logical data structure for the program that performs the business process. The three steps are:
- Separate repeating data elements in a single occurrence of the data aggregate.
- Separate duplicate values in multiple occurrences of the data aggregate.
- Group each data element with its controlling keys.
Step 1. separating repeating data elements
Look at a single occurrence of the data aggregate. The following table shows what this looks like for the class aggregate; the data element is listed with the class aggregate occurrence.
| Data element | Class aggregate occurrence |
|---|---|
| EDCNTR | CHICAGO |
| DATE(START) | 1/14/96 |
| CRSNAME | TRANSISTOR THEORY |
| CRS CODE | 41837 |
| LENGTH | 10 DAYS |
| INSTRS | multiple |
| STUSEQ# | multiple |
| STUNAME | multiple |
| CUST | multiple |
| LOCTN | multiple |
| STATUS | multiple |
| ABSENCE | multiple |
| GRADE | multiple |
The data elements defined as multiple are the elements that repeat. Separate the repeating data elements by shifting them to a lower level. Keep data elements with their controlling keys.
The data elements that repeat for a single class are: STUSEQ#, STUNAME, CUST, LOCTN, STATUS, ABSENCE, and GRADE. INSTRS is also a repeating data element, because some classes require two instructors, although this class requires only one.
When you separate repeating data elements into groups, you have the structure shown in the following figure.
In the following figure, the data elements in each box form an aggregate. The entire figure depicts a data structure. The data elements include the Course aggregate, the Student aggregate, and the Instructor aggregate.
The following figure shows these aggregates with the keys indicated with leading asterisks (*).

The keys for the data aggregates are shown in the following table.
| Data aggregate | Keys |
|---|---|
| Course aggregate | EDCNTR, DATE, CRSCODE |
| Student aggregate | EDCNTR, DATE, CRSCODE, STUSEQ# |
| Instructor aggregate | EDCNTR, DATE, CRSCODE, INSTRS |
The asterisks in the previous figure identify the key data elements. For the Class aggregate, it takes multiple data elements to identify the course, so you need multiple data elements to make up the key. The data elements that comprise the Class aggregate are:
- Controlling key element, STUSEQ#
- STUNAME
- CUST
- LOCTN
- STATUS
- ABSENCE
- GRADE
The data elements that comprise the Instructor aggregate are:
- Key element, INSTRS
The Course aggregate and the Instructor aggregate inherit the following keys from the root segment, Course aggregate:
- EDCNTR
- DATE
- CRSCODE
After you have shifted repeating data elements, make sure that each element is in the same group as its controlling key. INSTRS is separated from the group of data elements describing a student because the information about instructors is unrelated to the information about the students. The student sequence number does not control who the instructor is.
In the example shown in the previous figure, the Student aggregate and Instructor aggregate are both dependents of the Course aggregate. A dependent aggregate's key includes the concatenated keys of all the aggregates above the dependent aggregate. This is because a dependent's controlling key does not mean anything if you do not know the keys of the higher aggregates. For example, if you knew that a student's sequence number was 4, you would be able to find out all the information about the student associated with that number. This number would be meaningless, however, if it were not associated with a particular course. But, because the key for the Student aggregate is made up of Ed Center, date, and course code, you can deduce which class the student is in.
Step 2. isolating duplicate aggregate values
Look at multiple occurrences of the aggregate—in this case, the values you might have for two classes. The following table shows multiple occurrences (2) of the same data elements. As you look at this table, check for duplicate values. Remember that both occurrences describe one course.
| Data element list | Occurrence 1 | Occurrence 2 |
|---|---|---|
| EDCNTR | CHICAGO | NEW YORK |
| DATE(START) | 1/14/96 | 3/10/96 |
| CRSNAME | TRANS THEORY | TRANS THEORY |
| CRSCODE | 41837 | 41837 |
| LENGTH | 10 DAYS | 10 DAYS |
| INSTRS | multiple | multiple |
| STUSEQ# | multiple | multiple |
| STUNAME | multiple | multiple |
| CUST | multiple | multiple |
| LOCTN | multiple | multiple |
| STATUS | multiple | multiple |
| ABSENCE | multiple | multiple |
| GRADE | multiple | multiple |
The data elements defined as multiple are the data elements that repeat. The values in these elements are not the same. The aggregate is always unique for a particular class.
In this step, compare the two occurrences and shift the fields with duplicate values (TRANS THEORY and so on) to a higher level. If you need to, choose a controlling key for aggregates that do not yet have keys.
In the previous table, CRSNAME, CRSCODE, and LENGTH are the fields that have duplicate values. Much of this process is intuitive. Student status and grade, although they can have duplicate values, should not be separated because they are not meaningful values by themselves. These values would not be used to identify a particular student. This becomes clear when you remember to keep data elements with their controlling keys. When you separate duplicate values, you have the structure shown in the following figure.

Step 3. grouping data elements with their controlling keys
This step is often a check on the first two steps. (Sometimes the first two steps have already done what this step instructs you to do.)
At this stage, make sure that each data element is in the group that contains its controlling key. The data element should depend on the full key. If the data element depends only on part of the key, separate the data element along with the partial (controlling) key on which it depends.
In this example, CUST and LOCTN do not depend on the STUSEQ#. They are related to the student, but they do not depend on the student. They identify the company and company address of the student.
CUST and LOCTN are not dependent on the course, the Ed Center, or the date, either. They are separate from all of these things. Because a student is only associated with one CUST and LOCTN, but a CUST and LOCTN can have many students attending classes, the CUST and LOCTN aggregate should be above the student aggregate.
The following figure shows these aggregates and keys indicated with leading asterisks (*) and shows what the structure looks like when you separate CUST and LOCTN.

The keys for the data aggregates are shown in the following table.
| Data aggregate | Keys |
|---|---|
| Course aggregate | CRSCODE |
| Class aggregate | CRSCODE, EDCNTR, DATE |
| Customer aggregate | CUST, LOCTN |
| Student aggregate | (when viewed from the customer aggregate in Current roster after step 3instead of from the course aggregate, in Current roster after step 2) CUST, LOCTN, STUSEQ, CRSCODE, EDCNTR, DATE |
| Instructor aggregate | CRSCODE, EDCNTR, DATE, INSTRS |
Deciding on the arrangement of the customer and location information is part of designing a database. Data structuring should separate any inconsistent data elements from the rest of the data elements.