Use InfoSphere Data Architect to define and enforce data object naming standards

Data object naming standards promote a common understanding of data, sharing of data across organizational boundaries, and reduction of data redundancy through the consolidation of synonymous and overlapping data elements. As organizations search for ever greater efficiency these days, they are embracing naming standards in data management and have started to enforce those standards. This article examines the features of IBM InfoSphere Data Architect that enable users to define and implement object naming standards, and then demonstrates with a real-world example.

[07 Oct 2010: This article was updated to reflect InfoSphere Data Architect 7.5.3 capabilities.--Ed.]

Share:

Wei Liu, Software Engineer, IBM

Wei Liu is a software engineer working at the IBM Seattle office in Seattle, Washington. She works on data tooling and modeling.



07 October 2010 (First published 11 January 2007)

Also available in Vietnamese

Introduction

Recently organizations have realized the importance of naming standards in data management and started to enforce them. When a naming standard is enforced, terms defined in a glossary must be used and term order be followed when naming a data object. IDA helps users to define their glossary and data object naming standards, making it easy to create data object names that are compliant with the naming standards. It facilitates the transformation of logical model object names into physical, and allows the users to validate their data objects to ensure their compliance with naming standards.

Product name change

On December 16th, 2008 IBM announced that as of Version 7.5.1, Rational Data Architect is renamed to InfoSphere Data Architect to feature its role in InfoSphere Foundation Tools.

Naming standards are enforced in many organizations to provide a data environment that promotes better communication and more informed decision-making for both internal and external stakeholders. The benefits of instituting data object naming standards include:

  • Promotion of a common understanding of the data
  • Promotion of the sharing of data across organizational boundaries
  • Reduction of data redundancy through the consolidation of synonymous and overlapping data elements

Data object naming standards

A data object naming standard, or convention, describes how data object names are formulated. A naming standard may cover a wide range of documentation aspects, as described in the International Standard. ISO/IEC 11179-5:

  • The scope of the naming convention; for example, an established industry name
  • The authority that establishes names
  • Semantic rules governing the source and content of the terms used in a name; for example, terms derived from data models and terms commonly used in the discipline
  • Syntactic rules covering a required term order
  • Lexical rules covering controlled term lists, name length, character set, and language
  • A rule establishing whether or not names must be unique

Some of the aspects are very organization specific. This article concentrates on the semantics and syntactic parts of a naming standard, and assumes that a list of approved terms is well defined in an organization.

There are two important aspects of naming a data object: content and the format of the data object name.

  • Content or semantics rules relate to the essential meaning of the terms chosen for the object name and enable meaning to be conveyed by the object name. There are three categories of terms that contribute to data object name contents: prime words, class words, and modifiers (also called qualifiers) as described in the Data element naming standard.

    Prime words:

    • Represent the business concept about which data are being collected.
    • Describe the subject area of the data.
    • Are a noun or noun phrase which describes the subject and main focus of the name.
    • Place data elements within the logical context of the information model.

    Examples: Loan, customer, employee, property

    Class words:

    • Identify a distinct category or classification of data.
    • Delineate the type of data being described by the data name.
    • Describe the major classification of data associated with a data element.

    Examples: Date, amount, rate, quantity, code, indicator, name, description, comment

    Qualifiers:

    • Further qualify or distinguish the prime and class words.
    • Provide clarity and uniqueness to the data name.
    • Modify both class and prime words.
    • Restrict the meaning of the class and prime words.

    Examples: Last, first, next, previous, beginning

  • Format or syntax rules specify the structure of a data object name. It defines the pattern -- number and the arrangement/sequence of parts within a name. For example, a naming standard may require that data object names use the following pattern:

    {MOD}? {PW} {MOD}? {CW} {MOD}?

    Where the separator is a space, there could be a maximum of five and minimum of two terms, and the arrangement/sequence of terms is as follows:

    1. A Modifier(MOD), optional;
    2. A Prime Word (PW), required;
    3. A Modifier (MOD), optional;
    4. A Class Word (CW), required;
    5. A Modifier (MOD), optional.

    Valid data object examples are:

    • EMPLOYEE NAME (PW CW)
    • EMPLOYEE LAST NAME (PW MOD CW)
    • PERMANENT EMPLOYEE LAST NAME (MOD PW MOD CW)

A business name is an English-like and meaningful name used to describe a data object. Business names are used in conceptual or logical data models. An access or technical name describes a data object as represented in a physical database. Since database management systems usually have specific constraints, including characters and character lengths, for object names, it is very common for access names to use abbreviations and a different separator from business names. For example, the EMPLOYEE LAST NAME object in logical model is transformed to EMPL_LST_NM in the physical model.

Define a naming standard using IDA

When a naming standard is enforced, terms defined in a glossary must be used and term orders be followed when naming a data object. IDA helps the users to define their data object naming standards. To define a naming standard using IDA, you specify terms chosen for data object names in a glossary model and the patterns of terms using the data naming standard preferences.

Create a glossary model

A glossary model is a model that describes the terms that are established, approved, and shared in an organization for data object names. Using a glossary model, you can define the name, abbreviation, alternative abbreviation, type (prime or class), if it can be used as a modifier, status, and abstract or description for terms. Glossary models are stored in IDA data design projects. You can share one glossary model among multiple data design projects.

To create a glossary model:

  1. Click File > New > Glossary Model from the main menu. The New Glossary Model wizard opens as shown in Figure 1.
  2. In the wizard page, you specify a destination folder and file name. You can create the glossary model using a blank template, or the Enterprise template with some populated terms. You can choose to add the glossary model to the project properties as the naming standard for that project.
  3. Click Finish.
Figure 1. New glossary model wizard
New glossary model wizard

The glossary model is displayed in the Data Project Explorer under the destination folder that you specified. If you selected the option to add the model to the project properties, the model file is also displayed on the Naming Standard page in the Properties view for the project.

After your glossary is created, you can use the editor to add, remove, or modify glossary model definitions as shown in Figure 2. Click New to add a new row, then type in name, abbreviation, alternative abbreviation, and abstract. Select a type and status, and indicate if it’s a modifier. You can click in any of the columns of a row or use the Properties view to edit a definition. If the glossary already exists with your organization, you can copy and paste definitions from another source such as Microsoft® Word or Microsoft Excel.

Figure 2. Glossary model editor to add, remove, and edit terms
Glossary model editor

Specify a naming pattern

The second part of a naming standard defines the pattern or structure of a name. You can specify this using data naming standard preferences. These preferences apply to all data models in your workspace.

To set preferences for naming standards:

  1. Click Window > Preferences from the main menu.
  2. Click Data > Naming Standard.
  3. On the Logical page, as shown in Figure 3, set the pattern for entity and attribute object names. You can specify whether prime words, class words, and modifiers are optional, and the order in which these elements should occur. You can also specify valid separator characters for these logical objects. By default, the separator for logical objects is a space. With IDA V7.0, you can choose the <Title Case> as the separator if your naming standard requires names with a title case format, such as EmployeeLastName.
  4. On the Physical-Table/Column page, set the pattern for table and column object names in a physical model. You can specify whether prime words, class words, and modifiers are mandatory or optional, and the order in which these elements should occur. You can also specify valid separator characters for these physical objects. By default, the separator is an underscore character.
  5. On the Physical-Other page, as shown in Figure 4, set the name pattern for physical objects other than tables and columns; for example, primary keys, foreign keys, check constraints, unique constraints, indexes, and triggers by adding or removing variables and strings. The patterns defined in this page use variables, such as table name and column name, and don’t reference terms.
  6. On the Glossary page, specify a default glossary model. The glossary models specified here are used for database objects that show up in the Database Explorer.
Figure 3. Naming standard preferences for logical objects
Naming standard preferences - logical objects
Figure 4. Naming standard preferences for physical objects other than tables and columns
Naming standard preferences - physical objects

By now, you've successfully created your naming standard. When naming standards are created, how to ensure that data objects are compliant with the standards? IDA naming standard compliance rule is designed for this purpose.


Validate naming standard compliance

IDA's naming standard compliance rule discovers naming standard violations. This check is based on a combination of data naming standard preferences and glossary model files. You invoke the analyze model rules by selecting:

From the Data Project Explorer:

  • A package in a logical model
  • A database in a physical model, or
  • A schema in a physical model

From the Database Explorer:

  • A database, or
  • A schema

The analyze model dialog prompts you to choose rules to run against, as shown in Figure 5. When you select the naming standard compliance rule, the glossary models added to project properties as project naming standard show up on the next page of the wizard. You could add or remove glossary models at this time and this page is automatically synchronized with the naming standard properties page for the project. When you click Finish, IDA iterates all included objects from your selection and check for their names to ensure they used the terms and patterns defined in your naming standard. For example, if the naming pattern is defined as:

{PW} {MOD} {CW}

And employee, last, and name are prime word, modifier, and class word defined in the glossary model, an object EMPLOYEE LAST NAME has a valid name.

Figure 5. Analyze model dialog prompts the user to choose rules to run against
Analyze model dialog

If any violations are found, they are displayed in the Problems view, as shown in Figure 6.

Figure 6. Naming standard violations displayed as warnings in the Problems view
Problems view

Here are some examples where the naming standard is violated.

  • An incomplete name:

    The pattern is: {mod}? {PW} {MOD}? {MOD}? {CW} {MOD}?

    The object name is: CIVILIAN EMPLOYEE (MOD PW)

    Since the pattern requires a class word and “CIVILIAN EMPLOYEE" doesn't contain a class word, a warning with the following description is displayed in the Problems view:

    Entity CIVILIAN EMPLOYEE is not compliant with naming standards -- Missing required {class word} at the end. The expected naming pattern is {modifier}optional {prime word} {modifier}optional {modifier}optional {class word} {modifier}optional.

  • A name with invalid word:

    The pattern is: {MOD}? {PW} {MOD}? {MOD}? {CW} {MOD}?

    The object name is: CIVILIAN EMPLOYEE ADDRESS (MOD PW invalid word)

    Since the pattern requires a class word and “ADDRESS" is not a class word defined in the glossary model, a warning with the following description will be displayed in the Problems view:

    Attribute CIVILIAN EMPLOYEE ADDRESS is not compliant with naming standards -- A class word is required and ADDRESS is not a valid class word. The expected naming pattern is {modifier}optional {prime word} {modifier}optional {modifier}optional {class word} {modifier}optional.

    When “ADDRESS" is added to the glossary model as a class word, no warning shows up for CIVILIAN EMPLOYEE ADDRESS.

  • A name with extra strings:

    The pattern: {MOD}? {PW} {MOD}? {MOD}? {CW} {MOD}?

    The object name: CIVILIAN EMPL REQUESTED DESC TEXT FULL VERIFIED (MOD PW MOD MOD CW MOD MOD)

    The pattern allows a modifier after the required class word and the name has two modifiers at the end. Therefore, a warning with the following description is displayed in the Problems view:

    Attribute CIVILIAN EMPL REQUESTED DESC TEXT LAST VERIFIED is not compliant with naming standards -- An extra string VERIFIED is found at the end. The expected naming pattern is {modifier}optional {prime word} {modifier}optional {modifier}optional {class word} {modifier}optional.


Set a data object name with content assistant

For large organizations, their glossary models can easily contain thousands of entries. It is quite a challenge to choose terms among thousands when naming a data object. Naming content assistant was added to IDA V7.0 to help the users with this task.

View the naming pattern using content assistant

As one of the important part of a naming standard, naming patterns are defined for different data objects. This is done through the data naming standard preferences with IDA. A content assistant cue displays this information when you edit a name property in the Properties view. For example, when you set name for a table object in a physical model, you can mouse-over the content assistant cue and the naming pattern is shown, as seen in Figure 7. With this example, the separator is an underscore, and the terms should follow the following sequence:

  1. A Modifier, optional;
  2. A Prime Word, required;
  3. A Modifier, optional;
  4. A Modifier, optional;
  5. A Class Word, required;
  6. A Modifier, optional;
Figure 7. Content assistant cue displays the naming pattern
Content assistant cue

The content assistant cue is available for data objects for which the naming standard is defined as a combination of data naming standard preferences and glossary model files.

Display the glossary in the drop-down list

When you know the pattern to name an object, you need to choose terms from the glossary models to match the pattern. You can use the Ctrl+Space key or type the separator to see a drop-down list of available terms, abbreviations, and descriptions, as shown in Figure 8.

Figure 8. Content assistant displays terms and their properties
Terms and their properties

The drop-down list is shortened when the user types in letters as shown in Figure 9.

Figure 9. The content assistant term list is shortened when the user types in letters
Content assistant term list

When you select a word from the list, it is added as part of the name.


Transform data objects from logical to physical model using naming standard

A logical data model defines entities and relationships between entities without consideration of the implementation platform. On the other hand, a physical data model is a database-specific model that represents relational data objects (for example, tables, columns, primary and foreign keys, and their relationships). As a common top-down design scenario, you design a logical model, transform it as a physical model, and use the physical data model to generate data definition language (DDL) statements, which then is deployed to a database server. Since most databases have requirements on characters and character length of database object names, abbreviations are usually used when naming data objects in physical models. For example, an EMPLOYEE SOCIAL-SECURITY-NUMBER object in logical data model is transformed to EMPL_SSN in the physical model, where EMPL and SSN are the abbreviation of EMPLOYEE and SOCIAL-SECURITY-NUMBER defined in the glossary model.

When transforming a logical data model to a physical data model using IDA, entity and attribute names are transformed to table and column names according to the following rules:

  • The logical separator in a logical object name is replaced by the physical separator in its corresponding physical object name.
  • If an entity has a defined abbreviation property, the corresponding table is named according to the abbreviation property.
  • If an entity does not have a defined abbreviation property, the glossary models that are specified for the project are searched for matching terms and their abbreviations are used in the transformed table name. If there is no matching terms found, or there are no glossary models specified for the project, then the entity name is used.
  • The similar naming rules apply to the transformation of logical attributes to physical columns, except that an attribute is checked for a defined domain property first and the corresponding column is named according to the name of the domain property if found.

For example, a MESSAGE ORIGINATOR IDENTIFIER attribute in a logical model is transformed as a MSG_ORITR_ID column in the physical model, where MSG, ORITR and ID are the abbreviations of MESSAGE, ORIGINATOR, and IDENTIFIER defined in the glossary model and spaces are replaced by underscores.


The coast guard data element naming standards example

In this section, the Coast Guard Data Element Naming Standards are taken as an example to demonstrate the use of IDA to define and enforce naming standards. The Coast Guard has an ever increasing need to share data across organizational and functional boundaries. A cross-functional system is an information system that supports organizational processes relating the activities of several programs or functional divisions, rather than activities of a single program. The Guard Data Element Naming Standards have been developed and enforced to meet the needs during development of these cross-functional systems.

Glossary

The naming standards require an official class word list be developed, maintained, and centrally controlled by Commandant. Since the volume of required prime words is likely to number in the thousands, they require a preliminary prime word list be developed as a restricted vocabulary for coordination and use across the organization. This list can be developed from a review or refinement of terms appearing in the names of data elements defined in existing data systems. Sample modifiers or qualifiers are listed in Coast Guard Data Element Naming Standards. These terms can be stored in a IDA glossary model with some formatting using a text editor and then copying and pasting. Figure 10 shows the glossary created using the class word list, and sample prime words and modifiers from the Coast Guard Data Element Naming Standards.

Figure 10. Coast Guard glossary model
Coast Guard glossary model

Also, IDA supports version control and configuration management. Using the Compare With tools, you can compare glossary models and versions.

Naming syntax rules

The Coast Guard naming standards include the following syntax rules:

  • Each data element name shall have one class word taken from a pre-defined class word list. Class words may be added to the standardized list upon approval by Commandant.
  • Each data element name shall have one prime word taken from a pre-defined prime word list. The prime word must be positioned as the first or second word in the data element name. Prime words may be added to the standardized list upon approval by Commandant.
  • Each data element name shall contain a sufficient number of modifiers and qualifiers to fully describe it (up to four modifiers per prime word and one modifier plus two qualifiers per class word).
  • The sequence of words in a data element name is: (1) Modifier if required + prime word + (1 to 4) modifier(s) if required + class word + (1 or 2) qualifier(s) if required. The prime word must precede the class word within in a data element name. The minimum number of words in a data element name is two (prime word + class word), and the maximum number of words is nine.

These rules are kept in the IDA data naming standard preferences, as shown in Figure 11.

Figure 11. Coast Guard naming standard preferences
Coast Guard naming standards

Check for naming standard compliance

When the naming standards are defined, the naming standard compliance rule can be invoked for any logical or physical models as described in the Validate naming standard compliance section. Warning messages show up for data object names that are not compliant with the naming standards; for example, names that:

  • Are absent of a class word
  • Are absent of a prime word
  • Don’t have a prime word before a class word
  • Don’t have at least two terms
  • Have more than nine terms

This is the enforceability described in Coast Guard Data Element Naming Standards.


Conclusion

In this article, you've learned how IDA helps you to:

  1. Create a naming standard with glossary models and naming standard preferences
  2. Create data object names that are compliant with the naming standard with the naming content assistant
  3. Transform logical entity and attribute names into physical table and column names using abbreviations defined in glossary models
  4. Enforce naming standards using the naming standard compliance rule

Acknowledgement

Thank you to Robin Raddatz, who is responsible for the updates made to the 2010 October 07 release of this article.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, Rational
ArticleID=187834
ArticleTitle=Use InfoSphere Data Architect to define and enforce data object naming standards
publish-date=10072010