I continue the Quality busters series, which looks at common influences on application quality from the enterprise view of the operational environment and non-functional requirements. Addressing these influences is a matter of making tradeoffs, with no single solution solving all the problems. This month I'll discuss the complexities of application configuration management.
The operations team notifies the SHEEP Web application team that they are moving several servers to a new data center. Part of this move requires changing the host name for two systems. In keeping with data center naming standards, the WebSphere® MQ queue manager name on these systems will also change. The SHEEP team replies that the change should be easy; they just have to update the configuration objects on the two systems.
Plans are made, tasks assigned, and the day of the system move arrives. The SHEEP team member assisting with the move of the systems updates the known configuration files. The day after the move, the VP of Finance complains that his sales status report from the Data Warehouse system is not updating.
Operations, the SHEEP team, and the Data Warehouse (DW) team research the cause. They discover that one of the DW programmers took advantage of the SHEEP application's sales status request and reply messages. Instead of using the SHEEP application's configuration objects, the DW programmer created his own configuration object. The teams overlooked this configuration object during the system move. Once the DW programmer updates the configuration to reflect the new location and name of the request queue, the DW system updates again.
You can customize nearly every application by setting one or more configuration values. Configuration objects store these configuration values. Configuration values identify the system environment to the program; for example, queue and queue manager names, remote system identity, user logins and passwords, locale settings, timeout intervals, and more. Configuration values also identify user settings; for example, which features are enabled or disabled, default values for screen or processing elements, default user identification, personalization, and more.
You'll find configuration objects in many formats. The following are the more popular ones.
The classic configuration object is a text file that contains key value information. Usually, each line in the text file corresponds to a configuration key-value pair. The key appears on the left, followed by a separator symbol (commonly an equals sign, a colon, or a space), and then the value. Sometimes a special separator line, often called a section heading, is included.
Examples of this format include the INI file found in DOS/Windows and the properties file (java.util.Properties Class) in Java.
Listing 1. Example of key-value format
# Sample configuration file lines server.queueManagerName = SHEEPQM1 server.requestQueueName = RQSTQ server.queueTimeout = 1000 |
Advantages:
- Many parsers or access modules are already available.
- Very low overhead is associated with opening and reading the text file.
- The information is human-readable.
- You can edit the contents with simple text editors.
Disadvantages:
- Hierarchical information is difficult to store.
- Repeating information groups are difficult to store.
- Editing can be error prone whenever you have a large number of key-value pairs.
- A string representation stores data and requires conversion to integer or other binary representations.
The XML format is growing in popularity, although XML has no standard configuration
file format. It seems every application does it differently. Some cases use element
attributes, while others use only element tags (see the
example in Listing 2). You can name each repeating group with
id or name attributes on a
commonly named element tag or a uniquely named element tag .
Listing 2. Example of the XML format
<!-- Sample configuration file in XML -->
<config>
<server>
<queueManagerName>SHEEPQM1</queueManagerName>
<requestQueueName>RQSTQ</requestQueueName>
<queueTimeout>1000</queueTimeout>
</server>
</config>
|
Advantages:
- Standard XML parsers are available.
- Hierarchical information is easy to store.
- Repeating information groups are easy to store.
- The information is human-readable.
- You can edit the contents with text editors and XML editors.
Disadvantages:
- Increased overhead is associated with opening and parsing the XML document.
- Information is difficult to read when a large number of elements are present.
- Data is stored in a string representation which requires conversion to integer or other binary representations.
A registry is a special index object, usually in binary format that efficiently stores configuration information in a hierarchical structure. Microsoft Windows, for example, implements a system registry.
Advantages:
- You access the registry from a simple, consistent API. This API hides the location of the registry from the application program.
- Hierarchical information is easy to store.
- Repeating information groups are easy to store.
- Data is stored in a data-type representation more appropriate for the value's usage.
- All applications running on the system access a single location.
Disadvantages:
- The information is not human-readable.
- The binary format requires special editing programs, preferably tailored to the application, to view and edit the configuration values.
- The configuration values for an application are difficult to extract and store in a form that can be saved with the application for backup and recovery.
A directory service is a set of programs and processes that provides directory lookup services. An application sends a request (through a message or remote procedure call) to the directory service, which sends a reply. The directory service can store key-value pairs in a hierarchical structure. An example of a directory service is the X.500 Directory Services that is accessed using LDAP (lightweight directory access protocol).
Advantages:
- Directory services are separated from the application and can reside on the same or separate computer systems.
- A consistent API is available to access the service.
- Hierarchical information is easy to store.
- By being system-independent, a single service can be the repository of shared configuration information for many applications running on many computer systems.
Disadvantages:
- The information is not human readable.
- The service requires special edit programs to view and edit the configuration values.
- Reliability concerns arise if the directory service is not accessible due to the service not running, a broken communications connection, or something else.
- The configuration values for an application are difficult to extract in a form that can be used for application backup and recovery.
- A local configuration object still must store the naming and routing information needed by the application to identify the directory service.
The Java 2 SDK, Standard Edition, Version 1.4, introduces a new class called Preferences (java.util.prefs.Preferences). (See Resources.) The standard allows Preferences to be stored in an implementation-dependent back-end, which could be a file, a LDAP directory server, the Windows Registry, or some other storage mechanism.
Advantages and disadvantages:
- The advantages are those of the implementation-dependent approach; which is one of the formats previously listed.
- The disadvantages are, likewise, those of the implementation-dependent approach.
You might store configuration information in a database table. One approach is a table with a separate column for each configuration element and a single row in the table. Reading this row retrieves all the configuration information at once. Another approach is a table with two columns -- a key column and a value column. Each key-value pair forms a row.
Listing 3. Example of SQL configuration table
CREATE TABLE the_config (
queue_manager_name VARCHAR(32)
NOT NULL DEFAULT('SHEEPQM1'),
request_queue_name VARCHAR(32)
NOT NULL DEFAULT('RQSTQ'),
queue_timeout INTEGER
NOT NULL DEFAULT(1000)
);
|
Advantages:
- Database methods, such as SQL over JDBC, can access the data.
- Parsing of values is unnecessary since information is stored in a more appropriate data representation.
- Many applications running on many computer systems can easily access configuration information.
Disadvantages:
- The information is not directly human-readable.
- The format requires special database query tools or custom edit programs to view and edit the values.
- Reliability concerns arise if the database is not accessible.
- If a schema stores the configuration data separately from the application data, application configuration values might be difficult to extract and save for backup and recovery purposes.
- A local configuration object must store database access information.
Most operating systems provide support for environment variables or system variables. Each process, when it starts, is loaded with a copy of the system-level environment variables. The process can then change the value of these variables or define additional environment variables. A program can retrieve the value of these environment variables. As a result, environment variables provide a facility for process level management of configuration information.
Listing 4. Example of DOS script with environment variables
set QUEUE_MANAGER_NAME=SHEEPQM4 echo %QUEUE_MANAGER_NAME% myApplication.exe |
Advantages:
- You can define environment variables at the process level.
- A parent program in the process can change the environment variable, thus affecting a child program that starts afterwards.
- The information is generally human-readable since the setting of environment variables often occurs within runtime scripts.
Disadvantages:
- The assignment of environment variables often repeats in most runtime scripts. This repetition creates a maintenance problem as you must find and update all copies as needed.
- Diagnosing a problem increases in difficulty if another task in a process changes certain configuration information at runtime.
- A programmer must change information since these variables are not in a location that users can typically access.
Finally, some configuration information can pass to the program through command-line parameters. Command-line parameters can override configuration values, such as the configuration object used, the method to find the configuration object, or override specific values, found in other configuration objects. Command-line parameters provide a facility for program level management of configuration information.
Listing 5. Example of command-line parameters
myApplication.exe -qm:SHEEPQM4 |
Advantages:
- Command-line parameters are defined at the program level.
- Programmers can easily force overrides to the default sources of configuration values.
- Programming is relatively easy since command-line parameters have no external references to files or services.
Disadvantages:
- Maintenance efforts increase because you must search all runtime scripts to find parameter usage.
- Dynamically computed parameters can create diagnostic issues.
- Accessibility diminishes with values stored in sources that only programmers can modify .
Using a combination of formats is usually a good idea. When you add the ability to override configuration values, the programmer can selectively test pieces of the program without the worry of managing configuration objects -- which might be shared by other users and developers.
A common combination approach goes like this:
- Use a search path, such as the classpath in Java environments, to search for the configuration object. If no object is found, then attempt reading from a default location.
- Override the configuration values with the value of the environment variables.
- Override the configuration values of the command-line parameters.
- Log the final configuration values to assist with diagnosing program problems.
Configuration objects store the key-value information in one of two possible data representations: string- or data type-enabled.
The key-value text file, XML file, environment variable, and command-line parameter formats store values in a string representation. The using program must convert from the string representation to the desired internal representation. While the string representation makes it easy to edit the configuration object, it does lend itself to the entry of incorrect values. For example, a user might type the letter 'O' instead of the number '0' and a text editor cannot detect this.
The registry, preferences, and database formats store values in data type-specific representations. For example, numbers are stored in a numeric format, usually integer. This reduces the need for the program to convert values from one representation to another. It also reduces the likelihood of entering incorrectly formed values. However, with this representation, you need special programs to edit the configuration object.
A major decision regarding configuration objects is where to put them. You have several options:
- Script
- Program directory
- Fixed directory
- Search path
- Separate service
Real world applications often use a combination of locations involving many configuration objects.
For configuration information that is very program instance-specific, you might find it advantageous to put some information into environment variables or command-line parameters within the script that launches the program. This approach is rarely used because you must search all scripts to find out whether a changing configuration item is referenced within a script.
You might place a configuration object in the same directory where the program itself resides. Finding the configuration object is easier since the program can determine where it resides and simply check that directory. This approach has limited ability to share configuration information. Only programs in the same directory can share the configuration object; programs in another directory are not able to find it.
On systems such as UNIX, Windows, or OS/400, with a well-known and stable directory structure, you might place configuration objects in a well known fixed directory, such as the root directory or the QGPL library. All programs on the system can access this fixed directory. As a result, many programs and applications can share the configuration object. In most business environments, the technical support team does not permit the addition of user objects to these fixed directories, so this approach is often discouraged for operational support and security concerns.
Most systems provide a search path capability, such as the UNIX PATH environment variable or the Java CLASSPATH variable. By checking each directory in the search path for the configuration file, the program has more flexibility. This approach also supports testing better because a tester can put a tailored configuration object in the search path earlier. This benefit, however, is also its weakness. If, during operation, an incorrect version of the configuration object is inserted earlier in the search path, then the program will likely perform differently than expected. This can be difficult to diagnose.
Finally, using approaches such as the registry, directory service, or database, you can separate the configuration object altogether from the application. The configuration information might even reside on a different computer system. However, as mentioned before, this approach requires a small local configuration object that identifies how to access the configuration service. Also, this approach has a reliability concern if the service becomes inaccessible.
A similar consideration to location is deciding the scope of the configuration object. That is, how many program components will use the configuration object. Scope includes several levels -- program, process, application, system, or enterprise. Real world applications often combine scope levels involving many configuration objects. These levels are as follows:
- Program. The configuration information is applicable to a specific instance of a program. A session identifier is one example.
- Process. The configuration information is applicable to all threads, units of execution, and program modules that operate within the life of a process. The name of a response queue associated with the process is an example.
- Application. The configuration information is applicable to all programs that comprise an application. The database connection details are an example.
- System. The configuration information is applicable to all programs, independent of the owning application, that reside on a computer system. The computer name and operations notification console are examples.
- Enterprise. The configuration information is applicable to all computer systems within the enterprise. The names of the enterprise domain name servers (DNS) are an example.
Another important consideration regarding configuration objects is how often a program retrieves values from the object. Your decision will be influenced by how often configuration values might change and by the business rules regarding how up-to-date the program must be. The more frequently a program retrieves configuration values, the more overhead the program will have. If this is the case, the architect should choose a configuration object that has lower overhead associated with value retrieval. The following are commonly encountered retrieval frequencies:
- Program startup. The program reads the configuration object once, when the program starts. Any changes to the configuration object are ignored until the program is restarted.
- Periodic refresh. The program re-reads the configuration object on a periodic basis. Any changes are detected at the next scheduled refresh from the configuration object.
- Triggered refresh. A trigger in the program can force a re-read of the configuration object. The trigger might be a signal, a special message, a detected change in the configuration object's modify date, or some other event.
- Transactional. The program re-reads the configuration object for each transaction. With this approach, the program guarantees it is using the latest configuration value.
Finally, you must make a decision about who will maintain the configuration information -- developers, the operations department, or the users. In reality, you often use a combination of all three. Developers might maintain configuration values that support the ability to diagnose the programs. Operations might maintain configuration values that represent the system infrastructure and runtime environment. Finally, users might maintain personalization, locale, and other usage-oriented configuration values.
A tailored configuration edit program is generally beneficial for operations and user personnel to use. The tailored program can ensure that configuration values are correct and meaningful before they are saved to the configuration object. This edit program might be part of the application program itself, similar to the Tools -> Options dialogue in many Windows-based applications.
With so many different approaches and formats associated with configuration objects, how do you, as an architect, decide what to use?
As this series will show, it is all a matter of making trade-offs. The discussion above showed some of the advantages and disadvantages associated with each approach and format. In the end, you might end up using several approaches.
The architect might ask some of these before deciding which approach to take.
- What types of configuration items do you need to store in a configuration object?
- Can you dynamically compute the configuration item?
- What is the set of valid values for each configuration item?
- Is a default value associated with the configuration item, or is a user-specified value required?
- Where is enforcement of the configuration item values located -- in the program after reading the value, or in an edit function that updates the value?
- What is the scope of the configuration item?
- What is the appropriate location, format, and retrieval frequency for the configuration item?
- Who maintains the configuration item -- a programmer, operations team, or end-user?
- Can existing tools edit the configuration object (such as a text editor) or must a custom editor?
- If you create a custom editor, can you integrate it into the application, or will it be a stand-alone program?
- Is the configuration item subject to security concerns? For example, storing a database access password in a text file might not be acceptable to security audits. To meet any security requirements, how will you store the configuration item -- plain text, encrypted, or in a secured object?
- Must you synchronize the configuration item across several systems?
- How often will the configuration item be read from the configuration object?
- How should the program behave when a configuration item is not found?
- How should it behave when a configuration object is corrupted?
- Should the application repeat configuration information from another application or use the other application's configuration object directly? Repeating the information can result in replication and synchronization issues. Reusing the other application's configuration object can result in tight coupling and dependency issues.
In this column, I presented many approaches to storing and managing configuration (or customization) information , along with the advantages and disadvantages associated with each approach. This guide is meant to be representative and not exhaustive. The goal is to challenge you, as an architect, to think about the effects on the operational environment and non-functional requirements brought about by your chosen approach.
- Read the author's other articles in the Quality busters series on developerWorks.
- If you work with Windows-based initialization (INI) files, try the
GetPrivateProfileString(...)andGetProfileString(...)functions in the Windows SDK. - If you work with the System Registry, try several APIs, such as
ReqQueryValueEx(...)andRegOpenKeyEx(...)from the Windows SDK. - To work with key=value pair text files, get the java.util.Properties Class in the Java SDK. In version 1.4, the Java SDK introduces the
java.util.prefpackage, which includes thejava.util.pref.Preferencesclass, for working with implementation-dependent configuration objects, such as the System Registry on Windows platforms. - Find more information about the LDAP protocol and learn to access directory
services with it in this recently updated IBM Redbook, Understanding LDAP - Design and Implementation SG24-4986-01 (June 2004).
- Get an example of using XML for a configuration file in "Java configuration with XML Schema" by Marcello Vitaletti (developerWorks, November 2001).
- Visit these valuable resources on developerWorks:
- The Web Architecture zone specializes in articles covering various Web-based solutions.
- Browse for books on these and other technical topics.

Michael Russell has a Bachelors degree in Physics and a Masters degree in Computer Science. He was a logistics engineer, a technical services manager, and a certified IT architect at IBM for nearly 14 years. Michael has experience in Windows, UNIX, and OS/400 environments and is currently a Web application architect for a resort company in Orlando. He uses Web technology for entertainment through his own company, Vicki Fox Productions (http://www.VickiFox.com).