The cranky user: Storing configuration data

A crash course in data storage formats and conventions

Both computer users and application developers benefit from understanding the Big Three of storing configuration data: human readability, separation of data, and a centralized storage location.

Share:

Peter Seebach (crankyuser@seebs.plethora.net), Writer, Freelance

Photo of Peter SeebachPeter Seebach has been using computers for years and is gradually becoming acclimated. He still doesn't know why mice need to be cleaned so often, though.



05 July 2006

Storing configuration data is a central concern of modern application development. Most users expect to be able to set their individual preferences for using an application, and that information has to be stored somewhere, preferably in some kind of reasonable format. Likewise, most applications rely on various data files to perform routine operations, and these files must be stored and retrievable. Even a simple Space Invaders clone needs some kind of system for storing high scores.

In addition to figuring out where to store all this boring but essential data, developers have, over the years, decided on various conventions for how it should be stored. UNIX® systems favor rc files (such as foo.rc) stored in the user's home directory. Older Windows™ systems denoted configuration files with INI (foo.ini) and stored them in the same directory as the program. The early Macs stored preferences in the Preferences folder within the System folder. Today, you'll find Windows config data stored in the registry, although the old INI files still show up now and again. The Mac has moved preferences into a subdirectory of the user's home directory.

What's with this registry thing?

A classic example of configuration data, interestingly, is data about where an application's files are stored. On the Mac, application bundles largely remove that question. On Windows machines, many applications can only be run if their files are in a previously recorded location. If you're lucky, the application path is stored in an .INI file that you can easily edit. If the path is stored in the registry, you will not be able to move an application from one machine to another. Instead, you'll have to reinstall it, which means duplicating the entire procedure of patching and setting your preferences from scratch. Suffice it to say that if your system uses a single, opaque file for configuration settings, then you can never have anything nice.

In fact, one of the argued reasons for the Windows registry was to deal with DLL versioning, where multiple versions of shared (that is, dynamically linked) libraries create compatibility problems. The obvious solution, of using file names to denote different versions of libraries, was impractical on a system with 8-character filenames. If it had been limited to resolving just this problem, the Windows registry would probably go relatively unnoticed. The confusion (and infamy) of the registry is due to its being extended into a general mechanism for handling saved data.

Wading through all these options is tough enough as a user, but what about when you start to develop or customize your own applications? In this month's column, I'll shed some light on the various formats and conventions for storing configuration data. I'll also explain the essential principles involved; namely human readability, separation of data for different programs, and making it easy to find configuration data.

Human-readable formats

Human-readable formats are an unequivocal win when it comes to storing configuration data. One of the most annoying aspects of the Classic Mac OS was that it made preferences almost completely opaque to users. Windows fared far better with its INI files, which had a clear and easy-to-read format and included such niceties as sections and variables, support for comments, and a convention of selecting reasonably clear names. UNIX rc files have always been a potluck; some are brilliant, some are horrible. UNIX loses points in human readability by not mandating consistent file formats. The lack of consistency is a serious problem.

Consistency is more important, and more closely related to readability, than a lot of developers realize. It's tempting to try to invent a better file format, but it's not such a good idea. There are enough formats already without bringing more into the mix. It's one thing for an operating system to introduce a standard format for all applications on that platform; it's another for every application to introduce a new format.

Two approaches to usability

The modern Mac has done especially well by moving preferences data into its ever-so-elegant property list XML files, which actively encourage developers to use clear names for values. The code in Listing 1 shows just how well Apple understood this one.

Listing 1. Saving a preference on the Mac
[[NSUserDefaults standardUserDefaults] setObject:regCode  
forKey:@"RegCode"];

That's just one line of code (it's Objective-C, if you're wondering) and it does all the work for you -- opening the file, saving the value, the whole nine yards. Making that code so easy was no simple trick, but Apple scored by recognizing that it was important. The best way to ensure a usable system is to make it easy for developers to do the right thing. On the downside, you can't keep comments in property list files because any time the system updates them the comments are erased.

The Windows registry, as you might expect, does not so successfully support human-usability factors. Not only do you have to use a special program to edit your registry settings, but the registry settings themselves have names that seem calculated to discourage users. For instance, the setting to disable the annoying pop-ups that encourage you to "clean up your desktop" every so often is shown in Listing 2.

Listing 2. A variable name in the Windows registry
HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Explorer\Desktop\
CleanupWiz\NoRun

Note: The code in Listing 2 normally appears as a single line of code. It is on two lines of code in this article to meet printing requirements.

The names are egregiously long because the registry has to hold every setting for every kind of program for every user, as well as system-global settings. That leads me to my second consideration: the importance of separating data for different programs.


Separation of data

One of the most significant and consistent principles in storing configuration data is that config files should be separate from each other. You must be able to delete the preferences or settings for a single program without affecting the others in your system.

The big losers here are Windows and PalmOS. The Windows registry -- oft-touted as a solution to all sorts of problems -- has created a nightmarish single point of failure for configuration data. A troubled registry can easily cost you the settings and configuration data for every program in your system. Also, since detailed system internals are stored in the registry along with user preferences, you can't copy them over to a new system.

PalmOS has the same problem: you can't copy the Saved Preferences file from one Palm system to another without risking catastrophic failure. It's likely that this oversight explains the continued existence of .INI files on modern Windows systems; INIs allow programs to store their settings in a way that is more convenient to the user.


Storing user data

Where to store data is a tricky question. The convention of storing preferences in the application's directory works fairly well on Windows, but is inconvenient on the Mac, where applications are generally self-contained bundles. One obvious weakness of storing preferences and other changeable data with the application is that the data no longer is part of the user's files. This becomes even more important if multiple users share a machine; each user should have an individual copy of preference files. (The same issue can apply to saved games, documents, and other data. I once lost a couple months worth of accounting information because an accounting package kept its data files in a subdirectory of its application directory, rather than in the directory with all my other personal files.)

A centralized location for user data is preferable in most cases. The leap from a centralized directory to a centralized file seems reasonable, until you consider the most terrifying phrase in all of reliable computing: single point of failure. If any program on your computer can corrupt the One Global Database, it poses a threat to every program on your computer.

UNIX doesn't do as well here as it could. Most UNIX systems have system-wide settings in a reasonable-sounding location (such as /etc), and personal settings in home directories. But then you'll find another half-dozen application settings scattered where you'd least expect to find them. Wouldn't it be nice if application developers just stored configuration files in a standard location?


Doing it yourself

So what do you do when you are the application developer? The first thing, if at all possible, is to use some kind of system-standard feature or library for configuration data. On Windows, you might consider making an .INI file in the user's directory. If you make it in your own directory, it will get lost when the user has to reinstall the system. If you use the registry, it will get lost when the user has to reinstall the system. On the Mac, the default preference stuff is pure gold, so don't mess with it. On UNIX-like systems, you should probably go with a file, or even subdirectory, in the user's home directory. The convention of .programname is a good one, in general, but if your program name is likely to be common you should use something a little more distinctive.

If you have to define your own format, favor human-readable formats. The Mac property list format is great, and nothing prevents you from using it elsewhere, although it does require you to have at least most of a functioning XML parser. Something similar to .INI files is easy and cheap to implement, and might be a good choice for most purposes; do keep an eye on input validation, though, as it's potentially vulnerable to insertion attacks. (Of course, users can just edit the file anyway.)

While the need to move an application from one computer platform to another isn't especially common, it can be a huge source of stress when it comes up. The more you can do to make this process easy, the better. Reinstallation isn't just a hassle; it can also create serious security holes for some programs. If you make it both easier and safer for users to copy applications, you will keep your users happy while you also secure your apps.


In conclusion

As it turns out, storing configuration data isn't so tricky, and avoiding the temptation to re-invent the wheel is half the battle. In this month's column, I've explained the importance of consistency in handling configuration data: consistent (and human-readable) naming conventions, consistent data formats, and a consistent storage location go a long way toward improving an application, from both the user's perspective and the developer's.

This week's action item: Try to move an installed application from one machine to another on at least two platforms; for instance, Mac and UNIX, or Linux® and Windows. How much time do you spend trying to figure out where the settings are stored?

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Web development on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=152966
ArticleTitle=The cranky user: Storing configuration data
publish-date=07052006