Speaking UNIX, Part 11: Ramble around the UNIX file system

Discover where UNIX stores important files

Many directories in the UNIX® file system serve a special purpose, and certain directories are named per long-standing convention. In this installment of the "Speaking UNIX" series, discover where UNIX stores important files.

Martin Streicher (martin.streicher@gmail.com), Chief Technology Officer, McClatchy Interactive

Photo of Martin StreicherMartin Streicher is a freelance Ruby on Rails developer and the former Editor-in-Chief of Linux Magazine. Martin holds a Masters of Science degree in computer science from Purdue University and has programmed UNIX-like systems since 1986. He collects art and toys. You can reach Martin at martin.streicher@gmail.com.



21 June 2007

Also available in Chinese Russian

I just purchased a Global Positioning System (GPS) navigation device and, after only a handful of uses, I'm hooked. Getting from Point A to Point B is now a snap. No more MapQuest. No more guessing which way is east. No more pit stops in the sticks to ask for clarifications. I simply jump in my car, specify my destination, and follow the voice prompts. Why, the GPS makes me look like a local, giving new meaning to the Buckaroo Banzai maxim, "Wherever you go, there you are."

Let's indulge my wanderlust this month and ramble around the UNIX® file system. From /bin to /var, there are lots of interesting sights to see—some well trodden and others obscure—and shortly, you'll know your way around like the locals.

What's in a file name?

The files on a UNIX machine are organized in a hierarchy. The very top of the hierarchy is /—commonly referred to as "slash" or "the root directory."

If you change your working directory to / and run ls, you'll see several subdirectories with cryptic names like etc, bin, var, home, and tmp. Although UNIX now supports long file names, most of the monikers of these top-level directories hark back some 30 years to the origin of UNIX. Similarly, by the same long-standing conventions, each directory contained in / serves a special purpose:

  • /bin is but one of many directories that contain applications and utilities. However, /bin typically contains utilities that are essential to system operation. Hence, the shells, file-manipulation commands such as cp and chmod, compression and decompression, and diagnostics reside in /bin.

    /sbin also contains utilities crucial to system operation and maintenance. However, the programs found in /sbin can be executed only by superusers—hence, "superuser-bin" or /sbin.

  • /dev incorporates all the hardware installed on your system, including terminals and USB devices (and other peripherals that are physically connected to the computer), pseudo-terminals (used to interact with X terminal windows), and hard disk drives, among others.
  • /etc (often pronounced "etsee") is dedicated to system configuration. The /etc directory contains configuration files for the system daemons, startup scripts, system parameters, and more.
  • /home contains a user's home directories. For instance, if your login name is joe, the directory /home/joe acts as your personal file repository.
  • /lib is the coffer for essential system libraries. In modern UNIX, system libraries are typically shared, meaning that the libraries are not linked and included in each binary (which would waste space, at the least), but are loaded on demand when needed and shared by many applications at once. Hence, core applications and utilities installed with UNIX require the libraries in /lib to run, and you need at least a small handful of the libraries to create new executables from source code. All files here are vital, and the corruption or removal (whether intentional or accidental) of even one file can render a system useless.
  • /mnt, short for "mount," is the standard location to mount hard disk drive partitions and other devices. If you want to see which devices are currently mounted and accessible, simply run the mount command.
  • /tmp, or "temporary," is the system-wide scratch pad. Your Web server might stash session data files here, and other utilities use the space in /tmp for caching intermediate results. Files in /tmp are considered disposable. Indeed, your systems administrator probably deletes all files older than a certain expiry every evening.
  • /usr is the umbrella for a great number of files. End-user applications—from editors, games, and interfaces, to system features—are here, as is the library of man pages along with much more. Chances are that if the file is useful but not mandatory for system operation, you'll find it in /usr.
  • /var—short for "variable"—is the repository for files that typically grow in size over time. Mailboxes, log files, printer queues, and databases can be found in /var. It's commonplace also for Web sites to be kept in /var because a Web site tends to amass data preternaturally over time.

The directory names above are the most common, although some flavors of UNIX diverge slightly. (For example, on Mac OS X, which is based on FreeBSD®, the directory that contains users' home directories is named /Users rather than /home.)

Keeping tradition

In fact, the names etc, bin, lib, and man are so entrenched in UNIX culture that it is traditional to label directories of like purpose found elsewhere on the machine with the same name. For example, if you look at an expert's home directory, you'll likely find bin and lib directories to store personal applications as well as scripts and personal libraries, respectively.

For more information about UNIX naming conventions, see the Appendix A: Pick a standard, any standard section.

Mimicking canon, /usr/local frequently has etc, bin, lib, and man. Historically, /usr/local has been used to store applications and data that originate or are germane solely to your site. The /usr/local/bin directory stores locally added, new programs and locally modified versions of standard system utilities. For example, your systems administrator might offer the latest and greatest version of Perl in /usr/local/bin/perl, keeping /usr/bin/perl constant both for reference and because other core utilities might depend on it. The /usr/local/lib directory complements /usr/local/bin.

The /usr/local directory might even be an entirely separate partition (even a partition mounted through the Network File System from a Network Attached Storage [NAS] device), which makes system restores and resurrections simpler. If something happens to a system, the administrator can overwrite the operating system's files without worrying about destroying local data.

Even packages mirror the root directory. Consider MySQL: If configured with the option --prefix=/usr/local/mysql, it creates its own root directory in /usr/local named /usr/local/mysql and creates the subdirectories /usr/local/mysql/bin, /usr/local/mysql/lib, and so on:

$ ls -1 /usr/local/mysql
bin/
configure*
data/
docs/
include/
lib/
man/
...

Alternatively, if you want to install MySQL's pieces in /usr/local/bin, /usr/local/lib, and the rest, use --prefix=/usr/local.

Other points of interest

Because this is the 25-cent tour, let's swing by a few other attractions.

/etc

The /etc directory is the place to look for configuration files, which usually end with the suffix .conf. A larger package might have its own subdirectory to collect all the configuration files for that package. One case in point is Apache; in particular, Apache V2.2 has reorganized its configuration files to be more modular and less monolithic.

Another novelty includes /etc/init.d, which accommodates the many startup scripts that run when your system boots. If you want to cleanly restart a daemon, say, after changing its configuration, look in /etc/init.d for an eponymous script. For example, to restart the Postfix mail transport agent (MTA), you'd run:

$ /etc/init.d/postfix restart

/etc/init.d also contains scripts to drop to single-user mode, to restart and shut down the machine, and disable logins.

/var/spool

As mentioned earlier, /var keeps files that tend to grow and shrink in size over time. Like /, /var is divided into subdirectories, each with its own scheme:

  • /var/spool/mail is where you find your and other users' incoming mail. Your mailbox is simply a flat (contiguous, non-indexed) file (unless your systems administrator is using the maildir format). Incoming mail is appended to the end of the file. Mail you discard is deleted from the file; and when you read a new message, the status field of the message is changed and rewritten in place. You can read and write your own mailbox, but permissions prevent you from accessing other users' mailboxes. (It's recommended that you not edit your mailbox directly.)
  • /var/log maintains the suite of system log files, or those files that record system activity. Logs track everything from mail traffic to failed login attempts. Each daemon usually has its own log file, which makes it easy to hunt down issues when a service fails. Because system activity can be revealing, logs here are typically restricted and available to superusers only.

If your system provides a centralized fax service, /var/spool queues those requests, too.

/usr/man

The core man pages for your UNIX system reside in /usr/man. Extended collections of man pages can also be found in /usr/local/man and in a package's man directory, such as /usr/local/mysql5/man.

Because man pages can be found in as many places as executables, the man program supports a MANPATH environment variable that works identically to PATH. To search more than one location for a specific page, define MANPATH as a series of man page directories:

MANPATH="/usr/man"
MANPATH="/usr/local/man:$MANPATH"
MANPATH="/usr/local/mysql/man:$MANPATH
MANPATH="$HOME/man:$MANPATH"
export MANPATH

Here, $HOME/man is searched first (it's leftmost, or first), followed by /usr/local/mysql/man, and so on. By the way, the first four commands above could be simplified to the statement:

MANPATH="/usr/man:/usr/local/man:\
/usr/local/mysql/man:$HOME/man"

Yet, keeping the additions separate allows you to reorder the entries quickly and add new directories just as simply. Moreover, if you have a lot of paths, editing the latter MANPATH (and by extension, the PATH) variable becomes tedious.

Include files

Include files (or header files) define constants, macros, and other structures used in the operating system or in a particular library. Rather than redefine a specific structure, you simply "include" the header file in your code (a simple form of code reuse) and code to the header file's specifications. (The man sections 2 and 3 are set aside for such specifications; try man 2 signal, for instance.)

Akin to bin and lib, include is a common directory name. If a package has a development kit and you've installed the package in its own root directory, look in the include subdirectory for the header files.

Or, if you've installed a package into the common /usr/local/{bin,lib,include} directory, look for a package's header files in a subdirectory of /usr/local/include named after the package. This is an exception to storing everything in a common pool. Why? Header files are not uniquely named, so installing everything in one place would cause collisions, with one package overwriting another's header files.

If you build applications from source code (something you'll explore in-depth in an upcoming installment) and header files are in a non-standard location, you might need to add the -I option to your compiler commands. As an example, if your ImageMagick header files reside in /opt/include/magick, add -I/opt/include/magick to the compiler's switches.

Know it like the back of your hand

This concludes today's UNIX tour. You can now navigate the alleys and back streets of UNIX a bit more easily. If you do get lost, just say "Home, Home, Home" (don't get tricked by Betelgeuse)—or type cd. Remember that you can also use find and locate to find most anything, including executables, libraries, and include files.

Good afternoon, ladies and gentlemen. The next tour leaves in 30 days.

Appendix A: Pick a standard, any standard

While the software that ships with your UNIX operating system has a proper place in the file system—stored in /bin or /lib, say—locally added software might be found in any number of locations. Some systems administrators place local software in /usr/local, while others use /opt or "optional," because the software isn't required to run the system. Further, some administrators dump all executables in /usr/local/bin or /opt/bin, all libraries in /usr/local/lib or /opt/lib, and so on.

Another approach—and a paradigm I prefer—is to create a root directory for each locally added package, especially if the package is large. For example, I install MySQL V5 into /usr/local/mysql5.0 and Apache V2.2 into /usr/local/apache2.2. Each package installer creates its own bin, lib, and man directories within the package root.

A disadvantage of this approach is that each end user must add many bin directories to his or her PATH environment variable. And while that requirement isn't particularly onerous, it can be mitigated by expanding the default PATH set in the system-wide shell startup file. For example, the Bash system-wide, startup script, /etc/profile, might contain:

PATH="/bin:/usr/bin:/usr/local/bin"

PATH="$PATH:/usr/local/mysql5.0/bin"
PATH="$PATH:/usr/local/perl6/bin"
PATH="$PATH:/usr/local/Zend/bin"

export PATH

However, storing a package in its own "container" is quite advantageous:

  • It's obvious which package provides a specific application. Following this classification system, you can find the name of the package using the which command:
    $ which mysql
    /usr/local/mysql5.0/bin/mysql
  • You can retain multiple versions of the same package in parallel.

    For example, if you want to offer Perl V5.6 and Perl V5.8, install the former into /usr/local/perl5.6 and the latter to /usr/local/perl5.8. Each user can choose a Perl version by altering the PATH variable.

  • You can retain multiple versions in parallel but default to a particular version by using a symbolic link. Simply create a symbolic link to the version of the package you want to offer.

    For instance, assume that you offer the two versions of Perl mentioned above. If you want Perl V5.8 to be the default, create a symbolic ink to /usr/local/perl5.8 and name it perl:

    $ ls -1 /usr/local/perl*
    perl5.6
    perl5.8
    
    $ sudo ln -s /usr/local/perl5.8 \
      /usr/local/perl
    
    $ ls -1 -F /usr/local/perl*
    perl5.6/
    perl5.8/
    perl@

    An end user can now add /usr/local/perl/bin to his or her PATH variable to run the perl command. If you eventually need or want to switch to a newer or older version of Perl, you can simply delete the symbolic link and recreate it to point to a different directory.

    Symbolic links are invaluable for maintenance tasks such as this. You can maintain variants, reroute paths, and build collections for convenient access. For example, you can populate the traditional /usr/local/bin directory with links to commands in other packages, as in ln -s /usr/local/perl/bin/perl /usr/local/bin/perl. (Yes, you can create a symbolic link that traverses another symbolic link.)

Resources

Learn

Get products and technologies

  • IBM trial software: Build your next development project with software for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=236319
ArticleTitle=Speaking UNIX, Part 11: Ramble around the UNIX file system
publish-date=06212007