Skip to main content

System Administration Toolkit: Problems and pitfalls

It's a trap

Chris Herborth (chrish@pobox.com), Freelance Writer, Author
Photo of Chris Herborth
Chris Herborth is an award-winning Senior Technical Writer with more than 10 years of experience writing about operating systems and programming. When he's not playing with his son Alex or hanging out with his wife Lynette, Chris spends his spare time designing, writing, and researching (that is, playing) video games.

Summary:  Avoid common pitfalls and traps to help keep your systems running smoothly. Knowing the right way of dealing with full disks, or a crippled system, is nearly as important as having tools in your arsenal to make sure you're prepared to react quickly to missing files or an insecure system. This article focuses on some of the most common problems and issues facing UNIX® administrators and ways to achieve a safe and effective resolution.

View more content in this series

Date:  14 Nov 2006
Level:  Intermediate
Activity:  2349 views
Comments:  

About this series

The typical UNIX® administrator has a key range of utilities, tricks, and systems he or she uses regularly to aid in the process of administration. There are key utilities, command-line chains, and scripts that are used to simplify different processes. Some of these tools come with the operating system, but a majority of the tricks come through years of experience and a desire to ease the system administrator's life. The focus of this series is on getting the most from the available tools across a range of different UNIX environments, including methods of simplifying administration in a heterogeneous environment.


Deleting open log files

In the course of your administrative duties, you might notice a system starting to run low on disk space. If it's a critical system, it won't be easy to take the machine down and add more storage, and you're probably already using the quota system to keep individual users from soaking up too much disk space. It's only natural to start looking around for things that can be deleted, archived to another system, or some offline storage.

Log files are frequently an early target in this process, as the busy /tmp and /var file systems often have limited space. (Listing 1 shows you /tmp and /var on my iBook, which isn't running any busy services.) Certain services, such as Web servers, Java™2 Enterprise Edition (Java EE) Web applications, and databases, can create enormous logs, especially if someone has configured them in debug mode.


Listing 1. /tmp and /var can accumulate a lot of data, even on a personal workstation

chrish@Bender [530]$ sudo du -sh /tmp/ /var/
 44K    /tmp/
1.0G    /var/

After verifying that nobody needs the log data, you fire off a quick rm command and blow them away. But you don't recover any disk space. If you're not familiar with the semantics of the UNIX filesystem, you might think the machine needs a reboot and a potentially time consuming file system integrity check (using the fsck command in single-user mode).

On a standard UNIX filesystem, you can delete a file while it's still open for reading or writing. The name of the file is removed from the file system, and its storage space is recovered by the operating system when the program using it closes the file. This feature is often used by programs creating temporary files; they create the file, open it, and delete it. The file auto deletes if the program crashes or exits normally, so the programmer doesn't need to worry about closing the file or removing it later.

This matters because your space-consuming log file is being held open by the server that writes to it. Deleting it just removes the name from the file system, and it won't recover any drive space until the process exits or closes the file.

To get around this problem, you could restart the service that owns the log file, although this service interruption might cause a rebellion among your users. Another option is to rename the log file, and then tell the process to reload its configuration files. Any existing processing continues until normal completion using the open log file, and any new requests will be logged to a new log file created using the old name.

By convention, most server processes (all of the useful ones) reset themselves and reload their configuration files when you send a hangup signal (signal 1 or HUP). Listing 2 shows one way of sending the hangup signal to all of the Web server processes currently running.


Listing 2. Telling the Web server to reload its configuration file and reset its files

chrish@Bender [507]$ ps -A | grep httpd | grep -v grep | \
awk '{ print $1; }' | xargs -L 1 sudo kill -HUP
Password:

That's a bit of a mouth full, so let's take a look at each part of that pipe. The ps and grep commands search through all processes for the httpd (and skip the grep process that's looking for the httpd processes). Next, awk reduces the output to the process ID, which you feed into xargs. The xargs command then takes each process ID (since you used -L 1 to grab one line at a time), and uses sudo kill -HUP to send each one a hangup signal.


Deleting critical files

A sure-fire way to mess up a working system is to accidentally delete some critical files. Shared libraries, executables, or vital system configuration files are especially vulnerable to this sort of accident.

One way to avoid this problem is to never log in to the system as the root user (see the Logging in as root section). Regular users can't destroy vital system files unless you've been mucking with the standard permissions.

Another way is to make the directories read-only by removing the write bit (see Listing 3).


Listing 3. Making important directories read-only

chrish@Bender [541]$ cd /etc
chrish@Bender [542]$ sudo find -d . -type d | xargs sudo chmod -w

You can use the find -d option to do a depth-first search of directories (-type d is also specified), and then use xargs and chmod to remove the write bit, rendering each directory read-only. This stops everyone from creating new files and, more importantly, from deleting existing files. It won't stop people with the appropriate permissions (that is, you) from editing existing files.

Be very careful with this! If you have a poorly designed application that requires a writable directory, it could start failing with surprising error messages. Most programs confine their automatic file creation and deletion to /tmp and /var. Keep in mind that you'll have to put the write-bit back on (same process, but use u+w instead of -w in the command from Listing 3) when installing new software that needs to drop a configuration file or something into the read-only directory.


Coping with a crippled system

There are many ways to cripple a system, but most of them are going to require access to the system console to help repair things.

If the system has been crippled by run-away processes (see the "Monitoring a Slow System" article in this series; there's a link in the Resources section) consuming all of the available process slots or eating up so much memory that the machine is spending all of its time swapping to disk, you'll either need to kill the offending process or processes or, if you can't even log in and execute a kill command, restart the machine.

If you can access the system but can't kill the offending process for whatever reason, shutting down to single-user mode stops all non-essential services and any user-run processes.

To send the machine into single-user mode, you'll use the telinit for System V-based UNIXes (sudo telinit 1) or the shutdown command for operating systems that grew from Berkeley Software Distribution (BSD) (sudo shutdown now).

When you're done fixing things in single-user mode, the easiest way to get back to normal is to restart the machine; again, depending on your system's heritage, you'll use either telinit (telinit 6) or shutdown (shutdown -r now) to reboot.

In the worst case, your system might be damaged enough that you'll have to boot from the operating system installation media or a rescue disc. These almost always provide a minimal single-user environment that you can use to run disk checks (fsck), check for system compromises, or restore damaged files from backup.

You do keep backups, right? A good backup strategy saves you a lot of work when things get messed up, and it makes clumsy users (who can't seem to stop deleting their important files) very happy to see you.


Keeping sequenced files and archives

Having a file sometimes just isn't good enough; you need the last version or the version from last week. This could be something easy, such as someone in human resources overwriting the only copy of the paycheck processing file, or it might be something more exciting, such as vital system configuration files.

Keeping incremental backups of the system's important (and user) files is an important way of preventing this sort of disaster. Have you ever deleted or overwritten an important file? No problem, all you need to do is grab the version from last night from the incremental backup.

Listing 4 (which I've called newer-archive.sh on my system) shows you a simple shell script that creates an archive of files that are newer than a specified file. You can use this to create incremental backups of files that are newer than the last incremental backup.


Listing 4. A simple incremental archive script

#!/bin/sh
#
# Make an incremental archive containing files that have been
# modified since the last archive was created.
#
# Usage:
#
# newer-archive.sh -o new-file.tar -nt old-file files

old_file=""
new_file=""
files=""

archiver="tar -T - -czpsSf"

while [ "$1" != "" ] ; do
    case $1 in
        -o)
            new_file=$2
            shift
            ;;
        -nt)
            old_file=$2
            shift
            ;;
        *)
            files="$files $1"
            ;;
    esac
    shift
done

for path in $files ; do
    find $path -newer $old_file
done | $archiver $new_file

The -o option specifies the output file, and the -nt option specifies the file that you should use as a baseline; any files that are newer than this one will be added to the archive. After the options, files, or directories you want to archive can be listed, you'll need to add them all to the output file.

You can modify this to work with any sort of archiver, assuming you can find a way to feed it a list of files to archive through a pipe. You might also need to tweak the tar options specified here if your system doesn't have GNU tar installed.

You can combine this script with the date command (see Listing 5) to create an archive with the current date and time in it.


Listing 5. Using date to specify a backup archive name

chrish@Bender [525]$ sudo ~/bin/newer-archive.sh -o incremental-$(date 
+%Y-%m-%d-%H.%M.%S).tar.gz -nt incremental-2006-09-06-11.15.03.tar.gz /Users

By using the + option from date to specify a different output format (year-month-day-hour.minute.second), you can create a filename that incorporates the current date and time, use the last incremental backup (which is a month out of date that's too long between backups) as the reference old file, and back up all the new or modified user data.

Another option is to use RCS's ci and co commands to create a change history for each file. You can also use ci to check in a file. This creates a history file (the ci filename creates another filename that contains the file's history and its older revisions), and it sets the file to read-only. Use co -l to check out the file and make it writable again. After you're done making changes, check the file back in with a meaningful change log message (see Listing 6).


Listing 6. Using RCS to keep track of file versions

chrish@Bender [536]$ ci -u points.txt 
points.txt,v  <--  points.txt
enter description, terminated with single '.' or end of file:
NOTE: This is NOT the log message!
>> important points to cover in the article
>> .
initial revision: 1.1
done
chrish@Bender [537]$ dir points.txt
-r--r--r--   1 chrish  chrish  170 Oct  6 14:34 points.txt
chrish@Bender [538]$ co -l points.txt
points.txt,v  -->  points.txt
revision 1.1 (locked)
done
chrish@Bender [539]$ vi points.txt
chrish@Bender [540]$ ci -u points.txt
points.txt,v  <--  points.txt
new revision: 1.2; previous revision: 1.1
enter log message, terminated with single '.' or end of file:
>> added another important point
>> .
done

The ci command's -u option automatically checks out a read-only version of the file when you check it in. The co command's -l option locks the file so that you (and only you) can edit it.

RCS only works well with plain text files; if you need to keep older versions of binary files, look into something more powerful, such as Subversion (see Resources).


Creating users or groups

On most systems, adding a new user or group seems like a simple matter of editing the /etc/passwd file (and possibly the shadow password file, which actually contains the passwords) or /etc/group file using your favorite text editor. It's easy to remember, the file format isn't that challenging, and it'll be quick.

There are a number of reasons why you want to avoid doing this, and they're the reasons why most UNIX systems, especially modern ones, ship with a tool for creating new users and groups.

Editing these vital system files can cause havoc. Sure they're simple, but it's still easy to get distracted and mess something up. Maybe your editor of choice locks files while you're working, which could prevent anyone from logging in while you're editing.

You're also left with a pile of work after adding a user by hand. You need to create a new home directory, fill it with the standard home directory goodies, add them to all of the appropriate groups, and create system-level things, such as a mail spool for the new user.

Why give yourself more work? The user and group creation tools are there to save you time and effort (and make sure nothing gets messed up, which helps you maintain your guru-like reputation).

Most standard UNIX systems have adduser (or useradd) and addgroup (or groupadd) commands available to the administrator. Many Linux® distributions have handy graphical tools (such as Fedora Core's User Manager), and FreeBSD's comprehensive sysinstall utility also handles user and group creation. On Mac OS X, you'll use the Accounts preferences to create users, and you'll use the NetInfo Manager to create new groups.


Logging in as root

As you know, root has all the power in a UNIX system. The root user can do anything and, as they say, "with great power comes great responsibility." And yet, some people insist on logging in as root all the time, even if they're not doing anything that requires all of this power.

Always, always create (and use!) a regular user account for yourself on any system where you have root access when you need to do something that requires root privileges. Use the system's su (see Listing 7) or sudo (see Listing 8) command (whichever is available on your system) to temporarily become root.


Listing 7. Temporarily becoming root with the su command

chrish@Bender [514]$ su -
Password:
# 

Listing 8 uses the sudo command to run a command as root.


Listing 8. Running a command as root with the sudo command

chrish@Bender [517]$ sudo id
Password:
uid=0(root) gid=0(wheel) groups=0(wheel), 1(daemon), 2(kmem), 3(sys), 4(tty), 
29(certusers), 5(operator), 80(admin), 20(staff)

Why avoid running as root? One errant rm command, or accidentally unpacking a tarball into the wrong spot, and your system might be damaged enough to require some major repair work.


Securing systems

The systems on your network need to be secured; there's no doubt about that. Leaving a server or router with the default passwords is practically an invitation for unscrupulous (or even just curious) people to start poking around. This could result in a damaged system, either intentionally or accidentally messed up by the intruder, or worse, a compromised system, secretly modified to distribute spam, stolen software, or who knows what.

A good policy for securing your systems is to start by denying everything instead of allowing everything. Specifically, turn off every network service you don't actually need, block every incoming network port except for services you really want exposed to the random populace of the Internet (or your local area network (LAN) if you're behind a firewall), and remove all users who aren't actually using the system.

From there, you can re-deploy services, open network ports, and add users as necessary. This might seem like extra work, but it lets you know exactly what's going on with the system.

Having a reasonable password policy, if you can, also enhances security. A lot of corporate environments have password policies that actually encourage bad password behavior by requiring frequent password changes. If a user has a strong password, making them change their password too often increases the likelihood that they'll either forget their new password (thus creating work for your helpdesk), or that they'll write the password down and keep it near the machine. You'd be amazed at the number of people who keep a list of their highly-secure current passwords on a sticky note under the keyboard or mouse pad.

Tools, such as the powerful crack -- it's a dictionary-based password tester -- (see Resources for a Wikipedia entry), can also help you weed out weak passwords by testing a password file or other password store against words, common misspellings of words (such as l33t speak), and other word-mangling algorithms (see Listing 9).


Listing 9. Using crack to check for weak passwords

chrish@Bender [503]$ cp /etc/passwd .
chrish@Bender [504]$ sudo crack passwd

Note that you're operating on a copy of the password file (your systems passwords might be in /etc/shadow or a centralized store on the network), not the live file. You never know when things might go wrong!


Summary

This article showed you several ways of dealing with a range of system administration traps, and it showed you how the most obvious solution to a problem might not be the right one. You should now be able to deal with common disasters without losing your cool, and you'll be able to prepare in advance for problems by keeping incremental backups and having a secure system. You'll also save time by taking advantage of the tools available on your UNIX system of choice.


Resources

Learn

Get products and technologies

  • IBM trial software: Build your next development project with software for download directly from developerWorks.

Discuss

About the author

Photo of Chris Herborth

Chris Herborth is an award-winning Senior Technical Writer with more than 10 years of experience writing about operating systems and programming. When he's not playing with his son Alex or hanging out with his wife Lynette, Chris spends his spare time designing, writing, and researching (that is, playing) video games.

Comments



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=174131
ArticleTitle=System Administration Toolkit: Problems and pitfalls
publish-date=11142006
author1-email=chrish@pobox.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers