Speaking UNIX, Part 8: UNIX processes

Learn how UNIX multitasks

On UNIX® systems, each system and end-user task is contained within a process. The system creates new processes all the time and processes die when a task finishes or something unexpected happens. Here, learn how to control processes and use a number of commands to peer into your system.

Martin Streicher (martin.streicher@gmail.com), Chief Technology Officer, McClatchy Interactive

Photo of Martin StreicherMartin Streicher is a freelance Ruby on Rails developer and the former Editor-in-Chief of Linux Magazine. Martin holds a Masters of Science degree in computer science from Purdue University and has programmed UNIX-like systems since 1986. He collects art and toys. You can reach Martin at martin.streicher@gmail.com.



03 April 2007

Also available in Chinese Russian

At a recent street fair, I was mesmerized by the one-man band. Yes, I am easily amused, but I was impressed nonetheless. Combining harmonica, banjo, cymbals, and a kick drum -- at mouth, lap, knees, and foot, respectively -- the veritable solo symphony gave a rousing performance of the Led Zeppelin classic "Stairway to Heaven" and a moving interpretation of Beethoven's Fifth Symphony. By comparison, I'm lucky if I can pat my head and rub my tummy in tandem. (Or is it pat my tummy and rub my head?)

Lucky for you, the UNIX® operating system is much more like the one-man band than your clumsy columnist. UNIX is exceptional at juggling many tasks at once, all the while orchestrating access to the system's finite resources (memory, devices, and CPUs). In lay terms, UNIX can readily walk and chew gum at the same time.

This month, let's probe a little deeper than usual to examine how UNIX manages to do so many things simultaneously. While spelunking, let's also glimpse the internals of your shell to see how job-control commands, such as Control-C (terminate) and Control-Z (suspend), are implemented. Headlamps on! To the bat cave!

A real multitasker

On UNIX (and most modern operating systems, including Microsoft® Windows®, Mac OS X, FreeBSD, and Linux®), each computing task is represented by a process. UNIX runs many tasks seemingly at the same time because each process receives a little slice of CPU time in a (conceptually) round-robin fashion.

A process is something of a container, bundling a running application, its environment variables, the state of the application's input and output, and the state of the process, including its priority and accumulated resource usage. Figure 1 pictures a process.

Figure 1. A conceptual model of a UNIX process
A representation of a UNIX process

If it helps, you can think of a process as its own sovereign nation, with borders, resources, and gross domestic product.

Each process also has an owner. Tasks you initiate -- your shell and commands, say -- are typically owned by you. System services might be owned by special users or by the superuser, root. For example, to enhance security, the Apache HTTP Server is typically owned by a dedicated user named www, which provides access to the files the Web server needs, but no others.

Ownership of a process might change but is otherwise strictly exclusive. A process can have only one owner at any given time.

Finally (and simplifying for this introduction), each process has privileges. Typically, a process's privileges are commensurate with those of its owner. (For instance, if you can't access a particular file from your command-line shell, programs you launch from the shell inherit the same limitation.) An exception to this inheritance rule, where a process might acquire greater privileges than its owner, is an application with the special setuid or setgid bit enabled, as shown by ls.

The setuid bit can be set using chmod u+s. setuid permissions look like this:

$ ls -l /usr/bin/top
-rwsr-xr-x     1 root  wheel     83088 Mar 20  2005 top

The setgid bit can be set using chmod g+s:

$ ls -l /usr/bin/top
-r-xr-sr-x   1 root  tty  19388 Mar 20  2005 /usr/bin/wall

A setuid process, such as launching top, runs with the privileges of the user who owns the file. Hence, when you run top, your privileges are promoted to those of root. Similarly, a setgid process runs with the privileges associated with the group owner of the file.

For instance, on Mac OS X, the wall utility -- short for "write all," because it writes a message to every physical or virtual terminal device -- is setgid tty (as shown above). When you log in and are assigned a terminal device to type in (the terminal becomes standard input for your shell), you're made the owner of the device, and tty becomes the group owner. Because wall runs with the privileges of group tty, it can open and write to every terminal.

Taking inventory

Like all other system resources, your UNIX system has a finite, albeit large pool of processes. (In practice, a system almost never runs out of processes.) Each new task -- say, launching vi or running xclock -- is immediately allocated a process from the pool. On UNIX systems, you can view one or more processes using the ps command.

For example, if you want to see all the processes you own, type ps -w --user username:

$ ps -w --user mstreicher

You can view the entire list of processes using ps -a -w -x. (The format and specific flags of the ps command vary from UNIX flavor to UNIX flavor. See the online documentation for your system to find specifics.) -a selects all processes running on a tty device; -x further selects all processes not associated with a tty, which typically includes all the perpetual system services, such as the Apache HTTP server, the cron job scheduler, and so on; and -w shows a wide format, useful for seeing the command line or full pathname of the application associated with each process.

ps has a legion of features, and some versions of ps even allow you to customize the output. For example, here is a useful custom process listing:

$ ps --user mstreicher -o pid,uname,command,state,stime,time 
  PID USER     COMMAND          S STIME     TIME
14138 mstreic  sshd: mstreicher S 09:57 00:00:00
14139 mstreic  -bash            S 09:57 00:00:00
14937 mstreic  ps --user mstrei R 10:23 00:00:00

-o formats output according to the order of the named columns. pid, uname, and command are process ID, user name, and command, respectively. state reflects the process state, such as sleeping (S) or running (R). (More on process state in a moment.) stime shows when the command started, and time shows how much CPU time the process has consumed.

Daddy, where do processes come from?

On UNIX, some processes run from system boot to shutdown, but most processes come and go rapidly, as tasks start and complete. At times, a process can die a premature, even horrible death (say, due to a crash). Where do new processes come from?

Each new UNIX process is the spawn of an existing process. Further, each new process -- let's call it the "child" process -- is a clone of its "parent" process, at least for an instant, until the child continues execution independently. (If each new process is the offspring of an existing process, that begs the quandary, "Where does the first process come from?" See the sidebar below for the answer.)

The chicken and the egg

Some debates are perennial: To be or not to be? Coke or Pepsi? PC or Mac? Then, of course, there's the age-old quandary, "Which came first: the chicken or the egg?"

If each new UNIX process is spawned from an existing, running process, where does the first process come from? The answer: The UNIX kernel spawns the first process during the boot sequence.

The first process is called, appropriately enough, init, and the genealogy of all other system processes can be traced back to init. In fact, init's process number is 1. You can find the status of init by typing ps -l 1:

F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 4 S 0 1 0 0 68 0 - 373 select ? 0:02 init [2]

As you can see, the owner (UID) of init is 0 (root). Unlike every other process in the system, init doesn't have a parent process -- the Parent Process ID (PPID) is 0.

Figures 1-4 detail the spawning, uh, process:

  1. In Figures 2 and 3, Process A is running a program represented by the blue box. It runs the instructions numbered 10, 11, 12, and so on. Process A has its own data, its own copy of the program, its set of open files, and its own collection of environment variables, which were initially captured when Process A sprang into existence.
    Figure 2. Process A running code
    Running code within a process
  2. In UNIX, the fork() system call (so named because it's a call, or request, for operating system assistance) is used to spawn a new process. When Program A executes fork() in Instruction 13, the system immediately creates an exact clone of Process A, named Process Z. Z has the same environment variables as A, the same memory contents, the same program state, and the same files open. The state of Processes A and Z immediately after Process A spawns process Z is shown in Figure 3.
    Figure 3. Process A spawns a clone of itself
    A representation of the spawning process
  3. At inception, Process Z begins execution at the same place where Process A left off. That is, after inception, Process Z begins execution at Instruction 14. Process A continues execution at the same instruction.
  4. Typically, the programming logic at Instruction 14 tests whether the current process is the child or parent process -- that is, Instruction 14 in Process Z and Instruction 14 in process A separately determine if its process is the progeny or progenitor. To differentiate, the fork() system call returns 0 in the progeny but returns the process ID of Process Z to the progenitor.
  5. After the previous test, Process A and Process Z diverge, each taking a separate code path, as if both came to a fork in the road and each took a distinct branch. The process of spawning a new process is more often called forking, given the metaphor of two travelers reaching a fork in the road. Hence, the system call is named fork().

After the fork, Process A might continue running the same application. However, Process Z might immediately choose to metamorphose to another application. The latter operation of changing what program is running with a process is called execution, but you can think of it as reincarnation: Although the process ID remains the same, the instructions within the process are replaced entirely with those of the new program. Figure 4 shows the state of Process Z some time later.

Figure 4. Process Z is now independent of its progenitor, Process A
A representation of the spawning process

Forking around

You can experience forking right from the comfort of your private command line. To begin, open a new xterm. (You likely now realize that xterm is its own process and, within xterm, the shell is a separate process spawned by xterm). Next, type:

ps  -o pid,ppid,uname,command,state,stime,time

You should see something like this:

  PID  PPID USER     COMMAND          S STIME     TIME
16351 16350 mstreic  -bash            S 11:23 00:00:00
16364 16351 mstreic  ps -o pid,ppid,u R 11:24 00:00:00

According to the PPID fields in this list, the ps command is a child of the bash shell. (The hyphen in -bash indicates that the shell instance is a login shell.) To run ps, bash forks to create a new process; the new process reincarnates itself using execution, turning into a new instance of ps.

Here's another experiment to try. Type:

sleep 10 & sleep 10 & sleep 10 & ps  -o pid,ppid,uname,command,state,stime,time

You should see something like this:

$ sleep 10 & sleep 10 & sleep 10 & ps  -o pid,ppid,uname,command,state,stime,time
  PID  PPID USER     COMMAND          S STIME     TIME
16351 16350 mstreic  -bash            S 11:23 00:00:00
16843 16351 mstreic  sleep 10         S 11:42 00:00:00
16844 16351 mstreic  sleep 10         S 11:42 00:00:00
16845 16351 mstreic  sleep 10         S 11:42 00:00:00
16846 16351 mstreic  ps -o pid,ppid,u R 11:42 00:00:00

The command line spawns four new processes. Typing ampersand (&) after each sleep command runs each of those commands in the background, or in parallel with the shell. ps is another spawned process, but it's running in the foreground, preventing the shell from running another command until it terminates. Again, all four processes are the spawn of the shell, as shown by the values of PPID. The three sleep commands are marked S, because none of the process are consuming resources while they're sleeping.

For convenience, the shell keeps track of all background processes it spawns. Type jobs to see a list:

$ sleep 10 & sleep 10 & sleep 10 & 
[1] 16843
[2] 16844
[3] 16845

$ jobs
[1]   Running                 sleep 10 &
[2]   Running                 sleep 10 &
[3]   Running                 sleep 10 &

Here, the three jobs are labeled 1, 2, and 3 for convenience. The numbers 16843, 16844, and 16845 are the process IDs of each respective process. Thus, background task 1 is process ID 16843.

You can manipulate your background jobs from the command line using these labels. For instance, to terminate a command, type kill %N, where N is the command's label. To move a command from the background to the foreground, type fg %N:

$ sleep 10 & sleep 10 & sleep 10 &
[7] 17741
[8] 17742
[9] 17743

$ kill %7
$ jobs
[7]   Terminated              sleep 10
[8]-  Running                 sleep 10 &
[9]+  Running                 sleep 10 &

$ fg %8
sleep 10

Running multiple commands simultaneously and asynchronously from the command line is a great way to juggle your own set of tasks. A long-running job -- say, number crunching or a large compilation -- is perfect to place in the background. To capture the output of each background command, consider redirecting the output to a file, using the redirection operators >, >&, >>, and >>&. Whenever a background command finishes, the shell prints an alert message before the next prompt:

$ whoami
mstreicher
[8]-  Done                    sleep 10
[9]+  Done                    sleep 10
$

To the great process pool in the sky

Some processes live forever (such as init), and some processes reincarnate themselves into a new form (such as your shell). Ultimately, most processes die of natural causes -- a program runs to completion.

Additionally, you can place a process in a kind of suspended animation, where it waits to be reanimated. And as the previous example shows, you can terminate a process prematurely with kill.

If a command is running in the foreground and you want to suspend it, press Control-Z:

$ sleep 10
(Press Control-Z)
[1]+  Stopped                 sleep 10

$ ps
  PID  PPID USER     COMMAND          S STIME     TIME
18195 16351 mstreic  sleep 10         T 12:44 00:00:00

The shell has suspended the command and assigned it a label for convenience. You can use this label as before to terminate the job or return it to the foreground. You can also use the bg command to resume the process in the background:

bg %1
[1]+ sleep 10 &

If a command is running in the foreground and you want to terminate it, press Control-C:

$ sleep 10
(Press Control-C
$ jobs
$

Your shell makes suspending and terminating a process easy, but a little voodoo is working beneath the shell's innocent facade. Internally, your shell uses UNIX signals to affect the state of processes. A signal is an event, and it's used to alert a process. The operating system originates many signals, but you can send signals from one process to another, or even have a process signal itself.

UNIX includes a wide variety of signals, most of which have a special purpose. For example, if you send signal SIGSTOP to a process, the process suspends. (For a complete list of signals, type man 7 signal or type kill -L). You send signals with the kill command:

$ sleep 20 &
[1] 19988

$ kill -SIGSTOP 19988

$ jobs
[1]+  Stopped                 sleep 20

Initially, the sleep command started in the background with process ID 19988. After sending SIGSTOP, the process changed state, becoming suspended or stopped. Sending another signal, SIGCONT, reanimates the process, and it resumes where it left off.

In other words, your shell sends SIGSTOP to the foreground process each time you press Control-Z. The bg command sends SIGCONT. And Control-C sends SIGTERM, which requests that the process terminate immediately.

Some signals can be blocked by a process, and applications can be designed to explicitly "catch" signals and react to each event in a special way. For instance, the system service xinetd, which launches other network services on demand, re-reads its configuration files upon the receipt of SIGHUP. On Linux, sending signals to init can change the system runlevel and even initiate system shutdown. (Here's a question: What's the difference between kill %1 and kill 1?)

A process can even signal itself. Imagine that you're writing a game and want to give the user five seconds to respond. Your code can set a five-second timer and continue, say, redrawing the screen. When the timer runs out, SIGALRM is sent back to your process. Bzzzzt! Time's up!

(Here's the answer to the question: kill %1 kills your background job labeled 1. kill 1 terminates init, which is a signal to the operating system that it should shut down the entire machine.)

Still other signals are transmitted from the operating system to processes in special circumstances. A memory violation can spur SIGSEGV, killing the process instantly while leaving a core dump behind. One special signal, SIGKILL, can't be blocked or caught, and it kills a process immediately.

As with many other resources in UNIX, you can only signal processes that you own. This prevents you from terminating important system services and the processes of other users. The superuser, root, can signal any process.

More magic demystified

UNIX has many moving parts. It has system services, devices, memory managers, and more. Luckily, most of these complex machinations are hidden from view or are made convenient to use through user interfaces, such as the shell and windowing tools. Better yet, if you want to dive in, specialized tools, such as top, ps, and kill, all are readily available.

Now that you know how processes work, you can become your own one-person band. Just one request: Freebird!

Resources

Learn

Get products and technologies

  • IBM trial software: Build your next development project with software for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=206222
ArticleTitle=Speaking UNIX, Part 8: UNIX processes
publish-date=04032007