IBM Support

How To Determine Number of Instances of a Running Script

Question & Answer


Question

Why does ps command output show multiple instances of a script name?

Answer

How To Determine Number of Instances of a Running Script Introduction
Example script using the ps command
Why the example script does not work
An alternative approach
Conclusion

Introduction

Often times a script must determine how many instances of itself, or some other script, are currently running. This information is sometimes needed to ensure that only one instance of the script is running at a time. A common mistake is to use the output from the ps command to count the number of times the script name is listed in the process table. If the script has only been executed once, it would seem the ps command should list the script name only once in its output. This will be true in most cases, but because of the way Unix creates new processes with the fork and exec functions, and because of the way the ps command obtains a static snapshot of the process table at a given instant in time, it is not a reliable method for counting the number of instances of a running script.

Example script using the ps command

Here is a simple script which uses the output from the ps command to count the number of instances of itself that are currently running.

# cat count_1.ksh
#!/bin/ksh
numberOfInstances=$(ps -ef | grep $0 | egrep -v "grep|vi|more|pg" | wc -l)
print "number of instances = $numberOfInstances"
sleep 120

If the script is not already running, it will generate the expected output below.

#
./count_1.ksh
number of instances =        1

However, if the ps command is executed repeatedly in loop, sometimes the output will contain one instance of the script name, and other times it might contain two or more. A modified version of the script can be used to demonstrate this. Even though only one instance of the script is running, it will occasionally report that two instances of itself are running. The output from the script below is included for example purposes - if you run this script on your machine, it might take many more iterations through the loop before the number of instances variable will be greater than 1.

# cat count_2.ksh
#!/bin/ksh
while true
do
   numberOfInstances=$(ps -ef | grep $0 | egrep -v "grep|vi|more|pg" | wc -l)
   print "number of instances = $numberOfInstances"
done

# ./count_2.ksh
number of instances =        1
number of instances =        1
number of instances =        1
number of instances =        1
number of instances =        1
number of instances =        1
number of instances =        1
number of instances =        2
number of instances =        1
number of instances =        1
number of instances =        1
number of instances =        1
number of instances =        1
number of instances =        1
number of instances =        1
number of instances =        2
number of instances =        1
number of instances =        1
number of instances =        1
number of instances =        1
number of instances =        1
number of instances =        1
number of instances =        1
number of instances =        1
number of instances =        1
number of instances =        1
number of instances =        2
...

Why the example script does not work

Each time a new command is executed inside of a script, a new process is forked to run that command. Initially when this new process is forked, the name of the process will be the same as the name of the parent process, which is the name of the script. The new process will be a child process of the process running the script, so it's PPID will be the PID of the script.

After the fork, an exec is performed to overlay the new child process with the code for the command to be executed. At this time, the name of the new child process will be changed to the name of the actual command. If the ps command captures the process table at the moment a fork has executed but before the exec, two instances of a process with the same name will show up in the ps command output. Because a fork and exec are normally executed very quickly, one right after the other, it is rare for the ps command to capture a process that has been forked but not yet exec'ed.

Here is the line in the script that runs the ps command. The $ and parenthesis are used to denote command substitution. All commands enclosed within the parenthesis of a command substitution will be executed, and then replaced with the resulting output. So in this command line, the numberOfInstances variable will be assigned the output that is generated from the command pipeline, which starts with the ps command. Command substitution is implemented in the shell by creating a subshell process that is a child process of the shell which is running the script.

                                  command substitution
                   _________________________|____________________________
                  |                                                      |
numberOfInstances=$(ps -ef | grep $0 | egrep -v "grep|vi|more|pg" | wc -l)
                    |        |         |                            |
                    1        2         3                            4

When this line is executed by the shell, five child processes will be created; one for the subshell used to perform the command substitution, and one for each of the four commands executed inside of the command substitution parenthesis. The child processes are first created with a fork() function and then the new processes are overlayed with the actual commands in an exec() function. Because the fork and exec functions are not executed at the exact same time, it is possible for the ps command in the command line above, to capture a forked process in the process table before the exec takes place. This is rare for normal commands, but more common for the fork that takes place to handle command substitution. This is because the subshell process that handles command substitution exists until all of the commands running within it have terminated. Also, this subshell process retains the same name of the parent process that created it. So while a command substitution is taking place, it is more likely that the ps command will capture the subshell child process that is used for the command substitution.

How commands in a pipe are executed

To understand why some of these child processes might show up in the output from the ps command, it is important to understand that commands executed in a pipe are executed simultaneously, and not sequentially as we might expect. In other words, the ps command, the two grep commands, and the wc command are all running at the same time. Looking at the first two commands, the ps command and the grep command are both started at almost the same time. As the ps command is collecting all of the information in the process table, it sends that information down the pipe to the grep command. The grep command reads the information from ps as it becomes available, one line at a time, and sends its output down the pipe to the egrep command, which is already running. As each of these commands are running, if no input is currently available for reading, the command will sleep until more input is available, or until an end of file marker is read, which indicates that the command on the left (sending its output through the pipe to the command on the right), has finished. When a command reads an end-of-file marker, it will finish processing the input, write its output, and then terminate. So while the ps command is running, child processes are being created for each of the other commands within the command substitution, and these child processes (some of which might have the same name as the parent process or script),  can potentially show up in the ps command output.

The effect of command substitution

To demonstrate how command substitution in the example script listed above will cause a child process to be created with the same name as the name of the script, you can perform the following test. Modify the script to include a sleep command as the first command that will be executed inside the command substitution. Also, remove the wc -l command so that the $processes variable will contain the actual ps command output.

# cat count_3.ksh
#!/bin/ksh
processes=$(sleep 120 ; ps -ef | grep $0 | egrep -v "grep|vi|more|pg")
print $processes

# ps -f
     UID    PID   PPID   C    STIME    TTY  TIME CMD
    root 335896 360478   1 16:26:48  pts/2  0:00 ps -f
    root 360478 323818   0 11:15:25  pts/2  0:00 -ksh
# ./count_3.ksh

The script waits for 120 seconds before printing this output:

root 348198 360478 0 16:28:17 pts/2 0:00 /bin/ksh ./count_3.ksh

While the script is sleeping for 120 seconds, run the ps command in another terminal session on the same machine:

# ps -ef | grep count_3.ksh | grep -v grep
    root 348198 360478   0 16:28:17  pts/2  0:00 /bin/ksh ./count_3.ksh
    root 377072 348198   0 16:28:17  pts/2  0:00 /bin/ksh ./count_3.ksh

Notice there are two count_3.ksh processes in the process table. One of these is the child of the other and was created by the shell to handle the command substitution. The sleep command embedded within the command substitution slows this down so that we are able to easily see the two processes with the same name.

Removing command substitution

It might seem that capturing the ps output into a file, and thus removing the potential side effect of command substitution, would help to solve the problem. Here is a modified version of the script which does this.

# cat count_4.ksh
#!/bin/ksh

# Capture the process table into a file. If this is the first and only running instance
# of the script,
the script name will show up only once in the ps -ef output. However, if
# one or more instances of the script are already running, the number of times the script
# name
will show up in the ps -ef output is indeterminate, due to the effect of child
# processes
that are created by commands running inside the already running script.
ps -ef > /tmp/processList.out

numberOfInstances=$(grep $0 /tmp/processList.out | egrep -v "grep|vi|more|pg" | wc -l)
print "number of instances = $numberOfInstances"

while true
do
   ps -ef > /tmp/processList.out
   numberOfInstances=$(grep $0 /tmp/processList.out | egrep -v "grep|vi|more|pg" | wc -l)
   if (( numberOfInstances > 1 ))
   then
      print "number of instances = $numberOfInstances"
      exit 1
   fi
done

# ./count_4.ksh
number of instances =        1
(the script will run indefinitely as long as only one instance of the script is running)
^C
#

If the script is only running once, the captured ps -ef output file will show one instance in the process table, and thus the numberOfInstances variable will always contain a 1, as we would expect. When only one instance of the script is running, the loop will continue indefinitely as the numberOfInstances variable will always equal 1. If multiple instances of the script are running, the numberOfInstances variable will be greater than 1, and the loop will terminate. However the effect of child processes created by other commands running inside the previously running script(s), will still cause the numberOfProcesses variable to incorrectly count these child processes, that will sometimes have the same name as the script. If a script is attempting to ensure that N, or no more than N, instances of itself are running, this method will still be unreliable.

An alternative approach

A number of alternatives could be used to overcome the problem of child processes created by commands running within a script. One method would allow the use of the ps command, but code would be needed to check the PPID of each process in the ps command output to see if the process was a child of the script process, and if so it could be discarded. Here is another alternative approach that does not rely on ps command output. The following script is an example that can be used to ensure that only one instance of the script is running. To test this example script, run it in one terminal session, and then run it a number of times in a second terminal session.

# cat test.ksh
#!/bin/ksh

# Check to see if the lastRunningPID soft link exists. If it does, use
# the name it points to, to extract the pid. See if this pid exists in
# the /proc file system. If it does, it indicates that this script is
# already running, in which case we simply exit because we do not want
# to have the script running more than once. If the pid does not exist
# in /proc, it indicates that the script was not previously running so
# we just remove the soft link so that we can recreate it later and have
# it point to the correct pid.
if [ -L /tmp/lastRunningPID ]
then
   pid=$(ls -l /tmp/lastRunningPID | sed 's/.*-> //')
   if [ -d /proc/$pid ]
   then
      print "$$: Process $pid is already running. $$ is exiting."
      exit 0
   fi
      rm /tmp/lastRunningPID
fi

# The first time this script runs when no other instances of the
# script are running, create a soft link named lastRunningPID
# that points to the pid for the process. For this example, we
# create the soft link in /tmp but it could be created in any
# directory where the account used to run the script has access.
ln -s $$ /tmp/lastRunningPID
print "$$ is running now..."
ls -l /tmp/lastRunningPID

# Sleep so that we can test by running this script again in another
# terminal session. While the script is sleeping, the soft link will
# exist and point to its pid. If the script is executed again while
# we are sleeping, the link will be detected and the pid will be cross
# checked in the /proc directory to confirm that the process is still
# running. If it is, the second instance of this script will simply
# terminate.
sleep 120

# We are finished now so we will exit. At this point, no other
# instances of the script will be running so we will remove the
# soft link. The next time the script is executed, it will create
# a new soft link that points to its pid.
print "$$ is terminating..."
rm /tmp/lastRunningPID
exit 0

Terminal session 1

We run the script for the first time. Only one instance of the script will be running at this point until we run it again in session 2.

# ./test.ksh
155714 is running now...
lrwxrwxrwx   1 root     system            6 Dec 26 15:39 /tmp/lastRunningPID -> 155714

The script eventually finishes and exits. No other instances should be running at this point.

Terminal session 2

We run the script again, multiple times. Each time the script detects that it is already running so it exits.

# ./test.ksh
364578: Process 155714 is already running. 364578 is exiting.
# ./test.ksh
364580: Process 155714 is already running. 364580 is exiting.
# ./test.ksh
364582: Process 155714 is already running. 364582 is exiting.
# ./test.ksh
364586: Process 155714 is already running. 364586 is exiting.
# ./test.ksh
364590: Process 155714 is already running. 364590 is exiting.

The script has now completed in Session 1 so no instances of the script are now running. We run the script again so that it will be the only instance running at this point. Notice that a new pid is now being used.

# ./test.ksh
155718 is running now...
lrwxrwxrwx   1 root     system            6 Dec 26 15:42 /tmp/lastRunningPID -> 155718

Conclusion

Because of the way fork and exec operate, there will always be child processes that are created while a script is executing, and these child processes will sometimes have the same name as the name of the script. Whether or not these child processes show up in the ps command output is determined by many factors, including the speed of the machine running the script, the number of processors in that machine, and the operating system level on that machine. All of these factors can cause split second timing differences, and thus affect the output from a ps command on a particular machine. Because sometimes one or more of these child processes will have the same process name as the parent process script, using the ps command to count the number of times a script is running is not reliable.




[{"Product":{"code":"SWG10","label":"AIX"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Support information","Platform":[{"code":"PF002","label":"AIX"}],"Version":"5.3;6.1;7.1","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

More support for:
AIX

Software version:
5.3, 6.1, 7.1

Operating system(s):
AIX

Document number:
670139

Modified date:
17 June 2018

UID

isg3T1011020

Manage My Notification Subscriptions