A Process is one of the most important fundamental concepts of the Linux operating system. This article focuses on the basics of Linux processes.
A process is an instance of a program running in Linux. This is the basic definition that you might have heard before. Though its simple enough to understand but still lets elaborate a bit for the beginners.
Lets quickly create a hello world program in C language :
printf("\n Hello World\n");
// Simulate a wait for some time
for(i=0; i<0xFFFFFFFF; i++);
Compile the code above :
$ gcc -Wall hello_world.c -o hello_world
Run the executable :
The command above will execute the hello world program. Since the program waits for some time, so quickly go to the other terminal and check for any process named 'hello_world' :
$ ps -aef | grep hello_world
himanshu 2260 2146 95 20:38 pts/0 00:00:13 ./hello_world
So we see that a process named 'hello_world' is running in the system. Now, try to run the same program in parallel from 2-3 locations and again run the above command. I tried running the program in parallel from three different terminals and here is the output of the above command :
$ ps -aef | grep hello_world
himanshu 2320 2146 99 20:43 pts/0 00:00:03 ./hello_world
himanshu 2321 2261 67 20:43 pts/1 00:00:02 ./hello_world
himanshu 2322 2287 72 20:43 pts/2 00:00:00 ./hello_world
So you see that each instance of the hello_world program created a separate process. Hence we say that process is running instance of a program.
Identifiers associated with a process
Each process has following identifiers associated with it:
Process Identifier (PID)
Each process has a unique identifier associated with it known as process ID. This ID remains unique across the system. For example, if you run the ps command on your Linux box, you will see something like:
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 19:43 ? 00:00:00 /sbin/init
root 2 0 0 19:43 ? 00:00:00 [kthreadd]
root 3 2 0 19:43 ? 00:00:00 [migration/0]
root 4 2 0 19:43 ? 00:00:00 [ksoftirqd/0]
root 5 2 0 19:43 ? 00:00:00 [watchdog/0]
root 6 2 0 19:43 ? 00:00:00 [migration/1]
root 7 2 0 19:43 ? 00:00:00 [ksoftirqd/1]
root 8 2 0 19:43 ? 00:00:00 [watchdog/1]
root 9 2 0 19:43 ? 00:00:00 [events/0]
root 10 2 0 19:43 ? 00:00:00 [events/1]
The above output is from my Linux box. The second column (PID) gives the process ID of the process being described in the row. You may notice another similar looking column PPID. Well, this gives information of the parent process ID of this process. Any process in the Linux system will have a parent.
User and group Identifiers (UID and GID)
The category of identifiers associated with a process is the user and group identifiers. The user and group ID can further be classified into :
Real user ID and real group ID
These identifiers give information about the user and group to which a process belongs. Any process inherits these identifiers from its parent process.
Effective user ID, effective group ID and supplementary group ID
Ever got an error like "Permission denied"? Well this is a common error that is encountered many times. This error usually occurs when a process does not have sufficient permissions to carry out a task. These three IDs are used to determine the permission that a process has to do stuff that requires special permissions. Usually the effective user ID is same as real user ID but in case its different then it means that process is running with different privileges then what it has by default (ie inherited from its parent).
If a process is running with effective user ID '0', this means that this process has special privileges. The processes that have zero effective user ID are known as privileged processes as they are running as superuser. These processes bypass all the permission checks that kernel has in place for all the unprivileged processes.
The init process
In Linux every process has a parent process. Now, one would ask that there has to be some starting point, some process that is created first. Yes, there is a process known as 'init' that is the very first process that Linux kernel creates after system boots up. All the process there-on are children of this process either directly or indirectly. The init process has special privileges in the sense that it cannot be killed. The only time it terminates is when the Linux system is shut down. The init process always has process ID 1 associated with it.
Zombie and orphan processes
Suppose there are two processes. One is parent process while the other is child process. In a real time, there can be two scenarios:
The parent dies or gets killed before the child.
In the above scenario, the child process becomes the orphan process (as it has lost its parent). In Linux, the init process comes to the rescue of the orphan processes and adopts them. This means after a chile has lost its parent, the init process becomes its new parent process.
The child dies and parent does not perform wait() immediately.
Whenever the child is terminated, the termination status of the child is available to the parent through the wait() family of calls. So, the kernel does waits for parent to retrieve the termination status of the child before its completely wipes out the child process. Now, In a case where parent is not able to immediately perform the wait() (in order to fetch the termination status), the terminated child process becomes zombie process. A zombie process is one that is waiting for its parent to fetch its termination status. Although the kernel releases all the resources that the zombie process was holding before it got killed, some information like its termination status, its process ID etc are still stored by the kernel. Once the parent performs the wait() operation, kernel clears off this information too.
A process that needs to run for a long period of time and does not require a controlling terminal, these type of processes are programmed in a way that they becomes a daemon processes. For example, monitoring software like key-logger etc are usually programmed as daemon processes. A daemon process has no controlling terminal.
Memory layout of a process
A process can broadly be defined into following segments :
Stack contains all the data that is local to a function like variables, pointers etc. Each function has its own stack. Stack memory is dynamic in the sense that it grows with each function being called.
Heap segment contains memory that is dynamically requested by the programs for their variables.
All the global and static members become part of this segment.
All the program instructions, hard-coded strings, constant values are a part of this memory area.
If we extend the above hello world program to something like :
int i = 0;
char *ptr = (char*)malloc(15);
memset(ptr, 0, 15);
memcpy(ptr, "Hello World", 11);
printf("\n %s \n", ptr);
// Simulate a wait for some time
for(i=0; i<0xFFFFFFFF; i++);
In the example above :
- The variable 'a' goes into the data segment(specifically into BSS segment that contains all the uninitialized globals)
- The variables 'i' and 'ptr' lie on stack segment. Each function call like memset, memcpy, printf and free will have their separate stack once they get called.
- The constant values like "Hello World", '15', '11', '0', '0XFFFFFFFF' and all the instructions are part of text segment.
- The 15 bytes of memory allocated by the malloc function is allocated on heap. So the pointer 'ptr' holds the address of a memory location on heap.
Note that heap is shared by all processes, so overuse or corruption of heap might affect other programs running in the system.
Linux process environment
Environment in Linux is a list of 'variable=value' information that is used for variety of purposes. Programs, scripts, shells etc use this information for their smooth operation. For example the home directory of the user which is presently logged-in can be accessed by the 'HOME' environment variable. List of these environment variables along with their values can be viewed using the 'env' command. For example, on my Linux box I could see the following output of the env command :
So we can see that there is a wide list of environment variables available. A user can add an environment variable using the 'export' command. In C language, an extern variable char**environ can be used to access this list in a program. A list of functions like getenv(), setenv() etc are available to manipulate the process environment.
Manipulating Linux resource limits
Any process in Linux can get hold of resources like files, memory etc. As always there is a limit to these resources per process. Each resource has a soft and a hard limit associated with it. A soft limit is a temporary limit associated with a resource and can be changed while a hard limit is the cap up to which the soft limit can be changed. Linux provides command line utilities like 'ulimit' to manipulate these resource limits. On the other hand the system calls like getrlimit() and setrlimit() can be used to play with these limits from within a C code.