Using file descriptors

A file descriptor is an unsigned integer used by a process to identify an open file.

The number of file descriptors available to a process is limited by the /OPEN_MAX control in the sys/limits.h file. The number of file descriptors is also controlled by the ulimit -n flag. The open, pipe, creat, and fcntl subroutines all generate file descriptors. File descriptors are generally unique to each process, but they can be shared by child processes created with a fork subroutine or copied by the fcntl, dup, and dup2 subroutines.

File descriptors are indexes to the file descriptor table in the u_block area maintained by the kernel for each process. The most common ways for processes to obtain file descriptors are through open or creat operations or through inheritance from a parent process. When a fork operation occurs, the descriptor table is copied for the child process, which allows the child process equal access to the files used by the parent process.

File descriptor tables and system open file tables

The file descriptor and open file table structures track each process' access to a file and ensure data integrity.
Table Description
file descriptor table Translates an index number (file descriptor) in the table to an open file. File descriptor tables are created for each process and are located in the u_block area set aside for that process. Each of the entries in a file descriptor table has the following fields: the flags area and the file pointer. There are up to OPEN_MAX file descriptors. The structure of the file descriptor table is as follows:
struct ufd
{
        struct file *fp;
        int flags;
} *u_ufd
system open file table Contains entries for each open file. A file table entry tracks the current offset referenced by all read or write operations to the file and the open mode (O_RDONLY, O_WRONLY, or O_RDWR) of the file.

The open file table structure contains the current I/O offset for the file. The system treats each read/write operation as an implied seek to the current offset. Thus if x bytes are read or written, the pointer advances x bytes. The lseek subroutine can be used to reassign the current offset to a specified location in files that are randomly accessible. Stream-type files (such as pipes and sockets) do not use the offset because the data in the file is not randomly accessible.

Managing file descriptors

Because files can be shared by many users, it is necessary to allow related processes to share a common offset pointer and have a separate current offset pointer for independent processes that access the same file. The open file table entry maintains a reference count to track the number of file descriptors assigned to the file.

Multiple references to a single file can be caused by any of the following:
  • A separate process opening the file
  • Child processes retaining the file descriptors assigned to the parent process
  • The fcntl or dup subroutine creating copies of the file descriptors

Sharing open files

Each open operation creates a system open file table entry. Separate table entries ensure each process has separate current I/O offsets. Independent offsets protect the integrity of the data.

When a file descriptor is duplicated, two processes then share the same offset and interleaving can occur, in which bytes are not read or written sequentially.

Duplicating file descriptors

File descriptors can be duplicated between processes in the following ways: the dup or dup2 subroutine, the fork subroutine, and the fcntl (file descriptor control) subroutine.

dup and dup2 subroutines

The dup subroutine creates a copy of a file descriptor. The duplicate is created at an empty space in the user file descriptor table that contains the original descriptor. A dup process increments the reference count in the file table entry by 1 and returns the index number of the file-descriptor where the copy was placed.

The dup2 subroutine scans for the requested descriptor assignment and closes the requested file descriptor if it is open. It allows the process to designate which descriptor entry the copy will occupy, if a specific descriptor-table entry is required.

fork subroutine
The fork subroutine creates a child process that inherits the file descriptors assigned to the parent process. The child process then execs a new process. Inherited descriptors that had the close-on-exec flag set by the fcntl subroutine close.

 

fcntl (file descriptor control) subroutine
The fcntl subroutine manipulates file structure and controls open file descriptors. It can be used to make the following changes to a descriptor:
  • Duplicate a file descriptor (identical to the dup subroutine).
  • Get or set the close-on-exec flag.
  • Set nonblocking mode for the descriptor.
  • Append future writes to the end of the file (O_APPEND).
  • Enable the generation of a signal to the process when it is possible to do I/O.
  • Set or get the process ID or the group process ID for SIGIO handling.
  • Close all file descriptors.

Preset file descriptor values

When the shell runs a program, it opens three files with file descriptors 0, 1, and 2. The default assignments for these descriptors are as follows:

Descriptor Explanation
0 Represents standard input.
1 Represents standard output.
2 Represents standard error.

These default file descriptors are connected to the terminal, so that if a program reads file descriptor 0 and writes file descriptors 1 and 2, the program collects input from the terminal and sends output to the terminal. As the program uses other files, file descriptors are assigned in ascending order.

If I/O is redirected using the < (less than) or > (greater than) symbols, the shell's default file descriptor assignments are changed. For example, the following changes the default assignments for file descriptors 0 and 1 from the terminal to the appropriate files:
prog < FileX > FileY

In this example, file descriptor 0 now refers to FileX and file descriptor 1 refers to FileY. File descriptor 2 has not been changed. The program does not need to know where its input comes from nor where it is sent, as long as file descriptor 0 represents the input file and 1 and 2 represent output files.

The following sample program illustrates the redirection of standard output:
#include <fcntl.h>
#include <stdio.h>

void redirect_stdout(char *);

main()
{
       printf("Hello world\n");       /*this printf goes to
                                      * standard output*/
       fflush(stdout);
       redirect_stdout("foo");        /*redirect standard output*/
       printf("Hello to you too, foo\n");
                                      /*printf goes to file foo */
       fflush(stdout);
}

void
redirect_stdout(char *filename)
{
        int fd;
        if ((fd = open(filename,O_CREAT|O_WRONLY,0666)) < 0)
                                        /*open a new file */
        {
                perror(filename);
                exit(1);
        }
        close(1);                       /*close old */
                                        *standard output*/
        if (dup(fd) !=1)                /*dup new fd to
                                        *standard input*/
        {
                fprintf(stderr,"Unexpected dup failure\n");
                exit(1);
        }
        close(fd);                       /*close original, new fd,*/
                                         * no longer needed*/
}

Within the file descriptor table, file descriptor numbers are assigned the lowest descriptor number available at the time of a request for a descriptor. However, any value can be assigned within the file descriptor table by using the dup subroutine.

File descriptor resource limit

The number of file descriptors that can be allocated to a process is governed by a resource limit. The default value is set in the /etc/security/limits file and is typically set at 2000. The limit can be changed by the ulimit command or the setrlimit subroutine. The maximum size is defined by the constant OPEN_MAX.