While formal introductions to networking commonly refer to the Open Systems Interconnection (OSI) model, this introduction to the basic networking stack in Linux uses the four-layer model known as the Internet model (see Figure 1).
Figure 1. The Internet model of a network stack
At the bottom of the stack is the link layer. The link layer refers to the device drivers providing access to the physical layer, which could be numerous mediums, such as serial links or Ethernet devices. Above the link layer is the network layer, which is responsible for directing packets to their destinations. The next layer, called the transport layer, is responsible for peer-to-peer communication (for example, within a host). While the network layer manages communication between hosts, the transport layer manages communication between endpoints within those hosts. Finally, there's the application layer, which is commonly the semantic layer that understands the data being moved. For example, the Hypertext Transfer Protocol (HTTP) moves requests and responses for Web content between a server and a client.
Practically speaking, the layers of the networking stack go by much more recognizable names. At the link layer, you find Ethernet, the most common high-speed medium. Older link-layer protocols include the serial protocols such as the Serial Line Internet Protocol (SLIP), Compressed SLIP (CSLIP), and the Point-to-Point Protocol (PPP). The most common network layer protocol is Internet Protocol (IP), but other protocols exist at the network layer that satisfy other needs, such as the Internet Control Message Protocol (ICMP) and the Address Resolution Protocol (ARP). At the transport layer is the Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). Finally, the application layer includes many familiar protocols, including the standard Web protocol, HTTP, and the e-mail protocol, Simple Mail Transfer Protocol (SMTP).
Now on to the architecture of the Linux network stack and how it implements the
Internet model. Figure 2 provides a high-level view of the Linux network stack. At
the top is the user space layer, or application layer, which defines the users of the
network stack. At the bottom are the physical devices that provide connectivity to
the networks (serial or high-speed networks such as Ethernet). In the middle, or
kernel space, is the networking subsystem that is the focus of this article.
Through the interior of the networking stack flow socket buffers
(sk_buffs) that move packet data between sources and
sinks. You'll see the sk_buff structure shortly.
Figure 2. Linux high-level network stack architecture
First, here's a quick overview of the core elements of the Linux networking subsystem, followed by more detail in later sections. At the top (see Figure 2) is the system call interface. This simply provides a way for user-space applications to gain access to the kernel's networking subsystem. Next is a protocol-agnostic layer that provides a common way to work with the underlying transport-level protocols. Next are the actual protocols, which in Linux include the built-in protocols of TCP, UDP, and, of course, IP. Next is another agnostic layer that permits a common interface to and from the individual device drivers that are available, followed at the end by the individual device drivers themselves.
The system call interface can be described from two perspectives. When a
networking call is made by the user, it is multiplexed through the system call
interface into the kernel. This ends up as a call to
sys_socketcall in ./net/socket.c, which then further
demultiplexes the call to its intended target. The other perspective of the system
call interface is the use of normal file operations for networking I/O. For
example, typical read and write operations may be performed on a networking socket
(which is represented by a file descriptor, just as a normal file). Therefore,
while there exist a number of operations that are specific to networking (creating
a socket with the socket call, connecting it to a
destination with the connect call, and so on), there
are also a number of standard file operations that apply to networking objects
just as they do to regular files. In the end, the syscall interface provides the
means to transfer control between the user-space application and the kernel.
The sockets layer is a protocol agnostic interface that provides a set of common functions to support a variety of different protocols. The sockets layer not only supports the typical TCP and UDP protocols, but also IP, raw Ethernet, and other transport protocols, such as Stream Control Transmission Protocol (SCTP).
Communication through the network stack takes place with a socket. The socket
structure in Linux is struct sock, which is defined in
linux/include/net/sock.h. This large structure contains all of the required state
of a particular socket, including the particular protocol used by the socket and
the operations that may be performed on it.
The networking subsystem knows about the available protocols through a special
structure that defines its capabilities. Each protocol maintains a structure
called proto (found in linux/include/net/sock.h). This
structure defines the particular socket operations that can be performed from the
sockets layer to the transport layer (for example, how to create a socket, how to
establish a connection with a socket, how to close a socket, and so on).
The network protocols section defines the particular networking protocols that
are available (such as TCP, UDP, and so on). These are initialized at start of day
in a function called inet_init in
linux/net/ipv4/af_inet.c (as TCP and UDP are part of the
inet family of protocols). The
inet_init function registers each of the built-in
protocols using the proto_register function. This
function is defined in linux/net/core/sock.c, and, in addition to adding the
protocol to the active protocol list, it also optionally allocates one or more
slab caches if required.
You can see how individual protocols identify themselves through the
proto structure in files tcp_ipv4.c, udp.c, and raw.c
in linux/net/ipv4/. Each of these protocol structures are mapped by type and
protocol into the inetsw_array, which maps the built-in
protocols to their operations. The structure of
inetsw_array and its relationships is shown in Figure
3. Each of the protocols in this array is initialized at start of day into
inetsw through a call to
inet_register_protosw from
inet_init. Function
inet_init also initializes the various
inet modules, such as the ARP, ICMP, the IP modules,
and the TCP and UDP modules.
Figure 3. Structure of the Internet protocol array
Note from Figure 3 that the
proto structure defines the transport-specific methods,
while the proto_ops structure defines the general
socket methods. Additional protocols can be added to
inetsw protocol switch through a call to
inet_register_protosw. For example, the SCTP adds
itself through a call to sctp_init in
linux/net/sctp/protocol.c. For more information about the SCTP, check out the
Resources section.
Data movement for sockets takes place using a core structure called the socket
buffer (sk_buff). An sk_buff
contains packet data and also state data that cover multiple layers of the
protocol stack. Each packet sent or received is represented with an
sk_buff. The sk_buff
structure is defined in linux/include/linux/skbuff.h and shown in Figure 4.
Figure 4. Socket buffer and its relationship to other structures
As shown, multiple sk_buff may be chained together for
a given connection. Each sk_buff identifies the device
structure (net_device) to which the packet is being
sent or from which the packet was received. As each packet is represented with an
sk_buff, the packet headers are conveniently located
through a set of pointers (th,
iph, and mac for the Media
Access Control, or MAC, header). Because the sk_buff
are central to the socket data management, a number of support functions have been
created to manage them. Functions exist for sk_buff
creation and destruction, cloning, and queue management.
Socket buffers are designed to be linked together for a given socket and include a multitude of information, including the links to the protocol headers, a timestamp (when the packet was sent or received), and the device associated with the packet.
Below the protocols layer is another agnostic interface layer that connects protocols to a variety of hardware device drivers with varying capabilities. This layer provides a common set of functions to be used by lower-level network device drivers to allow them to operate with the higher-level protocol stack.
First, device drivers may register or unregister themselves to the kernel through
a call to register_netdevice or
unregister_netdevice. The caller first fills out the
net_device structure and then passes it in for
registration. The kernel calls its init function (if
one is defined), performs a number of sanity checks, creates a
sysfs entry, and then adds the new device to the device
list (a linked list of devices active in the kernel). You can find the
net_device structure in
linux/include/linux/netdevice.h. The various functions are implemented in
linux/net/core/dev.c.
To send an sk_buff from the protocol layer to a
device, the dev_queue_xmit function is used. This
function enqueues an sk_buff for eventual transmission
by the underlying device driver (with the network device being defined by the
net_device or sk_buff->dev
reference in the sk_buff). The
dev structure contains a method, called
hard_start_xmit, that holds the driver function for
initiating transmission of an sk_buff.
Receiving a packet is performed conventionally with
netif_rx. When a lower-level device driver receives a
packet (contained within an allocated sk_buff), the
sk_buff is passed up to the network layer through a
call to netif_rx. This function then queues the
sk_buff to an upper-layer protocol's queue for further
processing through netif_rx_schedule. You can find the
dev_queue_xmit and netif_rx
functions in linux/net/core/dev.c.
Recently, a new application program interface (NAPI) was introduced into the
kernel to allow drivers to interface with the device agnostic layer
(dev). Some drivers use NAPI, but the large majority
still use the older frame reception interface (by a rough factor of six to one).
NAPI can yield better performance under high loads by avoiding taking an interrupt
for each incoming frame.
At the bottom of the network stack are the device drivers that manage the physical network devices. Examples of devices at this layer include the SLIP driver over a serial interface or an Ethernet driver over an Ethernet device.
At initialization time, a device driver allocates a
net_device structure and then initializes it with its
necessary routines. One of these routines, called
dev->hard_start_xmit, defines how the upper layer
should enqueue an sk_buff for transmission. This
routine takes an sk_buff. The operation of this
function is dependent upon the underlying hardware, but commonly the packet
described by the sk_buff is moved to a hardware ring or
queue. Frame receipt, as described in the device agnostic layer, uses the
netif_rx interface or
netif_receive_skb for a NAPI-compliant network driver.
A NAPI driver puts constraints on the capabilities of the underlying hardware. See
the Resources section for more details.
After a device driver configures its interfaces in the
dev structure, a call to
register_netdevice makes it available for use. You can
find the drivers specific to network devices in linux/drivers/net.
The Linux source code is a great way to learn about the design of device drivers for a multitude of device types, including network device drivers. What you'll find is a variation in design and usage of the available kernel APIs, but each is useful for instruction or as a starting point for a new device driver. The remaining code in the network stack is common and usable unless you require a new protocol. Even then, the implementations of TCP (for a stream protocol) or UDP (for a message-based protocol) serve as useful models for starting out with new development.
Learn
- Check out
"Introduction to the Internet Protocols"
at www.linuxjunkies.org for a quick introduction to TCP/IP, UDP, and
ICMP.
-
"Kernel command using Linux system calls"
(developerWorks, March 2007) covers the Linux system call interface, which is an
important layer in the Linux kernel with user-space support from the GNU C Library
(glibc) that enables function calls between user space and the kernel.
-
"Access the Linux kernel using the /proc filesystem"
(developerWorks, March 2006) looks at the /proc file system, a virtual file system
that provides a novel way for user-space applications to communicate with the
kernel. This article demonstrates /proc, as well as loadable kernel modules.
- Linux, like BSD, is a great operating system if
you're interested in networking protocols.
"Better networking with SCTP"
(developerWorks, February 2006) covers one of the most interesting networking
protocols, SCTP, which operates like TCP but adds a number of useful features such
as messaging, multi-homing, and multi-streaming.
-
"Anatomy of the Linux slab allocator"
(developerWorks, May 2007) covers one of the most interesting aspects of memory
management in Linux, the slab allocator. This mechanism originated in SunOS, but
it's found a friendly home inside the Linux kernel.
- A NAPI driver has advantages over drivers using
the older packet processing framework, from better interrupt management to packet
throttling. You can read more about
NAPI's interface and design
at OSDL.
- Check out Tim's book
GNU/Linux Application Programming
for more information on programming Linux in
user space.
- In Tim's book
BSD Sockets
Programming from a Multi-Language Perspective
, learn about sockets programming using the BSD
Sockets API.
- In the
developerWorks Linux zone,
find more resources for Linux developers, including
Linux tutorials,
as well as
our readers' favorite Linux articles and tutorials
over the last month.
- Stay current with
developerWorks technical events and Webcasts.
Get products and technologies
-
Order the SEK for Linux,
a two-DVD set containing the latest IBM trial software for Linux from DB2®,
Lotus®, Rational®, Tivoli®, and WebSphere®.
- With
IBM trial software,
available for download directly from developerWorks, build your next development
project on Linux.
Discuss
- Get involved in the
developerWorks community
through our developer blogs, forums, podcasts, and community topics in our new
developerWorks spaces.

M. Tim Jones is an embedded software architect and the author of GNU/Linux Application Programming, AI Application Programming, and BSD Sockets Programming from a Multilanguage Perspective. His engineering background ranges from the development of kernels for geosynchronous spacecraft to embedded systems architecture and networking protocols development. Tim is a Consultant Engineer for Emulex Corp. in Longmont, Colorado.
Comments (Undergoing maintenance)





