Containers effectively partition the resources managed by a single operating system into isolated groups to better balance the conflicting demands on resource usage between the isolated groups. In contrast to virtualization, neither instruction-level emulation nor just-in-time compilation is required. Containers can run instructions native to the core CPU without any special interpretation mechanisms. None of the complexities of paravirtualization or system call thunking are required either.
By providing a way to create and enter containers, an operating system gives applications the illusion of running on a separate machine while at the same time sharing many of the underlying resources. For example, the page cache of common files—glibc for example—may effectively be shared because all containers use the same kernel and, depending on the container configuration, frequent the same libc library. This sharing can often extend to other files in directories that do not need to be written to.
The savings realized by sharing these resources, while also providing isolation, mean that containers have significantly lower overhead than true virtualization.
Container technology has existed for a long time. Solaris Zones and BSD jails are examples of containers on non-Linux operating systems. Container technologies for Linux have a similarly extensive heritage: Linux-Vserver, OpenVZ, and FreeVPS. While each of these technologies has matured, these solutions have not made significant strides towards integrating their container support into the mainstream Linux kernel.
In Serge Hallyn's "Linux Security Modules: A containers cookbook" (developerWorks, February 2009), discover how to truly strengthen lightweight containers with SELinux and Smack policy. And see Resources for more on these technologies.
In contrast, the Linux Resource Containers project (developed and maintained by IBM's Daniel Lezcano; see Resources for the code) seeks to implement containers by contributing to the mainstream Linux kernel. At the same time, these contributions may be useful for the mature Linux container solutions—offering a common back end for the more mature container projects. This article offers a quick introduction to using the tools created by the LXC project.
To get the most out of this article, you should be comfortable using the command line to run programs like make, gcc, and patch. You should also be familiar with the task of expanding tarballs (*.tar.gz files).
The LXC project consists of a Linux kernel patch and userspace tools. The userspace tools rely on the new features added to the kernel by the patch in order to offer a simplified set of tools to manipulate containers.
Before being able to use LXC, you need to download Linux kernel source code, apply an appropriate LXC patch, then build, install, and boot it. Then the LXC tools must be downloaded, built, and installed.
I used a patched Linux 2.6.27 kernel. (See Resources for
links.) While the lxc patch to the 2.6.27 Linux kernel probably will not apply to the kernel source from your
favorite distribution's kernel, Linux versions after 2.6.27 may contain
significant portions of the functionality presented in the patch; hence,
using the latest patch and mainline kernel source is highly recommended.
Also, instead of downloading and patching kernel source code, you may
retrieve the code using
git clone git://git.kernel.org/pub/scm/linux/kernel/git/daveh/linux-2.6-lxc.git
Directions on how to patch, configure, build, install, and boot a kernel can be found at kernelnewbies.org (see Resources for a link).
LXC requires some specific kernel configurations. The easiest way to
properly configure the kernel for LXC is to use
make menuconfig, then select Container
support. This in turn selects a set of other configuration options
depending on which features your kernel supports.
In addition to a kernel that supports containers, you will need tools that make starting and managing containers a simple task. The primary tools used for container management in this article come from liblxc (see Resources for a link, and also see libvirt for an alternative). This section discusses:
- The liblxc tool
- The iproute2 tool
- How to configure networking
- How to populate a container filesystem (by building a custom Debian one or by running an ssh container)
- How to connect to a container filesystem (SSH, VNC, VT: tty, VT: GUI)
Download and expand liblxc (see Resources), and then, from within the liblxc directory:
./configure --prefix=/ make make install
If you're comfortable building a source RPM, one is available (see Resources).
To manage your network interfaces within containers, you need version 2.6.26 or later of the iproute2 package (see Resources). If your Linux distribution lacks this package, download, configure, make, and install it using the instructions from the source tarball.
Another key component of many functional containers is network access. Bridging (connecting Ethernet segments so that they appear to be a single Ethernet segment) is currently the best method of connecting a container to the network. To prepare to use LXC, we will create a bridge (see Resources) and use it to connect our real network interface with the container's network interface.
To create a bridge named br0:
brctl addbr br0 brctl setfd br0 0
Bring up the bridge interface with your IP from a pre-existing network
10.0.2.15 in this example):
ifconfig br0 10.0.2.15 promisc up. Add your
pre-existing network interface (
eth0 in this
example) to the bridge and remove its direct association with its IP
brctl addif br0 eth0 ifconfig eth0 0.0.0.0 up
Any interface added to the bridge
respond to that IP address. Finally, ensure that your default route sends
packets to your gateway with
route add -net default gw 10.0.2.2 br0. Later,
when you configure the container, you specify
br0 as a link to the outside world.
In addition to networking, containers often need their own filesystem. There are several methods to populate a container filesystem depending on your needs. I'll discuss two:
- Building a custom Debian container
- Running an ssh container
Building a custom Debian container is rather simple using the
debootstrap sid rootfs http://debian.osuosl.org/debian/
If you're making a large number of containers, you may find it saves time
to download the packages into a tarball first like so:
debootstrap --make-tarball sid.packages.tgz sid http://debian.osuosl.org/debian/.
As an example, this produces a .tar file that is about 71MB in size (52MB
compressed) while a root directory consumes nearly 200MB. Then to start
building the root directory in rootfs:
debootstrap --unpack-tarball sid.packages.tgz sid rootfs.
debootstrap manpage has more information
on building smaller or more suitable containers.)
This will result in an environment (see Resources) that is highly redundant with respect to the host container.
Running an ssh container lets you dramatically reduce the disk space unique to a container's filesystem. For example, this method uses mere kilobytes to enable running multiple ssh daemons on port 22 of different containers (see Resources for an example). The container does this by using read-only bind mounts of the critical root directories such as /bin, /sbin, /lib, etc. to share the sshd package contents from the existing Linux system. A network namespace is used, and barebones read-write contents are created.
The techniques used to generate such lightweight containers are primarily those used to generate chroot environments. The difference lies in the read-only bind mounts and the use of namespaces to enhance the isolation of the chroot environment to the point that it becomes an effective container.
Next you need to select a method for connecting to the container.
Connecting to your container is the next step. Several methods are available depending on how you choose to configure your container:
- VNC (GUI)
- VT: tty (text)
- VT: X (GUI)
Connecting via SSH is good if you do not need a GUI interface to your container. In this case, a simple ssh connection may suffice (see "Running an ssh container" above). This method has the benefit of relying on IP addressing to enable the creation of nearly arbitrary numbers of containers.
If your ssh connection takes a long time to reach the password prompt, the Avahi multicast DNS/Service Discovery daemon may be timing out during DNS lookups.
Connecting via Virtual Network Computing (VNC) lets you add a GUI interface for your container.
Use vnc4server to start an X server that serves only VNC clients. You will
need to have vnc4server installed to run it from the /etc/rc.local file of
your container (like so):
echo '/usr/bin/vnc4server :0 -geometry 1024x768 -depth 24' >> rootfs/etc/rc.local.
This creates an X display with 1024-by-768 resolution and 24-bit color
when the container starts. Then connecting is as simple as:
Connecting via VT: tty (text) is useful if your container shares ttys with
its host. In this case, you may use Linux Virtual Terminals (VT) to connect to your
container. The simplest use of VTs starts a login on one of these tty
devices, which tend to correspond with Linux VTs. The login process is
getty. To use VT 8:
echo '8:2345:respawn:/sbin/getty 38400 tty8' >> rootfs/etc/inittab
Once the container is started, it will run
on tty8, allowing users to log in to the container. You can use a similar
trick to restart the container using the LXC tools.
This technique does not enable a graphical interface to the container. Furthermore, since only one process at a time can attach to tty8, further configuration would be needed to enable multiple containers.
Connecting via VT: X allows you to run a GUI. To run
the GNOME Display Manager (gdm) on VT 9, then edit
While this enables a graphical interface, it still uses one of a limited number of Linux virtual terminals.
Now that you are running a suitable kernel, have installed the LXC utilities, and have a working environment, it's time to learn to manage instances of that environment. (Hint: Much of this is covered in greater detail in the LXC README.)
LXC uses the cgroup filesystem to manage containers. You must first mount
this filesystem before using LXC:
mount -t cgroup cgroup /cgroup. You may mount
the cgroup filesystem anywhere you like. LXC will use the first cgroup
filesystem mounted in /etc/mtab.
The rest of this article shows you some LXC basics and some miscellaneous points and discusses low-level access.
For the basics in using LXC tools, we'll look at:
- Creating a container
- Getting information about (or listing) existing containers
- Starting system and application containers
- Signalling processes running in a container
- Pausing, resuming, stopping, and destroying a container
Creating a container associates a name with a configuration file. The name will be used to manage a single container:
lxc-create -n name -f configfile
This allows multiple containers to simultaneously use the same configuration file. Within the configuration file, you specify attributes of the container such as its host name, networking, root filesystem, and fstab. After running the lxc-sshd script (which creates a configuration for you), the ssh container configuration looks like:
lxc.utsname = my_ssh_container lxc.network.type = veth lxc.network.flags = up lxc.network.link = br0 lxc.network.ipv4 = 10.0.2.16/24 lxc.network.name = eth0 lxc.mount = ./fstab lxc.rootfs = ./rootfs
Regardless of the configuration file, containers started with the LXC tools have their own view of the processes in the system, their own mount tree, and their own view of interprocess communication (IPC) resources available.
Apart from these, when a container starts, any type of resource omitted from the configuration is assumed to be shared with the host. This allows administrators to compactly specify the critical differences between the container and its host and enables portability of the configurations.
Listing information about existing containers is crucial to managing them. To show the state of a specific container:
lxc-info -n name
To show the processes that are part of a container:
LXC differentiates between two types of containers: system and application containers. System containers resemble virtual machines. In contrast to true virtualization, they have lower overhead at the cost of decreased isolation. This is a direct consequence of the fact that the same Linux kernel is utilized by every container. To resemble a virtual machine, a system container starts at the same place a Linux distribution starts: by running the init program:
lxc-start -n name init
In contrast to a system container, an application container only creates separate namespaces needed to isolate a single application. To start an application container:
lxc-execute -n name cmd
To send a signal to all processes running inside a container:
lxc-kill -n name -s SIGNAL
Pausing a container is conceptually similar to sending the
SIGSTOP signal to all the processes in a
container. However, sending spurious
signals can confuse some programs. So LXC uses the Linux process freezer
available through the cgroup interface:
lxc-freeze -n name
To resume a frozen container:
lxc-unfreeze -n name
Stopping a container causes all the processes started in the container to die and cleans up the container:
lxc-stop -n name
Destroying the container removes the configuration files and metadata associated with the name by the
lxc-destroy -n name
Here are a few miscellaneous activities (some related to monitoring) you might like to know.
To view and adjust the priority of a container:
lxc-priority -n name lxc-priority -n name -p priority
To continually watch state and priority changes of a container:
lxc-monitor -n name
Hit Ctrl-C to stop monitoring the container.
You may also wait for a container to enter one of a set of states separated
lxc-wait -n name -s states
To wait for all states except
lxc-wait -n name -s 'STOPPED|STARTING|STOPPING|ABORTING|FREEZING|FROZEN'
This will, of course, return immediately. Barring unforeseen errors, you
lxc-wait to return only when the
container has entered the given state.
LXC uses the cgroup filesystem to manage containers. It's possible to read and manipulate parts of the cgroup filesystem through LXC. Managing cpu usage of each container can be done by reading and adjusting the cpu.shares of the container like so:
lxc-cgroup -n name cpu.shares lxc-cgroup -n name cpu.shares howmany
Now that this basic guide has shown you how to get started with Linux Containers tools, you can start crafting your own effective resources partitions.
This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002.
- See the companion article by Serge Hallyn,
"Secure Linux containers cookbook"
(developerWorks, February 2009) and discover how to truly strengthen
lightweight containers with SELinux and Smack policy.
instructions on how to patch, configure, build, install, and boot a Linux
- Other container technologies include
- Learn about creating a
prepare to use
is highly redundant with respect to the host container.
This ssh example
uses mere kilobytes to enable running multiple ssh daemons on port 22 of
- In the
developerWorks Linux zone,
find more resources for Linux developers (including developers who are
new to Linux),
and scan our
most popular articles and
- See all
- Stay current with
developerWorks technical events and Webcasts.
Get products and technologies
Linux Resource Containers project
on SourceForge.net is a repository of code for application container
implementation in the Linux kernel, a staging area for code that may be
sent to the linux-kernel mailing list.
2.6.27 Linux kernel
lxc patch to the 2.6.27 Linux kernel.
get the primary tools
used for container management used in this article. A
source RPM is available.
libvirt is a toolkit and Linux
virtualization API meant to provide common routines for managing
virtual machine and container instances. libvirt works regardless of
whether these instances use Xen, QEMU, or KVM for virtualization and LXC
or OpenVZ for containers.
Get version 2.6.26 or later of the
IBM trial software,
available for download directly from developerWorks, build your next
development project on Linux.
- Get involved in the
through blogs, forums, podcasts, and spaces.