 | Level: Introductory Paul Larson (pl@us.ibm.com), Software Engineer, Linux Technology Center, IBM
17 Feb 2004 The long-awaited 2.6 kernel is finally here. The IBM Linux Technology Center's Paul Larson takes a look behind the scenes at the tools, tests, and techniques -- from revision control and regression testing to bugtracking and list keeping -- that helped make 2.6 a better kernel than any that have come before it.
In the three years of active development leading up to the recent
release of the new 2.6 Linux kernel, some interesting changes took
place in the way the Linux kernel is developed and tested. In many ways,
the methods used to develop the Linux kernel are much the same today as
they were 3 years ago. However, several key changes have improved overall stability as well as quality.
Source code
management
Historically, there never was a formal source code management or
revision control system for the Linux kernel. It's true that many
developers did their own revision control, but there was no
official Linux CVS archive that Linus Torvalds checked code into and
others could pull from. This lack of revision control often left gaping
holes between releases, where nobody really knew which changes were in,
whether they were merged properly, or what new things to expect in the
upcoming release. Often, things were broken in ways that could
have been avoided had more developers been able to see changes as they were made.
The lack of formal revision control and source code management
led many to suggest the use of a product called BitKeeper. BitKeeper is a
source control management system that many kernel developers had already
been using successfully for their own kernel development work. Shortly
after the first 2.5 kernels were released, Linus Torvalds began using
BitKeeper on a trial basis to see if it would fit his needs. Today,
BitKeeper is used to manage the Linux kernel source code for both the main
2.4 and 2.5 kernels. To most users, who may have little or no concern for
kernel development, this may seem insignificant. However, there are
several ways that users can benefit from the changes that the use of
BitKeeper have brought about in the methods used to develop the Linux
kernel.
One of the key benefits that BitKeeper has provided is in merging
patches. When multiple patches are applied to the same base of code, and
some of those patches affect the same parts, merging problems are to be
expected. A good source code management system can do some of the more
tedious parts of this automatically, which makes merging patches faster
and allows greater throughput for patches going into the kernel. As the
community of Linux kernel developers expands, revision control is
important for helping keep track of all the changes. Since a single
person is responsible for integrating these changes into the main Linux
kernel, tools such as BitKeeper are essential to ensure that patches aren't
forgotten and are easily merged and managed.
Having a live, central repository for the latest changes to the Linux
kernel is invaluable. Every change or patch that is accepted into the
kernel is tracked as a changeset. End users and developers can keep their
own copy of the source repository and update it at will with the
latest changesets using a simple command. For developers, this means the
ability to always be working with the latest copy of the code. Testers
can use these logical changesets to determine which change caused a
problem, shortening the time needed for debugging. Even end users who want to
use the latest kernels can benefit from a live, central repository
directly, since they now have the ability to update as soon as a feature or
bugfix they need goes into the kernel. Any user can also provide
immediate feedback and bug reports on code as it is being merged into the
kernel.
Parallel development
As the Linux kernel has grown, become more complex, and gained the
attention of more developers that tend to specialize in the development of
particular aspects of the kernel, another interesting change has come
about in the methods used to develop Linux. During the development of the
2.3 kernel version, there were a few other kernel trees besides the main
one released by Linus Torvalds.
During the course of development of 2.5, there was an explosion
of kernel trees. Some of this parallelization of development was made
possible through the use of source code management tools because of the
ability to keep parallel lines of development synchronized. Some of the
parallelization of development was necessary to allow others to test large
changes before they were accepted. There were kernel maintainers that
kept their own trees that focused on specific components and goals such
as memory management, NUMA features, scalability improvements, and
architecture-specific code, and even some trees that collected and tracked lots
of small bug fixes.
Figure 1. The Linux 2.5 development tree
The advantage to this parallel development model is that it allows
developers of large changes, or large amounts of similar changes towards a
particular goal, the freedom to develop in a controlled environment
without affecting the stability of the kernel for everyone else. When
developers are ready, they can release patches against the current version
of the Linux kernel that implement all of the changes they have made so
far. Testers in the community can then easily test those changes and
provide feedback. As pieces are proven to be stable, those pieces can be
merged into the main Linux kernel individually, or even all at once.
Testing in the Bazaar
Historically, the approach to testing the Linux
kernel has centered around the open source development model. Since
the code is open to review by other developers as soon as it is released,
there was never a formal verification cycle performed as is common in
other forms of software development. The philosophy behind this approach,
called "Linus's Law" in "The Cathedral and the Bazaar" (please see Resources for a reference to that work) is "Given
enough eyeballs, all bugs are shallow." In other words, heavy peer review
should catch most of the really large problems.
In reality though, the kernel has many complex interactions. Even
with abundant peer review, many serious bugs can slip though.
Additionally, end users can, and often do, download and use the latest
kernels as they are released. At the time 2.4.0 was released, many in the
community were calling for a more organized testing effort to complement
the strengths of ad-hoc testing and code review. Organized testing
includes the use of test plans, repeatability in the testing process, and
the like. The use of all three methods leads to better code quality than
the original two methods alone.
Linux Test Project
One of the first contributors to bringing organized testing to Linux was
the Linux Test Project (LTP). This project is aimed at improving the
quality of Linux through more organized testing methods. Part of this
test project includes the development of automated test suites. The main
test suite developed by the LTP is also called the Linux Test Project.
At the time the 2.4.0 kernel was released, the LTP test suite only had
around 100 tests. As Linux was growing and maturing through 2.4 and 2.5
kernels, the LTP test suite was growing and maturing as well.
Today, the Linux Test Project contains well over 2000 tests, and the
number of tests is still
growing!
Code coverage analysis
New tools are now being used that instrument the
kernel in such a way that code coverage analysis can be performed.
Coverage analysis tells us which lines of code in the kernel are executed
while a given test is running. More importantly, coverage analysis
exposes which areas of the kernel are not being tested at all. This
data is
important
because it shows which new tests should be
written to test those areas of the kernel, leading to a kernel that is
more thoroughly tested.
Nightly kernel regression
testing
During the 2.5 development cycle, another project undertaken by the
Linux Test Project involved using the LTP test suite to perform nightly
regression testing of the Linux kernel. The use of BitKeeper created a
live, central repository for pulling snapshots of the Linux kernel at any
time. Before the use of BitKeeper and snapshots came about, testers had
to wait for releases before testing could begin. Now, testers can test
the changes as they are being made.
Another advantage of using automation tools to perform
regression tests nightly is fewer changes introduced since the last test. If a new regression bug is found, it is
often easy to detect which change is likely to have caused it.
Also, since the change is very
recent, it is still fresh on the minds of the developers -- hopefully
making it easier for them to remember and fix the relevant code.
Perhaps there should be a corollary to Linus' Law stating that some bugs are shallower than others, because those are
exactly the ones that nightly kernel regression testing weeds out. The
ability to do this daily, during the development cycle and before actual
releases are made, enables the testers who only look at full releases to
spend their eyeball time only on more serious and time-consuming bugs.
Scalable Test Platform
Another group called the Open Source Development Labs (OSDL) has also made
significant contributions to Linux testing. Some time after the 2.4
kernel had been released, the OSDL created a system called the Scalable
Test Platform (STP). The STP is an automated test platform that allows
developers and testers to run tests made available through the system on
hardware at OSDL. Developers can even test their own patches against
kernels using this system. The scalable test platform simplifies the
testing process since STP takes care of building the kernel, setting up
the test, running the test, and gathering results. Results are then
archived for future comparisons. Another benefit of this system is that
many people do not have access to large systems such as SMP machines with
8 processors. Through STP, anyone can run tests on large systems such as
these.
Tracking bugs
One of the biggest improvements in organized testing of the Linux kernel
that has happened since the release of 2.4 is bug tracking.
Historically, bugs found in the Linux kernel were reported to the Linux
kernel mailing list, to more component- or architecture-specific mailing
lists, or directly to the individual that maintains the section of code
where the bug was found. Deficiencies in this system were quickly
revealed as the number of people developing and testing Linux increased.
In the past, bugs were often missed, forgotten, or ignored unless the
person reporting the bug was incredibly persistent.
Now, a bug tracking
system has been installed at OSDL (see Resources
for a link) for reporting and
tracking bugs against the Linux kernel. The system is configured so that the maintainer of a component is notified when a bug against
that component has been reported. The maintainer can then either accept
and fix the bug, reassign the bug if it turns out to actually be a bug in
another part of the kernel, or reject it if it turns out to be something
such as a misconfigured system. Bugs reported to a mailing list run the
risk of being lost as more and more e-mail pours onto the list. In a bug
tracking system, however, there is always a record of every bug and the
state it is in.
Volumes of
information
In addition to these automated methods of information management,
an amazing amount of information was
gathered and tracked by various members of the open source community
during the development of what would become the 2.6 Linux kernel.
For instance, a status
list was created at the Kernel Newbies site
to keep track of new kernel features that had been
proposed. The list contains items sorted by status, which kernel
they had been included in if they were complete, and how far along they
were if they were still incomplete. Many of the items on the list contain
links to a Web site for large projects, or to a copy of an e-mail message
explaining the feature in the case of smaller items.
 |
Kernel version history
Many of us are familiar with the Linux kernel version numbering system by
now, but Andries Brouwer reminds us of
how atypical
it really is.
The first public release of Linux was
version 0.02
in October 1991.
Two months later, in December 1991, Linus released version 0.11, the first
stand-alone kernel capable of operating without Minix.
After the release of 0.12 one month later, the version number jumped to
0.95 in March
as a reflection of the system's growing
maturity. Nonetheless, the 1.0.0 milestone didn't come until two years
later, in March 1994.
The chronology of the two "streams" of kernel development dates from about
this time. Even-numbered kernels (such as 1.0, 2.2, 2.4, and now 2.6) are
stable, "production" models.
Meanwhile the odd-numbered kernel versions (1.1, 2.3) are cutting-edge or
"development" kernels. Until recently, work on a new development kernel
followed the release of a stable kernel only by a few months. However,
work on 2.5 started some ten months after 2.4 was finished.
So when can we expect kernel 2.7? It's hard to say, but there is already
a thread to discuss
it at KernelTrap.
Until that happens, you may enjoy reading more about the History of Linux in this
article from Ragib Hasan.
|
|
The "post-halloween document," meanwhile, told users what
to expect from the upcoming 2.6 kernel (see Resources for a link). The post-halloween document
mostly discussed major changes that users would notice and system
utilities that would need to be updated in order to take advantage of
them. Linux distributors and even end users wanting an early
peek at what would be in the 2.6 kernels were the main audience for this
information, which allowed them to
determine if there were programs they should upgrade in order to
take advantage of new features.
The Kernel Janitors project kept (and in fact is still keeping) a list of
smaller bugs and cleanups that
needed to be fixed. Many of these bugs or cleanups are caused by a
larger patch going into the kernel that requires changes to many parts of
the code, such as something that would affect device drivers. Those who
are new to kernel development can work on items from this list, allowing
them a chance to benefit the community while learning how to write
kernel code on smaller projects.
In yet another pre-release project, John Cherry tracked the number of
errors and warnings found during
the kernel compile for every version of the kernel that was released.
These compile statistics consistently dropped over time, and
releasing these results in a systematic way made it obvious how much
progress
was being made. In
many cases, some of these warnings and error messages could be used in the
same way the Kernel Janitors list is used, as compile warnings
are often attributable to minor bugs that
require little effort to fix.
Finally, there was Andrew Morton's "must-fix" list. Since he
had been chosen as the maintainer of the post-release 2.6 kernel,
he exercised his prerogative to
outline those problems he believed to be the highest priority
for resolution before the release of the final 2.6 kernel. The must-fix
list contained references to bugs in the kernel Bugzilla system, features
that need to be finished out, and other known issues that many felt should
block the release of 2.6 until resolved. This information helped to set
the roadmap for what steps needed to be taken before the new release was
made; it also provided valuable information to those who were curious
about how close the much-anticipated 2.6 release was to being made.
Some of these resources have obviously ceased to be maintained since the
release of the 2.6 kernel late last year. Others have found that their
work has not ended after that major release, and continue to post
updates. It will be interesting to see which are picked up again, and what
additional innovations are made, once we again approach a major release.
Conclusion
When most people think about a new stable version of the kernel, the first
question is usually, "What's new in this release?" Below the surface of
features and fixes though, there is a process that is being refined over
time.
Open source development is thriving in the Linux community.
The looseness of the confederacy of coders who work on the kernel and
other aspects of Linux allow the group to adapt successfully. In many
ways, the way that Linux is developed and tested -- and specifically, the
way this has evolved over time -- has had more impact on the reliability
of the new kernel than many of the individual enhancements and bug fixes
have had.
Resources - If you haven't yet (or if the last time you did was quite a while
ago), (re)read "The Cathedral
and the Bazaar" by Eric S. Raymond. This essay and others by Mr.
Raymond have also been published in book form (O'Reilly &
Associates, 1999).
- Read the Wikipedia definition of Linus's Law, first
formulated in ESR's "The
Cathedral and the Bazaar."
- BitKeeper is a source control
management system. The Linux kernel Bitkeeper is hosted at BitMover.
- The Kernel Tracker system
for posting bugs against the mainline Linux kernels is based on Bugzilla.
- During 2.5 development, the
"post-halloween" document kept users informed as to what they could
expect from the 2.6 kernel. The Kernel Janitors project began
keeping a list of smaller bugs and cleanups (Kernel Janitors is still
being updated).
- Andrew Morton
maintains the 2.6 kernel since its official release. Here you can see an example of the must-fix
list he was keeping prior to the 2.6 release.
- To keep up-to-date on the latest kernel news now that 2.6 is out, you
can monitor or subscribe to the (danger: high-volume!) Linux
Kernel Mailing List. Or those people who actually have lives can
monitor (or subscribe to) Kernel Traffic,
which could be likened to a sort of expert annotated digest of the LKML.
- LinuxHQ is also an
excellent source of resources and information about the kernel and kernel
programming.
- Open Source Development Labs (OSDL) is a consortium dedicated to
accelerating the adoption of Linux. Upon the release of the 2.6 kernel,
they issued a press
release outlining their contributions to the new kernel and providing
more information about themselves. You can view their kernel testing
results (including John Cherry's Linux 2.6 Compile
Statistics) from the OSDL Linux
Stability page. The OSDL is funded by member companies including IBM,
Red Hat Linux, SUSE LINUX, and many others.
- OSDL, IBM, and many others also contribute to the Linux Test Project (or LTP).
- Regression testing makes sure that new code doesn't break old code.
Read about it -- and other forms of testing -- in Testing Craft's Grand Index of [testing]
Techniques.
- You can read many articles about how software
testing is done at IBM in the IBM Systems Journal.
- Internally, IBM's Linux
Technology Center works directly with the Linux development community.
- The Linux at IBM
site features Linux news and information from throughout IBM.
- Prior to the 2.6 release, IBM developerWorks featured a "Towards
Linux 2.6" (developerWorks, September 2003) look at some of the coming attractions of the new kernel,
including the new scheduler and the Native Posix Threading Library (NPTL).
- Also read the article "Putting
Linux reliability to the test" (developerWorks, December 2003).
-
Find more resources for Linux developers in the developerWorks Linux zone.
- Browse for books on these and other technical topics.
About the author  | |  |
Paul Larson works on the Linux Test team in the Linux Technology Center at IBM. Some of the projects he has been working on over the past year include the Linux Test Project, 2.5/2.6 kernel stabilization, and kernel code coverage analysis. He can be reached at pl@us.ibm.com.
|
Rate this page
|  |