In the three years of active development leading up to the recent release of the new 2.6 Linux kernel, some interesting changes took place in the way the Linux kernel is developed and tested. In many ways, the methods used to develop the Linux kernel are much the same today as they were 3 years ago. However, several key changes have improved overall stability as well as quality.
Historically, there never was a formal source code management or revision control system for the Linux kernel. It's true that many developers did their own revision control, but there was no official Linux CVS archive that Linus Torvalds checked code into and others could pull from. This lack of revision control often left gaping holes between releases, where nobody really knew which changes were in, whether they were merged properly, or what new things to expect in the upcoming release. Often, things were broken in ways that could have been avoided had more developers been able to see changes as they were made.
The lack of formal revision control and source code management led many to suggest the use of a product called BitKeeper. BitKeeper is a source control management system that many kernel developers had already been using successfully for their own kernel development work. Shortly after the first 2.5 kernels were released, Linus Torvalds began using BitKeeper on a trial basis to see if it would fit his needs. Today, BitKeeper is used to manage the Linux kernel source code for both the main 2.4 and 2.5 kernels. To most users, who may have little or no concern for kernel development, this may seem insignificant. However, there are several ways that users can benefit from the changes that the use of BitKeeper have brought about in the methods used to develop the Linux kernel.
One of the key benefits that BitKeeper has provided is in merging patches. When multiple patches are applied to the same base of code, and some of those patches affect the same parts, merging problems are to be expected. A good source code management system can do some of the more tedious parts of this automatically, which makes merging patches faster and allows greater throughput for patches going into the kernel. As the community of Linux kernel developers expands, revision control is important for helping keep track of all the changes. Since a single person is responsible for integrating these changes into the main Linux kernel, tools such as BitKeeper are essential to ensure that patches aren't forgotten and are easily merged and managed.
Having a live, central repository for the latest changes to the Linux kernel is invaluable. Every change or patch that is accepted into the kernel is tracked as a changeset. End users and developers can keep their own copy of the source repository and update it at will with the latest changesets using a simple command. For developers, this means the ability to always be working with the latest copy of the code. Testers can use these logical changesets to determine which change caused a problem, shortening the time needed for debugging. Even end users who want to use the latest kernels can benefit from a live, central repository directly, since they now have the ability to update as soon as a feature or bugfix they need goes into the kernel. Any user can also provide immediate feedback and bug reports on code as it is being merged into the kernel.
As the Linux kernel has grown, become more complex, and gained the attention of more developers that tend to specialize in the development of particular aspects of the kernel, another interesting change has come about in the methods used to develop Linux. During the development of the 2.3 kernel version, there were a few other kernel trees besides the main one released by Linus Torvalds.
During the course of development of 2.5, there was an explosion of kernel trees. Some of this parallelization of development was made possible through the use of source code management tools because of the ability to keep parallel lines of development synchronized. Some of the parallelization of development was necessary to allow others to test large changes before they were accepted. There were kernel maintainers that kept their own trees that focused on specific components and goals such as memory management, NUMA features, scalability improvements, and architecture-specific code, and even some trees that collected and tracked lots of small bug fixes.
Figure 1. The Linux 2.5 development tree
The advantage to this parallel development model is that it allows developers of large changes, or large amounts of similar changes towards a particular goal, the freedom to develop in a controlled environment without affecting the stability of the kernel for everyone else. When developers are ready, they can release patches against the current version of the Linux kernel that implement all of the changes they have made so far. Testers in the community can then easily test those changes and provide feedback. As pieces are proven to be stable, those pieces can be merged into the main Linux kernel individually, or even all at once.
Historically, the approach to testing the Linux kernel has centered around the open source development model. Since the code is open to review by other developers as soon as it is released, there was never a formal verification cycle performed as is common in other forms of software development. The philosophy behind this approach, called "Linus's Law" in "The Cathedral and the Bazaar" (please see Resources for a reference to that work) is "Given enough eyeballs, all bugs are shallow." In other words, heavy peer review should catch most of the really large problems.
In reality though, the kernel has many complex interactions. Even with abundant peer review, many serious bugs can slip though. Additionally, end users can, and often do, download and use the latest kernels as they are released. At the time 2.4.0 was released, many in the community were calling for a more organized testing effort to complement the strengths of ad-hoc testing and code review. Organized testing includes the use of test plans, repeatability in the testing process, and the like. The use of all three methods leads to better code quality than the original two methods alone.
One of the first contributors to bringing organized testing to Linux was the Linux Test Project (LTP). This project is aimed at improving the quality of Linux through more organized testing methods. Part of this test project includes the development of automated test suites. The main test suite developed by the LTP is also called the Linux Test Project. At the time the 2.4.0 kernel was released, the LTP test suite only had around 100 tests. As Linux was growing and maturing through 2.4 and 2.5 kernels, the LTP test suite was growing and maturing as well. Today, the Linux Test Project contains well over 2000 tests, and the number of tests is still growing!
New tools are now being used that instrument the kernel in such a way that code coverage analysis can be performed. Coverage analysis tells us which lines of code in the kernel are executed while a given test is running. More importantly, coverage analysis exposes which areas of the kernel are not being tested at all. This data is important because it shows which new tests should be written to test those areas of the kernel, leading to a kernel that is more thoroughly tested.
During the 2.5 development cycle, another project undertaken by the Linux Test Project involved using the LTP test suite to perform nightly regression testing of the Linux kernel. The use of BitKeeper created a live, central repository for pulling snapshots of the Linux kernel at any time. Before the use of BitKeeper and snapshots came about, testers had to wait for releases before testing could begin. Now, testers can test the changes as they are being made.
Another advantage of using automation tools to perform regression tests nightly is fewer changes introduced since the last test. If a new regression bug is found, it is often easy to detect which change is likely to have caused it.
Also, since the change is very recent, it is still fresh on the minds of the developers -- hopefully making it easier for them to remember and fix the relevant code. Perhaps there should be a corollary to Linus' Law stating that some bugs are shallower than others, because those are exactly the ones that nightly kernel regression testing weeds out. The ability to do this daily, during the development cycle and before actual releases are made, enables the testers who only look at full releases to spend their eyeball time only on more serious and time-consuming bugs.
Another group called the Open Source Development Labs (OSDL) has also made significant contributions to Linux testing. Some time after the 2.4 kernel had been released, the OSDL created a system called the Scalable Test Platform (STP). The STP is an automated test platform that allows developers and testers to run tests made available through the system on hardware at OSDL. Developers can even test their own patches against kernels using this system. The scalable test platform simplifies the testing process since STP takes care of building the kernel, setting up the test, running the test, and gathering results. Results are then archived for future comparisons. Another benefit of this system is that many people do not have access to large systems such as SMP machines with 8 processors. Through STP, anyone can run tests on large systems such as these.
One of the biggest improvements in organized testing of the Linux kernel that has happened since the release of 2.4 is bug tracking. Historically, bugs found in the Linux kernel were reported to the Linux kernel mailing list, to more component- or architecture-specific mailing lists, or directly to the individual that maintains the section of code where the bug was found. Deficiencies in this system were quickly revealed as the number of people developing and testing Linux increased. In the past, bugs were often missed, forgotten, or ignored unless the person reporting the bug was incredibly persistent.
Now, a bug tracking system has been installed at OSDL (see Resources for a link) for reporting and tracking bugs against the Linux kernel. The system is configured so that the maintainer of a component is notified when a bug against that component has been reported. The maintainer can then either accept and fix the bug, reassign the bug if it turns out to actually be a bug in another part of the kernel, or reject it if it turns out to be something such as a misconfigured system. Bugs reported to a mailing list run the risk of being lost as more and more e-mail pours onto the list. In a bug tracking system, however, there is always a record of every bug and the state it is in.
In addition to these automated methods of information management, an amazing amount of information was gathered and tracked by various members of the open source community during the development of what would become the 2.6 Linux kernel.
For instance, a status list was created at the Kernel Newbies site to keep track of new kernel features that had been proposed. The list contains items sorted by status, which kernel they had been included in if they were complete, and how far along they were if they were still incomplete. Many of the items on the list contain links to a Web site for large projects, or to a copy of an e-mail message explaining the feature in the case of smaller items.
The "post-halloween document," meanwhile, told users what to expect from the upcoming 2.6 kernel (see Resources for a link). The post-halloween document mostly discussed major changes that users would notice and system utilities that would need to be updated in order to take advantage of them. Linux distributors and even end users wanting an early peek at what would be in the 2.6 kernels were the main audience for this information, which allowed them to determine if there were programs they should upgrade in order to take advantage of new features.
The Kernel Janitors project kept (and in fact is still keeping) a list of smaller bugs and cleanups that needed to be fixed. Many of these bugs or cleanups are caused by a larger patch going into the kernel that requires changes to many parts of the code, such as something that would affect device drivers. Those who are new to kernel development can work on items from this list, allowing them a chance to benefit the community while learning how to write kernel code on smaller projects.
In yet another pre-release project, John Cherry tracked the number of errors and warnings found during the kernel compile for every version of the kernel that was released. These compile statistics consistently dropped over time, and releasing these results in a systematic way made it obvious how much progress was being made. In many cases, some of these warnings and error messages could be used in the same way the Kernel Janitors list is used, as compile warnings are often attributable to minor bugs that require little effort to fix.
Finally, there was Andrew Morton's "must-fix" list. Since he had been chosen as the maintainer of the post-release 2.6 kernel, he exercised his prerogative to outline those problems he believed to be the highest priority for resolution before the release of the final 2.6 kernel. The must-fix list contained references to bugs in the kernel Bugzilla system, features that need to be finished out, and other known issues that many felt should block the release of 2.6 until resolved. This information helped to set the roadmap for what steps needed to be taken before the new release was made; it also provided valuable information to those who were curious about how close the much-anticipated 2.6 release was to being made.
Some of these resources have obviously ceased to be maintained since the release of the 2.6 kernel late last year. Others have found that their work has not ended after that major release, and continue to post updates. It will be interesting to see which are picked up again, and what additional innovations are made, once we again approach a major release.
When most people think about a new stable version of the kernel, the first question is usually, "What's new in this release?" Below the surface of features and fixes though, there is a process that is being refined over time.
Open source development is thriving in the Linux community. The looseness of the confederacy of coders who work on the kernel and other aspects of Linux allow the group to adapt successfully. In many ways, the way that Linux is developed and tested -- and specifically, the way this has evolved over time -- has had more impact on the reliability of the new kernel than many of the individual enhancements and bug fixes have had.
- If you haven't yet (or if the last time you did was quite a while
ago), (re)read "The Cathedral
and the Bazaar" by Eric S. Raymond. This essay and others by Mr.
Raymond have also been published in book form (O'Reilly &
- Read the Wikipedia definition of Linus's Law, first
formulated in ESR's "The
Cathedral and the Bazaar."
- BitKeeper is a source control
management system. The Linux kernel Bitkeeper is hosted at BitMover.
- The Kernel Tracker system
for posting bugs against the mainline Linux kernels is based on Bugzilla.
- During 2.5 development, the
"post-halloween" document kept users informed as to what they could
expect from the 2.6 kernel. The Kernel Janitors project began
keeping a list of smaller bugs and cleanups (Kernel Janitors is still
- Andrew Morton
maintains the 2.6 kernel since its official release. Here you can see an example of the must-fix
list he was keeping prior to the 2.6 release.
- To keep up-to-date on the latest kernel news now that 2.6 is out, you
can monitor or subscribe to the (danger: high-volume!) Linux
Kernel Mailing List. Or those people who actually have lives can
monitor (or subscribe to) Kernel Traffic,
which could be likened to a sort of expert annotated digest of the LKML.
- LinuxHQ is also an
excellent source of resources and information about the kernel and kernel
- Open Source Development Labs (OSDL) is a consortium dedicated to
accelerating the adoption of Linux. Upon the release of the 2.6 kernel,
they issued a press
release outlining their contributions to the new kernel and providing
more information about themselves. You can view their kernel testing
results (including John Cherry's Linux 2.6 Compile
Statistics) from the OSDL Linux
Stability page. The OSDL is funded by member companies including IBM,
Red Hat Linux, SUSE LINUX, and many others.
- OSDL, IBM, and many others also contribute to the Linux Test Project (or LTP).
- Regression testing makes sure that new code doesn't break old code.
Read about it -- and other forms of testing -- in Testing Craft's Grand Index of [testing]
- You can read many articles about how software
testing is done at IBM in the IBM Systems Journal.
- Internally, IBM's Linux
Technology Center works directly with the Linux development community.
- The Linux at IBM
site features Linux news and information from throughout IBM.
- Prior to the 2.6 release, IBM developerWorks featured a "Towards
Linux 2.6" (developerWorks, September 2003) look at some of the coming attractions of the new kernel,
including the new scheduler and the Native Posix Threading Library (NPTL).
- Also read the article "Putting
Linux reliability to the test" (developerWorks, December 2003).
Find more resources for Linux developers in the developerWorks Linux zone.
- Browse for books on these and other technical topics.
Paul Larson works on the Linux Test team in the Linux Technology Center at IBM. Some of the projects he has been working on over the past year include the Linux Test Project, 2.5/2.6 kernel stabilization, and kernel code coverage analysis. He can be reached at firstname.lastname@example.org.