While browsing past articles, I suddenly realized that the "Advanced filesystem implementor's guide" series has been around for nearly a year! Don't worry, this series will will be wrapping up soon as I cover IBM's JFS and EVMS (enterprise volume management) technologies for Linux. But since this is an IBM site, I thought it would be best to cover technologies developed by IBM only after I had first covered all other new filesystem technologies for Linux.
Before we get to JFS and EVMS, let me share an official update about the current state of affairs in the Linux filesystem world. We've been through a lot of 2.4 kernels; some of them have been decent, and others have been not-so-decent. And along with the kernel, XFS, ext3, and ReiserFS have been under very active development. During this time, lots of Gentoo Linux users have used various combinations of XFS, ext3, and ReiserFS filesystems with varying results. And in general, when a Gentoo Linux user has a problem with one of the new journaled filesystems, I usually hear about it. So, what filesystems have been most popular? Which have been the most reliable? In this article, I'll share my experiences along with feedback and status updates from the ReiserFS, ext3, and XFS development teams.
Over the past few months, XFS has turned out to be a popular Linux filesystem choice. Based on feedback from Gentoo Linux users, people tend to like XFS because of its generally good overall performance and its robust feature set. However, the 1.0.x release of XFS has suffered from one serious problem. You'll recall that "metadata only" journaling filesystems like XFS and ReiserFS can cause data corruption if a file's metadata is updated but some unforseen circumstance -- such as a crash -- prevents the new data from hitting the disk. In the case of ReiserFS, an affected file will contain stale or garbage data blocks, and in the case of XFS, the file will contain blocks consisting entirely of binary zeros. It turns out that XFS 1.0.x had the unfortunate tendency of frequently mangling recently modified files if your server happened to crash or unexpectedly lose power. Those who happened to be using XFS on a rugged server were generally fine, but those who were running XFS on a system that was suffering from some kind of software or hardware stability problem faced the risk of losing a good deal of data.
Fortunately, the SGI XFS guys dramatically reduced the incidence of this problem in XFS 1.1. The problem manifested itself much more often with XFS 1.0 because certain kinds metadata updates were required to be recorded to the filesystem in the order that they occured. These in-order metadata updates, called "synchronous" metadata updates, also had the effect of flushing all previous pending metadata updates to disk. Here's where the problem arose. If some of these early flushes of metadata also had corresponding data blocks that needed to get flushed, then it was possible that the new data blocks wouldn't hit the disk for up to 30 seconds after the metadata was recorded. This created a relatively large window for data loss to occur.
If the system rebooted or died during this window (after the metadata was flushed but before the corresponding data was written to disk), then both the old and new data could be lost. Here's why this could happen: the metadata update would erase any reference to the original data block(s), but would point to data block(s) on disk that were never filled with data. When the server started up again after the crash, the XFS code would look at the journal, realize the situation, and fill those incomplete data blocks with binary zeros as a security precaution. Unfortunately, the data would be lost for good.
This problem could be particularly troublesome in situations where files were regularly overwritten with completely new data. In these situations, the early flushing of metadata could cause the entire contents of the file to be lost if the system happened to die at the wrong time. This particular scenario bit the gentoo.org server a couple of times, resulting in data loss. Since our mailman mailing list software would overwrite its own configuration file with new data every few minutes, it was a prime candidate to fall prey to the scenario I describe above.
The moral of the story is this: the SGI guys have dramatically improved the situation in XFS 1.1, and if you're running XFS 1.0, then you should definitely plan to upgrade to XFS 1.1 in the near future. XFS 1.1 also includes many additional fixes. Oh, and when SGI reduced XFS's dependence on synchronous metadata updates, it also had the effect of speeding up one of XFS 1.0.x's weak spots -- file deletion. Yay!
In the near future, we can also expect to see a new release of XFS that is better suited for Intel's Itanium platform. Right now, XFS for Linux requires that the XFS filesystem block size is the same size as the platform's memory page size. This often makes it impossible to move disks from x86 systems to Itanium systems, since the Itanium can use a page size up to 64 K, while the x86 is stuck at 4 K. In addition, a filesystem block size of 64 K is a suboptimal choice for most tasks, and the current code would force some Itanium systems to use this filesystem block size. When this block size issue is fixed, it will not only make it easy to migrate XFS filesystems from x86 to ia64, but it will also provide the added benefit of allowing system administrators to choose an XFS filesystem block size that corresponds to their needs.
The ReiserFS filesystem is arguably the most ambitious journaled filesystem development project because it's not just a port of an existing filesystem to the Linux kernel (XFS, JFS,) nor is its design based upon that of an earlier filesystem as is ext3. In contrast, ReiserFS has been designed completely from scratch and boasts some very impressive performance numbers when it comes to the handling of small files. So, how has ReiserFS fared in terms of stability and general filesystem robustness since its introduction to the 2.4 kernel?
Since its introduction, ReiserFS has had an unusually high number of stability and corruption problems. There are a number of kernels that have been total nightmares for ReiserFS users, including 2.4.3, 2.4.9, and even the relatively recent 2.4.16. However, while some of these issues have been caused by bugs in the ReiserFS filesystem code itself, a surprising number of them have been unwanted side-effects caused by changes made to other parts of the kernel. One unfortunate thing about the Linux kernel development process is that no matter how carefully you test your own code, it's possible for some other kernel developer to sneak in a relatively untested change that causes your code to break. All too often, intra-developer communication only happens after these unwanted side-effects have been introduced and released to the unsuspecting Linux computing public. I think it's fair to say that there are a good number of disheartened ReiserFS users out there who have found themselves in this unfortunate no-win situation.
But there is good news, my friends. In the last few months, things have started looking a lot better for ReiserFS. For one, the kernel sources have started to stabilize around the 2.4.17 release. In addition, the guys at Namesys (the developers of ReiserFS) have been able to fix quite a few obscure bugs in their code over the past few months. And the news gets even better -- it appears that kernel 2.4.18 has a very solid ReiserFS implementation. And 2.4.18 isn't exactly a spring chicken -- at the time this article was written, it was nearly 3 months old and there still haven't been any major problems found in the code. In fact, due to a lack of incoming bug reports, Namesys has reassigned the Release Manager to a new job of improving ReiserFS performance.
So, it appears that ReiserFS and the 2.4 kernel have finally resolved their differences. For me personally, this is heartening news; I'm very eager to start using ReiserFS again and I plan to use it as my root filesystem when I next reload my development workstation. I'm sure there are many other ex-ReiserFS users who will be moving back to ReiserFS now that things have calmed down in kernel-land. Frankly, it's quite hard to live without ReiserFS once you've seen how its small file performance can boost the performance of certain applications.
So, what can we expect to see from ReiserFS in the near future? According to Hans Reiser and his team of developers, there are some very nice improvements that are scheduled to appear in the 2.4.20_pre1 kernel, including Chris Mason's data journaling (like ext3's "data=journal" mode!) support, new block allocation code that scales much better, and several improvements in large file peformance, resulting in an up to 15% performance improvement when reading large files from IDE drives. Beyond these immediate and significant improvements, we are likely to soon see ReiserFS support the equivalent of ext3's "data=ordered" mode. At that point, ReiserFS will offer equivalent data integrity features to those found in the ext3 filesystem. I'm very happy to see that the ReiserFS development team is making data integrity (not just metadata integrity) such a high priority.
So, what about ext3? In general, ext3 has been quite stable and hasn't suffered from any major issues. For this reason, ext3 has gained a reputation as being a very reliable and robust journaled filesystem choice. While some may consider the filesystem to be "boring" because it doesn't sport any major improvements over ext2 besides a very good journaling implementation, "boring" is a good thing in the world of filesystems. It means that the filesystem is very good at simply doing its job without fuss or incident. In addition, despite ext3's scalability limitations when compared to ResierFS, XFS, and JFS, ext3 has shown itself to be very fast and well-tuned for the typical kinds of filesystem operations performed by most servers and workstations. It's clear that the ext3 developers have met their goal of creating a high-quality journaling filesystem that Linux users can upgrade to easily and confidently.
With kernel 2.4.19_pre5, synchronous mounts of ext3 filesystems and "chattr +S"'d files now perform about ten times faster than they did previously. In the near future, expect to see the addition of an option for synchronous updates of specific directory trees, which is a feature that will be of use primarily to mailer programs. Besides that, we can expect to see regular small bug fixes and performance improvements to the code, but nothing major; ext3 is already quite refined, and the code now appears to be in maintenance mode.
Thanks for joining me in this article, and please join me next time as we take a look at JFS!
- Read Daniel's previous articles, where he described:
- You can learn more about XFS at SGI's XFS page. Read the FAQ. Join the mailing list.
- The Namesys Web page is
the place to learn more about ReiserFS.
- The ReiserFS
mailing list is an excellent source for current, more in-depth
ReiserFS information. Be sure to check out the ReiserFS
mailing list archive, too.
- Find out more about using ext3 with 2.4 kernels at Andrew Morton's ext3 for 2.4 page.
- Find more Linux articles in the developerWorks Linux zone.
Residing in Albuquerque, New Mexico, Daniel Robbins is the Chief Architect of Gentoo Linux, an advanced ports-based Linux for x86, PowerPC, Sparc, and Sparc64 systems. He has also served as a contributing author for several books, including Samba Unleashed and SuSE Linux Unleashed. Daniel enjoys spending time with his wife, Mary, and his daughter, Hadassah. You can contact Daniel at drobbins@gentoo.org.





