Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Common threads: Advanced filesystem implementor's guide, Part 11

Filesystem update

Daniel Robbins (drobbins@gentoo.org), President/CEO, Gentoo Technologies, Inc.
Residing in Albuquerque, New Mexico, Daniel Robbins is the Chief Architect of Gentoo Linux, an advanced ports-based Linux for x86, PowerPC, Sparc, and Sparc64 systems. He has also served as a contributing author for several books, including Samba Unleashed and SuSE Linux Unleashed. Daniel enjoys spending time with his wife, Mary, and his daughter, Hadassah. You can contact Daniel at drobbins@gentoo.org.

Summary:  In the "Advanced filesystem implementor's guide" series, Daniel Robbins shows you how to use the latest filesystem technologies in Linux 2.4. Along the way, he shares valuable practical implementation advice, performance information, and important technical notes so that your new filesystem experience is as pleasant as possible. In this article, Daniel gives an update on the status of the XFS, ReiserFS, and ext3 filesystems, sharing his experiences as Chief Architect of Gentoo Linux. In addition, he outlines how these various filesystems will continue to improve over the next six months to a year.

Date:  01 Jun 2002
Level:  Introductory

Comments:  

While browsing past articles, I suddenly realized that the "Advanced filesystem implementor's guide" series has been around for nearly a year! Don't worry, this series will will be wrapping up soon as I cover IBM's JFS and EVMS (enterprise volume management) technologies for Linux. But since this is an IBM site, I thought it would be best to cover technologies developed by IBM only after I had first covered all other new filesystem technologies for Linux.

Before we get to JFS and EVMS, let me share an official update about the current state of affairs in the Linux filesystem world. We've been through a lot of 2.4 kernels; some of them have been decent, and others have been not-so-decent. And along with the kernel, XFS, ext3, and ReiserFS have been under very active development. During this time, lots of Gentoo Linux users have used various combinations of XFS, ext3, and ReiserFS filesystems with varying results. And in general, when a Gentoo Linux user has a problem with one of the new journaled filesystems, I usually hear about it. So, what filesystems have been most popular? Which have been the most reliable? In this article, I'll share my experiences along with feedback and status updates from the ReiserFS, ext3, and XFS development teams.

What's up with XFS?

Over the past few months, XFS has turned out to be a popular Linux filesystem choice. Based on feedback from Gentoo Linux users, people tend to like XFS because of its generally good overall performance and its robust feature set. However, the 1.0.x release of XFS has suffered from one serious problem. You'll recall that "metadata only" journaling filesystems like XFS and ReiserFS can cause data corruption if a file's metadata is updated but some unforseen circumstance -- such as a crash -- prevents the new data from hitting the disk. In the case of ReiserFS, an affected file will contain stale or garbage data blocks, and in the case of XFS, the file will contain blocks consisting entirely of binary zeros. It turns out that XFS 1.0.x had the unfortunate tendency of frequently mangling recently modified files if your server happened to crash or unexpectedly lose power. Those who happened to be using XFS on a rugged server were generally fine, but those who were running XFS on a system that was suffering from some kind of software or hardware stability problem faced the risk of losing a good deal of data.

Fortunately, the SGI XFS guys dramatically reduced the incidence of this problem in XFS 1.1. The problem manifested itself much more often with XFS 1.0 because certain kinds metadata updates were required to be recorded to the filesystem in the order that they occured. These in-order metadata updates, called "synchronous" metadata updates, also had the effect of flushing all previous pending metadata updates to disk. Here's where the problem arose. If some of these early flushes of metadata also had corresponding data blocks that needed to get flushed, then it was possible that the new data blocks wouldn't hit the disk for up to 30 seconds after the metadata was recorded. This created a relatively large window for data loss to occur.

Technical note

With XFS 1.1, a filesystem's metadata is only updated synchronously (in-order) in two cases:

  • If the filesystem needs to allocate new space and there's a pending transaction to free that same space
  • When XFS processes transactions for files opened with the O_SYNC (synchronous) option; in this case, writes to this file will cause any of the filesystem's other pending metadata changes to be flushed to disk.

Fortunately, the vast majority of a typical server's I/O operations are asynchronous in nature.

If the system rebooted or died during this window (after the metadata was flushed but before the corresponding data was written to disk), then both the old and new data could be lost. Here's why this could happen: the metadata update would erase any reference to the original data block(s), but would point to data block(s) on disk that were never filled with data. When the server started up again after the crash, the XFS code would look at the journal, realize the situation, and fill those incomplete data blocks with binary zeros as a security precaution. Unfortunately, the data would be lost for good.

This problem could be particularly troublesome in situations where files were regularly overwritten with completely new data. In these situations, the early flushing of metadata could cause the entire contents of the file to be lost if the system happened to die at the wrong time. This particular scenario bit the gentoo.org server a couple of times, resulting in data loss. Since our mailman mailing list software would overwrite its own configuration file with new data every few minutes, it was a prime candidate to fall prey to the scenario I describe above.

The moral of the story is this: the SGI guys have dramatically improved the situation in XFS 1.1, and if you're running XFS 1.0, then you should definitely plan to upgrade to XFS 1.1 in the near future. XFS 1.1 also includes many additional fixes. Oh, and when SGI reduced XFS's dependence on synchronous metadata updates, it also had the effect of speeding up one of XFS 1.0.x's weak spots -- file deletion. Yay!

In the near future, we can also expect to see a new release of XFS that is better suited for Intel's Itanium platform. Right now, XFS for Linux requires that the XFS filesystem block size is the same size as the platform's memory page size. This often makes it impossible to move disks from x86 systems to Itanium systems, since the Itanium can use a page size up to 64 K, while the x86 is stuck at 4 K. In addition, a filesystem block size of 64 K is a suboptimal choice for most tasks, and the current code would force some Itanium systems to use this filesystem block size. When this block size issue is fixed, it will not only make it easy to migrate XFS filesystems from x86 to ia64, but it will also provide the added benefit of allowing system administrators to choose an XFS filesystem block size that corresponds to their needs.


ReiserFS

The ReiserFS filesystem is arguably the most ambitious journaled filesystem development project because it's not just a port of an existing filesystem to the Linux kernel (XFS, JFS,) nor is its design based upon that of an earlier filesystem as is ext3. In contrast, ReiserFS has been designed completely from scratch and boasts some very impressive performance numbers when it comes to the handling of small files. So, how has ReiserFS fared in terms of stability and general filesystem robustness since its introduction to the 2.4 kernel?

Since its introduction, ReiserFS has had an unusually high number of stability and corruption problems. There are a number of kernels that have been total nightmares for ReiserFS users, including 2.4.3, 2.4.9, and even the relatively recent 2.4.16. However, while some of these issues have been caused by bugs in the ReiserFS filesystem code itself, a surprising number of them have been unwanted side-effects caused by changes made to other parts of the kernel. One unfortunate thing about the Linux kernel development process is that no matter how carefully you test your own code, it's possible for some other kernel developer to sneak in a relatively untested change that causes your code to break. All too often, intra-developer communication only happens after these unwanted side-effects have been introduced and released to the unsuspecting Linux computing public. I think it's fair to say that there are a good number of disheartened ReiserFS users out there who have found themselves in this unfortunate no-win situation.

But there is good news, my friends. In the last few months, things have started looking a lot better for ReiserFS. For one, the kernel sources have started to stabilize around the 2.4.17 release. In addition, the guys at Namesys (the developers of ReiserFS) have been able to fix quite a few obscure bugs in their code over the past few months. And the news gets even better -- it appears that kernel 2.4.18 has a very solid ReiserFS implementation. And 2.4.18 isn't exactly a spring chicken -- at the time this article was written, it was nearly 3 months old and there still haven't been any major problems found in the code. In fact, due to a lack of incoming bug reports, Namesys has reassigned the Release Manager to a new job of improving ReiserFS performance.

So, it appears that ReiserFS and the 2.4 kernel have finally resolved their differences. For me personally, this is heartening news; I'm very eager to start using ReiserFS again and I plan to use it as my root filesystem when I next reload my development workstation. I'm sure there are many other ex-ReiserFS users who will be moving back to ReiserFS now that things have calmed down in kernel-land. Frankly, it's quite hard to live without ReiserFS once you've seen how its small file performance can boost the performance of certain applications.

So, what can we expect to see from ReiserFS in the near future? According to Hans Reiser and his team of developers, there are some very nice improvements that are scheduled to appear in the 2.4.20_pre1 kernel, including Chris Mason's data journaling (like ext3's "data=journal" mode!) support, new block allocation code that scales much better, and several improvements in large file peformance, resulting in an up to 15% performance improvement when reading large files from IDE drives. Beyond these immediate and significant improvements, we are likely to soon see ReiserFS support the equivalent of ext3's "data=ordered" mode. At that point, ReiserFS will offer equivalent data integrity features to those found in the ext3 filesystem. I'm very happy to see that the ReiserFS development team is making data integrity (not just metadata integrity) such a high priority.


Ext3

So, what about ext3? In general, ext3 has been quite stable and hasn't suffered from any major issues. For this reason, ext3 has gained a reputation as being a very reliable and robust journaled filesystem choice. While some may consider the filesystem to be "boring" because it doesn't sport any major improvements over ext2 besides a very good journaling implementation, "boring" is a good thing in the world of filesystems. It means that the filesystem is very good at simply doing its job without fuss or incident. In addition, despite ext3's scalability limitations when compared to ResierFS, XFS, and JFS, ext3 has shown itself to be very fast and well-tuned for the typical kinds of filesystem operations performed by most servers and workstations. It's clear that the ext3 developers have met their goal of creating a high-quality journaling filesystem that Linux users can upgrade to easily and confidently.

With kernel 2.4.19_pre5, synchronous mounts of ext3 filesystems and "chattr +S"'d files now perform about ten times faster than they did previously. In the near future, expect to see the addition of an option for synchronous updates of specific directory trees, which is a feature that will be of use primarily to mailer programs. Besides that, we can expect to see regular small bug fixes and performance improvements to the code, but nothing major; ext3 is already quite refined, and the code now appears to be in maintenance mode.

Thanks for joining me in this article, and please join me next time as we take a look at JFS!


Resources

About the author

Residing in Albuquerque, New Mexico, Daniel Robbins is the Chief Architect of Gentoo Linux, an advanced ports-based Linux for x86, PowerPC, Sparc, and Sparc64 systems. He has also served as a contributing author for several books, including Samba Unleashed and SuSE Linux Unleashed. Daniel enjoys spending time with his wife, Mary, and his daughter, Hadassah. You can contact Daniel at drobbins@gentoo.org.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux, Open source
ArticleID=11221
ArticleTitle=Common threads: Advanced filesystem implementor's guide, Part 11
publish-date=06012002
author1-email=drobbins@gentoo.org
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).