Advanced filesystem implementor's guide, Part 2
Using ReiserFS and Linux 2.4
This content is part # of # in the series: Common threads
This content is part of the series:Common threads
Stay tuned for additional content in this series.
In this article, I'll show you how to get ReiserFS running under a 2.4 series kernel. I'll also share lots of technical information on a variety of topics, including the best 2.4 kernels to use with ReiserFS, performance considerations, and more. Since I'll be covering installation first, I recommend that you read this article in its entirety before following the installation instructions. That way, you'll have all the technical notes in the back of your head as you start getting ReiserFS running on your systems, allowing you to make any necessary adjustments along the way.
The search for a good kernel
To enable ReiserFS on your system, you'll first need to find a suitable kernel. If you've been following 2.4 kernel development, then you know that this process is trickier than it sounds. At the time this article was written, the latest kernel was 2.4.6-pre2; however, I recommend that you stick with either 2.4.4 (a stock Linus kernel) or use 2.4.4-ac9 (a slightly improved Alan Cox kernel) for your ReiserFS system. From my testing, 2.4.5 seems to be quite flaky and I can't recommend this kernel for production use; let's hope that 2.4.6 is a lot better.
If you want to use something other than the 2.4.4 or 2.4.4-ac9 kernel for your production ReiserFS system, be sure to do the necessary research to make sure that ReiserFS (and the kernel in general) is stable. Of course, if you're setting up ReiserFS on a test server, feel free to use any kernel you'd like, as long as there's no important data at risk.
There are two good reasons to be careful about kernel stability issues in general and ReiserFS stability in particular. Since ReiserFS is an "experimental" kernel feature, you shouldn't assume that a newer kernel's ReiserFS implementation will work perfectly right out of the tarball. The second (and maybe more important issue these days) is that the majority of 2.4 kernel releases and patches have been a bit on the flaky side, so for the time being you need to tread carefully. Theoretically, all 2.4 releases should be production-ready, since 2.4 is supposed to be a stable series; however, in reality they aren't (yet), and you are strongly encouraged to approach new, untested kernels with caution.
This information is not meant to scare you away from using ReiserFS or Linux 2.4, but is rather meant to snap some sense into those more adventurous types. Don't hop from bleeding-edge kernel to bleeding-edge kernel on your important systems; if you do so, you will get burned. When you use an unproven kernel, you're not just risking a system lock-up; you're risking data loss and filesystem corruption, something you definitely want to avoid. Even if the ReiserFS implementation itself is solid, it's very possible that major bugs in other parts of the kernel could cause filesystem corruption to occur.
If you don't have a good source for up-to-date kernel stability information, I recommend that you regularly visit Linux Weekly News (see Related topics later in this article) to keep up-to-date with any potential kernel problems (the information is updated every Thursday). Now that I've hopefully convinced my more adventurous readers to stick with 2.4.4 or 2.4.4-ac9 for production ReiserFS configurations, let's continue.
The stock kernel
OK, we're going to cover three possible options for getting a production-ready ReiserFS system up and running. The first is to simply use a stock 2.4.4 Linux kernel. The second option is to use the 2.4.4 kernel along with the ReiserFS bigpatch, which includes special patches that makes ReiserFS quota-compatible and more compatible with locally-running NFS servers. Third, we can use the 2.4.4 kernel with the ac9 patch (producing 2.4.4-ac9), with or without the bigpatch. Generally, I recommend using 2.4.4-ac9 with the bigpatch since the bigpatch doesn't have any negative effects and you might need it, and ac9 performs significantly better than the stock kernel. However, if you have an aversion to ac kernels, the stock 2.4.4 will do just fine. I'll walk you through the process of setting up 2.4.4-ac9 with the bigpatch, but if for some reason you'd rather not install one or both of these patches, simply skip that particular step. Now, let's begin.
First, grab the 2.4.4 kernel sources from kernel.org and enter your /usr/src directory. Move any existing linux directory or symlink out of the way by renaming it (if a directory) or simply deleting it (if a symlink). Then:
# cd /usr/src # cat /path/to/linx-2.4.4.tar.bz2 | bzip2 -d | tar xvf -
The ac9 patch and bigpatch
If you're planning to just use stock 2.4.4, you have all the sources you need and can skip the rest of the patches. However, we recommend you continue to apply the following ac and bigpatch patches.
To apply the ac9 patch, grab the Alan Cox ac9 patch from kernel.org. Then type:
# cd /usr/src/linux # bzip2 -dc /path/to/patch-2.4.4-ac9.bz2 | patch -p1
Once the stock kernel is ready, head over to DiCE and grab DiCE's ReiserFS bigpatch. Again, this step is optional but recommended, especially if you will be running an NFS server on this system or need quotas (and if you won't be, this patch won't hurt, anyway.) To apply the bigpatch, do this:
# cd /usr/src/linux # bzip2 -dc /path/to/bigpatch-2.4.4.diff.bz2 | patch -p1
Once any optional fixes and the bigpatch are applied, you're ready to configure your kernel for ReiserFS.
Note: If you need additional instructions on how to compile a Linux kernel, check out the Compiling the Linux kernel free tutorial on developerWorks. A brief summary follows.
Kernel configuration is quite easy. First, type "make menuconfig". Under the "Code maturity level options" section, make sure that the "Prompt for development and/or incomplete code/drivers" option is enabled. Then, head over to the "File systems" section, and enable "ReiserFS support". You should configure ReiserFS support to be compiled directly into your kernel (not as a module); also, it is generally not a good idea to enable the "Have reiserFS do extra internal checking" option. Now, save your settings, compile your kernel ("make dep; make bzImage; make modules; make modules_install") and configure your boot loader to load the new ReiserFS-enabled kernel.
Important: It's always a good idea to save your current kernel and configure your boot loader so that you can boot with this kernel, just in case your new kernel doesn't work.
Installing the tools
Before you reboot, we need to get the "reiserfsprogs" tools installed, which include "mkreiserfs", "resize_reiserfs" (useful for LVM users), and "fsck.reiserfs". You can grab the latest version of "reiserfsprogs" (currently at "3.x.0j") from the Namesys.com download page. Once the tools are downloaded, you can compile and install "reiserfsprogs" by following these steps:
# cd /tmp # tar xzvf reiserfsprogs-3.x.0j.tar.gz # cd reiserfsprogs-3.x.0j # ./configure ./configure output will appear here # make make output will appear here # make install make install output will appear here
Now that the tools are installed, you can now create any new partitions (using "fdisk" or "cfdisk") or LVM logical volumes (using "lvcreate") as necessary and reboot your system. If you're creating standard partitions, you can simply label the partition as a "Linux native file system" (83).
Creating and mounting the filesystem
Once rebooted, you'll be able to create a ReiserFS filesystem on an empty partition as follows:
# mkreiserfs /dev/ hdxy
In the above example, /dev/hdxy should be a device node corresponding to a free partition. Mount it as you would any other filesystem:
# mount /dev/ hdxy /mnt/reiser
And, if you'd like to add a ReiserFS filesystem to your /etc/fstab file, simply set the "freq" and "passno" fields to "0", as follows:
/dev/hdc1 /home reiserfs defaults 0 0
From this point forward, your ReiserFS filesystems should act identically to their ext2 counterparts, except that you'll no longer need to worry about long "fsck"s, and overall performance will be much better -- especially for small files.
ReiserFS technical notes
I've been using 2.4.4's ReiserFS in a production environment for over month (on the cvs.gentoo.org development server) with no corruption problems at all. 2.4.4 and 2.4.4-ac9 have been rock solid. Our server performs a good amount of disk IO, since it's the home of our cvs repository, our "dev-wiki", the gentoo.org mail server, our mailman-based mailing lists, and a bunch of other things.
While ReiserFS will outperform an ext2 filesystem in almost every type of application, there are a few areas where ReiserFS currently has some rough edges. Thankfully, these issues really aren't hard limitations of ReiserFS, but rather areas that the Namesys developers haven't had time to code or optimize just yet.
Yes, it's true; ReiserFS does not yet have a "dump" and "restore" implementation. If you want to user ReiserFS and happen to be a "dump" fan, you'll have to find some alternate way of backing data. In reality, this turns out to be a non-issue, since 2.4 kernels are incompatible with "dump" and "restore" in the first place. For more information on the dump/kernel 2.4 incompatibility, read the posting by Linus Torvalds (see Related topics), where he says that "Dump was a stupid program in the first place. Leave it behind."
While ReiserFS generally blows the socks off ext2, ReiserFS does have a few special-case performance weaknesses. The first is sparse file performance. ReiserFS sparse file performance will be significantly worse than ext2. This will change at some point, when the Namesys developers get around to optimizing that part of ReiserFS for ReiserFS 4. Until then, ext2 is a better solution for applications that place heavy demands on sparse files.
You may also run into problems with code that performs bunches of stat() calls on large numbers of files. One application that seems to trigger this performance defect (which only exists with the ReiserFS implementation in 2.4 series kernels, and not 2.2 kernels) is the "mutt" mailer (see Related topics) when it is used to read large maildir-style mailboxes. Apparently, mutt stats each mail file twice, which tends to hurt performance more than normal. The ReiserFS development team is aware of this particular problem and has identified its cause, and you should expect a solution to be included in ReiserFS 4, if not sooner.
Fortunately, there are a couple of easy general performance tweaks you can use to make these problems less severe. The first is to mount your ReiserFS filesystem with the "noatime" mount option (a mount option that's available for other filesystems as well as ReiserFS). As you probably know, UNIX systems record an atime, or access time, for each object on the filesystem that gets updated every time a file is read. For most people, the atime stamp isn't very useful and hardly any applications (none I can think of) rely on the atime for any critical task. For this reason, it can usually be safely turned off, which gives a nice all-around performance boost. Generally, unless you specifically know that you need atime support, you should be mounting your filesystems with the noatime option. Use an /etc/fstab entry like this:
/dev/hdc1 /home reiserfs noatime 0 0
In first ReiserFS article, I mentioned that ReiserFS has a special feature called "tail packing". In ReiserFS lingo, "tails" are files that are smaller than a filesystem block (4k) or the trailing portions of files that don't quite fill up a complete filesystem block. ReiserFS has really excellent small-file performance because it is able to incorporate these tails into its b*tree (its primary organizational data structure) so that they are really close to the stat-data (ReiserFS' equivalent of an i-node). However, since tails don't fill up a complete block, they can waste a lot of disk space (relatively speaking, of course). To solve this problem, ReiserFS uses its "tail packing" functionality to squish tails into as small a space as possible. Generally, this allows a ReiserFS filesystem to hold around 5% more than an equivalent ext2 filesystem.
More about notail
However, tail packing also has its disadvantages. For one, it does give you a small but significant performance hit. Fortunately, the ReiserFS guys anticipated that some people would be willing to sacrifice around 5% of their disk capacity for a little extra performance, so they created the "notail" mount option. When a filesystem is mounted with this option, tail packing will be turned off, giving you greater speed and less storage capacity. In general, filesystem performance freaks mount their filesystems with both "notail" and "noatime" enabled, producing a noticeable performance improvement:
/dev/hdc1 /home reiserfs noatime,notail 0 0
Even if you want to save some disk space, there are times when temporarily mounting your filesystem with the "notail" option can be a good thing. In particular, most boot-loaders have problems loading kernels that were created on a ReiserFS filesystem with tail packing enabled. If you're using a LILO earlier than version 21.6, you'll have this problem. You will also have problems with the most recent versions of GRUB, which will not be able to load its stage1 and stage1_5 files, although it will have no problems loading the actual kernel. If you're already experiencing this problem, you can fix it by mounting the filesystem with the "notail" option, moving the files to another filesystem, and then moving them back. When they're recreated, they won't have tails. Also, remember that you can easily remount a filesystem (with new options) without unmounting it. This particular example remounts the root filesystem with the "notail" option. This command is useful if you normally want to use tail packing but also need your boot loader to load auxiliary files like kernels from the root filesystem:
# mount / -o remount,notail
If you're using qmail with ReiserFS, there are a few important resources that you should know about. The first is that you should apply this patch to your qmail 1.03 sources. It fixes a problem that qmail has with non-synchronous "link()" and "unlink()" calls, which happens to be a problem with ext2 as well as ReiserFS. Next, you owe it to yourself to check out Jedi's qmail tuning page, which contains lots of good advice on how to maximize qmail performance. Finally, be sure to check out Jedi's ReiserSMTP package. ReiserSMTP contains a GPL plug-in replacement for the SMTP portion of qmail; Jedi's replacement has been specially tuned for ReiserFS, and should provide you with double the incoming mail performance, thanks to its new queue-handling routines.
I've found ReiserFS to be a truly incredible filesystem, offering oodles of small file performance and great (normally better than ext2) regular file performance. Thanks to ReiserFS, my developers can complete Gentoo Linux "cvs" updates in only fifteen seconds, where they used to take around two minutes with ext2. ReiserFS makes our developers' lives more pleasant, and allows our cvs server to handle large amounts of simultaneous IO without thrashing our hard drives and negatively affecting interactive performance.
Yet despite all this, the most exciting thing about ReiserFS is what it will become in the future. Hans Reiser has a very aggressive and innovative plan for ReiserFS, including plans to extend the filesystem so that it can be used as a full-fledged high-performance database, complete with transaction support and advanced querying features. This means that we can expect ReiserFS to be more than "just another high-performance filesystem"; rather, it will open up new possibilities and approaches, allowing us to solve traditional storage problems in new and innovative ways. With Namesys on board, we can expect future Linux development to be quite exciting indeed -- and that's definitely a good thing.
- Read Daniel's other articles in this series, where he describes:
- The Namesys Web page is the place to learn more about ReiserFS.
- The ReiserFS mailing list is an excellent source for current, more in-depth ReiserFS information.
- Linux Weekly News is a great resource for keeping up with the latest kernel developments.
- You can find a very nice detailed look at the meta-data differences between UFS, ext2, ReiserFS, and more in Juan I. Santos Florido's Journal File Systems review in Linux Gazette.
- Check out the mutt e-mail client, which can be used to read large maildir-style mailboxes.
- Read Linus Torvalds' recent comments on dump and restore.
- Browse more Linux resources on developerWorks.
- Browse more Open source resources on developerWorks.