In my previous article, I introduced you to Linux 2.4's software RAID functionality, showing you how to set up linear, RAID-0, and RAID-1 volumes. In this article, we look at what you need to know in order to use RAID-1 to increase availability in a production environment. This requires a lot more understanding and knowledge than just setting up RAID-1 on a test server or at home -- specifically, you'll need to know exactly what RAID-1 will protect you against, and how to keep your RAID volume up and running in case of a disk failure. In this article, we'll cover these topics, starting with an overview of what RAID-1, 4, and 5 can and can't do for you, and ending with a complete test simulation of a failed RAID 1 drive replacement -- something that you should actually do (with this article as your guide) if at all possible. After going through the simulation, you'll have all the experience you need to handle a RAID-1 failure in a real-world environment.
The fault-tolerant features of RAID are designed to protect you from the negative impacts of a spontaneous complete drive failure. That's a good thing. But RAID isn't a perfect fix for every kind of reliability problem. Before implementing a fault-tolerant form of RAID (1,4,5) in a production environment, it's extremely important that you know exactly what RAID will and will not do for you. When we're in a situation where we're depending on RAID to perform, we don't want to make any false assumptions about what it does. Let's start by dispelling common myths about RAID 1, 4, and 5.
A lot of people think that if they place all their important data on a RAID 1/4/5 volume, then they won't have to perform regular backups. This is completely false -- here's why. RAID 1/4/5 helps to protect against unplanned downtime caused by a random drive failure. However, it offers no protection against accidental or malicious data corruption. If you type "cd /; rm -rf *" as root on a RAID volume, you'll lose a lot of very important data in a matter of seconds, and the fact that you have a 10 drive RAID-5 configuration will be of little significance. Also, RAID won't help you if your server is physically stolen or if there's a fire in your building. And of course, if you don't implement a backup strategy, you won't have an archive of past data -- if someone in your office deletes a bunch of important files, you won't be able to recover them. That alone should be enough to convince you that, in most circumstances, you should plan and implement a backup strategy before even thinking about tackling RAID-1, 4, or 5.
Another mistake is to implement software RAID on a system composed of low-quality hardware. If you're putting together a server that's going to do something important, it makes sense to purchase the highest-quality hardware that's still comfortably within your budget. If your system is unstable or improperly cooled, you'll run into problems that RAID can't solve. On a similar note, RAID obviously can't give you any additional uptime in the case of a power outage. If your server is going to be doing anything relatively important, make sure that it's been equipped with an uninterruptible power supply (UPS).
Next, we move on to filesystem issues. The filesystem exists "on top" of your software RAID volume. This means that using software RAID does not allow you to escape filesystem issues, such as long and potentially problematic fscks if you happen to be using a non-journalled or flaky filesystem. So, software RAID isn't going to make the ext2 filesystem more reliable; that's why it's so important that the Linux community has ReiserFS, as well as JFS and XFS in the works. Software RAID and a reliable journalling filesystem make a great combination.
RAID - intelligent implementation
Hopefully, the previous section dispelled any RAID myths that you might have had. When you implement RAID-1, 4, or 5, it's very important that you view the technology as something that will enhance uptime. When you implement one of these RAID levels, you're protecting yourself against a very specific situation -- a spontaneous complete (single or multiple) drive failure. If you experience this situation, software RAID will allow the system to continue running, while you make arrangements to replace the failed drive with a new one. In other words, if you implement RAID 1,4, or 5, you'll be reducing your risk of having a long, unplanned downtime due to a complete drive failure. Instead, you can have a short planned downtime -- just enough time to replace the dead drive. Obviously, this means that if having a highly-available system isn't a priority for you, then you shouldn't be implementing software RAID, unless you plan to use it primarily as a way to boost file I/O performance.
A smart system administrator uses software RAID for a specific purpose -- to improve the reliability of an already very reliable server. If you're a smart sysadmin, you've already covered the basics. You've protected your organization against catastrophe by implementing a regular backup plan. You've hooked your server up to a UPS, and have the UPS monitoring software up and running so that your server will shut down safely in the case of an extended power outage. Maybe you're using a journalling filesystem such as ReiserFS to reduce fsck time and increase filesystem reliability and performance. And hopefully, your server is well-cooled and is composed of high-quality hardware, and you've paid close attention to security issues. Now, and only now, should you consider implementing software RAID-1, 4 or 5 -- by doing so, you'll potentially give your server a few more percentage points of uptime by guarding it against a complete drive failure. Software RAID is that added layer of protection that makes an already rugged server even better.
Now that you've read about what RAID can and can't do, I hope you have reasonable expectations and the right attitude. In this section, I'll walk you through the process of simulating a disk failure, and then bringing your RAID volume back out of degraded mode. If you're have the ability to set up a RAID-1 volume on a test machine and follow along with me, I highly recommend that you do so. This kind of simulation can be fun. And having a little fun right now will help to ensure that when a drive really fails, you'll be calm and collected, and know exactly what to do.
OK, our first step is to set up a RAID-1 volume; refer to my previous article if you need a refresher on how to do this. To perform this test, it's essential that you set up your RAID-1 volume so that you can still boot your Linux system with one hard drive unplugged, because this is how we're going to simulate a drive failure.
Once you've set up your volume, you'll see something like Listing 1 if you cat /proc/mdstat.
Note that I'm using devfs, and that's why you see the extremely long device names listed above. I'm actually using /dev/hda5 and /dev/hde1 as my RAID-1 disks. At the moment, the kernel software RAID code is synchronizing the drives so that they're exact mirrors of each other. If your RAID-1 volume is at this point, you can go ahead and create a filesystem on the volume, and then mount it somewhere. Copy some files over to it, and then set up your /etc/fstab so that the volume (/dev/md0) will be mounted when your system boots. Here's the line I added to my fstab; yours may differ slightly:
/dev/md0 /mnt/raid1 reiserfs defaults 0 0 |
OK; we're almost ready to simulate a drive failure, but not quite. First, cat
/proc/mdstat again, and wait until all your volume's disks are synchronized.
When they are, your /proc/mdstat will look like Listing 2.
OK, now that the resync is complete, we're ready for the simulation. Go ahead and shut down your machine and power it down. Then, open it up and unplug one of the hard disks that make up your RAID-1 array. Of course, you won't want to unplug the disk that contains your Linux root partition -- we'll need to boot Linux again! OK, now that the hard drive is unplugged, bring the machine back up. Once you log in, you should find that /dev/md0 is mounted and that you're still able to use the volume. When you cat /proc/mdstat, you'll see the major change:
# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5]
read_ahead 1024 sectors
md0 : active raid1 ide/host0/bus0/target0/lun0/part5[0]
4610496 blocks [2/1] [U_]
unused devices: <none>
|
Here, you can see that my /dev/md0 volume is running in degraded mode. I unplugged drive /dev/hde, so /dev/hde1 wasn't found when the kernel booted and tried to autostart my array. Fortunately, the kernel found /dev/hda5, and /dev/md0 was able to start in degraded mode. As you can see, the /dev/hde1 partition isn't listed in /proc/mdstat, and one of the RAID disks is marked as "down" ("[U_]" instead of "[UU]"). But hey, since /dev/md0 is still going, software RAID-1 is doing what it's supposed to do -- keeping our data available.
Right now, we're experiencing a simulated drive failure. If the drive that currently doesn't have power actually failed while the system was running, this is the kind of situation we'd be in. Our RAID-1 volume would be running in degraded mode, meaning that our volume is still available but without any redundancy. At a convenient time, we'd want to shut down the system, replace the failed drive, and start the system back up again. Our RAID-1 volume would still be running in degraded mode at this point.
Once we have the new drive in the machine, we'd want to create a RAID autodetect ("FD") partition of the appropriate size on our new disk. An additional reboot may be needed so that Linux can reread the disk's partition tables. Once the new partition is visible to the system, we're ready to restore our degraded RAID-1 array -- then, we'll have some redundancy again.
Of course, we're only performing a simulation. To practice adding a partition back into our RAID array, we can do one of two things, depending on what kind of scenario you'd like to prepare or. You can either shut down your machine, plug the drive in, boot it up, and add the old partition back to the array, or you can shut down your machine, plug the drive in, boot up, wipe the drive, create a new RAID autodetect ("FD") partition to add the array (of the correct size, of course -- at least as big as the partition it's replacing) and then add this brand-new partition to the array. The second choice would be closer to what would happen in the event of a real drive failure, while the first would simulate something like a failed disk controller or bad cable situation -- where one of your mirror drives was temporarily unavailable, causing /dev/md0 to run in degraded mode, and requiring one of the partitions to be added back to the volume after the problem was remedied. Whichever simulation you choose to do, the "fix" is the same -- after the new partition is ready, we need to manually add it back to the /dev/md0 volume.
Before we add the partition back to our array, this would be a good time to take a look at our kernel boot messages. If you type "dmesg | more", you'll be able to view the kernel boot messages. You should see a bunch of text similar to Listing 3.
Now would be a good time to carefully read these messages, because they'll help you to understand the process that the kernel uses to autostart /dev/md0, giving you another valuable insight into the inner workings of Linux software RAID. If you read the kernel output listed above, you'll find that my kernel found /dev/hda5 and /dev/hde1, but hde1 was out of sync with hda5. So, the kernel started up /dev/md0 in degraded mode, using /dev/hda5 and not touching /dev/hde1 at all. Now, it's time to add our original (or newly created) partition to our volume. Here's how.
First, if your replacement partition has a new device name, update /etc/raidtab so that it reflects this new information. Then, add the new partition to the volume using the following command, replacing /dev/hde1 with the device name of the partition you're adding:
# raidhotadd /dev/md0 /dev/hde1 |
Your hard drive lights should begin glowing as reconstruction begins. Go ahead
and cat /proc/mdstat to check the status of the RAID-1 reconstruction that's
now in progress (see Listing 4).
In a matter of minutes, your RAID-1 volume will be back to normal (see Listing 5).
Voila! We've successfully recovered from a simulated drive failure, and you're ready to start using RAID-1 in a production environment. You can now affix your homemade "RAID-1 certified" sticker to your forehead and begin flapping your arms and running around the office to the delight of your coworkers. Actually, maybe that isn't such a great idea. See you next article :)
- Read Part 1 in Daniel's series on RAID, where he introduces Linux 2.4's software RAID functionality and shows how to set up linear,
RAID-0, and RAID-1 volumes
-
The Software-RAID HOWTO is
another excellent resource for information related to Linux Software RAID
- You may want to check out the
Boot+Root+RAID+Lilo Software RAID HOWTO if you'd like to learn how
to create a root RAID filesystem
- For updated versions of raidtools-0.90, keep an eye on people.redhat.com
- Find a recent kernel in The Linux Kernel Archives
- Take an IBM developerWorks tutorial showing you how to recompile and install a new kernel from sources
- Find the raidtools program
- Snag the latest version of raidtools
- Check out more tips on Software Raid solutions for Linux
Daniel Robbins lives in Albuquerque, New Mexico. He is the President/CEO of Gentoo Technologies, Inc., the Chief Architect of the Gentoo Project and a contributing author to several books published by MacMillan: Caldera OpenLinux Unleashed, SuSE Linux Unleashed, and Samba Unleashed. Daniel has been involved with computers in some fashion since the second grade, when he was first exposed to the Logo programming language as well as a potentially dangerous dose of Pac-Man. This probably explains why he has since served as a Lead Graphic Artist at SONY Electronic Publishing/Psygnosis. Daniel enjoys spending time with his wife Mary, and his new baby daughter Hadassah. You can contact Daniel at drobbins@gentoo.org.




