Introduction to emerging file systems
This content is part # of # in the series: Speaking UNIX
This content is part of the series:Speaking UNIX
Stay tuned for additional content in this series.
It's an old adage but nonetheless accurate: "In UNIX®, everything is a file". Conceptually, every UNIX resource is a simple file that you can open, read from, write to, or both. Your experiment data is a file, as is your shell startup script, the UNIX kernel, your home directory, each network socket, and the /bin/ls executable, among others.
However, not every file is identical. Your experiment data may be a highly customized database.
A shell script is a text file, albeit with one important distinction: The leading
!# (often called shebang ) line dictates which
application interprets the file. The kernel is a binary, as is each executable, both with
predictable, specific formats that system tools can manipulate. A directory is a
specialized index for cataloging contents.
Further, not every file is stored in the same way. The structure and size of a file depends on the underlying file system—the subsystem that persists the file data to a physical device. The same data—say, two copies of the same file—is organized differently on disparate file systems. Each file system dictates its own storage strategy, which may be designed to meet a specific criterion. For example, one file system may be optimized for speed, another for efficient use of space, and yet another for durability against data loss or corruption. Typically, there is no right or wrong file system to deploy: You must analyze your storage needs and pick the file system or combination of file systems that suit your purpose. UNIX (and a battery of vendors) offer a great number of file systems, so you're sure to find something advantageous.
Table 1 lists just a few file systems available for UNIX (and in many cases, Linux®).
Table 1. UNIX file systems
|Zettabyte file system (ZFS)||An essentially infinite file system (a zettabyte is 270 bytes) that you can construct using common, off-the-shelf storage media, ZFS can also compress data on write, maximizing media even further.|
|Network file system (NFS)||A stalwart feature of UNIX, NFS makes remote file systems appear to be local. NFS is ideal for sharing data.|
|Journaling file system (JFS)||JFS, among many other alternatives, retains a log, or journal, of changes made to each file. In the event of a crash or corruption, the journal is "replayed" to restore the file.|
|B-tree file system (Btrfs)||One of the newest file systems and intended to be the default file system for upcoming versions of Linux, Btrfs maintains data and metadata checksums to bulletproof persistence. Btrfs also provides snapshots and can mix devices of almost any size.|
A file system of your own
Commonly, a file system requires media to be tailored to it, so changing from one file system to another typically entails a backup, formatting physical drives, and a lengthy restore and validation process. Obviously, such changes are radical and are rightly the sole purview of the systems administrator.
However, a recent extension to many UNIX and Linux kernels allows any user to create and use a variety of file systems with no special privileges. Dubbed FUSE, short for Filesystem in User Space, you can use the software to create new file systems with relatively little effort. File systems derived from FUSE can mount files via SFTP; automatically scan files for viruses; and treat a collection of read-only CD-ROM discs as local, resident data files.
Let's install FUSE on a common operating system platform and explore some of the novel file systems now available to any user. Modern UNIX and Linux kernels typically include support for FUSE, so building the software is similar to building any utility code. You must download the package, unpack it, run a configuration script to detect the capabilities of your system, and build and install the code. Listing 1 shows the code for installing FUSE.
Listing 1. Installing FUSE
$ wget http://downloads.sourceforge.net/project/\ fuse/fuse-2.X/2.8.4/fuse-2.8.4.tar.gz $ tar xzf fuse-2.8.4.tar.gz $ cd fuse-2.8.4 $ ./configure $ make $ sudo make install
FUSE version 2.8.4 was the latest version at publication time. Be sure to download the
latest and greatest code from the FUSE project home page (see
Related topics for a link). After compilation, the FUSE package
installs a number of libraries required to build new FUSE file systems and an
application named fusermount to mount and unmount FUSE file systems.
Fusermount has a few special options but otherwise accepts options you would
typically provide to the vanilla
With the foundational code installed, you can continue to install a FUSE file system.
The first FUSE file system to try is SSHFS, a file system based on SFTP. Via SSHFS,
you can mount any remote directory as a local file system as long your server
supports SSH, which most do. Before you can continue, ensure that your system has
Glib 2.0 and
Gthread 2.0. If
your machine has GNOME, you likely have these libraries; otherwise, install them
from source or via your operating system's native package manager. (Debian Linux
systems have Aptitude. Red Hat Linux systems provide
Listing 2 shows the code to set up SSHFS.
Listing 2. Installing SSHFS
$ wget http://sourceforge.net/projects/\ fuse/files/sshfs-fuse/2.2/sshfs-fuse-2.2.tar.gz/download $ tar xzf sshfs-fuse-2.2.tar.gz $ cd sshfs-fuse-2.2 $ ./configure $ make $ sudo make install
If you run Ubuntu or another variant of UNIX or Linux, your distribution may provide pre-built binaries of all software packages. For example, on Ubuntu, you can install all the FUSE software listed so far with the command:
$ apt-get install libglib2.0-dev fuse-utils libfuse2 sshfs
If your dependencies are up to date, you may also be able to install SSHFS with the
apt-get install sshfs.
When the FUSE libraries and SSHFS are installed, you can mount any remote directory
you can access via SSH as a local file system (see Listing 3).
you provide the name of the remote system and your login credentials, the remote
directory you want to mount, and a local mount point, which can be any local
Listing 3. Mount a remote SSH-accessible directory as a file system
$ ssh firstname.lastname@example.org ls bin Documents lib Media Pictures Sites tmp Desktop Downloads Library Movies projects Source Documentation Dropbox local Music Public src $ cd ~ $ mkdir -p mounts/example $ sshfs email@example.com:/home/me mounts/example $ cd mounts/example $ ls bin Documents lib Media Pictures Sites tmp Desktop Downloads Library Movies projects Source Documentation Dropbox local Music Public src
As you can see, the contents of the remote directory are now available as readily as
local files. If you're a developer and use multiple systems for your work, SSHFS
makes local commands—think
make, and others—operate on remote files.
To unmount a FUSE file system, you can type
/some/mount/point. Or, because FUSE file systems behave just
like normal file systems, you can type
Another remote file system variant is called S3FS. Rather than use your own remote server and SFTP, S3FS mounts an Amazon Simple Storage Service (Amazon S3) bucket as a local file system. S3FS stores files "natively and transparently in S3," and you can enable local caching to minimize downloads. The maximum file size is 5GB. Like SSHFS, you can download and build the source and use it right away.
Note: The S3FS package depends on the
library development headers and the
Listing 4 shows the code to mount an S3FS instance.
Listing 4. Installing S3FS
$ wget http://s3fs.googlecode.com/files/s3fs-r191-source.tar.gz $ tar xzf s3fs-r191-source.tar.gz $ cd s3fs $ make $ sudo make install
Assuming that you already have an Amazon S3 account, which provides an access key and a secret access key, you can mount any bucket you own with a command like this:
$ s3fs bucket_name -ouse_cache=/tmp -o accessKeyId=your_access_key \ -o secretAccessKey=your_secret_access_key /mnt
With S3FS, you can keep an entire Git repository right on Amazon S3 or use
rsync to put backups on Amazon's storage system.
Peer into any archive
Another great use of FUSE is
archivemount, a utility to
mount many forms of archives, including tarballs (or gzip-compressed tar archives),
as a local file system. Assuming that you have
available on your system, you can build
quickly, as Listing 5 shows.
Listing 5. Installing archivemount
$ wget http://www.cybernoia.de/software/archivemount/archivemount-0.6.1.tar.gz $ tar xzf archivemount-0.6.1.tar.gz $ cd archivemount-0.6.1 $ ./configure $ make $ sudo make install
As an example, you can use
archivemount to mount its
own tarball as a file system. Listing 6 shows the code.
Listing 6. Mounting a tarball as a file system
$ tar tfz archivemount-0.6.1.tar.gz archivemount-0.6.1/ archivemount-0.6.1/README ... archivemount-0.6.1/archivemount.c archivemount-0.6.1/CHANGELOG $ mkdir -p ~/mnt/tgz $ archivemount archivemount-0.6.1.tar.gz ~/mnt/tgz $ cd ~/mnt/tgz/archivemount-0.6.1 $ ls aclocal.m4 archivemount.c configure depcomp Makefile.in archivemount.1 CHANGELOG configure.ac install-sh missing archivemount.1.in config.h.in COPYING Makefile.am README
As before with SSHFS,
archivemount provides seamless,
local file system-like access to the tarball. By the way, if you modify, delete, or
add any files to the "file system" of the tarball, all changes are saved to the
original tarball when you unmount the FUSE file system. The command
fusermount -u ~/mnt/tgz unmounts the FUSE file system
from the given mount point. Listing 7 provides a look into a
tarball "file system."
Listing 7. Writing a file via archivemount
$ cd ~/mnt/tgz/archivemount-0.6.1 $ echo 'Remember this for later.' > NOTES $ ls aclocal.m4 CHANGELOG COPYING Makefile.in archivemount.1 config.h.in depcomp missing archivemount.1.in configure install-sh NOTES archivemount.c configure.ac Makefile.am README $ fusermount -u ~/mnt/tgz $ tar tfz archivemount-0.6.1.tar.gz archivemount-0.6.1/ archivemount-0.6.1/README ... archivemount-0.6.1/CHANGELOG archivemount-0.6.1/NOTES
tar command in Listing 7
NOTES was indeed added to the tarball.
One caveat about expanding archives with
The author of the software warns against writing into an archive, because if the Write
operation fails, all changes are lost. In general, though, a small number of Write
operations work fine. Nonetheless, you'll likely find the read-only features of
If you want to review what's mounted via FUSE, simply run the typical
command and look for systems marked
fuse, as shown
in Listing 8.
Listing 8. FUSE-mounted file systems
$ mount /dev/sda1 on / type ext3 (rw,relatime,errors=remount-ro) tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755) ... udev on /dev type tmpfs (rw,mode=0755) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620) fusectl on /sys/fs/fuse/connections type fusectl (rw) archivemount on /home/strike/mnt/tgz type fuse.archivemount (rw,nosuid,nodev,user=strike)
The testbed for this article was Ubuntu version 10 running on a Dell® desktop PC.
mount transcript in Listing 8,
you can see the
archivemount file system at the bottom
and the typical physical devices mounted by a Linux machine at the top.
A file system for your secrets
Based on the previous two examples, you can imagine that FUSE can wrap many networked
services and file formats to provide seamless access to remote or packaged files.
Indeed, there are many working implementations.
mounts Wikipedia as a local file system, where you can edit articles locally. Other
packages provide much the same features for the Blogger, Flickr, and Google Mail
Another use of FUSE is mirroring, or maintaining copies of files in parallel.
One canonical example is the FUSE-centric Encrypted File System (EncFS). Given a
source directory and a target directory, all files written to the source directory are
automatically encrypted and saved to the target directory.
a flexible message logging library, is a prerequisite for EncFS. Install it from source or
from your distribution's repository and then continue to build EncFS, as shown in
Listing 9. Installing EncFS
$ wget http://encfs.googlecode.com/files/encfs-1.6-1.tgz $ tar xzf encfs-1.6-1.tgz $ cd encfs-1.6-1 $ ./configure $ make $ sudo make install
After the package is installed, you are ready to go. You must create two new
directories—one for the original files and one for the encrypted files.
encfs utility maps the two together, as shown in
Listing 10. Creating the directory tandem for encryption
$ mkdir ~/data $ mkdir ~/encrypted_data $ encfs ~/encrypted_data ~/data Creating new encrypted volume. Please choose from one of the following options: enter "x" for expert configuration mode, enter "p" for preconfigured paranoia mode, anything else, or an empty line will select standard mode. ?> p Paranoia configuration selected. Configuration finished. The file system to be created has the following properties: Filesystem cipher: "ssl/aes", version 2:2:1 Filename encoding: "nameio/block", version 3:0:1 Key Size: 256 bits Block Size: 1024 bytes, including 8 byte MAC header Each file contains 8 byte header with unique IV data. Filenames encoded using IV chaining mode. File data IV is chained to filename IV. File holes passed through to ciphertext. -------------------------- WARNING -------------------------- The external initialization-vector chaining option has been enabled. This option disables the use of hard links on the file system. Without hard links, some programs may not work. The programs 'mutt' and 'procmail' are known to fail. For more information, please see the encfs mailing list. If you would like to choose another configuration setting, please press CTRL-C now to abort and start over. Now you will need to enter a password for your file system. You will need to remember this password, as there is absolutely no recovery mechanism. However, the password can be changed later using encfsctl. New Encfs Password: Verify Encfs Password:
In general, you can choose "paranoia" mode. Next, type a password to protect your files and confirm the password by entering it again. Listing 11 shows the code.
Listing 11. EncFS automatically encrypts new files
$ cd ~/data $ vi secrets.txt [Create file with your secrets.] $ ls secrets.txt $ ls ~/encrypted_data LKs2bi5sfttNlyExybM6eNck
Each file that you create is reflected in the encrypted directory, and even the name of the file is encrypted. When you're finished, simply unmount the unencrypted directory, as shown in Listing 12. The contents of the directory are removed, leaving only the secure files.
Listing 12. Unmounting the encrypted directory
$ fusermount -u ~/data $ ls ~/data No such file or directory $ ls ~/encrypted_data LKs2bi5sfttNlyExybM6eNck
To view your unencrypted files, connect a mount point to the previously encrypted set of files. Type your password when prompted to authenticate yourself:
$ encfs ~/encrypted_data ~/data EncFS Password: $ ls ~/data secrets.txt
Once remounted, your original files are available and viewable as plain text. Any change to a file causes it to be re-encoded.
FUSE: It's not a toy!
Although the previous examples are fairly simple, do not underestimate the power of FUSE. Some high-powered file storage solutions also run in user space. For example, ZFS-Fuse is an implementation of ZFS in user space. MooseFS is a fault-tolerant, network-distributed file system. It spreads data over several physical servers, but the user treats the amalgam as one UNIX-like file system resource. And GlusterFS is a high-performance, distributed file system based on FUSE.
Describing a GlusterFS deployment is beyond the scope of this article, but a brief introduction should demonstrate the capabilities that you can build with FUSE, eschewing kernel modifications and patches. Specifically, GlusterFS can:
- Span any number of machines.
- Aggregate disparate, heterogeneous storage devices and physical file systems—what GlusterFS refers to as bricks—into a unified pool or namespace.
- Assemble file system features—what GlusterFS calls translators—in an ad hoc fashion to build customized storage solutions. One translator provides replication, while another implements connectivity via TCP/IP.
- Boost performance with read-ahead, write-behind, and a number of other tricks.
GlusterFS version 3 greatly simplifies initial setup of the software, which is only a little more complicated than, say, establishing SSH keys across a subnetwork of systems. But better yet, GlusterFS requires no kernel updates or special customization. Because it is written with FUSE, it operates outside of the milieu of the kernel and can be extended by any developer.
Light the FUSE!
FUSE is an incredible technology. With it, you can develop new file systems in any number
of popular programming languages, including Python, Ruby, Lua,
the Java™ language, and more. Moreover, creating and deploying a new file
system does not mandate changes to the kernel of all running systems. You can learn
more about how FUSE works in tandem with the kernel in another developerWorks
article titled "Develop
your own filesystem with FUSE: No kernel programming required". Refer to the
Related topics section for additional reading material and links
to the projects mentioned here.
A great number of FUSE file systems are now available and more appear each day.
ZFS-Fuse and GlusterFS are two examples of very sophisticated solutions, yet
archivemount and SSHFS are no less useful. All of the
FUSE file systems address a particular requirement. If you have special demands,
you might consider cooking up your own file system with FUSE and a little bit of
- Speaking UNIX: Check out other parts in this series.
- Zettabyte file system: Read more about the fantastic ZFS, originally created by Sun and now maintained by Oracle.
- Network file system: Learn more about NFS and its long history. The original version of NFS dates back to 1989.
- Journaled file system: Many file systems provide journaling to prevent loss of data. The original JFS was created by IBM in 1990.
- B-tree file system: Wikipedia provides a good introduction to Btrfs and offers links to many other sources of code and technical information.
- MooseFS: MooseFS is a fault-tolerant, network-distributed file system. It works on any system with a FUSE implementation, including Mac OS X.
libarchive: Download the source for
libarchivefrom its repository on Google Code.
libarchiveis required to build and use
archivemount: Download the source code for
archivemountfrom its project page. The software mounts archive files as a local file system.