Speaking UNIX

Introduction to emerging file systems


Content series:

This content is part # of # in the series: Speaking UNIX

Stay tuned for additional content in this series.

This content is part of the series:Speaking UNIX

Stay tuned for additional content in this series.

It's an old adage but nonetheless accurate: "In UNIX®, everything is a file". Conceptually, every UNIX resource is a simple file that you can open, read from, write to, or both. Your experiment data is a file, as is your shell startup script, the UNIX kernel, your home directory, each network socket, and the /bin/ls executable, among others.

However, not every file is identical. Your experiment data may be a highly customized database. A shell script is a text file, albeit with one important distinction: The leading !# (often called shebang ) line dictates which application interprets the file. The kernel is a binary, as is each executable, both with predictable, specific formats that system tools can manipulate. A directory is a specialized index for cataloging contents.

Further, not every file is stored in the same way. The structure and size of a file depends on the underlying file system—the subsystem that persists the file data to a physical device. The same data—say, two copies of the same file—is organized differently on disparate file systems. Each file system dictates its own storage strategy, which may be designed to meet a specific criterion. For example, one file system may be optimized for speed, another for efficient use of space, and yet another for durability against data loss or corruption. Typically, there is no right or wrong file system to deploy: You must analyze your storage needs and pick the file system or combination of file systems that suit your purpose. UNIX (and a battery of vendors) offer a great number of file systems, so you're sure to find something advantageous.

Table 1 lists just a few file systems available for UNIX (and in many cases, Linux®).

Table 1. UNIX file systems
File systemBenefit
Zettabyte file system (ZFS)An essentially infinite file system (a zettabyte is 270 bytes) that you can construct using common, off-the-shelf storage media, ZFS can also compress data on write, maximizing media even further.
Network file system (NFS)A stalwart feature of UNIX, NFS makes remote file systems appear to be local. NFS is ideal for sharing data.
Journaling file system (JFS)JFS, among many other alternatives, retains a log, or journal, of changes made to each file. In the event of a crash or corruption, the journal is "replayed" to restore the file.
B-tree file system (Btrfs)One of the newest file systems and intended to be the default file system for upcoming versions of Linux, Btrfs maintains data and metadata checksums to bulletproof persistence. Btrfs also provides snapshots and can mix devices of almost any size.

A file system of your own

Commonly, a file system requires media to be tailored to it, so changing from one file system to another typically entails a backup, formatting physical drives, and a lengthy restore and validation process. Obviously, such changes are radical and are rightly the sole purview of the systems administrator.

However, a recent extension to many UNIX and Linux kernels allows any user to create and use a variety of file systems with no special privileges. Dubbed FUSE, short for Filesystem in User Space, you can use the software to create new file systems with relatively little effort. File systems derived from FUSE can mount files via SFTP; automatically scan files for viruses; and treat a collection of read-only CD-ROM discs as local, resident data files.

Let's install FUSE on a common operating system platform and explore some of the novel file systems now available to any user. Modern UNIX and Linux kernels typically include support for FUSE, so building the software is similar to building any utility code. You must download the package, unpack it, run a configuration script to detect the capabilities of your system, and build and install the code. Listing 1 shows the code for installing FUSE.

Listing 1. Installing FUSE
$ wget\
$ tar xzf fuse-2.8.4.tar.gz
$ cd fuse-2.8.4

$ ./configure
$ make
$ sudo make install

FUSE version 2.8.4 was the latest version at publication time. Be sure to download the latest and greatest code from the FUSE project home page (see Related topics for a link). After compilation, the FUSE package installs a number of libraries required to build new FUSE file systems and an application named fusermount to mount and unmount FUSE file systems. Fusermount has a few special options but otherwise accepts options you would typically provide to the vanilla mount command.

Do-it-yourself NFS

With the foundational code installed, you can continue to install a FUSE file system. The first FUSE file system to try is SSHFS, a file system based on SFTP. Via SSHFS, you can mount any remote directory as a local file system as long your server supports SSH, which most do. Before you can continue, ensure that your system has Glib 2.0 and Gthread 2.0. If your machine has GNOME, you likely have these libraries; otherwise, install them from source or via your operating system's native package manager. (Debian Linux systems have Aptitude. Red Hat Linux systems provide rpm, yum, and YaST.) Listing 2 shows the code to set up SSHFS.

Listing 2. Installing SSHFS
$ wget\
$ tar xzf sshfs-fuse-2.2.tar.gz
$ cd sshfs-fuse-2.2
$ ./configure
$ make
$ sudo make install

If you run Ubuntu or another variant of UNIX or Linux, your distribution may provide pre-built binaries of all software packages. For example, on Ubuntu, you can install all the FUSE software listed so far with the command:

$ apt-get install libglib2.0-dev fuse-utils libfuse2 sshfs

If your dependencies are up to date, you may also be able to install SSHFS with the simple command apt-get install sshfs.

When the FUSE libraries and SSHFS are installed, you can mount any remote directory you can access via SSH as a local file system (see Listing 3). Combining ssh and mount, you provide the name of the remote system and your login credentials, the remote directory you want to mount, and a local mount point, which can be any local directory.

Listing 3. Mount a remote SSH-accessible directory as a file system
$ ssh ls
bin            Documents  lib      Media   Pictures  Sites   tmp
Desktop        Downloads  Library  Movies  projects  Source
Documentation  Dropbox    local    Music   Public    src

$ cd ~
$ mkdir -p mounts/example
$ sshfs mounts/example
$ cd mounts/example
$ ls
bin            Documents  lib      Media   Pictures  Sites   tmp
Desktop        Downloads  Library  Movies  projects  Source
Documentation  Dropbox    local    Music   Public    src

As you can see, the contents of the remote directory are now available as readily as local files. If you're a developer and use multiple systems for your work, SSHFS makes local commands—think cp, make, and others—operate on remote files.

To unmount a FUSE file system, you can type fusermount -u /some/mount/point. Or, because FUSE file systems behave just like normal file systems, you can type umount /some/mount/point.

Another remote file system variant is called S3FS. Rather than use your own remote server and SFTP, S3FS mounts an Amazon Simple Storage Service (Amazon S3) bucket as a local file system. S3FS stores files "natively and transparently in S3," and you can enable local caching to minimize downloads. The maximum file size is 5GB. Like SSHFS, you can download and build the source and use it right away.

Note: The S3FS package depends on the cURL library development headers and the LibXML2 library.

Listing 4 shows the code to mount an S3FS instance.

Listing 4. Installing S3FS
$ wget
$ tar xzf s3fs-r191-source.tar.gz
$ cd s3fs
$ make
$ sudo make install

Assuming that you already have an Amazon S3 account, which provides an access key and a secret access key, you can mount any bucket you own with a command like this:

$ s3fs bucket_name -ouse_cache=/tmp -o accessKeyId=your_access_key \
  -o secretAccessKey=your_secret_access_key /mnt

With S3FS, you can keep an entire Git repository right on Amazon S3 or use rsync to put backups on Amazon's storage system.

Peer into any archive

Another great use of FUSE is archivemount, a utility to mount many forms of archives, including tarballs (or gzip-compressed tar archives), as a local file system. Assuming that you have libarchive available on your system, you can build archivemount quickly, as Listing 5 shows.

Listing 5. Installing archivemount
$ wget
$ tar xzf archivemount-0.6.1.tar.gz
$ cd archivemount-0.6.1
$ ./configure
$ make
$ sudo make install

As an example, you can use archivemount to mount its own tarball as a file system. Listing 6 shows the code.

Listing 6. Mounting a tarball as a file system
$ tar tfz archivemount-0.6.1.tar.gz

$ mkdir -p ~/mnt/tgz
$ archivemount archivemount-0.6.1.tar.gz ~/mnt/tgz
$ cd ~/mnt/tgz/archivemount-0.6.1
$ ls
aclocal.m4         archivemount.c  configure     depcomp
archivemount.1     CHANGELOG  install-sh   missing     COPYING  README

As before with SSHFS, archivemount provides seamless, local file system-like access to the tarball. By the way, if you modify, delete, or add any files to the "file system" of the tarball, all changes are saved to the original tarball when you unmount the FUSE file system. The command fusermount -u ~/mnt/tgz unmounts the FUSE file system from the given mount point. Listing 7 provides a look into a tarball "file system."

Listing 7. Writing a file via archivemount
$ cd ~/mnt/tgz/archivemount-0.6.1
$ echo 'Remember this for later.' > NOTES
$ ls
aclocal.m4         CHANGELOG     COPYING
archivemount.1   depcomp      missing  configure     install-sh   NOTES
archivemount.c  README
$ fusermount -u ~/mnt/tgz

$ tar tfz archivemount-0.6.1.tar.gz

The final tar command in Listing 7 confirms that NOTES was indeed added to the tarball. One caveat about expanding archives with archivemount: The author of the software warns against writing into an archive, because if the Write operation fails, all changes are lost. In general, though, a small number of Write operations work fine. Nonetheless, you'll likely find the read-only features of archivemount compelling.

If you want to review what's mounted via FUSE, simply run the typical mount command and look for systems marked fuse, as shown in Listing 8.

Listing 8. FUSE-mounted file systems
$ mount
/dev/sda1 on / type ext3 (rw,relatime,errors=remount-ro)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
fusectl on /sys/fs/fuse/connections type fusectl (rw)
archivemount on /home/strike/mnt/tgz type fuse.archivemount (rw,nosuid,nodev,user=strike)

The testbed for this article was Ubuntu version 10 running on a Dell® desktop PC. In the mount transcript in Listing 8, you can see the archivemount file system at the bottom and the typical physical devices mounted by a Linux machine at the top.

A file system for your secrets

Based on the previous two examples, you can imagine that FUSE can wrap many networked services and file formats to provide seamless access to remote or packaged files. Indeed, there are many working implementations. WikipediaFS mounts Wikipedia as a local file system, where you can edit articles locally. Other packages provide much the same features for the Blogger, Flickr, and Google Mail services.

Another use of FUSE is mirroring, or maintaining copies of files in parallel. One canonical example is the FUSE-centric Encrypted File System (EncFS). Given a source directory and a target directory, all files written to the source directory are automatically encrypted and saved to the target directory. librlog, a flexible message logging library, is a prerequisite for EncFS. Install it from source or from your distribution's repository and then continue to build EncFS, as shown in Listing 9.

Listing 9. Installing EncFS
$ wget
$ tar xzf encfs-1.6-1.tgz
$ cd encfs-1.6-1
$ ./configure
$ make
$ sudo make install

After the package is installed, you are ready to go. You must create two new directories—one for the original files and one for the encrypted files. The encfs utility maps the two together, as shown in Listing 10.

Listing 10. Creating the directory tandem for encryption
$ mkdir ~/data
$ mkdir ~/encrypted_data
$ encfs ~/encrypted_data ~/data
Creating new encrypted volume.
Please choose from one of the following options:
 enter "x" for expert configuration mode,
 enter "p" for preconfigured paranoia mode,
 anything else, or an empty line will select standard mode.
?> p

Paranoia configuration selected.

Configuration finished.  The file system to be created has
the following properties:
Filesystem cipher: "ssl/aes", version 2:2:1
Filename encoding: "nameio/block", version 3:0:1
Key Size: 256 bits
Block Size: 1024 bytes, including 8 byte MAC header
Each file contains 8 byte header with unique IV data.
Filenames encoded using IV chaining mode.
File data IV is chained to filename IV.
File holes passed through to ciphertext.

-------------------------- WARNING --------------------------
The external initialization-vector chaining option has been
enabled.  This option disables the use of hard links on the
file system. Without hard links, some programs may not work.
The programs 'mutt' and 'procmail' are known to fail.  For
more information, please see the encfs mailing list.
If you would like to choose another configuration setting,
please press CTRL-C now to abort and start over.

Now you will need to enter a password for your file system.
You will need to remember this password, as there is absolutely
no recovery mechanism.  However, the password can be changed
later using encfsctl.

New Encfs Password:
Verify Encfs Password:

In general, you can choose "paranoia" mode. Next, type a password to protect your files and confirm the password by entering it again. Listing 11 shows the code.

Listing 11. EncFS automatically encrypts new files
$ cd ~/data
$ vi secrets.txt
[Create file with your secrets.]
$ ls
$ ls ~/encrypted_data

Each file that you create is reflected in the encrypted directory, and even the name of the file is encrypted. When you're finished, simply unmount the unencrypted directory, as shown in Listing 12. The contents of the directory are removed, leaving only the secure files.

Listing 12. Unmounting the encrypted directory
$ fusermount -u ~/data
$ ls ~/data
No such file or directory
$ ls ~/encrypted_data

To view your unencrypted files, connect a mount point to the previously encrypted set of files. Type your password when prompted to authenticate yourself:

$ encfs ~/encrypted_data ~/data
EncFS Password:

$ ls ~/data

Once remounted, your original files are available and viewable as plain text. Any change to a file causes it to be re-encoded.

FUSE: It's not a toy!

Although the previous examples are fairly simple, do not underestimate the power of FUSE. Some high-powered file storage solutions also run in user space. For example, ZFS-Fuse is an implementation of ZFS in user space. MooseFS is a fault-tolerant, network-distributed file system. It spreads data over several physical servers, but the user treats the amalgam as one UNIX-like file system resource. And GlusterFS is a high-performance, distributed file system based on FUSE.

Describing a GlusterFS deployment is beyond the scope of this article, but a brief introduction should demonstrate the capabilities that you can build with FUSE, eschewing kernel modifications and patches. Specifically, GlusterFS can:

  • Span any number of machines.
  • Aggregate disparate, heterogeneous storage devices and physical file systems—what GlusterFS refers to as bricks—into a unified pool or namespace.
  • Assemble file system features—what GlusterFS calls translators—in an ad hoc fashion to build customized storage solutions. One translator provides replication, while another implements connectivity via TCP/IP.
  • Boost performance with read-ahead, write-behind, and a number of other tricks.

GlusterFS version 3 greatly simplifies initial setup of the software, which is only a little more complicated than, say, establishing SSH keys across a subnetwork of systems. But better yet, GlusterFS requires no kernel updates or special customization. Because it is written with FUSE, it operates outside of the milieu of the kernel and can be extended by any developer.

Light the FUSE!

FUSE is an incredible technology. With it, you can develop new file systems in any number of popular programming languages, including Python, Ruby, Lua, C/C++, the Java™ language, and more. Moreover, creating and deploying a new file system does not mandate changes to the kernel of all running systems. You can learn more about how FUSE works in tandem with the kernel in another developerWorks article titled "Develop your own filesystem with FUSE: No kernel programming required". Refer to the Related topics section for additional reading material and links to the projects mentioned here.

A great number of FUSE file systems are now available and more appear each day. ZFS-Fuse and GlusterFS are two examples of very sophisticated solutions, yet archivemount and SSHFS are no less useful. All of the FUSE file systems address a particular requirement. If you have special demands, you might consider cooking up your own file system with FUSE and a little bit of code.

Downloadable resources

Related topics

  • Speaking UNIX: Check out other parts in this series.
  • Zettabyte file system: Read more about the fantastic ZFS, originally created by Sun and now maintained by Oracle.
  • Network file system: Learn more about NFS and its long history. The original version of NFS dates back to 1989.
  • Journaled file system: Many file systems provide journaling to prevent loss of data. The original JFS was created by IBM in 1990.
  • B-tree file system: Wikipedia provides a good introduction to Btrfs and offers links to many other sources of code and technical information.
  • MooseFS: MooseFS is a fault-tolerant, network-distributed file system. It works on any system with a FUSE implementation, including Mac OS X.
  • libarchive: Download the source for libarchive from its repository on Google Code. libarchive is required to build and use archivemount.
  • archivemount: Download the source code for archivemount from its project page. The software mounts archive files as a local file system.


Sign in or register to add and subscribe to comments.

Zone=AIX and UNIX
ArticleTitle=Speaking UNIX: Introduction to emerging file systems