Put virtual filesystems to work
VFS abstraction proves surprisingly effective in practice
This content is part # of # in the series: Server clinic
This content is part of the series:Server clinic
Stay tuned for additional content in this series.
"You won't understand how useful it is until you've tried it for yourself." When someone offers me that line, my usual reaction is to think the speaker doesn't understand the feature in question well enough to explain it.
In the case of virtual filesystems, though, my own experience tells me that the speaker is right. Jeffrey Hobbs is a Senior Developer with ActiveState Corp., and both of us have seen how unexpectedly powerful working with a virtual filesystem (VFS) can be.
Simple idea, big consequences
The idea behind a VFS is simple: it represents as a filesystem something that is not a filesystem. Filesystem here means a "conventional Linux-like filesystem": a tree or hierarchy of directly accessible directories and (ordinary) files. The concept should intrigue anyone working with Linux, of course, simply because so much of Linux's own character comes from the representation of devices, tables, and other objects within the UNIX filesystem. UNIX is founded on the principle that everything, or at least plenty of things, are files; VFS generalizes this to view as much as possible as a filesystem.
Note: Linux kernel engineers also speak of VFS, but in a different sense. This month's column is not about the Linux virtual filesystem switch, which dispatches filesystem drivers for ext2, ext3, reiserfs, and so on.
One way to think about VFS is that it's another example of a technology or concept that is "blurring the line between OSes and high-level language environments," in the words of independent developer Jean-Claude Wippler. Among other things, exposure in an application development language of a "system service" makes portability easier, for the operating system simply vanishes from view.
So, what kinds of things aren't filesystems, but are useful to represent that way? Lots of them: files accessible through FTP, HTTP, WebDAV, or other networking protocols; the contents of a
.zip, CVS (concurrent versions system), or other archive file; database tables; projections of real filesystems restricted by security or other constraints; and many more.
You can easily see how such resources will map onto filesystems in a natural way. Suppose an
example.zip archives these files:
This archive was probably created as the direct (partial) image of an existing filesystem tree, and is naturally represented by the rooted tree:
Plenty of industrial-strength products rely on VFS. The architecture of such DB2 features as its journalling filesystem rests on a VFS model. The well-known Zope application server provides a slightly more challenging example of a VFS. Zope's "acquisition" concept maps programmatic objects to URLs. In Zope, invocation of a method such as:
context.myproject.object1.method1(year = "1999")
corresponds to an HTTP request for the URL:
Do you see the benefit? VFS is also like UNIX's "everything is a file" concept, in that it's easy to understand and imitate the idea, but it might take years to appreciate how much it simplifies application design.
Consider an example. Suppose you have written a text editor; it provides means for accessing individual files, reading them, modifying them, and writing them back to storage. If you drop in a filesystem "virtualizer," you suddenly can use all the same code to navigate an FTP or ZIP archive, select individual items, modify them, and save them. Do you have a browser or backup utility or security scanner or version-control system that serves you well when it operates on local files? Virtualize its filesystem access and it immediately acts on
.tar files, old tape reels, and corporate resources accessible only through a virtual private network (VPN). Vendors like to sell such add-ons for thousands of dollars. VFS gives them for free.
Or almost free. Programmers don't have anything new to learn; they just keep doing the same filesystem
closes, and so on they've always used. And that's the point: all the code looks as it always has. The only difficulty is that few language run-time libraries currently support full-blown VFS. Among the common difficulties is that drivers are often read-only, either because write capabilities require more delicate programming, or simply because write operations don't have a place in such protocols as HTTP.
Who has VFS?
The language with the best support for VFS is Tcl, Hobbs' specialty at ActiveState. In other languages, including Java and Perl, existing implementations of VFS are "impure" in that they provide new methods such as Perl's
vfsopen to supplement core library entry points. In Tcl's release 8.4, in contrast, "Tcl's filesystem is completely virtual filesystem aware," as the community page on VFS explains. Among other things, this means that virtual file resources can be used syntactically any place ordinary files are recognized. Classic Tcl allowed this:
image create -file myimage.gif
Tcl 8.4 extends this naturally to:
image create -file ftp://myserver.com/myimage.gif
Independent consultant Matt Newman first implemented a robust VFS for Tcl in the late 1990s to facilitate development he did on behalf of large financial corporations. Late in 2000, Vince Darley, a scientist with Eurobios, prepared an ambitious rewrite of Tcl's filesystem application programming interface (API) that, among other advantages, gives VFS hooks.
Most working Tcl application developers first apply VFS in terms of the examples above -- a slick FTP-savvy editor, for instance. It's interesting to note, though, that the primary motivation of Tcl's VFS pioneers, including Wippler (mentioned in a previous "Server clinic" column on application deployment, see Related topics), was in "inside-out" VFS. That is, rather than using VFS technology to access existing external resources, it manages a special-purpose database, internal to the application, as a complete filesystem. They've achieved remarkable simplifications in application development and deployment through this mechanism. In this perspective, VFS is a way to unify and consolidate different operating environments -- development, quality assurance, customer site -- so that applications operate consistently across all of them. Transparent filesystem generalization to outside files is just a bonus. Listing 1 shows a couple of code examples adapted from the Tcl community page that hint at the power of that bonus.
Listing 1. Using VFS in Tcl
package require vfs::urltype vfs::urltype::Mount ftp # With VFS activated, normal Tcl file commands # can copy files even to and from FTP servers. file copy ftp://foo.bar.com/pub/Readme . file copy myfile ftp://user:email@example.com/private.txt package require vfs::zip vfs::zip::Mount foo.zip foo.zip # foo.zip is now part of the normal filesystem hierarchy. cd foo.zip # Within subdir1 on foo.zip, list all items. set listing [glob -dir subdir1 *]
Most languages have facilities for accessing FTP or ZIP. In that sense, VFS is like object orientation, high-level languages, or run-time libraries: it provides nothing new, and you can do in your own programming everything VFS does. With VFS, though, resource management is far easier and more rational. It's not a hard concept; it's just a good one.
VFS simplifies a range of interesting problems. Tcl programmers are currently working on such problems as:
- Centralized compressed archives of versioned dynamically-loadable libraries
- Intelligent network agents
- Advanced "mixed-media" backup managers
I've been experimenting with the programmability of "active filesystems." Think of these as analogies to the "properties" some languages support. Properties are attributes whose access can involve side effects: writing:
result = thing.property
might not just retrieve the value of "property" from memory, but might involve a calculation that pulls data across a network, checks system status, and so on. Similarly, when I write:
file copy report.pdf backup
this doesn't just copy a file to a directory, but, depending on circumstances, generates the file and does other operations.
Property-oriented programming is recognized as hazardous for its density of side effects, and certainly it achieves nothing that can't be done, in principal, with explict accessor methods. Still, it's a style that fits certain situations, and I'm finding that active filesystems are at least occasionally easier to code than the
makefiles and scripts they replace.
Tcl is way ahead of other languages in its VFS sophistication; it has tackled difficult problems of character encoding, I/O architecture, and performance constraints that are only now becoming evident in other language communities. You don't have to work in Tcl, though, to use the idea of VFS. Would your own programming benefit from more uniformity in the way it accesses file-like resources? Most modern languages provide some way you can extend built-in constructs so you can code your I/O in a more consistent and powerful manner.
An impressively large number of programmers has contributed to Tcl's VFS capabilities, let alone those of other languages. Among those who particularly deserve mention beside the developers cited earlier are Andreas Kupries of ActiveState for his work with I/O channels, and Kevin Kenny of GE, who has studied "bootstrapping" issues that VFS raises.
- Check out the other installments of Server clinic.
- The Wiki page on VFS is the best place to begin reading about Tcl's VFS capabilities.
- VFS is a major part of the technical infrastructure that gives Zope its renowned power.
- Apache provides VFS as a component of Jakarta.
- Starkit is the culmination of Wippler's research in VFS and related areas. A Starkit is an application that builds in interesting solutions to common problems of portability, persistence, and deployment.