All about pseudo, Part 1: Being root without being root

Pseudo make me a file system

The pseudo project provides the ability for non-root users to run software installations that might otherwise require root privileges, without actually endangering the stability of the host system. This article, first in a series, introduces the pseudo project and the reasons it exists.

Peter Seebach, member of technical staff, Wind River Systems

Author photoPeter Seebach has been messing about with source code since before it was fashionable. He has worked on everything from language standardization to mouse drivers.



10 May 2011

Also available in Japanese

The announcement of the Yocto project (see Resources) refers to a component called pseudo that allows non-root users to run project builds that might otherwise require root privileges. The pseudo project grew out of internal build system needs of Wind River Linux but is now available as an open source project hosted on GitHub (see Resources).

This series looks at pseudo, from design to implementation, with some in-depth explanations of how it works. In this first installment, I explore some of the problems pseudo was intended to solve, and the reasons why we felt a new project was a better fit than working on one of the existing projects.

Why not fakeroot?

The most often asked question related to pseudo is, "Why not fakeroot?" The existing fakeroot utility is certainly available as open source, it's in use in Debian, and perhaps most importantly, it is existing code. (Curious about fakeroot? See Resources.)

If a million forum posters are eagerly awaiting my long rant about how much I hate fakeroot they will be sadly disappointed. The underlying problem is not a question of the code quality of fakeroot; it's a question of design choices and use cases. While in theory we could have retrofitted the kinds of features we wanted on fakeroot, it seemed like a bad idea.

The most significant distinction is simply lifespan of data. Fakeroot is designed to handle a single build. You start a fakeroot daemon (faked), you run your build talking to that daemon, you complete your build, and you shut the daemon down. All the data that describes files is stored in memory during the build, and discarded when the daemon terminates. In the Wind River Linux build system, we were trying to use a persistent database which would survive for weeks or months, containing data about dozens of packages. While fakeroot had some features for preserving a file database, they were secondary, and there were several possible race conditions, or other failure modes, where the database could be lost or corrupted.

Pronounced "sudo"

Actually, no; we pronounce "pseudo" following the English pronunciation [soo-doh]. However, my original plan was to insist that we pronounce it as "sudo" [soo-doo]. This was to be my revenge against the people who named PostgreSQL, a product name I still have trouble remembering how to pronounce.

In our use case, one common failure mode was that build interruptions could result in faked still running. Since it saved its database only on exit that meant the database wasn't saved. Our initial workaround was to set up a timeout after which faked would exit and save its database automatically; this turned out to create additional issues later.

The fakeroot design relies on running a server and then telling clients how to find the server. When you combine this with the server timeout feature, though, it's easy to have problems where the daemon exits during a long build, and then there's no way for the clients to talk to the (now-exited) server. We wanted a way for clients to restart the daemon when they needed it.

Another consideration is chroot(2) support. In earlier versions of our system we also used fakechroot (see Resources) in conjunction with fakeroot, and a locally developed fakepasswd library that allowed *pwent() calls to refer to a target file system password database, but we really wanted to converge on a single combined solution.

In short, the issue isn't that fakeroot was bad at doing what it did, but that what it did wasn't what we wanted done.


Design goals

Originally, we formed the design goals for pseudo around the needs of the Wind River Linux build system, which we call the Linux Distribution Assembly Tool (LDAT). We had a small list of key features:

  1. Crashed or failed servers go away "cleanly" — no spare System V shared memory segments to clean up.
  2. Crashes should not destroy previously available data in the database; it's permissible to fail to record a new file in the event of a crash, but previously recorded files should stay recorded.
  3. Crashes should not break an ongoing build — the build should, if possible, recover (restarting the server if needed).
  4. The database should have both device and inode numbers (more on this later) and file names in it whenever possible.
  5. Diagnostics should be clear, informative, and plentiful in the event of problems.
  6. Performance should be livable; slower than fakeroot would be acceptable, but it had to be possible to complete builds in a reasonable amount of time.
  7. Eventual inclusion of fakechroot and fakepasswd functionality. (This didn't make the initial release, but we've added it since.)
  8. There should be portability to all the Linux systems we support for our build system; anything beyond that would be nice, but not required.

The rest of this article looks at some of the design decisions we made in order to meet these goals.

The essential technique for using pseudo is the same as that used for fakeroot in its default use case. The pseudo client is a dynamic library; the library name is stuffed into the environment variable LD_PRELOAD, which the dynamic linker loads before loading any other libraries. This library provides its own implementations of the "emulated" system calls, such as chmod(2), chown(2), or stat(2). When the user application tries to call to any of these functions (most system calls are implemented as functions in the system C library, which do the actual system call magic), the call is routed to the pseudo client library code, rather than to the underlying C library.

On the first call into any of these functions, the pseudo client library populates a table of function pointers that then point to the "real" (libc) implementations of these functions. These functions can then be called by pseudo within wrappers, as needed. This is a tricky bit of code, and the next article in this series explains it in more detail.

There is no attempt in pseudo right now to try to handle static linking, and thus far, it hasn't mattered.

Database stability

We needed a fast and reliable database, with decent lookups, good performance, and most importantly, something that we could integrate into our daemon process that did not require a separate installation and setup. After studying our intended usage and needs, we chose SQLite.

To the best of my knowledge, since we first started pseudo over two years ago there have been no bugs caused by problems with SQLite. The combination of stability, speed, availability, and licensing terms has been unbeatable. Yes, there are tasks for which SQLite is not a good fit, but for what we're doing, it's been perfect. Of all the design decisions I've made in the course of working on pseudo, "use SQLite" is the single most rewarding choice.

The locking strategy we came up with is to rely on having only a single server working with a given database, and for that server to fully serialize calls. This reduced a large number of possible problem cases, without a huge penalty on performance. (There have been some surprises, even with everything serialized; I'll talk about those in the third article in this series.)

The database is actually split into two databases: a file database that records the pseudo environment (files, ownership, modes), and a log database that records events. By separating these, we intended to reduce the performance impact of logging queries of the file database; they are not merely separate tables, but two separate databases.

Even though we've cleaned it some, the database code is still full of ugly special cases and not-quite-successful attempts to introduce generality. However, it works. When the server starts up, it always checks to see whether its databases exist, and if they don't, it creates them. As a result of a major cleanup we did a year ago, it also has a second table in each database showing the current "version" of the database, with migrations to introduce new fields or flags that we've added since the original design.

Device and inode numbers were not quite enough

The original fakeroot design identified files only by their device and inode numbers. (This pair should be, at any given time, unique for each file on a system.) In our early build system, a moderately-recurring theme was a failure mode where, running under fakeroot, we would get odd failures; for example, an attempt to remove a plain file might fail with an error message to the effect that rmdir() had failed because the file was not a directory.

A usage error, not an inherent bug in fakeroot, caused this problem. A program running in the fakeroot environment would record a directory in the fakeroot database, and then a program running outside the fakeroot environment would remove it from the disk. When the inode number was later reused, fakeroot would mistakenly believe that the inode referred to a directory instead of a file.

In pseudo, we adopted multiple layers of defense against this. The first layer of defense is that queries to the database always include the file type and mode bits from the real file on disk, not just the device and inode. If the file type bits show a file and directory mismatch, the database entry is invalidated and a log message is produced.

The second layer is that we record paths for files whenever possible (and it is nearly always possible). This enables pseudo to report both the current path name and the previous path name used.

Together these two defenses allowed a third layer of defense in that, when running builds, we received warning diagnostics about such mismatches. Most of the time we could track down the error almost immediately, because we just had to look for where the old file path was deleted, and find out why it wasn't running inside the pseudo environment. Thanks to the combination of these three layers, mysterious database corruption issues that were occurring somewhat frequently became rare enough to call special attention.

Client and server communications

Client processes communicate with the pseudo daemon through a UNIX-domain socket (that's a socket in the file system) rather than using TCP or UDP. This solves two problems. First, it allows multiple pseudo daemons to coexist peacefully, and second, it makes it easy for clients to reliably find the server they're supposed to use.

In fact, instead of starting a server before starting any clients, by default you start clients that automatically launch a server if they need one and none is available. This change in design helped remove many lines of shell code that attempted to determine whether or not a faked process was available, and if so, whether it was the active process for a given build. With pseudo's design, it doesn't matter if there is an existing server; client processes will use the existing server if there is one, or start a new one if they need to.

Communications take the form of a standard message that encodes information about a query, or a response to a query. In all cases, the client sends a message to the server and gets a response. The client will wait for a response, but if the other end of the socket goes away, the client then tries again to start the server.

Thus far, the biggest shortcoming of the current client/server design is that I inexplicably forgot to have a version number field in the interface. This only caused a real problem once (under extremely specific circumstances, which are described in the third article in this series), but it was still a serious mistake.

Recovering from server failures

One of the implications of the client library starting the server on demand is that the client library can also restart the server on demand, for example, if the server crashes. During early testing, I ran pseudo with the server modified to randomly crash about one syscall in three, with the intent of carefully testing this functionality.

Some time later, while investigating some performance problems I was having with pseudo builds, I happened to check dmesg, and discovered that the pseudo daemon was being killed by segmentation faults. It turned out that I'd introduced (unintentionally, this time) a bug that was triggered by a common circumstance and it caused the server to die with a segmentation fault. The bug had been there for at least two weeks, and I hadn't noticed. I feel that this shows the robustness of the design, although in retrospect maybe a bit more logging would have been useful.


Why open source?

There was brief discussion of whether we ought to try to make pseudo a closed-source product. The engineers advocating creating pseudo wanted it to be open source, the managers agreed with our arguments.

The fundamental reason that this is an open source project, and not a closed-source secret bit of technology, is that we have no interest in being in the business of selling or supporting it. Although this software has to work for our product to work, it's not our product, and we don't want it to be. Think about the power outlets in a hotel; you would have a hard time pitching a hotel room without working power to most modern travelers, but that doesn't mean you want to be in the business of metering and selling electricity.

Making pseudo available as open source lets other projects, such as Yocto, (see Resources) use it. Developers who don't naturally think in terms of cooperation are likely to move elsewhere rather than working heavily with open source projects. If my first inclination weren’t "release it as open source", I probably wouldn't be thinking about embedded Linux build systems in the first place.


Coming up next

In the next part of this series, I discuss in more technical detail how pseudo works and why we made some specific technical choices.

Note: The pseudo project is ongoing, therefore, development changes may occur following the publication of this article.

Resources

Learn

Get products and technologies

  • IBM trial software: Innovate your next open source development project using trial software, available for download or on DVD.

Discuss

  • developerWorks community: Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Open source on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source
ArticleID=656746
ArticleTitle=All about pseudo, Part 1: Being root without being root
publish-date=05102011