The announcement of the Yocto project (see Resources) refers to a component called pseudo that allows non-root users to run project builds that might otherwise require root privileges. The pseudo project grew out of internal build system needs of Wind River Linux but is now available as an open source project hosted on GitHub (see Resources).
This series looks at pseudo, from design to implementation, with some in-depth explanations of how it works. In this first installment, I explore some of the problems pseudo was intended to solve, and the reasons why we felt a new project was a better fit than working on one of the existing projects.
Why not fakeroot?
The most often asked question related to pseudo is, "Why not fakeroot?" The existing fakeroot utility is certainly available as open source, it's in use in Debian, and perhaps most importantly, it is existing code. (Curious about fakeroot? See Resources.)
If a million forum posters are eagerly awaiting my long rant about how much I hate fakeroot they will be sadly disappointed. The underlying problem is not a question of the code quality of fakeroot; it's a question of design choices and use cases. While in theory we could have retrofitted the kinds of features we wanted on fakeroot, it seemed like a bad idea.
The most significant distinction is simply lifespan of data. Fakeroot is
designed to handle a single build. You start a fakeroot daemon
faked), you run your build talking to that
daemon, you complete your build, and you shut the daemon down. All the
data that describes files is stored in memory during the build, and
discarded when the daemon terminates. In the Wind River Linux build
system, we were trying to use a persistent database which would survive
for weeks or months, containing data about dozens of packages. While
fakeroot had some features for preserving a file database, they were
secondary, and there were several possible race conditions, or other
failure modes, where the database could be lost or corrupted.
In our use case, one common failure mode was that build interruptions could
faked still running. Since it saved
its database only on exit that meant the database wasn't saved. Our
initial workaround was to set up a timeout after which
faked would exit and save its database
automatically; this turned out to create additional issues later.
The fakeroot design relies on running a server and then telling clients how to find the server. When you combine this with the server timeout feature, though, it's easy to have problems where the daemon exits during a long build, and then there's no way for the clients to talk to the (now-exited) server. We wanted a way for clients to restart the daemon when they needed it.
Another consideration is
chroot(2) support. In
earlier versions of our system we also used fakechroot (see Resources) in conjunction with fakeroot, and a
locally developed fakepasswd library that allowed
*pwent() calls to refer to a target file system
password database, but we really wanted to converge on a single combined
In short, the issue isn't that fakeroot was bad at doing what it did, but that what it did wasn't what we wanted done.
Originally, we formed the design goals for pseudo around the needs of the Wind River Linux build system, which we call the Linux Distribution Assembly Tool (LDAT). We had a small list of key features:
- Crashed or failed servers go away "cleanly" — no spare System V shared memory segments to clean up.
- Crashes should not destroy previously available data in the database; it's permissible to fail to record a new file in the event of a crash, but previously recorded files should stay recorded.
- Crashes should not break an ongoing build — the build should, if possible, recover (restarting the server if needed).
- The database should have both device and inode numbers (more on this later) and file names in it whenever possible.
- Diagnostics should be clear, informative, and plentiful in the event of problems.
- Performance should be livable; slower than fakeroot would be acceptable, but it had to be possible to complete builds in a reasonable amount of time.
- Eventual inclusion of fakechroot and fakepasswd functionality. (This didn't make the initial release, but we've added it since.)
- There should be portability to all the Linux systems we support for our build system; anything beyond that would be nice, but not required.
The rest of this article looks at some of the design decisions we made in order to meet these goals.
Using the dynamic linker
The essential technique for using pseudo is the same as that used for
fakeroot in its default use case. The pseudo client is a dynamic library;
the library name is stuffed into the environment variable
LD_PRELOAD, which the dynamic linker loads
before loading any other libraries. This library provides its own
implementations of the "emulated" system calls, such as
stat(2). When the user application tries to
call to any of these functions (most system calls are implemented as
functions in the system C library, which do the actual system call magic),
the call is routed to the pseudo client library code, rather than to the
underlying C library.
On the first call into any of these functions, the pseudo client library populates a table of function pointers that then point to the "real" (libc) implementations of these functions. These functions can then be called by pseudo within wrappers, as needed. This is a tricky bit of code, and the next article in this series explains it in more detail.
There is no attempt in pseudo right now to try to handle static linking, and thus far, it hasn't mattered.
We needed a fast and reliable database, with decent lookups, good performance, and most importantly, something that we could integrate into our daemon process that did not require a separate installation and setup. After studying our intended usage and needs, we chose SQLite.
To the best of my knowledge, since we first started pseudo over two years ago there have been no bugs caused by problems with SQLite. The combination of stability, speed, availability, and licensing terms has been unbeatable. Yes, there are tasks for which SQLite is not a good fit, but for what we're doing, it's been perfect. Of all the design decisions I've made in the course of working on pseudo, "use SQLite" is the single most rewarding choice.
The locking strategy we came up with is to rely on having only a single server working with a given database, and for that server to fully serialize calls. This reduced a large number of possible problem cases, without a huge penalty on performance. (There have been some surprises, even with everything serialized; I'll talk about those in the third article in this series.)
The database is actually split into two databases: a file database that records the pseudo environment (files, ownership, modes), and a log database that records events. By separating these, we intended to reduce the performance impact of logging queries of the file database; they are not merely separate tables, but two separate databases.
Even though we've cleaned it some, the database code is still full of ugly special cases and not-quite-successful attempts to introduce generality. However, it works. When the server starts up, it always checks to see whether its databases exist, and if they don't, it creates them. As a result of a major cleanup we did a year ago, it also has a second table in each database showing the current "version" of the database, with migrations to introduce new fields or flags that we've added since the original design.
Device and inode numbers were not quite enough
The original fakeroot design identified files only by their device and
inode numbers. (This pair should be, at any given time, unique for each
file on a system.) In our early build system, a moderately-recurring theme
was a failure mode where, running under fakeroot, we would get odd
failures; for example, an attempt to remove a plain file might fail with
an error message to the effect that
failed because the file was not a directory.
A usage error, not an inherent bug in fakeroot, caused this problem. A program running in the fakeroot environment would record a directory in the fakeroot database, and then a program running outside the fakeroot environment would remove it from the disk. When the inode number was later reused, fakeroot would mistakenly believe that the inode referred to a directory instead of a file.
In pseudo, we adopted multiple layers of defense against this. The first layer of defense is that queries to the database always include the file type and mode bits from the real file on disk, not just the device and inode. If the file type bits show a file and directory mismatch, the database entry is invalidated and a log message is produced.
The second layer is that we record paths for files whenever possible (and it is nearly always possible). This enables pseudo to report both the current path name and the previous path name used.
Together these two defenses allowed a third layer of defense in that, when running builds, we received warning diagnostics about such mismatches. Most of the time we could track down the error almost immediately, because we just had to look for where the old file path was deleted, and find out why it wasn't running inside the pseudo environment. Thanks to the combination of these three layers, mysterious database corruption issues that were occurring somewhat frequently became rare enough to call special attention.
Client and server communications
Client processes communicate with the pseudo daemon through a UNIX-domain socket (that's a socket in the file system) rather than using TCP or UDP. This solves two problems. First, it allows multiple pseudo daemons to coexist peacefully, and second, it makes it easy for clients to reliably find the server they're supposed to use.
In fact, instead of starting a server before starting any clients, by
default you start clients that automatically launch a server if they need
one and none is available. This change in design helped remove many lines
of shell code that attempted to determine whether or not a
faked process was available, and if so, whether
it was the active process for a given build. With pseudo's design, it
doesn't matter if there is an existing server; client processes will use
the existing server if there is one, or start a new one if they need
Communications take the form of a standard message that encodes information about a query, or a response to a query. In all cases, the client sends a message to the server and gets a response. The client will wait for a response, but if the other end of the socket goes away, the client then tries again to start the server.
Thus far, the biggest shortcoming of the current client/server design is that I inexplicably forgot to have a version number field in the interface. This only caused a real problem once (under extremely specific circumstances, which are described in the third article in this series), but it was still a serious mistake.
Recovering from server failures
One of the implications of the client library starting the server on demand is that the client library can also restart the server on demand, for example, if the server crashes. During early testing, I ran pseudo with the server modified to randomly crash about one syscall in three, with the intent of carefully testing this functionality.
Some time later, while investigating some performance problems I was having with pseudo builds, I happened to check dmesg, and discovered that the pseudo daemon was being killed by segmentation faults. It turned out that I'd introduced (unintentionally, this time) a bug that was triggered by a common circumstance and it caused the server to die with a segmentation fault. The bug had been there for at least two weeks, and I hadn't noticed. I feel that this shows the robustness of the design, although in retrospect maybe a bit more logging would have been useful.
Why open source?
There was brief discussion of whether we ought to try to make pseudo a closed-source product. The engineers advocating creating pseudo wanted it to be open source, the managers agreed with our arguments.
The fundamental reason that this is an open source project, and not a closed-source secret bit of technology, is that we have no interest in being in the business of selling or supporting it. Although this software has to work for our product to work, it's not our product, and we don't want it to be. Think about the power outlets in a hotel; you would have a hard time pitching a hotel room without working power to most modern travelers, but that doesn't mean you want to be in the business of metering and selling electricity.
Making pseudo available as open source lets other projects, such as Yocto, (see Resources) use it. Developers who don't naturally think in terms of cooperation are likely to move elsewhere rather than working heavily with open source projects. If my first inclination weren’t "release it as open source", I probably wouldn't be thinking about embedded Linux build systems in the first place.
Coming up next
In the next part of this series, I discuss in more technical detail how pseudo works and why we made some specific technical choices.
Note: The pseudo project is ongoing, therefore, development changes may occur following the publication of this article.
- Pseudo not fast enough? There's always fakeroot for your root emulation needs.
- If all you need is the chroot system call, fakechroot can do that.
- The pseudo project was developed entirely to meet internal needs, but was released as open source anyway.
- The Yocto project aims to provide key infrastructure for people working on building embedded Linux.
- developerWorks podcasts: Tune into interesting interviews and discussions for software developers
- Technical events and webcasts: Stay current with developerWorks Live! briefings.
- developerWorks on Twitter: Follow us for the latest news.
- Events of interest: Check out upcoming conferences, trade shows, and webcasts that are of interest to IBM open source developers.
- developerWorks Open source zone: Find extensive how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM's products, as well as our most popular articles and tutorials.
- developerWorks On demand demos: Watch our no-cost demos and learn about IBM and open source technologies and product functions.
Get products and technologies
- IBM trial software: Innovate your next open source development project using trial software, available for download or on DVD.
- developerWorks community: Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.