With December upon us, rife with rumors of labor disputes (again!) at the North Pole, it seems about time to talk about the ELF standard. ELF (ELF is an acronym for Executable and Linking Format) is a standard for object modules, libraries, executables, and core files. Many UNIX® and UNIX-like systems use ELF, and the ELF standard has contributed substantially to the development of compiler toolchains and debugging tools for a variety of systems.
In principle, all a computer needs to have to run code is the code itself and some idea of where to put it. This can be handled simply by convention; the old .COM format used in MS-DOS simply asserted that execution would start at byte offset 256 which had better contain some sort of executable instruction. Older systems tended towards very simplistic file formats because the overhead of object loading was significant and the files were pretty small.
Executable formats and object formats are closely related; both are intended to store machine code in a way that allows other programs to identify the code and refer to it. Object formats need to provide enough information to allow a linker to assemble an executable from them; executable formats may not need this, but often provide it anyway. Both might provide additional information about the source the machine code came from, for use in debugging.
These considerations apply to code compiled into machine language (for instance, C and Fortran output). Some interpreted languages (such as Perl), or languages that target a virtual machine (such as Java™) have support for machine-code plugins, which are linked dynamically. So in some cases, the standardization of the object format and related ABI standardization might affect programs written in other languages.
The ELF specification has dramatically reduced the hassle of cross-compilation; many tools used for manipulating object modules are now conveniently portable across systems. Furthermore, having a standardized binary format makes the toolchain easier to develop and test. Other formats did provide some of these features, but ELF seems to have provided a more unified approach and a bit more future-proofing. 32-bit and 64-bit ELF files can coexist and can be easily told apart, making it easier to work through transitions to 64-bit systems.
Before I talk about the history of ELF, it is worthy to note: libtool provides a portable front-end: you type the same command on every system. ELF provides a portable back-end: all eight of your compilers and linkers can use the same code and work on the same object modules, even if their interfaces are totally different. libtool and ELF are both nifty -- but only one of them enables us to make bad, holiday-related puns!
Object formats have been gradually evolving for a long time. Object formats have sometimes been shared between compilers. In particular, GCC has made a point of trying to support various native object modules. Three fairly widespread executable formats have been the a.out format, the COFF format, and the ELF format. The a.out format was so named because it was the format in which the UNIX linker created executables -- by default naming them a.out.
The COFF format (an acronym for Common Object File Format) was developed for use in System V Release 3 (SVR3). It's been fairly widely adopted; indeed, modern Microsoft® Windows® executables are a variant of COFF! The ELF format was introduced with SVR4.
As you might guess, COFF and ELF have a lot of similarities. ELF is a somewhat more mature standard despite being newer; the design of ELF addresses limitations encountered in previous specifications.
In the 1990s, a group of vendors got together and released a formal version of the ELF standard for public use, hoping that everyone would use this standard format and benefit from it. For the most part, everyone has.
ELF is used as the default format on Linux® and BSD systems. Microsoft is still using PE-COFF (the "P" stands for "portable") and Apple is still using Mach-O executables. Nonetheless, the ELF specification has been widely adopted and is being used on a variety of systems. Conversion from a.out to ELF binaries happened on both BSD and Linux systems during the 90s and many systems can still load a.out executables.
The committee announced availability of its standards in March of 1993; committee members included compiler developers (such as Watcom and Borland), CPU vendors (such as IBM and Intel®), and OS vendors (such as IBM and Microsoft). (Microsoft's PE-COFF was another standard they documented.) In 1995, the committee had more members and released the ELF 1.2 specification; the committee disbanded shortly thereafter, having done what they wanted to do.
This left some vendors with the need to convert their systems from other binary formats to ELF. The conversion provided a lot of people with migration headaches. Some systems shipped with utilities to convert files from one binary format to another, for instance, to convert ELF to COFF or COFF to ELF. Variants such as ECOFF (used on some Mips systems) or XCOFF (used by AIX® on the RS/6000®) also showed up. AIX 5 still uses XCOFF, although it has some utilities for manipulating ELF object files.
One of the key objectives of an object format is to allow consistent tool development and usage. Files in the ELF format are compatible with tools that manipulate ELF files and in some cases, this holds true across architectures. You do not need a special PowerPC® version of the nm utility (which prints lists of the objects in an object module) to read a PowerPC object module -- an x86 version of nm will read it just fine.
The ELF format doesn't support debugging symbols -- or does it? The DWARF (Debug With Arbitrary Record Format) format specifies a way to attach debugging information to ELF files. Some pundits have speculated that there is some significance to the name other than the obvious acronym, but we can safely dismiss such fantastic notions. The DWARF format was not immediately accepted by some vendors, and a successor, DWARF-2, is what's actually most widely used. Some vendors still use the older stabs debugging format; other formats have also been seen.
In summary, what ELF offers is a somewhat unified model of how to represent chunks of code, chunks of data, and relationships between them. Some of ELF's features have been hacked on to other simpler formats, but were designed from the beginning in ELF. For instance, ELF's support for meta-information about executables (such as what kind of executable a given file is) is quite nice; some formats have required you to try to execute something and hope it's a native binary.
To portably describe object modules, ELF files start with a file header and zero or more program or section headers. The file header also contains the information needed to read the program or section headers; these headers might change a little depending on architecture, so the file header has to indicate their size.
The file header starts with four magic numbers: 0x7F, 0x45, 0x4c, and 0x46, in that order. (This corresponds to an ASCII DEL, and the word "ELF.") This is not a four-byte magic number; it is four one-byte magic numbers. If it were a four-byte magic number, there would be endianness problems and the format wouldn't be all cool and portable.
The next few bytes indicate the ELF specification revision, the architecture "class" (32-bit or 64-bit, which affects the size of fields in program and section headers), the data-storage format (either big-endian or little-endian two's complement), and information about what ABI to use. There is some divergence in documentation about the remainder of these bytes: some of them are padding, but some implementations document extensions here which may or may not be in use.
Various tools use the ELF file header, ranging from linkers to program loaders, to figure out what a file is and what to do with it. Users rarely interact with it. Developers mostly use it through tools, ranging from file to various other types of linkers. PowerPC users will be very glad to note that the header distinguishes between endianness and processor architecture.
ET_CORE file type is only partially specified. Such a file is a core
file and while it uses the ELF format at least partially (for instance, an
ELF core file should contain sections with standard section headers), there
is no detailed specification of how to handle things like register dumps.
Program headers are used only in shared libraries and executable programs -- section headers, by contrast, are used in everything. Program headers are needed to create a process image; there might be multiple program headers containing information about a program. Program headers might contain hints about needed dynamic libraries, instructions on where to find a suitable loader, and so on. Dynamically linked executables specify an interpreter (for instance, /lib64/ld-linux-x86-64.so.2 on an x86_64 Linux system or the more prosaic /lib/ld.so.1 on a generic PowerPC Linux). Static executables have simpler program headers because they're self-contained files. For instance, this output from NetBSD's readelf program shows the program headers of a static binary:
Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x08048000 0x08048000 0x3ecb14 0x3ecb14 R E 0x1000 LOAD 0x3ed460 0x08438460 0x08438460 0x1e63c 0x289608 RW 0x1000 NOTE 0x000094 0x08048094 0x08048094 0x00018 0x00018 R 0x4
This indicates two sections to be loaded: One read-only section of executable code and one read-write section of initialized data. The NOTE section is purely informative. The offset and filesize values indicate the location and size of the data in the object file (or executable); the memory size and address indicate where to store it at runtime and how much space to allocate. In some cases, the amount of space reserved at runtime may be much larger than the space taken up on disk; this would typically reflect a lot of memory which has been initialized to all zeroes.
Section headers serve similar functions to program headers, containing information needed by a program loader, such as actual data or code (code is often called "text" just to confuse you). Many specific types are provided for, such as symbol tables, string tables, or even "array of constructors." Nonetheless, the essential contents are similar.
One of the technical shortcomings of ELF as it originally existed was its assumption that a given architecture would have only a single application binary interface, or ABI. In fact, even among commercial System V UNIX systems this doesn't always hold -- Solaris binaries might not be compatible with a "pure" SVR4 interface, for instance. This has been addressed by the addition of a specification for which ABI a given ELF file is built for. This evolved in the wild, to some extent -- the exact set of ABIs available has been developing. In some cases, you might find surprises. NetBSD uses the System V ABI type because NetBSD uses the System V ABI, even though the specification allows for a distinct NetBSD ABI.
Additional details are often available, and on most UNIX-like systems, the file command can give you a fair amount of information about a binary. For instance, here's the output of file on three different binaries: an x86_64 binary, a Cell Broadband Engine (Cell BE) PPE (PowerPC Processing Element) binary, and a Cell SPE (Synergistic Processing Element) binary.
$ file bin/c bin/c: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.0, dynamically linked (uses shared libs), not stripped $ file hello hello: ELF 32-bit MSB executable, version 1 (SYSV), statically linked, not stripped $ file matrix_mul matrix_mul: ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), not stripped
Note that the matrix_mul program (a Cell Broadband Engine PPE executable) has an identified CPU type, but the hello program (an SPE executable) does not because the machine type used is reserved and thus has no defined value.
The subtle differences in documentation between various systems highlight
variances in ELF implementations, although sometimes the documentation is
just wrong. The Linux ELF man page refers to the macro
it as "Start of architecture identification." This description matches the
FreeBSD man page too, but does not match the actual implementation on my
Fedora Core system; there's no such macro. In fact, machine architecture is
stored in the e_machine field of the ELF header; the "brand" feature, used
only on FreeBSD (so far as I can tell), seems to consist of replacing the
last eight characters of the 16-character ident block with the string "FreeBSD."
I am not sure what to make of this, but the following comment in the code
for the GRUB bootloader seems insightful:
#define EI_BRAND 8 /* start of OS branding (This is obviously illegal against the ELF standard.) */
The Linux environment for the Cell Broadband Engine embeds ELF executables for the SPE as objects which can be linked in with PPE executables. It's pretty easy to find the embedded executables by looking for the ELF magic numbers and extracting them for further study. (The e_machine field is set to 0x17, or 23, which is "reserved.")
ELF provides a consistent and well-documented way for dynamic linking of shared libraries. Code which references a library has an internal table of stub functions which load the library (if necessary), look up the relevant function in that library's symbol table, then patch themselves into jumps into that library. This is exceptionally reliable, but does impose some overhead.
One innovative solution is statically linked shared libraries. This solution is based on defining a consistent and fixed mapping for entry points into a given shared library which can then be loaded at the same address every time. The result is many of the benefits of shared libraries (smaller code, easy patches), with some of the benefits of static libraries (such as performance).
This technique was used in COFF and is not as widely found in ELF, but BSD/OS used it with ELF binaries. It does impose some additional work on library maintainers and has not been widely imitated, but the performance difference was noticeable.
The ELF specification doesn't directly support this, but makes it reasonably easy to pursue features like this.
The widespread adoption of ELF hardly comes as a surprise, given that it is freely available. The 1995 specification release said:
The TIS Committee grants you a non-exclusive, worldwide, royalty-free license to use the information disclosed in this Specification to make your software TIS-compliant; no other license, express or implied, is granted or intended hereby.
It's fascinating to me that some questions about this were raised recently in a court case (see Resources).
In any case, the long and short of this tale is that anyone who is working on a binary toolchain can acquire a complete and well-tested specification at no cost and even find a fairly substantial chunk of freely available or open source software which can manipulate this binary format. This is a good thing.
The widely used GNU toolchains manipulate ELF binaries and are often used in targeting a new architecture. For instance, in the recent Cell Broadband Engine SDK downloads (Resources), the GNU toolchain was used not only with the GCC port for the Cell BE, but also with the XL C compiler.
The ELF specification has not particularly revolutionized computing nor made possible things that were previously impossible. What it has done, though, is provide a general improvement in the quality of available tools and a reduction in the cost of developing tools. It's nice to have one less thing to worry about.
The ELF specification is fairly mature with support for 32-bit and 64-bit systems. There's not yet much pressure for a replacement -- 64-bit ELF could survive quite a long time and the standard is flexible enough to meet real-world needs. It would be nice to see slightly wider adoption; there's no obvious reason for some popular desktop operating systems to use unique binary formats.
This overview of ELF and DWARF, originally aimed at Linux on S/390 and zSeries® computers, applies to nearly every system imaginable. Think about getting this in your stocking.
This HOWTO for ELF is from back when Linux was just switching.
Fed up with software bloat (that uncomfortable feeling after a big holiday meal)? Try creating a 45-byte ELF executable (sorry, it's not portable).
The ELF binary format is a key component of how NetBSD systems can run Linux binaries.
Although Wikipedia is often criticized for containing much that is apocryphal and/or wildly inaccurate, it is even more often "a good starting place" for learning more on a given subject. The entries on ELF, on Object files, on Magic numbers, and on System V may be of interest to you.
Need to know ELF at its lowest, most visceral level? Then the 64-bit PowerPC ELF Application Binary Interface Supplement is a good friend to cultivate.
The Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux (same as previous "software bloat" link) makes for excellent reading, but David Wheeler's Program Library HOWTO suggests it is a better learning tool than a real-world application.
The paper ELF: From The Programmer's Perspective was penned by Hongjiu Lu in 1995 and has continued to enthrall and enlighten programmers worldwide ever since.
The imaginatively titled The Executable and Linking Format (ELF) by Michael L. Haungs offers another overview of ELF.
Dangerous ELF: the eminent writing team of SiuL+Hacky in About Introducing Your Own Code and
the well-known grugq (using what we feel sure is a licensed version of Adobe Acrobat) in Cheating the ELF: Subversive dynamic linking to libraries demonstrate that not every program that benefits from ELF is beneficial to the user.
For when you really, truly need to program in assembly language and support PIC under ELF at the same time.
Read more on the History of UNIX from Alan Filipski.
The GRUB bootloader has to know a surprising amount about ELF to load kernels.
Dissecting shared libraries (developerWorks, January 2005) details ELF as a format shared libraries are stored in.
GNU C/C++ toolchain for Linux on POWER (developerWorks, May 2005) demonstrates the ELF-based Linux on POWER™ object files.
Introduction to embedded software development for the IBM PowerPC 970FX processor (developerWorks, September 2005) has a section on the PowerPC 64-bit ELF ABI.
Get products and technologies
Give yourself another holiday gift by checking out the downloads available for the Cell Broadband Engine Architecture.
Experimenting with the emerging technologies from alphaWorks can make you an expert in all types of tomorrow's technologies.
Need to find out about how ELF technology can affect your project? Post your query on a developerWorks forum.