Skip to main content

Standards and specs: An unsung hero: The hardworking ELF

Seebach takes on the ELF object module format, a key to interoperability

Peter Seebach, Freelance author, Plethora.net
Peter Seebach
Peter Seebach has been using ELF executables since 1991 or so. He would like to see more standards named after fantasy creatures.

Summary:  The ELF object module format has had wide-ranging effects on software development for multiple platforms. Peter Seebach looks at the history of the ELF specification and why it's been so useful.

View more content in this series

Date:  20 Dec 2005
Level:  Introductory
Activity:  7582 views

With December upon us, rife with rumors of labor disputes (again!) at the North Pole, it seems about time to talk about the ELF standard. ELF (ELF is an acronym for Executable and Linking Format) is a standard for object modules, libraries, executables, and core files. Many UNIX® and UNIX-like systems use ELF, and the ELF standard has contributed substantially to the development of compiler toolchains and debugging tools for a variety of systems.

In principle, all a computer needs to have to run code is the code itself and some idea of where to put it. This can be handled simply by convention; the old .COM format used in MS-DOS simply asserted that execution would start at byte offset 256 which had better contain some sort of executable instruction. Older systems tended towards very simplistic file formats because the overhead of object loading was significant and the files were pretty small.

Can't tell the players without a program

System V UNIX is the commercial variant of AT&T's UNIX project, since sold off and licensed too often for me to keep track of. It was the successor to System III. System V UNIX was more commercialized than Berkeley-style UNIX and was often the basis for standards which Berkeley users later grudgingly adopted. COFF and ELF were both System V formats, while a.out appeared in V7 UNIX. As the names obviously suggest, V7 is much older than System V. (To quote Alan Filipski, "The different versions of the UN*X brand operating system are numbered in a logical sequence: 5, 6, 7, 2, 2.9, 3, 4.0, III, 4.1, V, 4.2, V.2, and 4.3." He would doubtless be glad to know that we are currently up to V.4.2 and 4.4/2.)

A couple more useful terms to keep in mind for this article, if you haven't seen them before:

Endianness: When a series of bytes are treated as a larger number, the first byte could be the most significant or the least significant. Typically, x86 processors are little-endian (the first byte is the least significant). Power Architecture™ machines in general can control this in software, but they tend to run big-endian most of the time.

Magic number: A magic number is a special number which is put at the beginning of a file to indicate what the file's contents will be. For instance, an ELF binary starts with 0x7f, followed by the letters "ELF."

Magic file: The UNIX file utility has a huge database of magic numbers; in fact, modern implementations have not only simple magic numbers, but also rules for further parsing to discern specific version numbers, size of images, host architecture of ELF binaries, and so on. Often called /etc/magic, even though many systems now store it in /usr/share.

Executable formats and object formats are closely related; both are intended to store machine code in a way that allows other programs to identify the code and refer to it. Object formats need to provide enough information to allow a linker to assemble an executable from them; executable formats may not need this, but often provide it anyway. Both might provide additional information about the source the machine code came from, for use in debugging.

These considerations apply to code compiled into machine language (for instance, C and Fortran output). Some interpreted languages (such as Perl), or languages that target a virtual machine (such as Java™) have support for machine-code plugins, which are linked dynamically. So in some cases, the standardization of the object format and related ABI standardization might affect programs written in other languages.

The ELF specification has dramatically reduced the hassle of cross-compilation; many tools used for manipulating object modules are now conveniently portable across systems. Furthermore, having a standardized binary format makes the toolchain easier to develop and test. Other formats did provide some of these features, but ELF seems to have provided a more unified approach and a bit more future-proofing. 32-bit and 64-bit ELF files can coexist and can be easily told apart, making it easier to work through transitions to 64-bit systems.

Before I talk about the history of ELF, it is worthy to note: libtool provides a portable front-end: you type the same command on every system. ELF provides a portable back-end: all eight of your compilers and linkers can use the same code and work on the same object modules, even if their interfaces are totally different. libtool and ELF are both nifty -- but only one of them enables us to make bad, holiday-related puns!

History

Object formats have been gradually evolving for a long time. Object formats have sometimes been shared between compilers. In particular, GCC has made a point of trying to support various native object modules. Three fairly widespread executable formats have been the a.out format, the COFF format, and the ELF format. The a.out format was so named because it was the format in which the UNIX linker created executables -- by default naming them a.out.

The COFF format (an acronym for Common Object File Format) was developed for use in System V Release 3 (SVR3). It's been fairly widely adopted; indeed, modern Microsoft® Windows® executables are a variant of COFF! The ELF format was introduced with SVR4.

As you might guess, COFF and ELF have a lot of similarities. ELF is a somewhat more mature standard despite being newer; the design of ELF addresses limitations encountered in previous specifications.

In the 1990s, a group of vendors got together and released a formal version of the ELF standard for public use, hoping that everyone would use this standard format and benefit from it. For the most part, everyone has.

ELF is used as the default format on Linux® and BSD systems. Microsoft is still using PE-COFF (the "P" stands for "portable") and Apple is still using Mach-O executables. Nonetheless, the ELF specification has been widely adopted and is being used on a variety of systems. Conversion from a.out to ELF binaries happened on both BSD and Linux systems during the 90s and many systems can still load a.out executables.

The committee announced availability of its standards in March of 1993; committee members included compiler developers (such as Watcom and Borland), CPU vendors (such as IBM and Intel®), and OS vendors (such as IBM and Microsoft). (Microsoft's PE-COFF was another standard they documented.) In 1995, the committee had more members and released the ELF 1.2 specification; the committee disbanded shortly thereafter, having done what they wanted to do.

This left some vendors with the need to convert their systems from other binary formats to ELF. The conversion provided a lot of people with migration headaches. Some systems shipped with utilities to convert files from one binary format to another, for instance, to convert ELF to COFF or COFF to ELF. Variants such as ECOFF (used on some Mips systems) or XCOFF (used by AIX® on the RS/6000®) also showed up. AIX 5 still uses XCOFF, although it has some utilities for manipulating ELF object files.


Technical considerations

One of the key objectives of an object format is to allow consistent tool development and usage. Files in the ELF format are compatible with tools that manipulate ELF files and in some cases, this holds true across architectures. You do not need a special PowerPC® version of the nm utility (which prints lists of the objects in an object module) to read a PowerPC object module -- an x86 version of nm will read it just fine.

The ELF format doesn't support debugging symbols -- or does it? The DWARF (Debug With Arbitrary Record Format) format specifies a way to attach debugging information to ELF files. Some pundits have speculated that there is some significance to the name other than the obvious acronym, but we can safely dismiss such fantastic notions. The DWARF format was not immediately accepted by some vendors, and a successor, DWARF-2, is what's actually most widely used. Some vendors still use the older stabs debugging format; other formats have also been seen.

In summary, what ELF offers is a somewhat unified model of how to represent chunks of code, chunks of data, and relationships between them. Some of ELF's features have been hacked on to other simpler formats, but were designed from the beginning in ELF. For instance, ELF's support for meta-information about executables (such as what kind of executable a given file is) is quite nice; some formats have required you to try to execute something and hope it's a native binary.


The header

To portably describe object modules, ELF files start with a file header and zero or more program or section headers. The file header also contains the information needed to read the program or section headers; these headers might change a little depending on architecture, so the file header has to indicate their size.

The file header starts with four magic numbers: 0x7F, 0x45, 0x4c, and 0x46, in that order. (This corresponds to an ASCII DEL, and the word "ELF.") This is not a four-byte magic number; it is four one-byte magic numbers. If it were a four-byte magic number, there would be endianness problems and the format wouldn't be all cool and portable.

The next few bytes indicate the ELF specification revision, the architecture "class" (32-bit or 64-bit, which affects the size of fields in program and section headers), the data-storage format (either big-endian or little-endian two's complement), and information about what ABI to use. There is some divergence in documentation about the remainder of these bytes: some of them are padding, but some implementations document extensions here which may or may not be in use.

Various tools use the ELF file header, ranging from linkers to program loaders, to figure out what a file is and what to do with it. Users rarely interact with it. Developers mostly use it through tools, ranging from file to various other types of linkers. PowerPC users will be very glad to note that the header distinguishes between endianness and processor architecture.

The ET_CORE file type is only partially specified. Such a file is a core file and while it uses the ELF format at least partially (for instance, an ELF core file should contain sections with standard section headers), there is no detailed specification of how to handle things like register dumps.

Program headers are used only in shared libraries and executable programs -- section headers, by contrast, are used in everything. Program headers are needed to create a process image; there might be multiple program headers containing information about a program. Program headers might contain hints about needed dynamic libraries, instructions on where to find a suitable loader, and so on. Dynamically linked executables specify an interpreter (for instance, /lib64/ld-linux-x86-64.so.2 on an x86_64 Linux system or the more prosaic /lib/ld.so.1 on a generic PowerPC Linux). Static executables have simpler program headers because they're self-contained files. For instance, this output from NetBSD's readelf program shows the program headers of a static binary:



Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz  MemSiz   Flg Align
  LOAD           0x000000 0x08048000 0x08048000 0x3ecb14 0x3ecb14 R E 0x1000
  LOAD           0x3ed460 0x08438460 0x08438460 0x1e63c  0x289608 RW  0x1000
  NOTE           0x000094 0x08048094 0x08048094 0x00018  0x00018  R   0x4

This indicates two sections to be loaded: One read-only section of executable code and one read-write section of initialized data. The NOTE section is purely informative. The offset and filesize values indicate the location and size of the data in the object file (or executable); the memory size and address indicate where to store it at runtime and how much space to allocate. In some cases, the amount of space reserved at runtime may be much larger than the space taken up on disk; this would typically reflect a lot of memory which has been initialized to all zeroes.

Section headers serve similar functions to program headers, containing information needed by a program loader, such as actual data or code (code is often called "text" just to confuse you). Many specific types are provided for, such as symbol tables, string tables, or even "array of constructors." Nonetheless, the essential contents are similar.


Variants

One of the technical shortcomings of ELF as it originally existed was its assumption that a given architecture would have only a single application binary interface, or ABI. In fact, even among commercial System V UNIX systems this doesn't always hold -- Solaris binaries might not be compatible with a "pure" SVR4 interface, for instance. This has been addressed by the addition of a specification for which ABI a given ELF file is built for. This evolved in the wild, to some extent -- the exact set of ABIs available has been developing. In some cases, you might find surprises. NetBSD uses the System V ABI type because NetBSD uses the System V ABI, even though the specification allows for a distinct NetBSD ABI.

Additional details are often available, and on most UNIX-like systems, the file command can give you a fair amount of information about a binary. For instance, here's the output of file on three different binaries: an x86_64 binary, a Cell Broadband Engine (Cell BE) PPE (PowerPC Processing Element) binary, and a Cell SPE (Synergistic Processing Element) binary.

$ file bin/c
bin/c: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.0, 
dynamically linked (uses shared libs), not stripped
$ file hello
hello: ELF 32-bit MSB executable, version 1 (SYSV), statically linked, not stripped
$ file matrix_mul
matrix_mul: ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV), 
for GNU/Linux 2.6.0, dynamically linked (uses shared libs), not stripped

Note that the matrix_mul program (a Cell Broadband Engine PPE executable) has an identified CPU type, but the hello program (an SPE executable) does not because the machine type used is reserved and thus has no defined value.

The subtle differences in documentation between various systems highlight variances in ELF implementations, although sometimes the documentation is just wrong. The Linux ELF man page refers to the macro EI_BRAND, defining it as "Start of architecture identification." This description matches the FreeBSD man page too, but does not match the actual implementation on my Fedora Core system; there's no such macro. In fact, machine architecture is stored in the e_machine field of the ELF header; the "brand" feature, used only on FreeBSD (so far as I can tell), seems to consist of replacing the last eight characters of the 16-character ident block with the string "FreeBSD." I am not sure what to make of this, but the following comment in the code for the GRUB bootloader seems insightful:

#define EI_BRAND        8       /* start of OS branding (This is
                                   obviously illegal against the ELF
                                   standard.) */

The Linux environment for the Cell Broadband Engine embeds ELF executables for the SPE as objects which can be linked in with PPE executables. It's pretty easy to find the embedded executables by looking for the ELF magic numbers and extracting them for further study. (The e_machine field is set to 0x17, or 23, which is "reserved.")


Shared library magic

ELF provides a consistent and well-documented way for dynamic linking of shared libraries. Code which references a library has an internal table of stub functions which load the library (if necessary), look up the relevant function in that library's symbol table, then patch themselves into jumps into that library. This is exceptionally reliable, but does impose some overhead.

One innovative solution is statically linked shared libraries. This solution is based on defining a consistent and fixed mapping for entry points into a given shared library which can then be loaded at the same address every time. The result is many of the benefits of shared libraries (smaller code, easy patches), with some of the benefits of static libraries (such as performance).

This technique was used in COFF and is not as widely found in ELF, but BSD/OS used it with ELF binaries. It does impose some additional work on library maintainers and has not been widely imitated, but the performance difference was noticeable.

The ELF specification doesn't directly support this, but makes it reasonably easy to pursue features like this.


Licensing and impact

The widespread adoption of ELF hardly comes as a surprise, given that it is freely available. The 1995 specification release said:

The TIS Committee grants you a non-exclusive, worldwide, royalty-free license to use the information disclosed in this Specification to make your software TIS-compliant; no other license, express or implied, is granted or intended hereby.

It's fascinating to me that some questions about this were raised recently in a court case (see Resources).

In any case, the long and short of this tale is that anyone who is working on a binary toolchain can acquire a complete and well-tested specification at no cost and even find a fairly substantial chunk of freely available or open source software which can manipulate this binary format. This is a good thing.

The widely used GNU toolchains manipulate ELF binaries and are often used in targeting a new architecture. For instance, in the recent Cell Broadband Engine SDK downloads (Resources), the GNU toolchain was used not only with the GCC port for the Cell BE, but also with the XL C compiler.

The ELF specification has not particularly revolutionized computing nor made possible things that were previously impossible. What it has done, though, is provide a general improvement in the quality of available tools and a reduction in the cost of developing tools. It's nice to have one less thing to worry about.

The ELF specification is fairly mature with support for 32-bit and 64-bit systems. There's not yet much pressure for a replacement -- 64-bit ELF could survive quite a long time and the standard is flexible enough to meet real-world needs. It would be nice to see slightly wider adoption; there's no obvious reason for some popular desktop operating systems to use unique binary formats.


Resources

Learn

Get products and technologies

Discuss

  • Need to find out about how ELF technology can affect your project? Post your query on a developerWorks forum.

About the author

Peter Seebach

Peter Seebach has been using ELF executables since 1991 or so. He would like to see more standards named after fantasy creatures.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Multicore acceleration
ArticleID=100985
ArticleTitle=Standards and specs: An unsung hero: The hardworking ELF
publish-date=12202005
author1-email=developerworks@seebs.plethora.net
author1-email-cc=dwpower@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers