UNIX tools for exploring object files

Learn more about your system

The programs that run on a UNIX® system follow a careful design known as the object file format. Learn more about the object file format and the tools that you can use for exploring object files found on your system.

Bill Zimmerly , Freelance Writer and Knowledge Engineer, Author

Photo of Bill ZimmerlyBill Zimmerly is a knowledge engineer, a low-level systems programmer with expertise in various versions of UNIX and Microsoft® Windows®, and a free thinker who worships at the altar of Logic. Bill is also known as an unreasonable person. Unreasonable as in, "Reasonable people adapt themselves to the world. Unreasonable people attempt to adapt the world to themselves. All progress, therefore, depends on unreasonable people" (George Bernard Shaw). Creating new technologies and writing about them are his passions. He resides in rural Hillsboro, Missouri, where the air is fresh, the views are inspiring, and good wineries are all around. There's nothing quite like writing an article on UNIX shell scripting while sipping on a crystal-clear glass of Stone Hill Blush. You can contact him at bill@zimmerly.com.



21 November 2006

The modern art of computer programming combines a special kind of human personality with a special set of tools to produce a rather ghostly product -- software -- that other human beings find useful. Computer programmers are detail-oriented folks who are able to deal with the difficulties of computers. Computers are exacting in their demands and don't tolerate deviation from these demands at all. No doubt about it, computers are difficult to program no matter what your personality, and many tools have been created to assist you in making the task easier.

In UNIX® and Linux®, everything is a file. You could say that the very sine qua non of UNIX and Linux programming is writing code to deal with files. Many types of files make up the system, but object files have a special design that provides for flexible, multipurpose use.

Object files are roadmaps that contain mnemonic symbols with attached addresses and values. The symbols are used for naming various sections of code and data, both initialized and uninitialized. They are also used for locating embedded debugging information and, just like the semantic Web, are fully readable by programs.

Tools of the trade

The tools of the computer programming trade begin with a code editor, such as vi or Emacs, with which you can type and edit the instructions you want the computer to follow to carry out the required tasks, and end with the compilers and linkers that produce the machine code that actually accomplishes these goals.

High-level tools, known as Integrated Debugging Environments (IDEs), integrate the functionality of individual tools with a common look and feel. An IDE can vastly blur the lines between editor, compiler, linker, and debugger. So for the purpose of studying and learning the system with greater depth, it's often advisable to work with the tools separately before working with the integrated suite. (Note: IDEs are often called Integrated Development Environments, too.)

The compiler transforms the text that you create in the code editor into an object file. The object file was originally known as an intermediate representation of code, because it served as the input to link editors (in other words, linkers) that finish the task and produce an executable program as output.

The transformation process that proceeds from code to executable is well-defined and automated, and object files are an integral link in the chain. During the transformation process, the object files serve as a map to the link editors, enabling them to resolve the symbols and stitch together the various code and data sections into a unified whole.

History

Many notable object file formats exist in the world of computer programming. The DOS family includes the COM, OBJ, and EXE formats. UNIX and Linux use a.out, COFF, and ELF. Microsoft® Windows® uses the portable executable (PE) format and Macintosh uses PEF, Mach-O, and others.

Originally, each type of computer had its own unique object file format but, with the advent of UNIX and other operating systems designed to be portable among different hardware platforms, some common file formats ascended to the level of a common standard. Among these are the a.out, COFF, and ELF formats.

Understanding object files requires a set of tools that can read the various portions of the object file and display them in a more readable format. This article discusses some of the more important aspects of those tools. But first, you must create a workbench and put a victim -- er, a patient -- on it.

The workbench

Fire up an xterm session, and let's begin to explore object files by creating a clean workbench. The following commands create a useful place to play with object files:

cd
mkdir src
cd src
mkdir hw
cd hw

Then, by using your favorite code editor, type the program shown in Listing 1 in the $HOME/src/hw directory, and call it hw.c.

Listing 1. The hw.c program
#include <stdio.h>

int main(void)
{
  printf("Hello World!\n");
  return 0;
}

This simple "Hello World" program serves as a patient to study with the various tools available in the UNIX arsenal. Instead of taking any shortcuts to creating the executable (and there are many shortcuts), you'll take your time to build and examine just the object file output.

File formats

The normal output of a C compiler is assembler code for whatever processor you specify as the target. The assembler code is input to the assembler, which by default produces the grandfather of all object files, the a.out file. The name itself stands for Assembler Output. To create the a.out file, type the following command in the xterm window:

cc hw.c

Note: If you experience any errors or the a.out file wasn't created, you might need to examine your system or source file (hw.c) for errors. Check also to see whether cc is defined to run your C/C++ compiler.

Modern C compilers combine the compile and assemble steps into one step. You can invoke switches to see just the assembler output of the C compiler. By typing the following command, you can see what the assembler output from the C compiler looks like:

cc -S hw.c

This command has generated a new file -- hw.s -- that contains the assembler input text that you typically would not have seen, because the compiler defaults to producing the a.out file. As expected, the UNIX assembler program can assemble this type of input file to produce the a.out file.

UNIX-specific tools

Assuming that all went well with the compile and you have an a.out file in the directory, let's examine it. Among the list of useful tools for examining object files, the following set exists:

  • nm: Lists symbols from object files.
  • objdump: Displays detailed information from object files.
  • readelf: Displays information about ELF object files.

The first tool on the list is nm, which lists the symbols in an object file. If you type the nm command, you'll notice that it defaults to looking for a file named a.out If the file isn't found, the tool complains. If, however, the tool did find the a.out file that your compiler created, it presents a listing similar to Listing 2.

Listing 2. Output of the nm command
08049594 A __bss_start
080482e4 t call_gmon_start
08049594 b completed.4463
08049498 d __CTOR_END__
08049494 d __CTOR_LIST__
08049588 D __data_start
08049588 W data_start
0804842c t __do_global_ctors_aux
0804830c t __do_global_dtors_aux
0804958c D __dso_handle
080494a0 d __DTOR_END__
0804949c d __DTOR_LIST__
080494a8 d _DYNAMIC
08049594 A _edata
08049598 A _end
08048458 T _fini
08049494 a __fini_array_end
08049494 a __fini_array_start
08048478 R _fp_hw
0804833b t frame_dummy
08048490 r __FRAME_END__
08049574 d _GLOBAL_OFFSET_TABLE_
         w __gmon_start__
08048308 T __i686.get_pc_thunk.bx
08048278 T _init
08049494 a __init_array_end
08049494 a __init_array_start
0804847c R _IO_stdin_used
080494a4 d __JCR_END__
080494a4 d __JCR_LIST__
         w _Jv_RegisterClasses
080483e1 T __libc_csu_fini
08048390 T __libc_csu_init
         U __libc_start_main@@GLIBC_2.0
08048360 T main
08049590 d p.4462
         U puts@@GLIBC_2.0
080482c0 T _start

The sections that contain executable code are known as text sections or segments. Likewise, there are data sections or segments for containing non-executable information or data. Another type of section, known by the BSS designation, contains blocks started by symbol data.

For each symbol that the nm command lists, the symbol's value in hexadecimal (by default) and the symbol type with a coded character precede the symbol. Various codes that you commonly see include A for absolute, which means that the value will not change by further linking; B for a symbol found in the BSS section; or C for common symbols that reference uninitialized data.

Object files contain many different parts that are divided into sections. Sections can contain executable code, symbol names, initialized data values, and many other types of data. For detailed information on all of these types of data, consider reading the UNIX man page on nm, where each type is described by the character codes shown in the output of the command.

Details, details . . .

Even a simple Hello World program contains a vast array of details when it reaches the object file stage. The nm program is good for listing symbols and their types and values but, for examining in greater detail the contents of those named sections of the object file, more powerful tools are necessary.

Two of these more powerful tools are the objdump and readelf programs. By typing the following command, you can see an assembly listing of every section in the object file that contains executable code. Isn't it amazing how much code the compiler actually generates for such a tiny program?

objdump -d a.out

This command produces the output you see in Listing 3. Each section of executable code is run when a particular event becomes necessary, including events like the initialization of a library and the main starting entry point of the program itself.

For a programmer who is fascinated by the low-level details of programming, this is a powerful tool for studying the output of compilers and assemblers. Details, such as those shown in this code, reveal a lot about how the native processor itself operates. When studied hand-in-hand with the processor manufacturer's technical documentation, you can glean valuable insights into how such things work to a greater degree because of the clarity of output from a functioning program.

Likewise, the readelf program can list the contents of the object file with similar lucidity. You can see this by typing the following command:

readelf -all a.out

This command produces the output shown in Listing 4. The ELF header shows a nice summary of all the section entries in the file. Before enumerating the contents of those headers, you can see how many there are. This information can be useful when exploring a rather large object file.

As you can see from this output, a huge amount of useful detail resides in the simple a.out Hello World file -- version information, histograms, multiple tables of various symbol types, and so on. Yes, one can spend a great deal of time learning about executable programs by exploring object files with just the few tools presented here.

In addition to all these sections, the compiler can place debugging information in the object files, and such information can be displayed as well. Type the following command and take some time to see what the compiler is telling you (if you're a debugging program, that is):

readelf --debug-dump a.out | less

This command produces the output shown in Listing 5. Debugging tools, such as GDB, read in this debugging information, and you can get the tools to display more descriptive labels (for example) than raw address values when disassembling code while it's running under the debugger.

Executable files are object files

In the UNIX world, executable files are object files, and you can examine them as you did the a.out file. It is a useful exercise to change to the /bin or /local/bin directory and run nm, objdump, and readelf over some of your most commonly used commands, such as pwd, ps, cat, or rm. Often when you're writing a program that requires a certain functionality that one of the standard tools has, it's useful to see how those tools actually do their work by simply running objdump -d <command> over it.

If you're so inclined to work on compilers and other language tools, you'll find that time spent studying the various object files that make up your computer's system is time well spent. A UNIX operating system has many layers, and the layers that the tools examining its object files expose are close to the hardware. You can get a real feel for the system in this way.

Conclusion

Exploring object files can greatly deepen your knowledge of the UNIX operating system and provide greater insight into how the software is actually assembled from source code. I encourage you to study the output of the object file tools described in this article by running them over the programs found in the /bin or /local/bin directories on your system and seek out system documentation that your hardware manufacturer provides.

Resources

Learn

Get products and technologies

  • IBM trial software: Build your next development project with software for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=445555
ArticleTitle=UNIX tools for exploring object files
publish-date=11212006