On an IBM AIX system, the addresses of global symbols in C, C++, and Fortran programs are stored in a fixed-size data structure called the table of contents (TOC). TOC overflow occurs if the application contains more global symbols than can fit in the TOC. This article explains how the TOC works on AIX and the different mechanisms that can be used in the presence of TOC overflow. It discusses the performance impact of each mechanism and provides guidance for application developers on the best ways to use these mechanisms, based on the characteristics of their applications.

Share:

Lu Yang (luyang@ca.ibm.com), C/C++ Test Teamlead for AIX and Linux on Power, IBM

author photoLu Yang is the C/C++ test team lead for IBM AIX and Linux on IBM Power. She holds an MSc in Computing Science and has published one journal article.



Christopher Barton, PhD (kbarton@ca.ibm.com), Technical lead, XL C/C++ & XL Fortran for AIX and Linux on Power, IBM

author photoKit Barton is the technical lead for the C/C++ and Fortran compilers on IBM AIX and Linux on IBM Power. He holds a PhD in Computing Science and has published several technical articles.



06 November 2012

On IBM® AIX® systems, the addresses of global symbols in C, C++, and Fortran programs are stored in a data structure called the table of contents (TOC). To access global symbols, the address of the global symbol must be retrieved from the TOC. The default TOC data structure has a fixed size. Therefore, it can contain only the addresses for a fixed number of global symbols. For large applications, it is common to have more global symbols than can be stored in the default TOC, resulting in TOC overflow.

This article explains how the TOC works on AIX. It describes the different mechanisms that can be used in the presence of TOC overflow and the performance implications for each mechanism. It also discusses the trade-offs between the different mechanisms to help application developers choose the best mechanism based on the characteristics of their applications. This information will benefit those who develop large applications and tune their performance. Anyone that has worked with applications that contain TOC overflow will also benefit from the description of how the TOC works, why TOC overflow occurs, and how to deal with it.

TOC overview

Figure 1 shows a diagram of the TOC data structure. The TOC contains addresses of all global variables and functions used in the program. Access to a global symbol is achieved by locating the address of the global symbol from the TOC and then using the address to access the symbol. The location of global symbols in the TOC is determined by the linker during the final linking stage after the compilation. During the linking step, the linker scans the code and looks for references to global symbols. As it finds references, it determines whether the symbol is already located in the TOC. If it is not, the linker assigns a TOC entry to contain the address of the global symbol.

Figure 1. Table of contents (TOC) data structure
Data structure with references to global symbols

The C source code in Listing 1 contains two global variables, A1 and A2. As the linker is scanning the final code, it will find a reference to symbol A1. If it has not been assigned a TOC entry, it will be placed in the TOC. The linker will then modify the reference to A1 to use the correct offset in the TOC. Similarly, when the linker finds a reference to A2, it will first assign a TOC entry for A2 and then modify all references to A2 to use the correct TOC entry.

Listing 1. Example with two global variables (t2.c)
int A1;
int A2;
int main() {
    return A1+A2;
}

The AIX application binary interface (ABI) reserves register r2 to point to the TOC. The compiler and the linker cooperate to generate the correct sequence of instructions to load a global symbol. The compiler generates two adjacent instructions:

  1. Compute the offset within the TOC using r2 as the base register.
  2. Load the value from the memory location computed in last step.

Because the compiler does not know the location of a global symbol inside the TOC, it places a 0 as the offset for the first instruction and relies on the linker to write the correct offset once the layout of the TOC is known. Listing 2 shows the file generated by the compiler for the source code shown in Listing 1. The first instruction loads the address of A1 to gr3 (the address computed based on gr2 plus the offset that will be updated by the linker at the linking stage).

The second instruction loads the value of A1 to gr3. The value of A2 is obtained the same way, by the following two instructions. The last instruction adds the value of A1 and A2, and then stores the result in gr3.

Note:
In this example, the gr# notation provided in the listing file represents the same registers as specified in the disassembled binary (r# in the disassembled binary).

Listing 2. Compiler listing file t2.lst (generated by xlc -qlist -o t2 t2.c)
     | 000000                           PDEF     main
   11|                                  PROC
   12| 000000 lwz      80620004   1     L4A       gr3=.A1(gr2,0)
   12| 000004 lwz      80630000   1     L4A       gr3=A1(gr3,0)
   12| 000008 lwz      80820008   1     L4A       gr4=.A2(gr2,0)
   12| 00000C lwz      80840000   1     L4A       gr4=A2(gr4,0)
   12| 000010 add      7C632214   1     A         gr3=gr3,gr4

Listing 3 shows the disassembled binary code (after the final linking) for the example in Listing 1. The first instruction loads the address at location r2+68 and places it in r3. Here, we see that the linker has replaced the offset inserted by the compiler with the final offset (68) corresponding to the location of A1 in the TOC. The second instruction loads the value of A1 into r3. The value of A2 is obtained the same way by the following two instructions. The last instruction adds the contents of r3 and r4 and stores the result in r3.

Listing 3. objdump output of t2 (generated by objdump -d t2)
    10000380 <.main>:
   10000380:       80 62 00 44     l       r3,68(r2)
   10000384:       80 63 00 00     l       r3,0(r3)
   10000388:       80 82 00 48     l       r4,72(r2)
   1000038c:       80 84 00 00     l       r4,0(r4)
   10000390:       7c 63 22 14     cax     r3,r3,r4

The IBM® PowerPC® architecture uses an instruction with a signed 16-bit offset for indirect address calculations. As shown in the previous example, this limits the size of the TOC to 64k bytes. Therefore, a maximum of 16k entries can be located in the TOC in 32-bit mode (addresses are 4 bytes) and 8k in 64-bit mode (addresses are 8 bytes). If a program contains more TOC entries than the TOC can hold, the linker will abort and report TOC overflow. In this case, an alternative mechanism must be used.


Ways to handle TOC overflow

There are two basic ways that you can deal with TOC overflow:

  • Reduce the number of global symbols in their programs, thereby reducing the number of TOC entries required.
  • Increase the size of the TOC. This requires using more than one instruction to obtain the address of a global symbol from the TOC.

Reduce the number of TOC entries

If possible, users can change the source code to reduce the number of global symbols. This can be done by removing unnecessary global symbols, marking them as static, or grouping symbols together and placing them in structures. The compiler can perform both of these tasks automatically when link-time optimizations are used. See Resources for these blog posts for more details on these approaches: "Dealing with TOC overflow: The traditional approach" and "TOC overflow: Getting help from the XL compilers."

The -qminimaltoc compiler option is designed specifically to reduce the number of TOC entries required by the program. This option causes the compiler to create a separate table for each source file. This table contains the address of each global symbol used in the source file. The address of the table is placed as an entry in the TOC (Figure 2). This option is effective only if the TOC entries are spread across multiple source files. If a single source file contains enough global symbols to cause TOC overflow, this option will have no effect.

Figure 2. Minimal TOC
Symbol TOC entries replaced by file-based table

Although the -qminimaltoc option reduces the number of entries in the TOC, it increases the time required to access a global symbol because it introduces another indirect reference. Thus, you can introduce significant performance overhead by using this option. Another drawback of this approach is that it can increase the memory requirements for the application because the address for each global symbol will be located in the table for every source file where it is used. This can result in a lot of duplication of addresses across all of the tables. See Resources for a link to compiler documentation, where you can find more information.

Enable a large TOC

If it is not possible to reduce the number of global symbols to eliminate TOC overflow, additional options must be used by the compiler or linker to enable a larger TOC. Figure 3 shows the TOC of a 32-bit program with more than 16k global symbols. In this case, an extended TOC is created in addition to the base TOC to increase the total capacity, as required. As a result, the TOC becomes consecutive 64k regions. The base TOC is the first 64k region, followed by one or more additional 64k regions, thus forming the extended TOC. Accessing an entry in the extended TOC requires two operations:

  1. Compute the base address of the TOC region that contains the entry for the global symbol.
  2. Load the entry from proper offset within the TOC region.

On PowerPC, you can do the first computation efficiently by using the "add immediate shifted (addis)" instruction. The second operation uses a load instruction, similar to the regular TOC.

Figure 3. Large TOC
Data structure exceeds 64K TOC entries

Again, because of the maximum 16-bit offset on PowerPC, a large TOC contains up to 64k TOC regions. With a maximum of 64k entries in each TOC region, a large TOC can be as big as 64kB * 64kB = 4GB. This creates a limit of 1G global symbols in a 32-bit environment and 500M in a 64-bit environment. The examples that follow show how a symbol is accessed in a large TOC.

There are two options to enable the large TOC model:

  • Using the linker, through the -bbigtoc option
  • Using the compiler and linker, through new extensions to the -qpic=large option

We will explain how a global symbol is accessed by these two approaches in a way similar to how we did for a regular TOC, using Listing 4 as an example. First, we show the compiler listing file and then the disassembled binary.

Listing 4 shows an example program that contains TOC overflow.

Listing 4. Example that contains TOC overflow (t17000.c)
int A1;
int A2;
…
int A17000;
int main() {
  A17001=A1+A2+…+A17000;
}

Listing 5 shows the corresponding listing file, using the -bbigtoc option. The -bbigtoc option is a linker option, not a compiler option, so there is no change to the code produced by the compiler. Therefore, the code in Listing 5 is very similar to the code generated without TOC overflow in Listing 2.

Listing 5. t17000.lst with -bbigtoc (generated by xlc -qlist -bbigtoc -o t17000_bigtoc t17000.c)
     | 000000                           PDEF     main
…
17001|                                  PROC
17002| 02FFD8 lwz      8082FFF8   1     L4A       gr4=.A16382(gr2,0)
17002| 02FFDC lwz      80840000   1     L4A       gr4=A16382(gr4,0)
17002| 02FFE0 add      7C632214   1     A         gr3=gr3,gr4
17002| 02FFE4 lwz      8082FFFC   1     L4A       gr4=.A16383(gr2,0)
17002| 02FFE8 lwz      80840000   1     L4A       gr4=A16383(gr4,0)
17002| 02FFEC add      7C632214   1     A         gr3=gr3,gr4
17002| 02FFF0 lwz      80820000   1     L4A       gr4=.A16384(gr2,0)
17002| 02FFF4 lwz      80840000   1     L4A       gr4=A16384(gr4,0)
17002| 02FFF8 add      7C632214   1     A         gr3=gr3,gr4
17002| 02FFFC lwz      80820004   1     L4A       gr4=.A16385(gr2,0)
17002| 030000 lwz      80840000   1     L4A       gr4=A16385(gr4,0)
17002| 030004 add      7C632214   1     A         gr3=gr3,gr4…

As mentioned earlier, when the address of the global symbol is placed in the extended TOC, the location of the symbol must be computed with two instructions: one to compute the extended TOC and the second to compute the location within the extended TOC. As shown previously in Listing 5, the compiler inserted only one instruction to get the address of the symbol, but it takes two instructions to do that through large TOC access. Therefore, as Listing 6 shows, the linker has to replace the load instruction with a branch instruction to out-of-line code. The out-of-line code contains the two instructions to compute the location within the extended TOC. Finally, the out-of-line code branches back to the original code.

Note:
The linker cannot change the number of instructions in the instruction stream because doing so would change the relative offsets used in branch instructions.

Listing 6. objdump output of t17000 with -bbigtoc (generated by objdump -d t17000_bigtoc)
…
10030298:       80 82 80 04     l       r4,-32764(r2)
1003029c:       80 84 00 00     l       r4,0(r4)
100302a0:       7c 63 22 14     cax     r3,r3,r4
100302a4:       80 82 80 00     l       r4,-32768(r2)
100302a8:       80 84 00 00     l       r4,0(r4)
100302ac:       7c 63 22 14     cax     r3,r3,r4
100302b0:       48 00 1d d4     b       10032084 <@FIX0+0x4>
100302b4:       80 84 00 00     l       r4,0(r4)
100302b8:       7c 63 22 14     cax     r3,r3,r4
100302bc:       48 00 1d d4     b       10032090 <@FIX0+0x10>
100302c0:       80 84 00 00     l       r4,0(r4)
100302c4:       7c 63 22 14     cax     r3,r3,r4
…
10032080 <@FIX0>:
10032080:       48 00 00 00     b       10032080 <@FIX0>
10032084:       3c 82 00 01     cau     r4,r2,1
10032088:       80 84 80 00     l       r4,-32768(r4)
1003208c:       4b ff e2 28     b       100302b4 <.main+0x2ff34>
10032090:       3c 82 00 01     cau     r4,r2,1
10032094:       80 84 80 04     l       r4,-32764(r4)
10032098:       4b ff e2 28     b       100302c0 <.main+0x2ff40>

You can see in Listing 6 that, when using -bbigtoc to access a symbol in the extended TOC, the load instruction is replaced with a branch instruction to out-of-line code. For example:

100302b0: 48 00 1d d4 b 10032084 <@FIX0+0x4>

In the branch target section, two instructions are used to obtain the address of the symbol:

10032084:       3c 82 00 01     cau     r4,r2,1
10032088:       80 84 80 00     l       r4,-32768(r4)
1003208c:       4b ff e2 28     b       100302b4 <.main+0x2ff34>

In this example, we have two TOC regions: the base region (R0) and the extended region (R1). R0 is located at base address of the TOC (the address that r2 points to) and R1 will be immediately following R0 (i.e. r2+64k). The first instruction, cau r4,r2,1, (compare address upper, or add immediate shifted) initializes r4 to the base address of R1 (1<<16 + r2). The second instruction adds offset 65536 (-32768 as a signed value) to r4 taking r4 to the end of R1, where the address of the symbol resides.

The -qpic=large option

In the XL C/C++ Version 12.1 and XL Fortran Version 14.1 compilers, the -qpic=large option has been extended to cooperate with the linker to generate more efficient code when using a large TOC.

Listing 7 shows the code generated by the compiler for the example in Listing 4 that contains TOC overflow. With the -qpic=large option, the compiler always generates two instructions to get the address of a symbol because that is what required to access a large TOC. Thus, the linker does not need to insert a branch to out-of-line code to compute the correct offsets.

Listing 7. t17000.lst with -qpic=large (generated by xlc -qlist -qpic=large -o t17000_pic_large t17000.c)
     | 000000                           PDEF     main
…
    0| 03FFCC addis    3C820000   1     LAU       gr4=.A16382(gr2,0)
    0| 03FFD0 lwz      8084FFF8   1     L4A       gr4=.A16382(gr4,0)
17002| 03FFD4 lwz      80840000   1     L4A       gr4=A16382(gr4,0)
17002| 03FFD8 add      7C632214   1     A         gr3=gr3,gr4
    0| 03FFDC addis    3C820000   1     LAU       gr4=.A16383(gr2,0)
    0| 03FFE0 lwz      8084FFFC   1     L4A       gr4=.A16383(gr4,0)
17002| 03FFE4 lwz      80840000   1     L4A       gr4=A16383(gr4,0)
17002| 03FFE8 add      7C632214   1     A         gr3=gr3,gr4
    0| 03FFEC addis    3C820000   1     LAU       gr4=.A16384(gr2,0)
    0| 03FFF0 lwz      80840000   1     L4A       gr4=.A16384(gr4,0)
17002| 03FFF4 lwz      80840000   1     L4A       gr4=A16384(gr4,0)
17002| 03FFF8 add      7C632214   1     A         gr3=gr3,gr4
    0| 03FFFC addis    3C820000   1     LAU       gr4=.A16385(gr2,0)
    0| 040000 lwz      80840004   1     L4A       gr4=.A16385(gr4,0)
17002| 040004 lwz      80840000   1     L4A       gr4=A16385(gr4,0)
17002| 040008 add      7C632214   1     A         gr3=gr3,gr4

You can see in Listing 8 that with -qpic=large, two instructions are always used to access the TOC, even when TOC overflow does not occur. This is seen when accessing the first two global symbols at offset -32764 and -32768. These symbols reside in the base TOC but still require an additional instruction to compute the address. When using -bbigtoc, this address would be computed with a single instruction instead.

Listing 8. objdump output of t17000 with -qpic=large (generated by objdump -d t17000_pic_large)
1004024c:       3c 82 00 00     cau     r4,r2,0
10040250:       80 84 80 04     l       r4,-32764(r4)
10040254:       80 84 00 00     l       r4,0(r4)
10040258:       7c 63 22 14     cax     r3,r3,r4
1004025c:       3c 82 00 00     cau     r4,r2,0
10040260:       80 84 80 00     l       r4,-32768(r4)
10040264:       80 84 00 00     l       r4,0(r4)
10040268:       7c 63 22 14     cax     r3,r3,r4
1004026c:       3c 82 00 01     cau     r4,r2,1
10040270:       80 84 80 00     l       r4,-32768(r4)
10040274:       80 84 00 00     l       r4,0(r4)
10040278:       7c 63 22 14     cax     r3,r3,r4
1004027c:       3c 82 00 01     cau     r4,r2,1
10040280:       80 84 80 04     l       r4,-32764(r4)
10040284:       80 84 00 00     l       r4,0(r4)
10040288:       7c 63 22 14     cax     r3,r3,r4

Performance considerations when dealing with TOC overflow

The -qminimaltoc option is used to reduce the number of TOC entries required. It will place all global symbols located in a source file into a separate data structure. This option should be used to compile specific source files that contain only performance-insensitive global symbols. Even if it does not remove TOC overflow completely, the -qminimaltoc option can still be useful to reduce overall pressure on the TOC. However, it should be used carefully, because it can have a significant impact on performance when used to compile files that contain performance-sensitive global symbols.

When using -bbigtoc, there is no additional overhead to access global symbols that reside in the base TOC. However, accessing global symbols in the extended TOC requires several additional instructions to be executed, including a branch to out-of-line code, which can have an impact on the instruction cache performance. In addition, the code to handle the offset calculations is generated at link time, thus the compiler may miss opportunities to optimize it.

The -qpic=large option can also have an impact on performance, because it always uses two instructions to get the address of a symbol, regardless of whether there is TOC overflow. However, the two instructions used by -qpic=large (addis followed by a load) have a short latency compared to the sequence generated by -bbigtoc.


Summary

When TOC overflow occurs, the best approach is to reduce the number of global symbols by modifying the source code or relying on compiler optimizations. If specific files contain only performance-insensitive global symbols, the -qminimaltoc option should be considered to compile these files. However, due to the large performance overhead, this option should be used with discretion.

As we mentioned in last section, the -qpic=large option can also have a performance impact, because it always uses two instructions to get the address of a symbol, regardless of whether there is TOC overflow. On the other hand, when using -bbigtoc, there is no additional overhead to access global symbols that reside in the base TOC. However, accessing global symbols in the extended TOC requires several additional instructions to be executed, including a branch to out-of-line code, and that can have an impact on the instruction cache performance. The execution time for this sequence of instructions is much higher than for the two instructions generated by -qpic=large.

In theory, if all performance-sensitive global symbols can be placed in base TOC, -bbigtoc would be an ideal choice. But in reality, this is not possible, because users cannot control which symbols should be located in the base TOC and which should be located in the extended TOC. Also, as the source code for applications evolves and new global symbols are added, the placement of all global symbols within the TOC can change. Therefore, for typical application development, the -qpic=large option in the XL C/C++ V12.1 and XL Fortran V14.1 compilers is the preferred solution, because it provides the best balance for accessing symbols in both the base and the extended TOCs.

Resources

Learn

Get products and technologies

  • Download a free trial version of Rational software.
  • Evaluate other IBM software in the way that suits you best: Download it for a trial, try it online, use it in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement service-oriented architecture efficiently.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Rational software on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Rational
ArticleID=844105
ArticleTitle=An overview of the TOC on AIX
publish-date=11062012