 | Level: Intermediate Jonathan Bartlett (johnnyb@eskimo.com), Director of Technology, New Medio
28 Feb 2007 The ABI, or Application Binary Interface, is the set of
conventions that allow programs written in different languages or compiled by different
compilers to call each other's functions. This article, the last in a four-part
series, discusses the PowerPC® ABI for 64-bit ELF (UNIX-like) systems and how to
write and call functions using it. Knowing in detail how the 64-bit PowerPC ABI works
will help you write 64-bit programs for the POWER5™ and other PowerPC-based processors
more effectively, whether you program in assembly language or not. There is also a
32-bit ABI that is not covered in this article.
The simplified ABI
The previous article "Programming with the PowerPC branch processor," briefly discussed the "simplified" ABI. This allows the writing of functions that meet certain criteria with a minimum of fuss. The criteria a function must meet to use the simplified ABI are:
- It must not call any other function.
- It may modify only registers 3 through 12 (although see exceptions under Non-volatile register save areas, below).
- It may modify only register fields
cr0, cr1, cr5, cr6, and cr7.
There are a few additional restrictions if your code uses the PowerPC
vector processing extensions as well, but that is beyond the scope of this article.
Interestingly, you need not declare in any way when you are using the
simplified ABI, because it is a fully-compatible subset of the normal
ABI for functions that do not need stack frames,
discussed in the next section.
When a function is called using the PowerPC ABI semantics, it passes the parameters to the function in registers. Register 3 has the first fixed-point parameter, register 4 has the second, and so on through register 10. Likewise, floating-point values are passed through the floating-point registers 1 through 13. When the function is completed, the value is returned through register 3, and the function exits using the blr instruction.
To demonstrate the simplified PowerPC ABI, let's look at a function that takes one parameter, squares it, and returns it. Here is the function in assembly language (enter as my_square.s):
Listing 1. Function to square a number using the simplified ABI
###FUNCTION ENTRY POINT DECLARATION###
.section .opd, "aw"
.align 3
.global my_square
my_square: #this is the name of the function as seen
.quad .my_square, .TOC.@tocbase, 0
#Tell the linker that this is a function reference
.type my_square, @function
###FUNCTION CODE HERE###
.text
.my_square: #This is the label for the code itself (referenced in the "opd")
#Parameter 1 -- number to be squared -- in register 3
#Multiply it by itself, and store it back into register 3
mulld 3, 3, 3
#The return value is now in register 3, so we just need to leave
blr
|
Previously, you were using the .opd
section for declaring the program's entry point, but here you're also
using it to declare a function. These are called official
procedure descriptors, and they contain the information the
linker needs to combine position-independent code from different
shared object files together. The most important field is the first
one, which is the address of the start of the code for the procedure.
The second field is the TOC pointer used for the function. The third field is an environment pointer for languages that use one, but is normally just set to zero. Notice that the only symbol definition that is exported globally is the official procedure descriptor.
The C language prototype for this function is:
Listing 2. C prototype for number-squaring function
typedef long long int64;
int64 my_square(int64 val);
|
Here is the C code for using the function (enter as my_square_tester.c):
Listing 3. C code for calling the my_square function
#include <stdio.h>
/* make declarations easier to write */
typedef long long int64;
int64 my_square(int64);
int main() {
int a = 32;
printf("The square of %lld is %lld.\n", a, my_square(a));
return 0;
}
|
The simple way to compile and run this code is to do the following:
Listing 4. Compiling and running my_square_tester
gcc -m64 my_square.s my_square_tester.c -o my_square_tester
./my_square_tester
|
The -m64 flag tells the compiler to use 64-bit
instructions, compile using the 64-bit ABI and libraries, and use the
64-bit ABI for linking. It then takes care of all of the linking
issues for you (and there are several -- you can see the full linking command line by
appending -v to the command line).
As you can see, writing functions using the simplified PowerPC ABI is
very straightforward. The issues come in when the functions don't
meet these criteria.
The stack
Now let's get into the more complicated parts of the ABI. The most
important part of any ABI is the details of how to make use of the
stack, which is the area of memory that holds
local function data.
The need for a stack
The best way to see why stacks are needed is to look at recursive functions. For simplicity, let's look at the recursive implementation of the factorial function:
Listing 5. Factorial function
typedef long long int64;
int64 factorial(int64 num) {
//BASE CASE
if (num == 0) {
return 1;
//RECURSIVE CASE
} else {
return num * factorial(num - 1);
}
}
|
This may be easy enough to understand conceptually, but let's examine it concretely. What is going on here? What happens, for instance, if you try to find the value of the factorial of 4? Let's follow the sequence:
First, the function will be called, and num will be set equal
to 4. Then, because num is greater than 0, factorial will be called again, this time with 3. Now, in the new
call to factorial, num is set to
3. However, this references a different memory location than the previous one, even
though they share the same name and the same code. Even though it is the same
variable name in the same code, num is different this time.
This is because each time a function is called, it has an activation record (also
called a stack frame) associated with it. The activation record contains all of
the call-specific data for the function, including parameters and local variables. This
is how recursive functions keep from trashing the values of the variables in other,
active function calls. Each call gets its own activation record, so each time it is
called the variables get their own storage space within that activation record. Only
when the function call is completely finished is the space for the activation
record released for reuse (more on this later).
So, with 3 as the value of num, we go through the
function again, then with 2, then with 1, then with 0. However, with
0, the function has reached its base case. The
base case is the point where it ceases to call itself, and instead
returns. So, with 0 as num, it returns 1 as the
result. The previous function call picks up where it left off
(calling factorial(0)) and multiplies the result,
1, with the value in its own num, also 1. This is
returned, and the next function waiting is reactivated. This
one multiplies the result, 1, with its value of num, which
is 2, and the result, 2, is then returned. The next waiting function call
is then reactivated, and the previous result is multiplied by this function's value of num, which is 3, resulting in 6. This number is returned to our original function, whose value of num is 4. This is multiplied with the previous result to get 24.
As you can see, each time a function calls another function, its own
values and state are suspended while the next function invocation
occurs. This is true for all functions, not just recursive ones. If
that function again calls other functions, its state is likewise
suspended. When a function returns, the function that called it is
revived and it continues from there. So, as we progress, the "live"
function calls stack up on top of each other with each function call, and
then are removed from the stack with every function return. The
result looks like this (factorial will be abbreviated as fac):
fac(4) [active]
fac(4) [suspended], fac(3) [active]
fac(4) [suspended], fac(3) [suspended], fac(2) [active]
fac(4) [suspended], fac(3) [suspended], fac(2) [suspended], fac(1) [active]
fac(4) [suspended], fac(3) [suspended], fac(2) [suspended], fac(1) [suspended], fac(0) [active]
fac(4) [suspended], fac(3) [suspended], fac(2) [suspended], fac(1) [active]
fac(4) [suspended], fac(3) [suspended], fac(2) [active]
fac(4) [suspended], fac(3) [active]
fac(4) [active]
As you can see, the suspended function activation records "stack up",
and then, when each function returns, it gets taken off of the stack.
The stack layout
To implement this idea, a range of memory is allocated for each program
called the program stack. All PowerPC programs start off
with a pointer to this stack in register 1. In the PowerPC ABI,
register 1 always points to the top of the stack.
This makes it easy for functions to know where their activation record
is -- they are simply defined in terms of the stack pointer. If a
function is executing, then the stack pointer is pointing to the top
of the whole stack, which is also the top of that function's activation record. Because activation records are implemented on a stack, they are often referred to as stack frames, but both terms are equivalent.
Now, when the "top of the stack" is referred to, that is a conceptual
designation. Physically, in memory, the stack grows downward, from
large-numbered memory addresses to small-numbered ones. Therefore, register
1 will have a pointer to the conceptual top of the stack, and
references to stack positions that have positive offsets will
actually be below the top of the stack
conceptually, and negative offsets will be conceptually above. So,
0(1) refers to the conceptual top of the stack, 4(1) refers to four bytes down from the top (conceptually), 24(1) is even lower conceptually, and 100(1) is lower still.
Now that you understand how the stack looks conceptually and physically, let's look at what exactly the individual stack frames hold. Here is the layout of the stack according to the 64-bit PowerPC ABI, from a physical memory standpoint (stack offsets, where given, refer to the beginning of this location in memory):
Table 1. Stack frame layout
| Contains | Size | Beginning stack offset |
|---|
| Floating point non-volatile register save area | Varies | Varies | | General non-volatile register save area | Varies | Varies |
|---|
| VRSAVE | 4 bytes | Varies |
|---|
| Alignment padding | 4 or 12 bytes | Varies |
|---|
| Vector non-volatile register save area | Varies | Varies (must be quadword-aligned) |
|---|
| Local variable storage | Varies | Varies |
|---|
| Parameters for function calls | Varies (minimum 64 bytes) | 48(1) |
|---|
| TOC save area | 8 | 40(1) |
|---|
| Link editor area | 8 | 32(1) |
|---|
| Compiler area | 8 | 24(1) |
|---|
| Link Register save area | 8 | 16(1) |
|---|
| Condition Register save area | 8 | 8(1) |
|---|
| Pointer to top of previous stack frame | 8 | 0(1) |
|---|
I won't concern you with the floating point, VRSAVE, Vector, or
alignment space. Those topics deal with floating point and vector processing and are outside the scope of this article. All stack values must be doubleword (8-byte) aligned, and the whole
frame should be quadword (16-byte) aligned. All parameters must be
doubleword-aligned.
Now, let's look at what each part of the stack frame does.
Non-volatile register save areas
The first part of the stack
frame is the non-volatile
register save area.
Registers in the PowerPC ABI are divided into
three basic classes: dedicated, volatile, and non-volatile.
Dedicated registers are registers that have a predefined, permanent
function, like the stack pointer (register 1) and the TOC pointer
(register 2). Registers 3-12 are volatile registers, which means that
any function can modify them freely without having to restore their previous value. However, this means that any
time a function calls another function, it should assume that
registers 3-12 will be overwritten by that function.
On the other hand,
registers 13 and above are considered non-volatile registers. This
means that a function can use them provided their value is
restored before returning from the function. Therefore,
before using a non-volatile register in a function, its value must be
saved in the function's stack frame, and then restored before the
function returns. Likewise, a function may also assume that the values it
assigns to non-volatile registers will not be modified (or at least will
be restored) when it makes calls to other functions. A function may use as little or as much memory in this save area as needed.
Now you can see why our earlier rules for the simplified ABI required that only registers 3 through 12 should be used: the others are non-volatile and require stack space to save them! Therefore, in order to use the other registers, they have to be saved on the stack. However, the ABI actually has a way to work around this limitation. Functions are free to use the 288 bytes that are physically below the stack pointer for functions that do not call other functions. Therefore, functions using the simplified ABI actually can save, use, and restore non-volatile registers by using negative offsets from the stack pointer.
Local variable storage
The local variable storage area is a general-purpose area for saving function-specific data. Often this is not needed because of the large
number of registers available for use in the PowerPC architecture. However, this space is often used for local arrays. This area can be any size
needed by the function.
Parameters for function calls
Function parameters are handled a little differently from other local
data. The PowerPC ABI actually puts the storage space for the
function parameters in the calling
function's stack space. Now, as you saw earlier, function
calls actually pass their parameters through registers. However,
space must still be reserved for parameters in case the values need to
be saved, especially since the parameters are passed using volatile registers. This space is also used for
overflow: if there are more parameters than registers available for
use, then they need to go in the stack space. Since this parameter
area is shared by all functions called from the current one, when a function sets up its stack space, it has to reserve space for the largest number of parameters it will use in a function call.
So that a function can know where its parameters are, parameters are stored from the
bottom of memory to the top. The first parameter is in 48(1), while the second parameter is in 56(1). This way, the function being called can know the exact offset of each parameter, no matter how big the parameter list area is. Remember, the parameter list area is defined for all of the calls made by a function, and therefore will likely be bigger than necessary for any individual function call.
Now, since the save area for the parameters passed to a function are actually in the
calling function's stack frame, when a function establishes its own stack frame, the
offsets to the parameter list now have to be adjusted to account for the function's own
stack frame size. So, let's say that function func1 calls
function func2 with three parameters, and func2 has a 112-byte stack frame. If func2 wants to access the memory for its first parameter, it would refer to it as 160(1), because it has to go past its own stack frame (112 bytes) and reach the first parameter in the last frame (48 bytes). p>
Thankfully, functions rarely have to access their parameter save area because most parameters are passed by register, not in the parameter save area. However, space must be allocated for them even if there is nothing stored there. Functions must assume that for the first eight parameters, they are only passed by register, but they will still have a save area available if they need to be stored by the program. This space must also be a minimum of 64 bytes large.
TOC, link editor, and compiler areas
The TOC save
area, compiler area, and linker area are all reserved for system use, and are
not modified by programmers, but the programmer must reserve space for them.
Link register save area
The link register save area is different from the other parts of the ABI. When a function begins, it actually saves the link register in the
calling function's stack frame, not its own, and then only if it needs to save it. Most functions that call other functions will need it, though.
Condition register save area
The condition register save area is needed if any of the non-volatile fields of the condition register are modified. The non-volatile fields are cr2, cr3, and cr4. The condition register should be saved in its area of the stack before any of these fields are modified, and then restored before returning.
Pointer to the previous stack frame
The final item in the stack frame is a pointer to the previous stack frame, often called the back pointer.
Writing a function that uses the stack
Functions create the stack frame during the beginning of the function (called the function prologue) and tear it down at the end of a function (called the function epilogue).
A function's prologue usually follows the following sequence:
- Reserve stack space and save the old stack pointer, using
stdu 1,
-SIZE_OF_STACK(1) (where SIZE_OF_STACK is the size of
the stack frame for this function). This will save the old stack pointer and allocate
stack memory atomically.
- If this function will call another function, or use the link register in any way, it will be saved by the instruction
mflr 0 followed by a store into the link register save area of the function that called this one, using the instruction std 0, SIZE_OF_STACK+16(1).
- Save all non-volatile registers that will be used during this function (including the condition register, if any of its non-volatile fields will be used).
The function's epilogue follows the reverse sequence, restoring what had been saved, and then destroying the stack frame using ld 1, 0(1), which loads the previous stack pointer back into the stack pointer register.
Now, let's return to the function that we originally implemented without a stack, and as an example, look and see what it would look like with a stack (enter as my_square.s and compile and run as before):
Listing 6. Function to square a number using a stack
###FUNCTION ENTRY POINT DECLARATION###
.section .opd, "aw"
.align 3
.global my_square
my_square: #this is the name of the function as seen
.quad .my_square, .TOC.@tocbase, 0
.type my_square, @function
###FUNCTION CODE HERE###
.text
.my_square: #This is the label for the code itself (Referenced in the "opd")
##PROLOGUE##
#Set up stack frame & back pointer (112 bytes -- minimum stack)
stdu 1, -112(1)
#Save LR (optional)
mflr 0
std 0, 128(1)
#Save non-volatile registers (we don't have any)
##FUNCTION BODY##
#Parameter 1 -- number to be squared -- in register 3
mulld 3, 3, 3
#The return value is now in register 3, so we just need to leave
##EPILOGUE##
#Restore non-volatile registers (we don't have any)
#Restore LR (not needed in this function, but here anyway)
ld 0, 128(1)
mtlr 0
#Restore stack frame atomically
ld 1, 0(1)
#Return
blr
|
That's exactly the same code as before, just wrapped with prologue and epilogue code. As
mentioned, this code is simple enough that it doesn't need prologue and epilogue code and
is perfectly fine using the simplified ABI. However, it is a good example of how to set up and tear down a stack frame.
Now, let's return to the factorial function. This function, since it calls itself, makes very good use of stack frames. Let's look at how the factorial function would work in assembly language (enter as factorial.s):
Listing 7. The factorial function in assembly language
###ENTRY POINT###
.section .opd, "aw"
.align 3
.global factorial
factorial:
.quad .factorial, .TOC.@tocbase, 0
.type factorial, @function
###CODE###
.text
.factorial:
#Prologue
#Reserve Space
#48 (save areas) + 64 (parameter area) + 8 (local variable) = 120 bytes.
#aligned to 16-byte boundary = 128 bytes
stdu 1, -128(1)
#Save Link Register
mflr 0
std 0, 144(1)
#Function body
#Base Case? (register 3 == 0)
cmpdi 3, 0
bt- eq, return_one
#Not base case - recursive call
#Save local variable
std 3, 112(1)
#NOTE - it could also have been stored in the parameter save area.
# parameter 1 would have been at 176(1)
#Subtract One
subi 3, 3, 1
#Call the function (branch and set the link register to the return address)
bl factorial
#Linker word
nop
#Restore local variable (but to a different register -
#register 3 is now the return value from the last factorial
#function)
ld 4, 112(1)
#Multiply by return value
mulld 3, 3, 4
#Result is in register 3, which is the return value register
factorial_return:
#Epilogue
#Restore Link Register
ld 0, 144(1)
mtlr 0
#Restore stack
ld 1, 0(1)
#Return
blr
return_one:
#Set return value to 1
li 3, 1
#Return
b factorial_return
|
To test it from C, enter the following (enter as factorial_caller.c):
Listing 8. Program to call factorial function
#include <stdio.h>
typedef long long int64;
int64 factorial(int64);
int main() {
int64 a = 10;
printf("The factorial of %lld is %lld\n", factorial(a));
return 0;
}
|
Compile and run as follows:
Listing 9. Compiling and running factorial
gcc -m64 factorial.s factorial_caller.c -o factorial
./factorial
|
There are a few features of this factorial function that are interesting. First of all, we are making use of the local variable storage space. We are saving the current parameter in 112(1). Now, since this is a function parameter, we could have saved an extra doubleword of stack space and stored it in the caller's parameter area.
Another interesting thing in the program is the nop instruction
after the function call. That is required by the ABI. That extra instruction
allows the linker to insert additional code if necessary during the linking process.
For example, if you have a program that has enough symbols to warrant multiple TOCs
(TOCs were discussed in "Assembly
language for Power Architecture, Part 2: The art of loading and storing on PowerPC"), the linker will emit an instruction (or multiple instructions using a branch) to swap around TOCs for you.
Finally, notice that the branch target for the function call is not the code that starts it, but the .opd entry point descriptor. The linker will take care of converting this to point to the correct code. However, this will let the linker know additional information about the function, including which TOC it is using, so it can emit the code to swap these around if necessary.
Creating dynamic libraries
Now that you know how to make functions, you can put them together into a library. You
actually don't need to write any additional code, you just need to compile it all together. To combine the factorial and my_square functions into a single library (let's call it libmymath.so), just enter the following:
Listing 10. Compiling shared libraries
gcc -m64 -shared factorial.s my_square.s -o libmymath.so
|
This instructs the compiler to produce a shared object called libmymath.so. To link this into executables, you need to enable both the compile-time linker and the run-time dynamic linker to find it. To compile the factorial calling function to use the shared object, compile and run like this:
Listing 11. Using the shared library
#-L tells what directories to search, -l tells what libraries to find
gcc -m64 factorial_caller.c -o factorial -L. -lmymath
#Tell the dynamic linker what additional directories to search
export LD_LIBRARY_PATH=.
#Run the program
./factorial
|
Of course, you can get rid of all of those directory flags if the library is installed in a standard library location.
As mentioned in "Assembly language for Power Architecture, Part 2: The art of loading and storing on PowerPC," the TOC, or table of contents, of an application only has 64KB worth of space for holding global data references. So, what happens when several shared objects are loaded into the same application space and the table of contents gets too big? This is what the .TOC.@tocbase reference is for in the official procedure descriptor. The linker can manage several TOCs in a single application. The .TOC.@tocbase instructs the linker to put the address of the TOC for that function in that spot. Then, when the linker is setting up references to functions, it compares the TOC of the current function to the TOC of the function it's calling. If they are the same, it leaves the call alone. If they are different, it actually modifies your code to swap TOC references on function call and return. This is one of the main reasons for the official procedure descriptors, and also one of the main reasons for the extra nop instruction that follows a function call.
Because of this, you never have to worry about running out of global symbol space from linking in too many shared objects.
 |
Conclusion
The simplified 64-bit ABI is a breeze to use in programs, and the full ABI is not much harder. The most difficult part is determining the different offsets of the different parts of the stack frame, knowing where each piece should go, and what size it should be.
Creating reusable libraries in assembly language is fast and easy. To convert functions that use the 64-bit ABI into shared libraries, all that is needed is a few extra compiler flags and you're ready to go.
Hopefully, this series of articles has demonstrated the ease and power of PowerPC programming. Perhaps in your next project, you'll consider tapping the full resources of the POWER5 chip by using its assembly language!
Resources Learn
Get products and technologies
-
Order the SEK for Linux, a two-DVD set containing the latest IBM trial software for Linux from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
-
With IBM trial software, available for download directly from developerWorks, build your next development project on Linux.
Discuss
About the author  | |  | Jonathan Bartlett is the author of the book Programming from the Ground Up, an introduction to programming using Linux assembly language. He is the lead developer at New Medio, developing Web, video, kiosk, and desktop applications for clients. |
Rate this page
|  |