The PowerPC® architecture describes two computation modes, a 32-bit computation mode and a 64-bit computation mode. Generally, processors that implement only 32-bit computation mode are referred to as 32-bit PowerPC processors. Processors that implement both computation modes are referred to as 64-bit PowerPC processors. If a processor implements the 64-bit computation mode it also must implement the 32-bit computation mode.
As 64-bit PowerPC processors become more widely available, it becomes desirable to make applications run in the 64-bit computation mode, providing access to larger address space and faster 64-bit arithmetic.
Porting PowerPC 32-bit software to the PowerPC 970FX's 64-bit computation mode
This article has three main sections. The first section discusses the changes required in C code to port from the 32-bit computation mode to the 64-bit computation mode, the second covers assembly language programs, and the third covers programs that execute with supervisor-level privileges.
Because a 64-bit PowerPC processor will support 32-bit code, it is not generally necessary to port an application merely to run it on a 64-bit processor; the original binary code can usually run unmodified (and with no performance penalty). However, this does not provide access to the larger address space and faster 64-bit arithmetic available in the 64-bit computation mode (it is possible to write assembly language functions that will execute in 64-bit computation mode using 32-bit tools).
Note that a single installation of GCC will not target both computation modes. The compiler, assembler, and linker are all different for the 64-bit ELF ABI, so a developer wishing to target both will need two complete toolchains installed.
Changes required in C/C++ language programs
This section discusses issues specific to porting C language software to the 64-bit computation mode of the PowerPC 970FX.
Most software programs written in the C language can be easily migrated
to the 64-bit computation mode of the PowerPC 970FX processor. In the C
language, only long and pointer data types are
changed between the 32-bit and 64-bit ABI. The change in size of long and pointer data types will affect structure
element alignment and structure padding, which is important if the data is
accessed in multiple contexts (for example, C language and assembly).
Programs which simply dump data structures to disk may also be affected.
The data type change will also affect results of some assignment and
arithmetic operations. For example, in the following code sample, the
compiler will insert padding in order to align the field_two element of the structure:
Listing 1. A sample structure
typedef struct sample_one {
int field_one;
long field_two;
int field_three;
} sample_one_type;
|
A compiler may also insert padding at the end of a structure (structure-end padding is required in case the structure is used as an element of an array). Such padding is not required if the above structure is compiled for the 32-bit ABI. Care must be taken if a structure size or field alignment is constrained by external requirements. If the exact layout of data matters, these changes can break compatibility. The exact layout of a structure may be constrained if the data will be transmitted over the network; for instance, a TCP header structure should have its exact layout preserved.
In the above example, the padding inserted by the compiler can be
removed by moving field_three above field_two, or by changing the type of field_two to int.
Another example of a problem is related to the change in size of pointer data types, as the following sample code illustrates.
Listing 2. Pointer to integer conversion
int test; void *p; test=(int)p; |
The above sample code running in 32-bit computation mode will execute
correctly, since the size of the pointer variable is the same as the size
of the integer variable. In 64-bit computation mode, the above code sample
might not execute correctly if the pointer value is greater than 2^32. The
GNU GCC compiler will produce a warning message when compiling the above
code even though an explicit cast is used. Any pointer arithmetic should
be performed using variables of type long
regardless of the computation mode. Pointers should only be assigned to
other pointers or variables of type long. In
general, the use of pointers as well as the long data type might need to be examined when porting
32-bit PowerPC programs. Care must also be taken then when displaying
long or pointer values, since the number of
characters required to display these values is different.
The use of function descriptors by the PowerPC 64-bit ELF ABI requires that changes be made to code that manipulates function addresses. In order to access function addresses, the following code can be used:
Listing 3. Function descriptors
int function_name(int arg1, int arg2);
typedef struct function_descriptor {
void *addr;
unsigned long toc;
unsigned long env;
} f_desc_t;
function_address=(unsigned long)(((f_desc_t *)function_name)->addr);
|
When using the GNU GCC compiler, there are a number of compiler options
(shown in Table 1) that can be used to help flush out some of the errors that can be associated with porting 32-bit PowerPC code. In general, using the GNU GCC -Wall and -Werror flags will help identify questionable
sections of code.
Table 1. GNU GCC compiler options useful in porting 32-bit software
| Option Name | Description |
| -Wpointer-arith | Warns about code that depends on size of function pointer or void * |
| -Wconversion | Warns about implicit parameter conversions and implicit negative to unsigned conversions |
| -Wformat | Checks calls to printf() and scanf() in order to ensure that supplied arguments have appropriate format string |
| -Wsign-compare | Warns when signed/unsigned comparisons could produce incorrect result |
| -Wpadded | Warns when padding is included in a structure |
Changes required for assembly language programs
This section discusses issues associated with porting 32-bit assembly language software to the 64-bit computation mode.
All global symbols in the PowerPC 64-bit ELF ABI have to be accessed
through the TOC pointer. In high-level languages, TOC accesses are done by
the compiler. In assembly language, the TOC must be used explicitly. The
following examples assume that the GNU assembler is being used --
assemblers produced by different vendors might have different syntax
requirements. The following example also assumes that a C language
pre-processor is used to process the macros before the assembler program
is invoked. The following are examples of macros (TOC_ENTRY, GET_SYM_ADDR)
followed by example code that loads into General Purpose Register 4 the
address of the global_symbol_name variable
using the TOC pointer:
Listing 4. TOC pointer
#define TOC_ENTRY(name,symbol) \ .section ".toc","aw"; \ name: .tc symbol[TC],symbol #define GET_SYM_ADDR(ra,name) \ ld ra,name@toc(r2) TOC_ENTRY(.LC0,global_symbol_name) GET_SYM_ADDR(r4,.LC0) |
In the above example, the TOC_ENTRY macro must not be placed in the
same section as the GET_SYM_ADDR macro. The TOC_ENTRY macro should be placed in the beginning of the file, since this macro defines an entry in
the .toc section. The TOC register has to be
initialized in the startup code. The fact that all functions have a
function descriptor with a TOC pointer can be used to initially set up the
TOC register. The following sample code illustrates this concept:
Listing 5. Using LOAD_64BIT_VAL
LOAD_64BIT_VAL(r2,my_main) ld r2,8(r2) |
In the above example, the symbol my_main is
the name of a function. The symbol my_main
refers to the address of the function descriptor, and the symbol .my_main (note the dot in front of the function name)
refers to the address of the function.
Loading a 64-bit value into a General Purpose Register
Because General Purpose Registers in the PowerPC 970FX are 64 bits wide, the assembly code required to load a value into a register needs to be changed. The following macro can be used to load an arbitrary 64-bit value into a General Purpose Register:
Listing 6. Defining LOAD_64BIT_VAL
#define LOAD_64BIT_VAL(ra,value) \ addis ra,r0,value@highest; \ ori ra,ra,value@higher; \ sldi ra,ra,32; \ oris ra,ra,value@h; \ ori ra,ra,value@l |
Assembly language function prolog and epilog code needs to be changed when porting applications from the 32-bit PowerPC ABI to the PowerPC 64-bit ELF ABI. One convenient method of handling assembly language function prolog and epilog code is to create macros. The following are examples of macros for assembly language function prolog and epilog code:
Listing 7. Function prolog and epilog
#define function_prolog(fn) \ .section ".text"; \ .align 2; \ .globl fn; \ .section ".opd","aw"; \ .align 3; \ fn:; \ .quad .fn,.TOC.@tocbase,0; \ .previous; \ .size fn,24; \ .globl .fn; \ .fn: #define function_epilog(fn) .long 0; \ .byte 0,12,0,0,0,0,0,0; \ .type .fn,@function; \ .size .fn,.-.fn |
Function prolog code is placed at the beginning of every assembly function, and function epilog code is placed at the end of every assembly function. The above sample function prolog creates code for the function descriptor. The function epilog code creates a default function traceback table.
When creating a data object using assembly language, a data object prolog and epilog are also required. The following is an example of code that can be used for creating a data object prolog and epilog:
Listing 8. Data prolog and epilog
#define data_global_prolog(dn) \ .section ".toc","aw"; \ .tc dn[TC],dn; \ .section ".data"; \ .align 3; \ .globl dn; \ dn: #define data_global_epilog(dn) \ .type dn,@object; \ .size dn,.-dn |
Assembly language function calls
When using assembly language, function calls must be handled differently due to PowerPC 64-bit ELF ABI requirements. When calling a function from assembly language, the following syntax should be used (note the dot in front of the function name):
Listing 9. Calling a function
b .my_main |
Changes required for supervisor-level software
The implementation of memory management in the 64-bit PowerPC processors is significantly different from previous implementations. Code that manipulates MMU structures will have to be rewritten in order to execute correctly on the PowerPC 970FX. Code executing with translation disabled (either the MSR DR or the MSR IR bits set to 0) must set the MSR HV (Hypervisor) bit to 1. If the MSR HV bit is not set to 1 when translation is disabled, a storage access will cause an exception.
A new instruction, mtmsrd, must be used when
updating the MSR register. The mtmsr
instruction that was previously used in 32-bit PowerPC implementations
updates only the lower 32 bits of the MSR register, but the MSR register
is 64 bits wide on 64-bit PowerPC chips. The rfi instruction is no longer available, and the rfid instruction should be used instead. The rfid instruction is used in restoring the MSR
register and execution context after an exception or interrupt (see Resources for a link to the original App Note and
complete listings on new, changed, and deprecated instructions).
Since the dcbi instruction has been removed
from the PowerPC architecture, any code using this instruction will have
to be examined. In most instances, the dcbi
instruction was used to invalidate memory that was written to by an
external device. The presence of memory coherency in PowerPC 970FX should
eliminate the need for the dcbi instruction.
Any use of the dcba instruction, which was
typically used to optimize software performance by placing the data in the
cache, should be removed.
The absence of the no-memory coherency storage attribute does not have
a direct impact on software. Software that uses the dcbf instruction could be potentially optimized,
since all cachable memory accesses are coherent with respect to main
storage. The use of a memory coherency protocol removes the need for the
dcbf instruction in most cases.
The PowerPC 970FX processor requires that the tlbie instruction (to invalidate the TLB) not be
simultaneously executed by multiple processors at the same time. This
requirement necessitates the use of an atomic test and set instruction
sequence before the execution of the tlbie
instruction. The following is an example of a test and set instruction
sequence:
Listing 10. Test and set sequence
..again: lwarx r5,r0,r3 cmpwi cr0,r5,0x0000 bne ..again stwcx. r4,r0,r3 bne ..again blr |
In the above example, General Purpose Register 3 contains the address against which the test and set instruction sequence is executed, and General Purpose Register 4 should contain a non-zero value. Typically, the processor initialization software will set a memory location used by the test and set instruction sequence to 0. The following instruction sequence should be used when invalidating a TLB entry:
Listing 11. Invalidating a TLB entry
bl .test_and_set tlbie r6,0 eieio tlbsync ptesync isync addi r6,r0,0x0000 stw r6,0x0000(r3) |
The tlbie instruction must be executed in
64-bit computation mode. In the above example, r3 contains the address of
the memory location used by the test and set function, and r6 contains
bits 16 through 51 of the effective address that will be invalidated in
the TLB. The stw instruction at the end of the
sequence clears the reservation created by the test and set instruction
sequence.
Porting PowerPC 32-bit software to PowerPC 970FX 32-bit computation mode
User-level 32-bit PowerPC applications do not require any changes in order to be executed in the 32-bit computation mode on the PowerPC 970FX processor. Thirty-two-bit computation mode is chosen by setting the SF bit in the MSR register to 0. In order to exploit some of the 64-bit capabilities of the PowerPC 970FX processor, changes in the original 32-bit PowerPC user-level application programs can be made. For example, if 32-bit software is performing 64-bit scalar arithmetic operations, the arithmetic operations can be written in assembly language, and such assembly functions can be called from high-level languages. The values passed to the function performing 64-bit arithmetic, and the return value, must be contained in the lower 32 bits of the General Purpose Registers.
The process of porting current 32-bit software targeted for the 32-bit computation mode of the PowerPC 970FX is primarily limited to changes in programs that will execute in supervisor mode.
Code executing at exception vector locations will be executed in 64-bit computation mode. As a result of an exception, the PowerPC 970FX processor sets the MSR SF bit to 1, indicating that the processor is executing in 64-bit computation mode. The code at the exception vector location must save a single register in a temporary register (for example SPRG1), change the value of MSR SF bit to 0, and restore the saved register. This must be done even before a branch to the exception handler code is executed. If, for example, an exception handler is located at address 0xFF000000, the branch instruction executing in 64-bit computation mode will sign extend the address to 0xFFFFFFFFFF000000 and an incorrect branch destination address will be generated, most likely resulting in another exception.
Certain instructions dealing with the MMU must be executed in 64-bit
computation mode (tlbie, tlbiel). The tlbie and
tlbiel instructions must be written in assembly
language in such a way that they can be called from code executing in
32-bit computation mode.
The behavior of some supervisor-level instructions, such as slbie and slbmte, is
affected by the processor computation mode.
Code that manipulates the MSR register must be altered so that the new
mtmsrd instruction is used. Supervisor-mode
code should use the rfid instruction instead of
the rfi instruction.
This excerpt covers the major issues of porting 32-bit code to run in the 64-bit computation mode. As you can see, the potential pitfalls are fairly well isolated. Portable C code will probably recompile with no changes at all.
The first stop if you need more information than this is the original article in the IBM Technology Group Library (see Resources). If the information you need isn't in this article, there are many other articles related to the PowerPC 970FX; or you might also want to consider posting to the Power Architecture technology forum.
- The "Developing
Embedded Software For The IBM PowerPC 970 FX Processor" Application
Note (in PDF format) from which this article was excerpted is about twice
as long, and provides additional overview material, and technical material
about porting. In particular, the article covers ABI compatibility,
VMX/AltiVec, user- and supervisor-level instructions, and more (IBM, July 2004).
- The "PowerPC
Microprocessor Family: Programming Environments Manual for 64 and 32-bit
Microprocessors" (also in PDF format) is 10.7 megabytes (760 pages) of
goodness -- including more details on PowerPC's register set, operand
conventions, addressing modes and instruction set summary, cache model and
memory coherency, exceptions, memory management, multiple-precision
shifts, floating-point models, and synchronization programming examples
than you'll find in any other place (IBM, June 2003).
- Both of the above -- and many others -- can be found in the IBM
Microelectronics Technology Group Library, which contains much
documentation on PowerPC and other processors and cores, including
Application Notes, Data Sheets, User Manuals, and more.
- One early adopter of 64-bit PowerPC code was 64-bit PPC Linux, supported on a range
of 64-bit PowerPC hardware.
- Apple's copy of the GCC documentation discusses PowerPC
compiler options.
- Everything you ever wanted to know about GCC, including source, can be
found on the GCC home page.
- RISCWatch
is best known for low-level debugging, but it can also be used to debug C
and C++ application programs.
- "The
RS/6000 64-bit Solution", an early white paper on 64-bit RS/6000
systems, discusses a number of still-relevant topics (IBM).
- Have experience you'd be willing to share with Power Architecture zone
readers? Article submissions on all aspects of Power Architecture technology from authors inside and outside
IBM are welcomed. Check out the Power Architecture author
FAQ to learn more.
- Have a question or comment on this story, or
on Power Architecture technology in general?
Post it in the Power Architecture technical forum
or send in a letter to the editors.
-
Get a subscription to the Power Architecture Community Newsletter when
you Join the Power Architecture community.
- All things Power are chronicled in the developerWorks Power
Architecture editors' blog, which is just one of many developerWorks
blogs.
- Find more articles and resources on Power Architecture
technology and all things
related in the developerWorks Power
Architecture technology content area.
- Download a IBM PowerPC 405 Evaluation Kit to demo a SoC in a simulated
environment, or just to explore the fully licensed version of
Power Architecture technology.
Comments (Undergoing maintenance)




