From the stacks: Making the transition to 64 bits

Porting PowerPC software from 32-bit mode to 64-bit mode on the PowerPC 970FX

Developers porting applications to the 64-bit computing mode of the 970FX processor may face a number of issues; this excerpt from a longer Technical Library article covers some of the issues faced when porting existing 32-bit code to the new computing model -- or when embarking on new 64-bit development.

The PowerPC® architecture describes two computation modes, a 32-bit computation mode and a 64-bit computation mode. Generally, processors that implement only 32-bit computation mode are referred to as 32-bit PowerPC processors. Processors that implement both computation modes are referred to as 64-bit PowerPC processors. If a processor implements the 64-bit computation mode it also must implement the 32-bit computation mode.

As 64-bit PowerPC processors become more widely available, it becomes desirable to make applications run in the 64-bit computation mode, providing access to larger address space and faster 64-bit arithmetic.

Porting PowerPC 32-bit software to the PowerPC 970FX's 64-bit computation mode

This article has three main sections. The first section discusses the changes required in C code to port from the 32-bit computation mode to the 64-bit computation mode, the second covers assembly language programs, and the third covers programs that execute with supervisor-level privileges.

Because a 64-bit PowerPC processor will support 32-bit code, it is not generally necessary to port an application merely to run it on a 64-bit processor; the original binary code can usually run unmodified (and with no performance penalty). However, this does not provide access to the larger address space and faster 64-bit arithmetic available in the 64-bit computation mode (it is possible to write assembly language functions that will execute in 64-bit computation mode using 32-bit tools).

Note that a single installation of GCC will not target both computation modes. The compiler, assembler, and linker are all different for the 64-bit ELF ABI, so a developer wishing to target both will need two complete toolchains installed.


Changes required in C/C++ language programs

This section discusses issues specific to porting C language software to the 64-bit computation mode of the PowerPC 970FX.

There's more where this came from

This article is an excerpt from the application note "Developing Embedded Software For The IBM PowerPC 970FX Processor," which is listed in the Resources section, below. This excerpt covers porting 32-bit software to run in the 64-bit computation mode of the PowerPC 970FX.

The full article also describes the 64-bit ELF ABI, development tools, and the VMX (aka "AltiVec") vector processing instructions, as well as significant changes to memory management, the larger virtual memory pages for the 64-bit PowerPC architecture, new user-level and supervisor-level instructions specific to 64-bit PowerPC processors, and 64-bit PowerPC endianness (64-bit PowerPC processors are big-endian only, rather than open-endian).

Data type changes

Most software programs written in the C language can be easily migrated to the 64-bit computation mode of the PowerPC 970FX processor. In the C language, only long and pointer data types are changed between the 32-bit and 64-bit ABI. The change in size of long and pointer data types will affect structure element alignment and structure padding, which is important if the data is accessed in multiple contexts (for example, C language and assembly). Programs which simply dump data structures to disk may also be affected. The data type change will also affect results of some assignment and arithmetic operations. For example, in the following code sample, the compiler will insert padding in order to align the field_two element of the structure:

Listing 1. A sample structure
typedef struct sample_one {
int     field_one;
long    field_two;
int     field_three;
} sample_one_type;

A compiler may also insert padding at the end of a structure (structure-end padding is required in case the structure is used as an element of an array). Such padding is not required if the above structure is compiled for the 32-bit ABI. Care must be taken if a structure size or field alignment is constrained by external requirements. If the exact layout of data matters, these changes can break compatibility. The exact layout of a structure may be constrained if the data will be transmitted over the network; for instance, a TCP header structure should have its exact layout preserved.

In the above example, the padding inserted by the compiler can be removed by moving field_three above field_two, or by changing the type of field_two to int.

Pointer arithmetic

Another example of a problem is related to the change in size of pointer data types, as the following sample code illustrates.

Listing 2. Pointer to integer conversion
int test;
void *p;
test=(int)p;

The above sample code running in 32-bit computation mode will execute correctly, since the size of the pointer variable is the same as the size of the integer variable. In 64-bit computation mode, the above code sample might not execute correctly if the pointer value is greater than 2^32. The GNU GCC compiler will produce a warning message when compiling the above code even though an explicit cast is used. Any pointer arithmetic should be performed using variables of type long regardless of the computation mode. Pointers should only be assigned to other pointers or variables of type long. In general, the use of pointers as well as the long data type might need to be examined when porting 32-bit PowerPC programs. Care must also be taken then when displaying long or pointer values, since the number of characters required to display these values is different.

Function pointers

The use of function descriptors by the PowerPC 64-bit ELF ABI requires that changes be made to code that manipulates function addresses. In order to access function addresses, the following code can be used:

Listing 3. Function descriptors
int function_name(int arg1, int arg2);
typedef struct function_descriptor {
	void *addr;
	unsigned long toc;
	unsigned long env;
} f_desc_t;
function_address=(unsigned long)(((f_desc_t *)function_name)->addr);

Compiler options

When using the GNU GCC compiler, there are a number of compiler options (shown in Table 1) that can be used to help flush out some of the errors that can be associated with porting 32-bit PowerPC code. In general, using the GNU GCC -Wall and -Werror flags will help identify questionable sections of code.

Table 1. GNU GCC compiler options useful in porting 32-bit software
Option NameDescription
-Wpointer-arithWarns about code that depends on size of function pointer or void *
-WconversionWarns about implicit parameter conversions and implicit negative to unsigned conversions
-WformatChecks calls to printf() and scanf() in order to ensure that supplied arguments have appropriate format string
-Wsign-compareWarns when signed/unsigned comparisons could produce incorrect result
-WpaddedWarns when padding is included in a structure

Changes required for assembly language programs

This section discusses issues associated with porting 32-bit assembly language software to the 64-bit computation mode.

TOC programming

All global symbols in the PowerPC 64-bit ELF ABI have to be accessed through the TOC pointer. In high-level languages, TOC accesses are done by the compiler. In assembly language, the TOC must be used explicitly. The following examples assume that the GNU assembler is being used -- assemblers produced by different vendors might have different syntax requirements. The following example also assumes that a C language pre-processor is used to process the macros before the assembler program is invoked. The following are examples of macros (TOC_ENTRY, GET_SYM_ADDR) followed by example code that loads into General Purpose Register 4 the address of the global_symbol_name variable using the TOC pointer:

Listing 4. TOC pointer
#define TOC_ENTRY(name,symbol)  \
		.section        ".toc","aw";            \
		name:           .tc symbol[TC],symbol
#define GET_SYM_ADDR(ra,name)   \
		ld              ra,name@toc(r2)
TOC_ENTRY(.LC0,global_symbol_name)
GET_SYM_ADDR(r4,.LC0)

In the above example, the TOC_ENTRY macro must not be placed in the same section as the GET_SYM_ADDR macro. The TOC_ENTRY macro should be placed in the beginning of the file, since this macro defines an entry in the .toc section. The TOC register has to be initialized in the startup code. The fact that all functions have a function descriptor with a TOC pointer can be used to initially set up the TOC register. The following sample code illustrates this concept:

Listing 5. Using LOAD_64BIT_VAL
LOAD_64BIT_VAL(r2,my_main)
ld               r2,8(r2)

In the above example, the symbol my_main is the name of a function. The symbol my_main refers to the address of the function descriptor, and the symbol .my_main (note the dot in front of the function name) refers to the address of the function.

Loading a 64-bit value into a General Purpose Register

Because General Purpose Registers in the PowerPC 970FX are 64 bits wide, the assembly code required to load a value into a register needs to be changed. The following macro can be used to load an arbitrary 64-bit value into a General Purpose Register:

Listing 6. Defining LOAD_64BIT_VAL
#define LOAD_64BIT_VAL(ra,value) \
	addis           ra,r0,value@highest;            \
	ori             ra,ra,value@higher;             \
	sldi            ra,ra,32;                       \
	oris            ra,ra,value@h;                  \
	ori             ra,ra,value@l

Function prolog and epilog

Assembly language function prolog and epilog code needs to be changed when porting applications from the 32-bit PowerPC ABI to the PowerPC 64-bit ELF ABI. One convenient method of handling assembly language function prolog and epilog code is to create macros. The following are examples of macros for assembly language function prolog and epilog code:

Listing 7. Function prolog and epilog
#define function_prolog(fn) \
		.section        ".text";                \
		.align          2;                      \
		.globl          fn;                     \
		.section        ".opd","aw";            \
		.align          3;                      \
		fn:;                                    \
		.quad           .fn,.TOC.@tocbase,0;    \
		.previous;                              \
		.size           fn,24;                  \
		.globl          .fn;                    \
		.fn:
#define function_epilog(fn)
		.long           0;                      \
		.byte           0,12,0,0,0,0,0,0;       \
		.type           .fn,@function;          \
		.size           .fn,.-.fn

Function prolog code is placed at the beginning of every assembly function, and function epilog code is placed at the end of every assembly function. The above sample function prolog creates code for the function descriptor. The function epilog code creates a default function traceback table.

Assembly data objects

When creating a data object using assembly language, a data object prolog and epilog are also required. The following is an example of code that can be used for creating a data object prolog and epilog:

Listing 8. Data prolog and epilog
#define data_global_prolog(dn)          \
		.section        ".toc","aw";            \
		.tc             dn[TC],dn;              \
		.section        ".data";                \
		.align          3;                      \
		.globl          dn;                     \
		dn:
#define data_global_epilog(dn) \
		.type           dn,@object;             \
		.size           dn,.-dn

Assembly language function calls

When using assembly language, function calls must be handled differently due to PowerPC 64-bit ELF ABI requirements. When calling a function from assembly language, the following syntax should be used (note the dot in front of the function name):

Listing 9. Calling a function
b		.my_main

Changes required for supervisor-level software

The implementation of memory management in the 64-bit PowerPC processors is significantly different from previous implementations. Code that manipulates MMU structures will have to be rewritten in order to execute correctly on the PowerPC 970FX. Code executing with translation disabled (either the MSR DR or the MSR IR bits set to 0) must set the MSR HV (Hypervisor) bit to 1. If the MSR HV bit is not set to 1 when translation is disabled, a storage access will cause an exception.

Introducing the hypervisor

The 64-bit PowerPC architecture introduced the concept of hypervisor resources, which have a higher privilege level than typical supervisor resources. In the PowerPC 970 FX, all system-level software should be executed in hypervisor mode.

A new instruction, mtmsrd, must be used when updating the MSR register. The mtmsr instruction that was previously used in 32-bit PowerPC implementations updates only the lower 32 bits of the MSR register, but the MSR register is 64 bits wide on 64-bit PowerPC chips. The rfi instruction is no longer available, and the rfid instruction should be used instead. The rfid instruction is used in restoring the MSR register and execution context after an exception or interrupt (see Resources for a link to the original App Note and complete listings on new, changed, and deprecated instructions).

Since the dcbi instruction has been removed from the PowerPC architecture, any code using this instruction will have to be examined. In most instances, the dcbi instruction was used to invalidate memory that was written to by an external device. The presence of memory coherency in PowerPC 970FX should eliminate the need for the dcbi instruction. Any use of the dcba instruction, which was typically used to optimize software performance by placing the data in the cache, should be removed.

The absence of the no-memory coherency storage attribute does not have a direct impact on software. Software that uses the dcbf instruction could be potentially optimized, since all cachable memory accesses are coherent with respect to main storage. The use of a memory coherency protocol removes the need for the dcbf instruction in most cases.

The PowerPC 970FX processor requires that the tlbie instruction (to invalidate the TLB) not be simultaneously executed by multiple processors at the same time. This requirement necessitates the use of an atomic test and set instruction sequence before the execution of the tlbie instruction. The following is an example of a test and set instruction sequence:

Listing 10. Test and set sequence
..again:        lwarx           r5,r0,r3
		cmpwi           cr0,r5,0x0000
		bne             ..again
		stwcx.          r4,r0,r3
		bne             ..again
		blr

In the above example, General Purpose Register 3 contains the address against which the test and set instruction sequence is executed, and General Purpose Register 4 should contain a non-zero value. Typically, the processor initialization software will set a memory location used by the test and set instruction sequence to 0. The following instruction sequence should be used when invalidating a TLB entry:

Listing 11. Invalidating a TLB entry
	bl              .test_and_set
	tlbie           r6,0
	eieio
	tlbsync
	ptesync
	isync
	addi            r6,r0,0x0000
	stw             r6,0x0000(r3)

The tlbie instruction must be executed in 64-bit computation mode. In the above example, r3 contains the address of the memory location used by the test and set function, and r6 contains bits 16 through 51 of the effective address that will be invalidated in the TLB. The stw instruction at the end of the sequence clears the reservation created by the test and set instruction sequence.


Porting PowerPC 32-bit software to PowerPC 970FX 32-bit computation mode

User-level 32-bit PowerPC applications do not require any changes in order to be executed in the 32-bit computation mode on the PowerPC 970FX processor. Thirty-two-bit computation mode is chosen by setting the SF bit in the MSR register to 0. In order to exploit some of the 64-bit capabilities of the PowerPC 970FX processor, changes in the original 32-bit PowerPC user-level application programs can be made. For example, if 32-bit software is performing 64-bit scalar arithmetic operations, the arithmetic operations can be written in assembly language, and such assembly functions can be called from high-level languages. The values passed to the function performing 64-bit arithmetic, and the return value, must be contained in the lower 32 bits of the General Purpose Registers.

The process of porting current 32-bit software targeted for the 32-bit computation mode of the PowerPC 970FX is primarily limited to changes in programs that will execute in supervisor mode.

Code executing at exception vector locations will be executed in 64-bit computation mode. As a result of an exception, the PowerPC 970FX processor sets the MSR SF bit to 1, indicating that the processor is executing in 64-bit computation mode. The code at the exception vector location must save a single register in a temporary register (for example SPRG1), change the value of MSR SF bit to 0, and restore the saved register. This must be done even before a branch to the exception handler code is executed. If, for example, an exception handler is located at address 0xFF000000, the branch instruction executing in 64-bit computation mode will sign extend the address to 0xFFFFFFFFFF000000 and an incorrect branch destination address will be generated, most likely resulting in another exception.

Certain instructions dealing with the MMU must be executed in 64-bit computation mode (tlbie, tlbiel). The tlbie and tlbiel instructions must be written in assembly language in such a way that they can be called from code executing in 32-bit computation mode.

The behavior of some supervisor-level instructions, such as slbie and slbmte, is affected by the processor computation mode.

Code that manipulates the MSR register must be altered so that the new mtmsrd instruction is used. Supervisor-mode code should use the rfid instruction instead of the rfi instruction.


Going further

This excerpt covers the major issues of porting 32-bit code to run in the 64-bit computation mode. As you can see, the potential pitfalls are fairly well isolated. Portable C code will probably recompile with no changes at all.

The first stop if you need more information than this is the original article in the IBM Technology Group Library (see Resources). If the information you need isn't in this article, there are many other articles related to the PowerPC 970FX; or you might also want to consider posting to the Power Architecture technology forum.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Multicore acceleration
ArticleID=32319
ArticleTitle=From the stacks: Making the transition to 64 bits
publish-date=10192004