Topic
  • No replies
SystemAdmin
SystemAdmin
196 Posts

Pinned topic Optimization of __lwarx

‏2010-05-11T09:45:03Z |

I have the following function:
#include <builtins.h>

int variable = 0;

void set() {
while (true) {
while (__lwarx(&variable) == 1);

if (__stwcx(&variable, 1)) {
return;
}
}
}
Compiling it with "-qthreaded -O3" creates an endless loop as can easily be seen in the assembler code.
(dbx) listi set
0x100000800 (set()) 3862ff58 addi r3,-168(r2)
0x100000804 (set()+0x4) 7c001828 lwarx r0,r0,r3
0x100000808 (set()+0x8) 2c000001 cmpi cr0,0x0,r0,0x1
0x10000080c (set()+0xc) 41820024 beq 0x100000830 (set()+0x30)
0x100000810 (set()+0x10) 38000001 li r0,0x1
0x100000814 (set()+0x14) 7c00192d stwcx. r0,r0,r3
0x100000818 (set()+0x18) 4082ffec bne 0x100000804 (set()+0x4)
0x10000081c (set()+0x1c) 4e800020 blr
0x100000820 (set()+0x20) 60000000 ori r0,r0,0x0
0x100000824 (set()+0x24) 60000000 ori r0,r0,0x0
0x100000828 (set()+0x28) 60000000 ori r0,r0,0x0
0x10000082c (set()+0x2c) 60210000 ori r1,r1,0x0
0x100000830 (set()+0x30) 41820000 beq 0x100000830 (set()+0x30)
0x100000834 (set()+0x34) 38000001 li r0,0x1
0x100000838 (set()+0x38) 7c00192d stwcx. r0,r0,r3
0x10000083c (set()+0x3c) 4d820020 beqlr
0x100000840 (set()+0x40) 4bffffc4 b 0x100000804 (set()+0x4)
Basically the compiler translates this to:
if (__lwarx(&variable) == 1) while (true);
I did not expect that the call to __lwarx would be optimized out. Especially because such a use (spinning for a variable) is one of its use cases. The documentation is quite vague:
... This has the same effect as inserting __fence built-in functions before and after the __ldarx built-in function and can inhibit compiler optimization of surrounding code (see __alignx for a description of the __fence built-in function). ...
My questions: Should this be considered a compiler bug?
Updated on 2010-05-11T16:45:20Z at 2010-05-11T16:45:20Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    196 Posts

    Re: Optimization of __lwarx

    ‏2010-05-11T16:45:20Z  
    The problem is that the C and C++ languages as they are standarized today do not provide any mechanisms to perform atomic operations. So you're limited to using compiler extensions such as these builtins, for which there is no standard right or wrong behavior; you have to rely on the behavior currently implemented by the compiler.

    In this specific situation, what is happening is that the lwarx builtin is expecting a pointer to volatile, while you're invoking it with a pointer to a non-volatile. This allows the compiler to move it out of the loop, as it can assume that the load will not be changed in the loop. The way to fix this on your source is to mark "variable" as volatile. If what you have is a pointer, cast it to (volatile int *) before calling lwarx.

    Long term, the C++0X standard will provide mechanisms to do thread communication in a portable way without relying on volatile variables or compiler builtins.
    Updated on 2010-05-11T16:45:20Z at 2010-05-11T16:45:20Z by SystemAdmin