Contents


Using inline assembly with IBM XL C/C++ compiler for Linux on z Systems, Part 3

Basic features

Comments

Content series:

This content is part # of # in the series: Using inline assembly with IBM XL C/C++ compiler for Linux on z Systems, Part 3

Stay tuned for additional content in this series.

This content is part of the series:Using inline assembly with IBM XL C/C++ compiler for Linux on z Systems, Part 3

Stay tuned for additional content in this series.

IBM XL C/C++ compiler for Linux on z Systems Version 1.1 released in 2015 enables support for incorporating user's assembler instructions directly into C/C++ programs (inline assembly). This provides advanced users with greater flexibility to access instructions at the chip level. With inline assembly, software engineers are able to handcraft assembler codes for the most performance-sensitive parts of C/C++ programs. This can further accelerate the execution of the applications to the full extent of the programmers' ingenuity.

The objective of this article is to introduce the basics about inline assembly feature supported by IBM XL compiler for Linux on z Systems. Advanced features will be discussed in the article, Advanced features of inline assembly for Linux on z Systems. The scope of this article is within the assembler instructions involving general registers. Vector registers and floating-point registers will be deliberated as a separate issue. The target audiences are advanced software engineers interested in exploring beyond the extent of the optimizations provided by the Linux on z Systems compiler to fine tune the most performance-sensitive code section of high-performing applications.

More in this series.

Assembler instructions and inline assembly statements

Inline assembly statements are platform dependent

Each inline assembly statement encapsulates zero or more assembly instructions. Assembler instructions, being instructions at hardware level, are proprietary to the architecture. Instructions on different platforms might be completely different even when the operation they carry out is similar in nature. For instance, an arithmetic addition instruction acting on the contents of two registers on IBM Power Architecture® accepts three operands R1, R2, and R3.

caxo. R3, R2, R1

This operation adds the values stored in register R1 and register R2, and then saves the result to register R3. It also updates the contents of the condition register and the fixed-point exception register about the status of the operation.

On IBM z Systems, however, the corresponding instruction only accepts two operands as follows:

AR R2, R1

This instruction adds the value stored in register R1 with that in register R2 and then stores the result back to register R2. The instruction updates the 4-bit value condition code in the program status word (PSW) about the status of the operation and the overflow. There is no condition register on z Systems processors.

Note that the ARK and AGRK instructions on z Systems accept three operands when distinct operand facility is installed.

To ensure portability of the applications, code sections with inline assembly statements must be guarded with proper macros specific for z Systems. The list of all macros defined by IBM XL compiler can be acquired by compiling any piece of C codes with the –qshowmacros –P option. This emits all macro definitions to a preprocessed output.

Formation of an inline assembly statement

Inline assembly are statements embedded in C/C++ programs by the users to instruct the compiler to inline the specified assembler instructions into the generated codes. The major components of an inline assembly statement are as follows:

  • A keyword: asm, __asm, or __asm__ to mark the start of the assembly statement
  • An optional keyword volatile to inform the compiler of the volatileness of the assembly block
  • One or more assembler instructions, also called code-format-strings, to be inlined into the codes
  • An output operand list to display the output of the assembler instructions
  • An input operand list containing the input to the assembler instructions
  • An optional clobber list to inform the compiler about the registers, the condition code, and the memory affected by the assembler instructions

Figure 1 generalizes the major components of an inline assembly statement. You can find detailed information about the most major components in the later sections of this article.

Figure 1. Generalization of an inline assembly statement

Operands for assembler instructions of inline assembly statements

An inline assembly statement contains zero or more assembler instructions. Operands for the instructions, if any, must be either literals or parameters. When an operand is a parameter, it must come from the combined operand list. The parameter is identified with a % sign and its positional number, starting from zero, on the list. Figure 2 is a visualization of how operands are identified based on the output and input operand lists.

Figure 2. The combined operand list

Inline assembly demands that all connections to the rest of the program be established through the output and input operand lists. Direct reference to an external symbol without going through the operand lists is not supported.

Compiler adds instructions to support user’s assembler instructions

Instead of users directly manipulating the assembler instructions in the codes, the inline assembly feature enables users to apply the required assembler instructions on the symbols in the operand list without the burden of arranging supporting operations. The compiler prepares the operations needed to facilitate the assembler instructions. For example, if the intended operation is to add two variables, then the user can call an add instruction (for instance, AR) with the two variables served as the two operands. Other tasks, such as selecting registers to perform the job, loading the values to the registers, storing back the values to memory locations after the completion of operations are handled by the compiler.

In the following listings, there are two versions of the same program: example01.c does not have an inline assembly statement and example02.c does have an inline assembly statement.

Listing 1. example01.c, a C program without an inline assembly statement
#include <stdio.h>
int main () {
   int array[] = { 1, 2, 3 };
   printf ( "array = [ %d, %d, %d ]\n", array[0], array[1], array[2]);
   printf ( "array = [ %d, %d, %d ]\n", array[0], array[1], array[2]);
   return 0;
}

Lines 4 and 5 of the example01.c program are identical. They are used in this example as markers to demonstrate the preparation done by the compiler when it handles inline assembly statements. The example02.c program is created by adding an inline assembly statement between lines 4 and 5 of example01.c. The statement uses an add instruction (AR) to add the value of array[2] to that of array[1].

Listing 2. exmple02.c, a C program with an inline assembly statement
#include <stdio.h>
int main () {
   int array[] = { 1, 2, 3 };
   printf ( "array = [ %d, %d, %d ]\n", array[0], array[1], array[2]);
   asm ("AR %0, %1 \n" 
       :"+r"(array[1]) 
       :"r"(array[2]) );
   printf ( "array = [ %d, %d, %d ]\n", array[0], array[1], array[2]);
   return 0;
}

AR instruction stores the sum of its operands to the first operand: It adds %0 (array[1]) with %1 (array[2]) and then stores the result back to %0 (array[1]). During runtime, example02.c should print out the array with the second element (array[1]) being 5, as demonstrated in Listing 3.

Listing 3. Running example02.c
xlc –o  example02 ./example02.c
./example02  
array = [ 1, 2, 3]
array = [ 1, 5, 3]         <- The 2nd element of the array becomes 5, which is the sum of 2+3

Juxtaposing the generated assembly codes in two cases will clarify what the compiler does to support the inline assembly statement on lines 5 to 7 of example02.c. The assembly files of example01.c and example02.c can be produced by compiling the program with –S.

Listing 4. Creating the assembly files with –S
xlc –c –S example01.c example02.c

Figure 3 compares the assembly files generated by the compiler for programs example01.c and example02.c. As evident in Figure 3, the difference between example01.s (on the left side of Figure 3) and example02.s (on the right side of Figure 3) reveals that the compiler arranges extra operations before inlining instruction AR %r0, %r1 to the code. It selects general registers (r0 and r1) and loads them with proper values before calling the AR instruction. It also stores the computed value back to the array after running the AR instruction. These supporting instructions are required for the inline assembly statement to succeed.

Figure 3. Supporting operations added by the compiler to support AR

Refer to Table 1 for details about the operations in the highlighted snippet.

Table 1. Operations by the compiler for the inline assembly statement
C codeAssembler codeOperation
printf ( "array = [ %d, %d, %d ]\n", array[0], array[1], array[2]); BRASL %r14,printf Preparation completed, call printf on line 4
(Instructions added by the compiler) L %r1,184(,%r15) Load value of array[2] to register r1
MVC 168(4,%r15),180(%r15) Copy value of array[1] to location r15+168
L %r0,168(,%r15) Load value of array[1], now at location r15+168 to r0
asm ("AR %0, %1 \n" :"+r"(array[1]) :"r"(array[2]) ); #GS00000 Start inlining user's assembler instruction
AR %r0, %r1 Inline user's assembler instruction
#GE00000 End inlining user's assembler instruction
(Instructions added by the compiler) ST %r0,168(,%r15) Store r0 (array[1])back to location r15+168
MVC 180(4,%r15),168(%r15) Copy value at location r15+168 to location r15+180 (array[1] is updated)
printf ( "array = [ %d, %d, %d ]\n", array[0], array[1], array[2]); LA %r2,16(,%r13) Prepare to call printf the second time

Constraints, modifiers, output and input operand lists, and clobber list

Each output operand consists of a pair of constraint and (C/C++ expression). An output operand must be accompanied by a modifier. An input operand also consists of a pair of constraint and (C/C++ expression) but it does not have a modifier. Clobber list informs the compiler whether the assembler instructions affect any entities which are not listed in the input and output operand list. The output, input, and clobber lists might be empty. When there are two or more members on the list, the members are separated by commas.

The constraints

A constraint is a string literal describing the kind of the operand it accompanies. The constraint must match the type of operand that the assembler instruction expects. Some of the most frequently used constraints are described in this article. The list of supported constraints might grow in the future releases of the compiler. Refer to the compiler manual for the most up to date, and exhaustive list of supporting constraints for each product.

a, d, and r constraints for general registers

Constraints a, d, and r are used for assembler instructions which expect a general register as an operand. In a C/C++ program a, r, and d constraints represent integer symbols. For example, in Listing 5, the inline assembly statement of the example03.c program uses the LPGFR instruction to load the positive value of an integer variable to itself, effectively turns any integer variable to its absolute value. Because variable a is an integer and LPGFR instruction accepts general registers as its operands, using any of a, d and r as a constraint with LPGFR is legal.

Listing 5. example03.c - usage of a, d, and r constraints for general registers
#include<stdio.h>
int abs_a(int a){
    asm (" LPGFR %0, %0\n" :"+a"(a) );        //a constraint
    return a;
}
int abs_d(int a){
    asm (" LPGFR %0, %0\n" :"+d"(a) );        //d constraint
    return a;
}
int abs_r(int a){
    asm (" LPGFR %0, %0\n" :"+r"(a) );        //r constraint
    return a;
}

int main() {
    int x = -5;
    printf( "Absolute value of %d is %d (a constraint)\n", x, abs_a(x) );
    printf( "Absolute value of %d is %d (d constraint)\n", x, abs_d(x) );
    printf( "Absolute value of %d is %d (r constraint)\n", x, abs_r(x) );
    x = 12;
    printf( "Absolute value of %d is %d (a constraint)\n", x, abs_a(x) );
    printf( "Absolute value of %d is %d (d constraint)\n", x, abs_d(x) );
    printf( "Absolute value of %d is %d (r constraint)\n", x, abs_r(x) );
}

During execution, program example03.c prints out the absolute values of -5 and 12.

Note: On IBM z/Architecture®, r0 cannot be used in addressing. For that reason, the constraint "a" (abbreviation for address) can be used for all general registers except r0.

I, J, and K constraints for constant values up to 2-byte in length

I, J, and K constraints can be used for assembler instructions which expect an immediate operand. In a C/C++ program I, J, and K constraints represent up to 16-bit integer constants or character constants. For example in Listing 6, the inline assembly statement of program example04.c uses each of the constraints I, J, and K as a 1-byte character when dealing with MVI (Move Immediate), and as a 2-byte integer when working with AHI (Add Haft-word Immediate). This usage of I, J, and K is legal because the MVI instruction expects a 1-byte immediate value and the AHI instruction expects a 2-byte constant integer.

Listing 6. example04.c with I, J and K constraints
#include<stdio.h>
int main() {
    char text[]=”ibm”;
    asm(“MVI %0,%1\n”:”=m”(text[0]):”I”(‘I’));         //I for 1-byte char
    asm(“MVI %0,%1\n”:”=m”(text[1]):”J”(‘B’));         //J for 1-byte char
    asm(“MVI %0,%1\n”:”=m”(text[2]):”K”(‘M’));         //K for 1-byte char
    printf (“Expected IBM , got %s\n”, text);         

    int x = 0;
    asm(“AHI %0,%1\n”:”+r”(x):”I”(0x1FFF));         //I for 2-byte int
    asm(“AHI %0,%1\n”:”+r”(x):”J”(0x1FFF));         //J for 2-byte int
    asm(“AHI %0,%1\n”:”+r”(x):”K”(0x1FFF));         //K for 2-byte int
    printf (“Expected 0x5FFD, got 0x%X\n”, x);
    return 0;
}

When run, program example04.c prints out the expected values.

g, i, and n constraints for constant values up to 4 byte in length

g, i, and n constraints can be used for assembler instructions which expect an immediate operand. In a C/C++ program g, i, and n constraints can be used to represent up to 32-bit integer or character constants. For example, in Listing 7, the inline assembly statement of program example05.c uses each of constraints g, i, and n as a 1-byte character when dealing with MVI, as a 2-byte integer when working with AHI, and as a 4-byte integer when operating with AFI (Add Full-word Immediate). This usage of constraints g, i, and n is valid because MVI instruction expects a 1-byte immediate value, AHI instruction a 2-byte constant integer, and AFI instruction a 4-byte constant integer.

Listing 7. example05.c to demonstrate usage of g, i, and n constraints
#include<stdio.h>
int main() {
    char text[]="xlc";
    asm("MVI %0,%1\n":"=m"(text[0]):"i"('X'));        //i for 1-byte char
    asm("MVI %0,%1\n":"=m"(text[1]):"n"('L'));        //n for 1-byte char
    asm("MVI %0,%1\n":"=m"(text[2]):"g"('C'));        //g for 1-byte char
    printf ("Expected XLC, got %s\n", text);

    int x = 0;
    asm("AHI %0,%1\n":"+r"(x):"i"(0x1FFF));        //i for 2-byte int
    asm("AHI %0,%1\n":"+r"(x):"n"(0x1FFF));        //n for 2-byte int
    asm("AHI %0,%1\n":"+r"(x):"g"(0x1FFF));        //g for 2-byte int
    printf ("Expected 0x5FFD, got 0x%X\n", x);

    x = 0;
    asm("AFI %0,%1\n":"+r"(x):"i"(0x1FFFFFF));        //i for 4-byte int
    asm("AFI %0,%1\n":"+r"(x):"n"(0x1FFFFFF));        //n for 4-byte int
    asm("AFI %0,%1\n":"+r"(x):"g"(0x1FFFFFF));        //g for 4-byte int
    printf ("Expected 0x5FFFFFD, got 0x%X\n", x);
    return 0;
}

When run, program example05.c prints out the expected values.

Q, g, m, and o for memory constraints

Q, g, m, and o constraints are used as memory operands for assembler instructions, which expect operands in the form of D(X, B) where D is the displacement, X the index, and B the base register. In a C/C++ program, Q, g, m, and o constraints can be used to represent integer symbols. For example, in Listing 8, the inline assembly statement of program example06.c uses each of constraints Q, g, m, and o as a memory constraint when working with the ST (Store) instruction to update elements of an array.

Listing 8. example06.c to demonstrate usage of Q, g, m and o constraints
#include <stdio.h>
int main () {
   int a[] = { 1, 2, 3, 4 };
   int b[] = { 10, 20, 30, 40 };
   printf ( "a = [ %d, %d, %d, %d ]\n", a[0], a[1], a[2], a[3] );

   asm ("ST %1,%0\n":"=Q"(a[0]):"r"(b[0]));        //Q as memory constraint
   asm ("ST %1,%0\n":"=g"(a[1]):"r"(b[1]));        //g as memory constraint
   asm ("ST %1,%0\n":"=m"(a[2]):"r"(b[2]));        //m as memory constraint
   asm ("ST %1,%0\n":"=o"(a[3]):"r"(b[3]));        //o as memory constraint

   printf ( "a = [ %d, %d, %d, %d ]\n", a[0], a[1], a[2], a[3] );
   return 0;
}

0, 1, 2, … , 9 as matching constraints

0, 1, …, 9 are matching constraints used to advise the compiler to allocate the same register for both the input operand and the numbered output operand. As such, the matching constraints can only be used with the input operands. This is essential when one of the operations uses the result of a previous one as its input. Without a matching constraint, the compiler does not know that the same register must be used for both the output and input operands. An example usage of matching constraints is explained in detail in the article, Advanced features of inline assembly for Linux on z Systems.

The modifiers

The modifiers are added to inform the compiler of more information about the corresponding operand. The list of supported modifiers might grow in the future releases of the compiler. Refer to the compiler manual for an exhaustive list of modifiers for each product. For the current compiler for Linux on z Systems, the following modifiers are supported.

  • Modifier "=" indicates that the operand is write-only for this instruction. The previous value is discarded and replaced by output data.
  • Modifier "+" indicates that the operand is both read and written by the instruction.
  • Modifier "&" indicates that operand can be modified before the instruction is finished using the input operands.
  • Modifier "%" declares the instruction to be commutative for this operand and the following one. This means that the order of the operand having the "%" modifier and the next can be swapped safely when generating the instruction. Accordingly, the "%" modifier cannot be specified on the last operand. The purpose of the "%" modifier is to provide the compiler with an opportunity to optimize the codes. If the compiler can prove that the swapping order of the two operands yields some performance gain, it can proceed with these commutative operands.

The "=" (write-only) and "+" (read-and-write) modifiers are important. Incorrect specification of these two modifiers might lead to unexpected results. To demonstrate the impact of those modifiers, the C program example08a.c in Listing 9 is written with the "=" modifier where "+" should be used.

Listing 9. example08a.c with incorrect output modifier
#include <stdio.h>
int main () {
   int a = 10, b = 200;
   printf ("INITIAL: a = %d, b = %d\n", a, b );        // Modifier “=” is used for a
   asm ("AR %0, %1\n" 
            :"=r"(a) 
            :"r"(b));
   printf ("RESULT : a = %d, b = %d\n", a, b );
   return 0;
}

The intent of the program is to assign the sum of variables a and b to a. It uses the AR instruction for that purpose. The AR instruction adds the value in both operands together and then stores the sum back to the first operand. As such, it carries out both read and write operations on the first operand. By specifying the "=" modifier, the user notifies the compiler that only the write operation is required for the first operand (variable a). This misleads the compiler, and the execution of the example08a.c code will be incorrect.

Listing 10. Result of using incorrect output modifier
xlc -o example08a ./example08a.c  
./example08a
INITIAL: a = 10, b = 200
RESULT : a = 200, b = 200        <- a is supposed to be 210

Using the proper modifier "+" will produce the correct result. Figure 4 exhibits the difference in the codes generated by the compiler for both cases with the incorrect modifier usage on the left side and the proper modifier usage on the right side.

Figure 4. Difference in the generated codes when using different modifiers

The difference in the assembly codes generated reveals that using write-only modifier "=" indeed misleads the compiler. As seen on the left panel, the compiler omits a load to register r0 [ L %r0,172(,%r15) ] before calling the AR instruction [ AR %r0, %r1 ] on it. As a result, r0 will participate in the AR instruction not with the value of variable a, but with whatever happens to be in it. This explains why the "=" modifier produces incorrect results. On the other hand, using the correct modifier "+", as shown in the right panel, will avoid this problem. The compiler loads the proper value to register r0 [ L %r0,172(,%r15) ] before using it.

The only way to avoid using improper modifiers is to refer to the definitions of the corresponding assembler instructions before using them. For more details, you can refer to z/Architecture Principles of Operation (IBM Publication No. SA22-7832-10). You can also find a link to the official version in the References section.

The output operand list

The output operand list consists of zero or more output operands. When there is no output operand, the list is empty. In this case, the output operand list is reduced to only the corresponding ":" in the inline assembly statement. Output operands on the list are separated by commas. Each operand consists of a mandatory modifier which is either "+" or "=", a constraint, and a C/C++ expression enclosed in parenthesis. The value of the C/C++ expression is used as the output operand for the assembler instructions in the inline assembly statement. Output operands must be modifiable l-values.

The input operand list

The input operand list consists of zero or more input operands. When there is no input operand, the list is empty. In this case, the input operand list is reduced to only the corresponding ":" in the inline assembly statement. Input operands on the list are separated by commas. Each operand consists of a constraint and a C/C++ expression enclosed in parenthesis. The value of the C/C++ expression is used as the input operand for the assembler instructions in the inline assembly statement.

The clobber list

The clobber list is a comma-separated list of (a) memory, (b) the condition code, and (c) register names. All values on the clobber list must be in double quotations and separated by commas. The purpose of the clobber list is to inform the compiler about the updates that the assembler instructions might have on entities not listed in the output operand list or the input operand list.

(a) Specifying memory on the clobber list

If the assembler instructions belonging to an inline assembly statement can read from or write to entities other than those listed in the input and output operand list, memory must be added to the clobber list of the statement. An example is the assembler instruction accessing the memory pointed to by an input operand. This is to ensure that the compiler will not move the assembler instruction across other memory references and the data that is used after the completion of the assembly statement is valid. Adding memory to clobber list, however, results in many unnecessary reloads and reduces the benefits of hardware prefetching. For that reason, memory should be added to the clobber list with caution to prevent avoidable performance penalty.

(b) Specifying condition code on the clobber list

Many assembler instructions such as Compare, Add, and Subtract update the condition code. The user should notify the compiler about this fact by adding "cc" to the clobber list. For the complete list of assembler instructions altering the condition code, refer to z/Architecture Principles of Operation.

(c) Register names on the clobber list

If the assembler instruction uses or updates registers that are not listed in the output and input operand lists, the user must list all impacted registers in the clobber list. Based on the information from the clobber list, the compiler will facilitate the operations of the inline assembly statement. If a register is used by the assembly statement without being listed on the clobber list, its usage will not be known by the compiler. The compiler might use the register for other purposes. As a result, the computed value might be incorrect.

Note that the compiler reserves some registers for its own operations. Clobbering the reserved registers is not allowed. Table 2 lists the designated usage of some registers when using IBM XL compilers.

Table 2. Designated usage of registers by IBM XL compilers on z Systems
Register nameSpecial usageCan be used by user?
r2, r3 Parameters, return values Yes, with care
r4, r5, r6 Parameters Yes, with care
r13Base register for literal poolNo
r14 Return address Yes, with care
r15Stack pointersNo

Example usage of the clobber list is discussed in detail in the article, Advanced features of inline assembly for Linux on z Systems.

Conclusion

Inline assembly provides an avenue for users to incorporate assembler instructions directly into C/C++ programs. This feature allows advanced users to further improve the performance of the applications by handcrafting the assembler instructions for particular sections of the codes. IBM XL compilers perform highly sophisticated tasks to optimize the codes generated at each level of optimization. For that reason, accelerating performance with inline ASM requires the intrinsic knowledge of the user about the execution of the target codes. Careful analysis about the effects on the performance of the embedded assembler instructions, together with thorough planning and testing are the prerequisites for achieving performance gain.

This article only discusses the basics of inline assembly for Linux on z Systems. Advanced features are explained in the Advanced features of inline assembly for Linux on z Systems article.

Acknowledgements

I would like to thank Visda Vokhshoori and Nha-Vy Tran for their advice during the composition of this article.

Resources

References


Downloadable resources


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=1012997
ArticleTitle=Using inline assembly with IBM XL C/C++ compiler for Linux on z Systems, Part 3: Basic features
publish-date=08122015