Debugging optimized code

Using architecture sections and run-time architecture checks

Comments

Creating an application that is backwards compatible on earlier hardware models and yet as efficient as possible on the latest hardware models can be difficult problem to solve. In this article, we will explore the various solutions to this problem, including the use of Architecture Sections and Run-Time Architecture Checks introduced in XL C/C++ V2R1M1.

Using the z/OS XL C/C++ compiler, applications can support earlier System z architectures by compiling source files with the -qARCH option, such as –qARCH=5. However, by compiling with –qARCH option, the compiler is instructed to generate code for architecture level specified and is unable to exploit instructions in newer architectures.

How does one provide a backwards compatible binary and exploit higher level architecture functionality?

Prior to the introduction of Architecture Sections, the solution was to create an architecture-specific source file as follows:

  1. Compile main application code with -qARCH(N)
  2. Compile architecture-tuned source files with -qARCH(X), where X > N.
  3. Link and Execute

Consider the following example:

main.c

#define _POSIX_SOURCE
#include <sys/utsname.h>

#include <stdlib.h>
#include <stdio.h>

#define ARCH9 2217

// Counts bits that have the value of 1 in a byte
unsigned long myBytePopcount(char op)
{
    unsigned long count = 0;
    for (int i = 0; i < 8; i++) {

        if ((op & 1) == 1) {
            count++;
        }
        op >>= 1;
    }
    return count;
}

int main()
{
    struct utsname runOn;
    uname(&runOn);
    int archLevel = atoi(runOn.machine);

    unsigned char input = 55;
    unsigned long output;

    // Check architecture level
    if (archLevel >= ARCH9) {
        // External call to ARCH(9) compiled function
        output = callArch9Builtin(input)&0xFF;
    } else {
        output = myBytePopcount(input);
    }
    printf("%lu\n", output);

    return 0;
}

popcnt.c

#include <builtins.h>

// Compiled with –qARCH=9 and use a z196 Hardware Builtin-In function
unsigned long callArch9Builtin(unsigned long op) {
    return __popcnt(op);
}

Compile and Execution:

 xlc –qARCH=9 –c –o popcnt.o popcnt.c  
 xlc –qARCH=5 –c –o main.o main.c 
xlc –o a.out popcnt.o main.o
./a.out

The above example counts the number of one bits (population count) for a byte. During run-time in main.c, the architecture level is checked using the uname library. If the system is detected to be an ARCH(9) (IBM z196) or higher system, then the call to the external function callArch9Builtin is made. The function callArch9Builtin makes use of the __popcnt built-in function (BIF) to calculate the population count. If an architecture level of less than ARCH(9) is detected, then the user-defined function myBytePopcount is called.

While this solution solves the problem of creating an application that is backwards compatible from ARCH(5) and up and makes use of a hardware built-in function to improve efficiency, it does have a number of drawbacks.

Drawbacks of this approach:

  • In order to support multiple architectures in one binary, architecture-specific code must be in exclusive source files. In the above example, the ARCH(9) definition is in a separate source file, popcnt.c, while the main.c code is compiled with ARCH(5). While it may not be obvious in this example, this approach reduces code readability, increases the number of source files to maintain, can increase code duplication, and there is call overhead due to the function call.
  • The uname library makes this code enabled only in a POSIX enabled environment on z/OS.

Architecture sections

The first attempt made use of two source files, resulting in two compilation units, one compiled with ARCH(9) and the other compiled with ARCH(5).

The XL C/C++ V2R1M1 and z/OS V2R2 XL C/C++ compilers introduce a new feature, Architecture Sections, that enable the user to generate code for multiple architectures in one compile unit.

Architecture sections are blocks of code identified by:

#pragma arch_section(architecture_level) …statement…

The architecture_level specified in an arch_section must be greater than the architecture option provided.

The following example modifies Example 1's main function to add an architecture section:

main.c

#define _POSIX_SOURCE
#include <sys/utsname.h>

#include <builtins.h>

#include <stdlib.h>
#include <stdio.h>

#define ARCH9 2217

int main()
{
    unsigned char input = 55;
    unsigned long output = 0;

    struct utsname runOn;
    uname(&runOn);
    int archLevel = atoi(runOn.machine);

    if (archLevel >= ARCH9)
    #pragma arch_section(9){ 
        output = __popcnt(input)&0xFF;
    }  else { 
        output = myPopCountOnByte(input);
    }

    printf("%lu\n", output);
}

Compile and Execution:

 xlc –qARCH=5 –c –o main.o main.c 
xlc –o a.out main.o
./a.out

The block within the architecture section generates ARCH(9) instructions, but the code outside of the architecture section generates ARCH(5) instructions.

There are few gotchas that you should be aware of when using the #pragma arch_section directive:

  • The #pragma arch_section directive can only be used on the statement level
  • The #pragma arch_section directive cannot precede a function definition
  • Architecture specific types (such as Vector data types (vector unsigned int)) can be declared outside of architecture section, however they cannot be referenced outside of an architecture section if the architecture compiler option or architecture section doesn't support them .
  • IBM Extensions such as Decimal and Vector (if used within #pragma arch_section) require their associated option to be specified at compiler invocation.

There remains one potential improvement and that relates to the dependency on the uname library.

Run-Time architecture check

The XL C/C++ V2R1M1 compiler adds support for a run-time architecture check library. The library provides a set of Built-In Functions (BIFs) that perform a fast and efficient safety check at run-time prior to entering an architecture section.

Using the library begins with a call to:

void __builtin_cpu_init ( void )

This function detects the hardware level and should be called once in your application.

Once the above function is called, two functions are available that can be used to determine the hardware that the application is running on.

  1. int __builtin_cpu_supports(const char* feature)

This built-in function returns a positive integer if the runtime CPU supports the specified feature, or returns 0 otherwise. If __builtin_cpu_init function has not been called, this built-in function returns 0. The supported feature arguments are:

  • "longdisplacemnt"
  • "etf2"
  • "etf3"
  • "dfp"
  • "prefetch"
  • "storeclockfast"
  • "loadstoreoncond"
  • "popcount"
  • "interlocked"
  • "tx"
  • "dfpzoned"
  • "vector128"
  • "5" through "11"

Alternatively, if a specific architecture is needed, the following function can be used:

  1. int __builtin_cpu_is(const char* cpumodel)

This built-in function returns a positive integer if the runtime CPU supports thesspecified cpu model (architecture level) or returns 0 otherwise. The available arguments are one of "5", "6", "7", "8", "9", "10", "11".

We can now apply these functions to our prior solution by replacing the uname library with the Run-Time Architecture checks as shown below:

main.c

#include <builtins.h>

unsigned long myPopCountOnByte(unsigned long op)
{
    unsigned int count = 0;
    for (int i = 0; i < 8; i++) {
    
        if ((op & 1) == 1) {
            count++;
        }
        op >>= 1;
    }
    return count;
}

int main()
{
    unsigned char input = 55;
    unsigned long output;

    __builtin_cpu_init(); 

    if  (__builtin_cpu_supports("popcount") ) 
    #pragma arch_section(9) {
        output = __popcnt(input)&0xFF;
    } 
    else
    {
        output = myPopCountOnByte(input);
    }
    printf("%lu\n", output);

    return 0;
}

Compile and Execution:

xlc –c –qARCH=5 –o main.o main.c
xlc –o a.out main.o
./a.out

Conclusion

I hope that the above examples demonstrate the potential use cases of Architecture Sections and Run-Time Architecture checks.

With architecture sections, you no longer need to maintain code customized for a specific hardware model in separate source files. Additionally, with run-time architecture checks, you have a compiler built-in that is fast and efficient in determining the hardware level.

For more information about the architecture section and run-time architecture checks, please visit the resource section below.

Acknowledgements

The author thanks the following individuals who helped make this article possible: Zibi Sarbinowski, Visda Vokhshoori, and Kobi Vinayagamoorthy.


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Rational
ArticleID=1014456
ArticleTitle=Debugging optimized code
publish-date=09082015