Topic
  • 12 replies
  • Latest Post - ‏2011-01-27T15:07:49Z by SystemAdmin
___Sh4DoW___
___Sh4DoW___
6 Posts

Pinned topic Opencl segfault on Power7

‏2011-01-11T16:26:43Z |
Hi, we have some problem with Power7 and OpenCL 0.2.

We are using Power7 PS701 with red hat 5.5. When we started linux for the first time, the machine was in Power6 mode and after installing OpenCL 0.2 we runned the OpenCL IBM Test without any problem.
Now we want to use the machine in Power 7 mode, so we have installed the latest toolchain 4.0 and compiled the latest kernel with the toolchain. Now when we run the OpenCL example by IBM (for example: perlin_nois)
we receive a segfault during the execution of the program.

after using: export LD_LIBRARY_PATH=/usr/lib/CL/debug
this is the output:

ppc$ make
gcc -Wall -I. -I../../clu -I../../common -g -O3 -m32 -c -o perlin.o ../src/perlin.c
gcc -Wall -I. -I../../clu -I../../common -g -O3 -m32 -c -o clock.o ../../common/clock.c
gcc -Wall -I. -I../../clu -I../../common -g -O3 -m32 -c -o perlin_host.o ../src/perlin_host.c
gcc -Wall -I. -I../../clu -I../../common -g -O3 -m32 -c -o clu.o ../../clu/clu.c
gcc -lOpenCL -m32 -lm -lstdc++ perlin.o clock.o perlin_host.o clu.o -o perlin

ppc$ ./perlin
04211:WARN Cannot parse cpumap. Assuming one node
04211:ERROR numa_node_to_cpus failed for node 0: 0(Success)
Compiling and Creating kernel...
Local Work Group Size = NULL
Global Work Size 256 x 1024
Compute Device Data
Segmentation fault

The same program works without problem if we go back in power6 mode with the old kernel.
We have also tried to compile a kernel build of 2.6.32 with the toolchain 2.1 and we received the same segfault.
(http://www.ibm.com/developerworks/wikis/display/LinuxP/RHEL+5.5+Checklist+for+POWER7#RHEL5.5ChecklistforPOWER7-Step4%3AConfirmyouarereallyinPOWER7mode)

Does anyone know anything more about this issue??
Updated on 2011-01-27T15:07:49Z at 2011-01-27T15:07:49Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    131 Posts

    Re: Opencl segfault on Power7

    ‏2011-01-11T19:11:33Z  
    The V0.2 OpenCL release does not support Power 7, but it will be supported on RHEL 6.0 in an upcoming release.

    Given the untested environment, there was a Linux kernel issue introduced in 2.6.32.16 that manifests itself in a cpumap error similar to what you are seeing. I believe upgrading your kernel to 2.6.32.19 or higher should address the issue.
  • ___Sh4DoW___
    ___Sh4DoW___
    6 Posts

    Re: Opencl segfault on Power7

    ‏2011-01-11T23:29:46Z  
    The V0.2 OpenCL release does not support Power 7, but it will be supported on RHEL 6.0 in an upcoming release.

    Given the untested environment, there was a Linux kernel issue introduced in 2.6.32.16 that manifests itself in a cpumap error similar to what you are seeing. I believe upgrading your kernel to 2.6.32.19 or higher should address the issue.
    But in the OpenCL Development Kit for Linux on Power 0.2:
    1.1 TestedConfigurations
    The OpenCL Development Kit for Linux on Power has been tested on the IBM BladeCenter QS22 and JS22/JS23/JS43 running Red Hat® Enterprise Linux 5.4 and 5.5 as well as the IBM Power 755 server running Red Hat Enterprise Linux 5.5.

    It seems that should work.

    Where did you read that V0.2 does not support Power7??
  • SystemAdmin
    SystemAdmin
    131 Posts

    Re: Opencl segfault on Power7

    ‏2011-01-11T23:39:42Z  
    But in the OpenCL Development Kit for Linux on Power 0.2:
    1.1 TestedConfigurations
    The OpenCL Development Kit for Linux on Power has been tested on the IBM BladeCenter QS22 and JS22/JS23/JS43 running Red Hat® Enterprise Linux 5.4 and 5.5 as well as the IBM Power 755 server running Red Hat Enterprise Linux 5.5.

    It seems that should work.

    Where did you read that V0.2 does not support Power7??
    The Power7 device was a tested configuration on RHEL 5.5, but it is enabled as a VMX/Altivec device and doesn't leverage the VSX extension. As far as OpenCL is concerned, the Power7 looks like a Power6 processor. Without full Power7 support, the full performance of the processor will not be realized.

    We believe you are experiencing the NUMA problem gbello referred to in which the cpumap is invalid causing OpenCL to make poor decisions when binding kernels to compute units.
  • ___Sh4DoW___
    ___Sh4DoW___
    6 Posts

    Re: Opencl segfault on Power7

    ‏2011-01-12T09:36:39Z  
    The Power7 device was a tested configuration on RHEL 5.5, but it is enabled as a VMX/Altivec device and doesn't leverage the VSX extension. As far as OpenCL is concerned, the Power7 looks like a Power6 processor. Without full Power7 support, the full performance of the processor will not be realized.

    We believe you are experiencing the NUMA problem gbello referred to in which the cpumap is invalid causing OpenCL to make poor decisions when binding kernels to compute units.
    I don't understand if you are telling me that the current version of IBM OpenCL 0.2 doesn't support the full power of the Power 7 (see the Power7 like a Power6), or the problem is only RHEL 5.5 and with the new version 6 there will be no problem.
    Or I have to wait for a new version of IBM OpenCL that will have full Power7 support??

    By the way I'm trying even to use the latest kernel 2.6.37 and I receive the same error.
  • SystemAdmin
    SystemAdmin
    131 Posts

    Re: Opencl segfault on Power7

    ‏2011-01-12T12:47:28Z  
    I don't understand if you are telling me that the current version of IBM OpenCL 0.2 doesn't support the full power of the Power 7 (see the Power7 like a Power6), or the problem is only RHEL 5.5 and with the new version 6 there will be no problem.
    Or I have to wait for a new version of IBM OpenCL that will have full Power7 support??

    By the way I'm trying even to use the latest kernel 2.6.37 and I receive the same error.
    The IBM OpenCL 0.2 with its kernel compiler is not optimized to take advantage of the Power 7 features. These include the Vector Scalar eXtension (VSX) with is additional instructions (scalar and vector) and its enhanced register file. In addition, the kernel compiler does not do instruction scheduling for Power 7. Optimized support of Power 7 is planned to be available in the next release of the IBM OpenCL SDK. This release is not planned to be tested on RHEL 5.5 in that the first optimized and tested Power 7 Red Hat release is RHEL 6. See http://publib.boulder.ibm.com/infocenter/lnxinfo/v3r0m0/index.jsp?topic=/liaam/liaamdistros.htm

    Since you are running with a newer kernel, we'll have to look further into the problem to understand what the issue is. Any chance we can upgrade to RHEL 6?

    Dan B.
  • ___Sh4DoW___
    ___Sh4DoW___
    6 Posts

    Re: Opencl segfault on Power7

    ‏2011-01-13T09:10:33Z  
    The IBM OpenCL 0.2 with its kernel compiler is not optimized to take advantage of the Power 7 features. These include the Vector Scalar eXtension (VSX) with is additional instructions (scalar and vector) and its enhanced register file. In addition, the kernel compiler does not do instruction scheduling for Power 7. Optimized support of Power 7 is planned to be available in the next release of the IBM OpenCL SDK. This release is not planned to be tested on RHEL 5.5 in that the first optimized and tested Power 7 Red Hat release is RHEL 6. See http://publib.boulder.ibm.com/infocenter/lnxinfo/v3r0m0/index.jsp?topic=/liaam/liaamdistros.htm

    Since you are running with a newer kernel, we'll have to look further into the problem to understand what the issue is. Any chance we can upgrade to RHEL 6?

    Dan B.
    Thank you very much for all the useful informations.
    Have you any idea about the date of release of the new version??

    I will wait the new release to test the real power of Power7 with OpenCL and RHEL 6.

    Ivan.
  • SystemAdmin
    SystemAdmin
    131 Posts

    Re: Opencl segfault on Power7

    ‏2011-01-13T16:13:15Z  
    Thank you very much for all the useful informations.
    Have you any idea about the date of release of the new version??

    I will wait the new release to test the real power of Power7 with OpenCL and RHEL 6.

    Ivan.
    Ivan,
    Our targeted release date for next next version is the end of the quarter.
  • SystemAdmin
    SystemAdmin
    131 Posts

    Re: Opencl segfault on Power7

    ‏2011-01-13T17:00:50Z  
    I don't understand if you are telling me that the current version of IBM OpenCL 0.2 doesn't support the full power of the Power 7 (see the Power7 like a Power6), or the problem is only RHEL 5.5 and with the new version 6 there will be no problem.
    Or I have to wait for a new version of IBM OpenCL that will have full Power7 support??

    By the way I'm trying even to use the latest kernel 2.6.37 and I receive the same error.
    Interesting that you are seeing a cpumap issue running a later kernel. The below code will indicate whether you are seeing the cpumap issue mentioned above. It may or may not identify other issues.

    The program basically checks that the kernel is correctly sizing for the system cpumap.

    #define _GNU_SOURCE
    #include <sched.h>
    #include <unistd.h>
    #include <sys/syscall.h>
    #include <stdio.h>
    #include <errno.h>
    #include <string.h>

    int main(int argc __attribute__((unused)), char **argv __attribute__((unused)))
    {
    cpu_set_t numaset;
    size_t len = syscall(__NR_sched_getaffinity, 0, sizeof(numaset), &numaset);
    printf("kernel sched_getaffinity syscall length returned is %d (in bytes)\n", len);

    size_t bytes = -1;
    FILE *cpumap;
    cpumap = fopen("/sys/devices/system/node/node0/cpumap","r");
    if (cpumap != NULL) {
    char buffer1024;
    size_t size;
    size = fread(buffer,1,1024,cpumap);
    bytes = (size/9)*4;
    printf("size of file /sys/devices/system/node/node0/cpumap entries is %d (in bytes)\n", bytes);
    fclose(cpumap);
    } else {
    printf("file /sys/devices/system/node/node0/cpumap failed to open with errno %d (%s)\n", errno, strerror(errno));
    }

    if (len >= bytes) {
    printf("PASS test\n");
    } else {
    printf("FAIL test\n");
    }

    return 0;
    }
  • ___Sh4DoW___
    ___Sh4DoW___
    6 Posts

    Re: Opencl segfault on Power7

    ‏2011-01-14T14:38:58Z  
    Interesting that you are seeing a cpumap issue running a later kernel. The below code will indicate whether you are seeing the cpumap issue mentioned above. It may or may not identify other issues.

    The program basically checks that the kernel is correctly sizing for the system cpumap.

    #define _GNU_SOURCE
    #include <sched.h>
    #include <unistd.h>
    #include <sys/syscall.h>
    #include <stdio.h>
    #include <errno.h>
    #include <string.h>

    int main(int argc __attribute__((unused)), char **argv __attribute__((unused)))
    {
    cpu_set_t numaset;
    size_t len = syscall(__NR_sched_getaffinity, 0, sizeof(numaset), &numaset);
    printf("kernel sched_getaffinity syscall length returned is %d (in bytes)\n", len);

    size_t bytes = -1;
    FILE *cpumap;
    cpumap = fopen("/sys/devices/system/node/node0/cpumap","r");
    if (cpumap != NULL) {
    char buffer1024;
    size_t size;
    size = fread(buffer,1,1024,cpumap);
    bytes = (size/9)*4;
    printf("size of file /sys/devices/system/node/node0/cpumap entries is %d (in bytes)\n", bytes);
    fclose(cpumap);
    } else {
    printf("file /sys/devices/system/node/node0/cpumap failed to open with errno %d (%s)\n", errno, strerror(errno));
    }

    if (len >= bytes) {
    printf("PASS test\n");
    } else {
    printf("FAIL test\n");
    }

    return 0;
    }
    I'm running the 2.6.37 kernel version and it works:

    kernel sched_getaffinity syscall length returned is 16 (in bytes)
    size of file /sys/devices/system/node/node0/cpumap entries is 16 (in bytes)
    PASS test
  • SystemAdmin
    SystemAdmin
    131 Posts

    Re: Opencl segfault on Power7

    ‏2011-01-14T21:25:33Z  
    I'm running the 2.6.37 kernel version and it works:

    kernel sched_getaffinity syscall length returned is 16 (in bytes)
    size of file /sys/devices/system/node/node0/cpumap entries is 16 (in bytes)
    PASS test
    It sounds like you have encountered a different issue. Perhaps the level of kernel you are trying it incompatible with the supporting libraries like libNUMA. What does "numactl --hardware" report?
  • ___Sh4DoW___
    ___Sh4DoW___
    6 Posts

    Re: Opencl segfault on Power7

    ‏2011-01-16T17:21:20Z  
    It sounds like you have encountered a different issue. Perhaps the level of kernel you are trying it incompatible with the supporting libraries like libNUMA. What does "numactl --hardware" report?
    available: 1 nodes (0)
    node 0 size: 31616 MB
    node 0 free: 28696 MB
    node distances:
    node 0
    0: 10

    Ivan
  • SystemAdmin
    SystemAdmin
    131 Posts

    Re: Opencl segfault on Power7

    ‏2011-01-27T15:07:49Z  
    available: 1 nodes (0)
    node 0 size: 31616 MB
    node 0 free: 28696 MB
    node distances:
    node 0
    0: 10

    Ivan
    Hi Ivan,

    It is difficult to say for sure why you are seeing the numa messages as we don't have a configuration similar to yours to reproduce the issue. Perhaps these message are not directly related to the segfault but instead are just a side-effect of the NUMA calls on a non-NUMA system.

    Is perlin the only sample that is failing for you?

    Is it possible for you to capture the stack trace at the time of the segfault? This can be done by running perlin under GDB.

    Have you tried running on RHEL 6.0?