Topic
  • 7 replies
  • Latest Post - ‏2011-07-27T08:49:48Z by PeterTh
PeterTh
PeterTh
4 Posts

Pinned topic Kernel execution fails on Power7, no errors reported

‏2011-07-21T15:51:48Z |
I have a rather straightforward kernel and associated host program (used to measure memory latency) which executes fine using the IBM platform on Cell PPEs and SPEs, and on AMD, nVidia and Intel platforms. On Power7 no errors are reported for any host function call, but the kernel does not appear to run (the result is 0 and the kernel execution takes 0 ns). I checked the generated binary (dumped from OpenCL) and it seems to be correct.
Other OpenCL programs work on our Power7 machine so it doesn't appear to be a problem with the setup.

Is there any known problem with the Power7 implementation that could cause this? If not, I'm happy to provide more details or the complete program.
Updated on 2011-07-27T08:49:48Z at 2011-07-27T08:49:48Z by PeterTh
  • SystemAdmin
    SystemAdmin
    131 Posts

    Re: Kernel execution fails on Power7, no errors reported

    ‏2011-07-21T16:02:44Z  
    Try running with our 'debug' runtime library that might show if there are any errors that didn't get caught --

    export LD_LIBRARY_PATH=/usr/lib/CL/debug:/usr/lib64/CL/debug

    If that doesn't show anything, then your best bet is to post your code here and we'll take a look at it.

    .bri.
  • SystemAdmin
    SystemAdmin
    131 Posts

    Re: Kernel execution fails on Power7, no errors reported

    ‏2011-07-21T16:05:32Z  
    PeterTh,

    Have you tried running your program with the DEBUG library? Perhaps an error is beng issued that is not being seen. If the debug library doesn't help, then the complete program will be helpful in debugging the problem.

    Dan B.
  • PeterTh
    PeterTh
    4 Posts

    Re: Kernel execution fails on Power7, no errors reported

    ‏2011-07-25T09:25:07Z  
    I tried using the debug runtime, and the only difference is that it takes a bit longer to run. The code is part of a larger suite of microbenchmarks, which could be a bit hard to compile/run in its current state, so I isolated the parts needed and tar'd it here:
    http://www.dps.uibk.ac.at/~petert/web/ocl/mem_latency.tar

    It should compile with
    
    g++ main.cpp ../cllib/clcommon.cpp ../cllib/clinfo.cpp -I../cllib -DUNIX -lOpenCL -o mem_latency
    


    The code in the cllib part in particular is quite messy, but it's not really relevant. The functionality in the main.cpp is very straightforward, and as I said it works on every other platform.

    I'm looking forward to finding out what is going on here.
  • SystemAdmin
    SystemAdmin
    131 Posts

    Re: Kernel execution fails on Power7, no errors reported

    ‏2011-07-25T11:36:35Z  
    • PeterTh
    • ‏2011-07-25T09:25:07Z
    I tried using the debug runtime, and the only difference is that it takes a bit longer to run. The code is part of a larger suite of microbenchmarks, which could be a bit hard to compile/run in its current state, so I isolated the parts needed and tar'd it here:
    http://www.dps.uibk.ac.at/~petert/web/ocl/mem_latency.tar

    It should compile with
    <pre class="jive-pre"> g++ main.cpp ../cllib/clcommon.cpp ../cllib/clinfo.cpp -I../cllib -DUNIX -lOpenCL -o mem_latency </pre>

    The code in the cllib part in particular is quite messy, but it's not really relevant. The functionality in the main.cpp is very straightforward, and as I said it works on every other platform.

    I'm looking forward to finding out what is going on here.
    cllib/clcommon.cpp is corrupt - looks like it's a tar file as well? somethings not right there.

    $ file cllib/clcommon.cpp
    cllib/clcommon.cpp: POSIX tar archive (GNU)
    $ tar -tvf cllib/clcommon.cpp
    -rwxr--r-- petert/dps 1535 2011-07-11 05:40 cllib/clcommon.h
    -rwxr--r-- petert/dps 16657 2011-07-21 04:45 cllib/clinfo.cpp
    -rwxr--r-- petert/dps 3430 2011-07-25 04:03 cllib/clinfo.h
    -rwxr--r-- petert/dps 3380 2011-07-21 03:50 mem_latency/constant_latency.cl
    -rwxr--r-- petert/dps 3243 2011-07-21 03:50 mem_latency/global_latency.cl
    -rwxr--r-- petert/dps 4604 2011-07-21 03:51 mem_latency/local_latency.cl
    -rwxr--r-- petert/dps 15647 2011-07-25 03:53 mem_latency/main.cpp
    -rwxr-xr-x petert/dps 55077 2011-07-25 04:10 mem_latency/mem_latency
    thx.bri.
  • PeterTh
    PeterTh
    4 Posts

    Re: Kernel execution fails on Power7, no errors reported

    ‏2011-07-25T12:05:32Z  
    Sorry about that, I have no idea how it happened.

    I regenerated the file at http://www.dps.uibk.ac.at/~petert/web/ocl/mem_latency.tar and it should work now. (I actually downloaded, extracted and compiled it to test)
  • SystemAdmin
    SystemAdmin
    131 Posts

    Re: Kernel execution fails on Power7, no errors reported

    ‏2011-07-25T15:38:50Z  
    • PeterTh
    • ‏2011-07-25T12:05:32Z
    Sorry about that, I have no idea how it happened.

    I regenerated the file at http://www.dps.uibk.ac.at/~petert/web/ocl/mem_latency.tar and it should work now. (I actually downloaded, extracted and compiled it to test)
    you have a mismatch in data sizes with iterations. you pass it into run_benchmark as a long:

    
    unsigned 
    
    long size, _MEM_TYPE mt, _DATA_LAYOUT layout, unsigned 
    
    long iterations,      
    // benchmark settings
    

    but then point to it for the kernel as a uint:

    
    err |= clSetKernelArg(kernel, 2, sizeof(cl_uint), &iterations);
    


    so the kernel just sees the top half of the data - 0.

    s/long/int/ in run_benchmark and/or build as a 32bit program - both makes it work.
    also, you declare iterations as unsigned but then use -1 in several places, which could cause you problems.

    .bri.
  • PeterTh
    PeterTh
    4 Posts

    Re: Kernel execution fails on Power7, no errors reported

    ‏2011-07-27T08:49:48Z  
    you have a mismatch in data sizes with iterations. you pass it into run_benchmark as a long:

    <pre class="jive-pre"> unsigned long size, _MEM_TYPE mt, _DATA_LAYOUT layout, unsigned long iterations, // benchmark settings </pre>
    but then point to it for the kernel as a uint:

    <pre class="jive-pre"> err |= clSetKernelArg(kernel, 2, sizeof(cl_uint), &iterations); </pre>

    so the kernel just sees the top half of the data - 0.

    s/long/int/ in run_benchmark and/or build as a 32bit program - both makes it work.
    also, you declare iterations as unsigned but then use -1 in several places, which could cause you problems.

    .bri.
    Well, that was stupid. I shouldn't directly use C types.

    Thanks a lot for your help.