Topic
8 replies Latest Post - ‏2012-09-20T14:29:53Z by SystemAdmin
lswierczewski
lswierczewski
5 Posts
ACCEPTED ANSWER

Pinned topic IBM Cell + OpenCL kernel - problem

‏2012-09-19T16:26:43Z |
I have IBM Blade QS22.

Environment OpenCL detects the Cell device.


DEVICE_NAME = CPU Cell Broadband Engine, altivec supported 32bit DEVICE_VENDOR = IBM DEVICE_VERSION = OpenCL 1.1 BUILD 201103161830 DRIVER_VERSION = 0.3 CL_DEVICE_EXTENSIONS = cl_khr_byte_addressable_store cl_ext_device_fission cl_ext_migrate_memobject cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics DEVICE_MAX_COMPUTE_UNITS = 4 DEVICE_MAX_CLOCK_FREQUENCY = 3200 DEVICE_GLOBAL_MEM_SIZE = 2593128448 CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 32768 CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 128 CL_DEVICE_LOCAL_MEM_SIZE = 524288 CL_DEVICE_MAX_MEM_ALLOC_SIZE = 1296039936 CL_DEVICE_MAX_MEM_ALLOC_SIZE = 1296039936


When I try to execute the program I get: CL_INVALID_KERNEL_ARGS.
Example kernel:


__kernel 

void benchmark_mips_units(__global 

float *q_register, __global unsigned 

long *q) 
{   unsigned 

long 

int idx = 512*get_global_id(0) + get_global_id(1);   unsigned 

long 

int i;   unsigned 

long 

int count;   unsigned 

long 

int j;   

for(j=0; j < 10; j++) 
{ i = 13371194527;   

while(i != 1) 
{ 

if(i%2 == 0) 
{ i /= 2; count++; 
} 

else 
{ i = 3*i + 1; count++; 
}   
} 
}   q_register[idx] = count;   
}


The code works on nVidia and AMD graphics cards. It does not work only on the IBM Cell.

The code is in the repository google: http://code.google.com/p/olib/.

You can download the repository, go to the samples directory and execute:


make benchamrk_ocl


Compiled program works on everything except the IBM Cell.

Where is the problem?
Updated on 2012-09-20T14:29:53Z at 2012-09-20T14:29:53Z by SystemAdmin
  • lswierczewski
    lswierczewski
    5 Posts
    ACCEPTED ANSWER

    Re: IBM Cell + OpenCL kernel - problem

    ‏2012-09-19T16:33:58Z  in response to lswierczewski
    Sorry. In the code you need to change to build: CL_DEVICE_TYPE_CPU (default is GPU).

    I forgot about that.

    I do not know how to edit posts (I do not see the option).
  • SystemAdmin
    SystemAdmin
    10114 Posts
    ACCEPTED ANSWER

    Re: IBM Cell + OpenCL kernel - problem

    ‏2012-09-19T16:46:09Z  in response to lswierczewski
    not sure which kernel, but if it's one that's set like this:

    clSetKernelArg(OpenCLVectorAdd, 0, sizeof(cl_mem), (void*)&GPUVector1);
    clSetKernelArg(OpenCLVectorAdd, 1, sizeof(unsigned long long int), &GPU_q);

    the sizeof() for the 2nd needs to be sizeof(cl_mem) -- its the size of the variable (the cl_mem object) NOT what it points to. probably works on 64bit, but will fail 32bit.
    also, with the IBM implementation, you can use the 'debug' opencl libraries and on some errors you'll see more debug information:

    export LD_LIBRARY_PATH=/usr/lib/CL/debug:/usr/lib64/CL/debug:$LD_LIBRARY_PATH

    .bri.
  • lswierczewski
    lswierczewski
    5 Posts
    ACCEPTED ANSWER

    Re: IBM Cell + OpenCL kernel - problem

    ‏2012-09-19T18:06:04Z  in response to lswierczewski
    It works!

    I have a question. Is OpenCL uses SPEs in Cell?

    For Blade QS22 I have only:

    
    DEVICE_MAX_COMPUTE_UNITS = 4
    


    This is the of two core PPE (+ HT).
    • SystemAdmin
      SystemAdmin
      10114 Posts
      ACCEPTED ANSWER

      Re: IBM Cell + OpenCL kernel - problem

      ‏2012-09-19T18:12:03Z  in response to lswierczewski
      glad that works.

      the device you specified was the CPU device. if you want the SPEs, you need to use CL_DEVICE_TYPE_ACCELERATOR.

      .bri.
      • lswierczewski
        lswierczewski
        5 Posts
        ACCEPTED ANSWER

        Re: IBM Cell + OpenCL kernel - problem

        ‏2012-09-19T20:05:07Z  in response to SystemAdmin
        OpenCL can not use at the same time the SPE and the PPE?
        • SystemAdmin
          SystemAdmin
          10114 Posts
          ACCEPTED ANSWER

          Re: IBM Cell + OpenCL kernel - problem

          ‏2012-09-19T20:30:57Z  in response to lswierczewski
          They are 2 different device_types since they have vastly different capabilities. So no, you can't execute 1 NDRange or task that will run across both SPEs and PPEs at the same time. But you can create 2 devices, and run different tasks on each.

          .bri.
          • lswierczewski
            lswierczewski
            5 Posts
            ACCEPTED ANSWER

            Re: IBM Cell + OpenCL kernel - problem

            ‏2012-09-20T12:40:54Z  in response to SystemAdmin
            I have a small problem.

            The program returns:

            
            Segmentation fault
            


            Problem is in line: (e.g. in file opencl/olib_quantum_ocl.cpp in function dqft_ocl)

            
            clEnqueueReadBuffer(GPUCommandQueue, GPUVector2, CL_TRUE, 0, q * sizeof(complex_float), q_register, 0, NULL, NULL);
            


            q_register is an array of size q * sizeof(complex_float), same GPUVector2.

            A function call is probably a good (?).

            You can download new version code from repository.

            in samples/shor_algorithm.cpp You must change in line 145:

            
            # define USE_DEVICE OLIB_CL_DEVICE_GPU
            


            to:

            
            # define USE_DEVICE OLIB_CL_DEVICE_CPU
            


            and

            
            make shor_algorithm_ocl_float
            


            Run:

            
            ./shor_algorithm_ocl_float 45
            


            The program sometimes returns an error (Segmentation fault). Sometimes a good result.
            • SystemAdmin
              SystemAdmin
              10114 Posts
              ACCEPTED ANSWER

              Re: IBM Cell + OpenCL kernel - problem

              ‏2012-09-20T14:29:53Z  in response to lswierczewski
              can you run with the 'debug' opencl libraries to see if you see more debug information:

              export LD_LIBRARY_PATH=/usr/lib/CL/debug:/usr/lib64/CL/debug:$LD_LIBRARY_PATH
              i don't see anything offhand that looks like it would cause a core-dump. tho, it might not be with the ReadBuffer, it might be with the kernel. When you do the EnqueueNDRange, we just put it on the queue, and it might not run right away, depending on what else is going on. when you call the EnqueueReadBuffer with the block=TRUE, then you wait there and we run the kernel and the readbuffer. if the kernel has issues and crashes, then that's probably where you'll see it.

              you can gdb (again, use the debug library per above) and see if you can tell where the seg fault is happening.
              .bri.