Topic
9 replies Latest Post - ‏2010-07-11T13:43:55Z by PedroGonnet
SystemAdmin
SystemAdmin
10114 Posts
ACCEPTED ANSWER

Pinned topic Strange behaviour (bug?) when using O3 spu-gcc

‏2006-06-30T14:50:54Z |
When using SIMD in a simple loop to sum an array of uints, I am finding that if -O3&spu-gcc is used, the code only works correctly when I have a printf for the sum variable inside the loop. If I use -O0, or use the XLC compiler, this problem does not occur.

The important part is pasted here:
"""
int const array_size = 8;
unsigned int *data_local = memalign(16, array_size * sizeof(unsigned int));
for (int i =0; i < array_size; ++i) {
data_local = i;
}

vector unsigned int *data_vec = data_local;
vector unsigned int tmp;
unsigned long long sum = 0;
for (int i = 0; i < array_size / 4; i += 2) {
tmp = spu_add (data_vec, data_veci + 1);

// Heisenbug: with spu-gcc, this printf is necessary
// when optimisation is on.
//printf ("sum=%lld\n", sum);
sum += (unsigned long long)((unsigned int *)(&tmp))[0];
sum += (unsigned long long)((unsigned int *)(&tmp))[1];
sum += (unsigned long long)((unsigned int *)(&tmp))[2];
sum += (unsigned long long)((unsigned int *)(&tmp))[3];
}

printf ("%lld Calculated sum %lld\n", speid, sum);
"""

When the printf is uncommented, the sum is correctly reported as 28. When it is disabled, the sum is reported as 25284896.

I have made the full source to this test case, including makefile and PPU code, available at http://icculus.org/~jcspray/sums_minimal.tar.gz

Hopefully someone can shed some light on this.
Updated on 2010-07-11T13:43:55Z at 2010-07-11T13:43:55Z by PedroGonnet
  • SystemAdmin
    SystemAdmin
    10114 Posts
    ACCEPTED ANSWER

    Re: Strange behaviour (bug?) when using O3 spu-gcc

    ‏2006-06-30T15:51:12Z  in response to SystemAdmin
    Correction: the problem is present with XLC, not with O3 but with O4.

    Now seems more likely that it's my problem, but I know not where.
  • SystemAdmin
    SystemAdmin
    10114 Posts
    ACCEPTED ANSWER

    Re: Strange behaviour (bug?) when using O3 spu-gcc

    ‏2006-06-30T17:50:17Z  in response to SystemAdmin
    In this line:

    tmp = spu_add (data_vec, data_veci + 1);

    For the first parameter, do you mean "data_vec[i]" instead of what you wrote?
    • SystemAdmin
      SystemAdmin
      10114 Posts
      ACCEPTED ANSWER

      Re: Strange behaviour (bug?) when using O3 spu-gcc

      ‏2006-06-30T17:56:50Z  in response to SystemAdmin
      Oops, it would seem like the forum is truncating the text of "data_vec i ".
  • SystemAdmin
    SystemAdmin
    10114 Posts
    ACCEPTED ANSWER

    Re: Strange behaviour (bug?) when using O3 spu-gcc

    ‏2006-07-03T07:54:05Z  in response to SystemAdmin
    I have the same problem with DMA transfers

    My post is here

    http://www-128.ibm.com/developerworks/forums/dw_thread.jsp?forum=739&thread=119731&cat=46

    I "solved" it with 'fflush(stdout)' instruction, not printf

    this may be a bug....

    /miKeL a.k.a.mc2
  • SystemAdmin
    SystemAdmin
    10114 Posts
    ACCEPTED ANSWER

    Re: Strange behaviour (bug?) when using O3 spu-gcc

    ‏2006-07-03T10:27:57Z  in response to SystemAdmin
    I have tried making all the variables in the problematic function volatile. This changes the output value (from millions to 128), but it remains incorrect in the absence of a printf or fflush.

    Note that there is no DMA whatsoever going on in this test case.
    • SystemAdmin
      SystemAdmin
      10114 Posts
      ACCEPTED ANSWER

      Re: Strange behaviour (bug?) when using O3 spu-gcc

      ‏2006-07-04T05:57:46Z  in response to SystemAdmin
      I believe your problem lies in the casting in these lines:

      sum += (unsigned long long)((unsigned int *)(&tmp))[0];
      sum += (unsigned long long)((unsigned int *)(&tmp))[1];
      sum += (unsigned long long)((unsigned int *)(&tmp))[2];
      sum += (unsigned long long)((unsigned int *)(&tmp))[3];

      for which the compiler gives a warning related to the pinning. conversions between vectors and scalars are illegal (see SPU_language_extensions_2.1.pdf, section 1.3.5). Instead, you should use the following code:

      sum += spu_extract(tmp, 0);
      sum += spu_extract(tmp, 1);
      sum += spu_extract(tmp, 2);
      sum += spu_extract(tmp, 3);

      I tried, and it appears to fix your problem
      Roberto
      • SystemAdmin
        SystemAdmin
        10114 Posts
        ACCEPTED ANSWER

        Re: Strange behaviour (bug?) when using O3 spu-gcc

        ‏2006-07-04T08:52:25Z  in response to SystemAdmin
        Thank you, that does indeed fix the problem. The compiler warning was the somewhat opaque "warning: dereferencing type-punned pointer will break strict-aliasing rules"

        I knew there had to be a neater way of getting the scalars out of the vector... :-)
        • PedroGonnet
          PedroGonnet
          12 Posts
          ACCEPTED ANSWER

          Re: Strange behaviour (bug?) when using O3 spu-gcc

          ‏2010-07-10T10:18:58Z  in response to SystemAdmin
          I am experiencing the same or similar behaviour, get with spu-gcc (GCC) 4.1.1, and the problem won't go away.

          I have a nested for-loop where i compute the pairwise interactions between two particles using "vector float" types and SIMD intrinsics. I have an non-vectorized version of the code that runs both on the PPU and the SPU correctly -- both versions differ only in the use of SIMD instructions.

          If I count the number of pairwise interactions computed per time step, I get a different result in the SIMD version than in the non-SIMD version. In an attempt to track down this bug, I inserted a "printf"-statement inside the inner loop. With this "printf" (or using "fflush" as suggested above), the code runs correctly.

          Using "-O0" and "-O1" also fixes the problem (without "printf" or "fflush"), but as this is the innermost loop of a large computation, this is not really an option.

          The code does not use any strange casts as in the example above and I have no idea what may be going wrong.

          I have attached the relevant source files. The function that fails is "dopair" (line 339 of runner_spu.c) and the "printf"/"fflush" statement that makes it work is on line 452. I am trying to reduce the problem to a smaller example, but I'm not quite there yet.

          Cheers, Pedro.
          • PedroGonnet
            PedroGonnet
            12 Posts
            ACCEPTED ANSWER

            Re: Strange behaviour (bug?) when using O3 spu-gcc

            ‏2010-07-11T13:43:55Z  in response to PedroGonnet
            I think I may have found the cause of my bug... My code inadvertently had outstanding DMA transfers on the data it was working on. If the interactions were computed too quickly (e.g. when optimizations were on), the code would work on old/stale data, resulting in the different behavior.

            Sorry for posting this non-bug!

            Cheers, Pedro