Topic
17 replies Latest Post - ‏2008-11-19T00:30:52Z by SystemAdmin
SystemAdmin
SystemAdmin
10114 Posts
ACCEPTED ANSWER

Pinned topic cryptography - md5 hash

‏2008-09-16T03:00:28Z |
Hi, has anyone seen the blackhat paper by nick breese on http://www.security-assessment.com/ ?

He initially claimed to be able to do >1billion md5 hash /second and later revised this number to 80million/second. I have checked his code, available here and this version indeed runs at over >1billion md5hash /second.

However, this code hashes the same value over and over again. The moment i start to increment the hash data and use this hash data, the iterations immediately drop to ~80million. My additional code is negligible.

Does anyone know the reason for this disparity?
Updated on 2008-11-19T00:30:52Z at 2008-11-19T00:30:52Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    10114 Posts
    ACCEPTED ANSWER

    Re: cryptography - md5 hash

    ‏2008-09-16T08:27:41Z  in response to SystemAdmin
    Hi,

    How did you measure these results?
    The code you referred does not do any iterations if I compile it with -DSPEEDTEST.
    • SystemAdmin
      SystemAdmin
      10114 Posts
      ACCEPTED ANSWER

      Re: cryptography - md5 hash

      ‏2008-09-16T08:36:36Z  in response to SystemAdmin
      The way I did it was to include the line #define SPEEDTEST in spu_md5.c

      Also include some code in ppu_md5.c to calculate the time. Not the most elegant ways but they work
      time(&start);
      ...
      time(&end);
      double dif = difftime (end,start);
      printf("\nSuccessfully executed in %.2lf seconds.\n", dif);
    • SystemAdmin
      SystemAdmin
      10114 Posts
      ACCEPTED ANSWER

      Re: cryptography - md5 hash

      ‏2008-09-16T08:39:41Z  in response to SystemAdmin
      Nick breese claims a compiler optimisation caught him out. Is this possible?

      thanks
      • SystemAdmin
        SystemAdmin
        10114 Posts
        ACCEPTED ANSWER

        Re: cryptography - md5 hash

        ‏2008-09-16T11:06:23Z  in response to SystemAdmin
        Interesting application. It looks very much like compiler optimizations are the problem. Looking at the generated assembly, defining SPEEDTEST causes the md5_process function to be optimized out entirely! This is because it has no side-effects. I've attached a modified version. Note that it must be compiled with the following additional option to force inlining of the huge function:
        --param max-inline-insns-single=2000

        My changes are commented (my comments prefixed with 'jmt'). In summary they are:
        • Add volatile variables where necessary to prevent dead code stripping
        • Trivial instruction for very slight pipelining improvement
        • Calculate another four values in parallel in order to fill pipeline stalls with useful work.

        At a quick glance there doesn't seem to be much to do to improve the instruction mix. It would be great to shift some work onto the odd pipeline, but I can't see any obvious way to do that.

        Interestingly, a bonus from doing two calculations in parallel is that this happens to push the constant-initialization code out of the loop, which gives a slight performance boost. In my experience spu-gcc is not very good at promoting such instructions out of loops, even if there are spare registers available. It seems pot luck whether it does so or not.

        The modified code calculates 8 values on each of 2.5M iterations on 6 SPUs, making 120M calculations per second. I'm 99% certain the published performance values are meaningless as they are not actually timing full execution of the algorithm.

        Hope this is of interest to the forum...
        Jonny
        • SystemAdmin
          SystemAdmin
          10114 Posts
          ACCEPTED ANSWER

          Re: cryptography - md5 hash

          ‏2008-09-16T14:12:14Z  in response to SystemAdmin
          "Trivial instruction for very slight pipelining improvement" = "Trivial instruction reordering for very slight pipelining improvement"

          Also, improved instruction selection for rotate_left and the and/or-with-complement operations, which nearly doubles the speed.

          I'm fairly certain that there are no compiler optimizations removing/merging redundant code now, but that may merit further checking.

          Performance now 225M iterations/sec, code actually attached this time.
          • SystemAdmin
            SystemAdmin
            10114 Posts
            ACCEPTED ANSWER

            Re: cryptography - md5 hash

            ‏2008-09-16T14:18:27Z  in response to SystemAdmin
            Why aren't my attachments showing up?

            Code posted at:
            http://www.dur.ac.uk/j.m.taylor/spu_md5.c
            • SystemAdmin
              SystemAdmin
              10114 Posts
              ACCEPTED ANSWER

              Re: cryptography - md5 hash

              ‏2008-09-17T01:06:04Z  in response to SystemAdmin
              Hi, thanks. great analysis
            • SystemAdmin
              SystemAdmin
              10114 Posts
              ACCEPTED ANSWER

              Re: cryptography - md5 hash

              ‏2008-10-29T08:01:04Z  in response to SystemAdmin
              The slight revival of this thread reminded me of a bit of extra comparison people might find interesting. I wanted to compare the PS3 performance to that of a high-end desktop. I was very surprised to find that a Mac Pro (running all 8 cores) is pretty much exactly the same speed as the PS3 (running all 6 SPUs) for this task.

              I had assumed that this task was something the PS3 would excel at. While that is true, it's also a task that the Xeon is fairly well suited to! The problem doesn't require a large working register set, and only miminal memory access, and is sufficiently parallel on an instruction-by-instruction basis to keep the CPU well fed.

              This is in contrast to my own scientific code which runs about 3 times as fast on the PS3 than on the Mac Pro. In the MD5 case, though, it seems that the PS3 wins but only on purchase cost...
              • SystemAdmin
                SystemAdmin
                10114 Posts
                ACCEPTED ANSWER

                Re: cryptography - md5 hash

                ‏2008-10-30T01:04:46Z  in response to SystemAdmin
                yup, i think performance really depends on coding style. I am still experimenting and little things can cause large pipeline stalls even when using compiler optimisation.

                I am currently looking at bitsliced implementations of DES, has anyone tried this yet?
                • SystemAdmin
                  SystemAdmin
                  10114 Posts
                  ACCEPTED ANSWER

                  Re: cryptography - md5 hash

                  ‏2008-10-31T18:57:58Z  in response to SystemAdmin
                  I hadn't, but you got me curious. As well as the large number of registers being potentially useful for a bitsliced implementation, it strikes me that the presence of a single-instruction spu_sel could potentially give the SPUs quite an advantage over an x86 machine when it comes to implementing the S-boxes.

                  As I already had some code which deals with a different instruction-minimization problem, I had a go at applying it to the case of the S-boxes. I can get the sequence down to an average of about 45 operations per S-box, which is better than the value of 51 quoted here (which doesn't use a 'select' primitive): http://www.darkside.com.au/bitslice/

                  I'm optimistic that better search code could get it down significantly further. Since in some cases my code generates a worse sequence than his, despite the availability of an extra primitive, my code is clearly not doing such an effective job of searching the parameter space...
                  • SystemAdmin
                    SystemAdmin
                    10114 Posts
                    ACCEPTED ANSWER

                    Re: cryptography - md5 hash

                    ‏2008-11-18T05:08:40Z  in response to SystemAdmin
                    Marc Bevand has kindly posted his bitsliced DES sbox optimiser for PS3 at http://www.epitech.eu/v4/perso/~bevand_m/

                    Like you, he achieves 45 instructions on average per sbox. Marc Bevand achieves <40 instructions here http://www.hyperelliptic.org/SPEED/slides/Osvik_cell-speed.pdf

                    Theoratically, without optimisation and just 'selb', the PS3 only needs 12+ 4x(8+4+2+1) = 72 gates so this decrease to <40 is not too surprising

                    Thanks jmt for your prompt responses so far. I better get down to understanding this instruction minimisation problem!!
                    • SystemAdmin
                      SystemAdmin
                      10114 Posts
                      ACCEPTED ANSWER

                      Re: cryptography - md5 hash

                      ‏2008-11-18T10:04:36Z  in response to SystemAdmin
                      Very interesting, thanks for posting those links.

                      I had a skim through the PDF by Osvik & Tromer, but couldn't find any reference to <40 instructions. Could you point me to the bit where that is mentioned?

                      When I have a chance I will post my code, but unfortunately at the moment my own (unrelated) research is taking up all my available time. I'm doing a few overnight runs at the moment, though, playing with a modified algorithm that I think should potentially yield slightly shorter sequences...
  • SystemAdmin
    SystemAdmin
    10114 Posts
    ACCEPTED ANSWER

    Re: cryptography - md5 hash

    ‏2008-10-22T14:19:55Z  in response to SystemAdmin
    Question about this code. I'm new to this so bear with me. The comment says it is processing four values: 1234aaa, 2345aaa, 3456aaa, and 4567aaa. But the hex doesn't include the aaa in each value.

    vec_uint4 data = { 0x34333231, 0x35343332, 0x36353433, 0x37363534 };

    Obviously there is something here I am not understanding. If someone could explain this a learner....I would be grateful.
    • SystemAdmin
      SystemAdmin
      10114 Posts
      ACCEPTED ANSWER

      Re: cryptography - md5 hash

      ‏2008-10-23T00:33:17Z  in response to SystemAdmin
      Hi, this is the ascii value of 1234, 2345, 3456 etc..

      He stores it backwards as 4321, 5432,6543 etc because of big/little endian conversion (Which i feel actually isnt neccessary and just confuses things)
      • SystemAdmin
        SystemAdmin
        10114 Posts
        ACCEPTED ANSWER

        Re: cryptography - md5 hash

        ‏2008-10-23T13:11:47Z  in response to SystemAdmin
        Thanks for the reply? Do you think there is a reason he hard-coded the aaa into the vector?

        datachunk[1]=(vec_uint4){0x80616161,0x80616161,0x80616161,0x80616161};
        • SystemAdmin
          SystemAdmin
          10114 Posts
          ACCEPTED ANSWER

          Re: cryptography - md5 hash

          ‏2008-10-24T00:51:00Z  in response to SystemAdmin
          hi, i think the code is very simple, purely for speed testing purpose. Taken in this context, I think hard-coding aaa is just for convenience. :)