Topic
IC4NOTICE: developerWorks Community will be offline May 29-30, 2015 while we upgrade to the latest version of IBM Connections. For more information, read our upgrade FAQ.
4 replies Latest Post - ‏2013-07-01T22:25:35Z by nasica88
nasica88
nasica88
24 Posts
ACCEPTED ANSWER

Pinned topic Low latency memcpy performance

‏2013-06-26T06:51:27Z |

I am testing low latency memcpy performance on PowerLinux.  I do not own the source code, but in this test, basically one process writes to a shared memory segment and another reads from that using semaphore signals.

I found that adding the following kernel parameters to the 'append' section in /boot/etc/yaboot.conf, the performance improved much.

" nohz=off intel_idle.max_cstate=0 processor.max_cstate=0 cgroup_disable=memory nmi_watchdog=0 divider=4 nosoftlockup mce=ignore_ce"

As you can see, many of these are available only on Intel, and actually I got this set of parameters from x86 people.  I guess perhaps I can get even better performance with kernel parameters specific to PowerPC. 

Any experience with these or any recommendation ?

  • sjmunroe
    sjmunroe
    8 Posts
    ACCEPTED ANSWER

    Re: Low latency memcpy performance

    ‏2013-06-27T18:52:33Z  in response to nasica88

    nohz=off is valid for POWER but the rest don't apply. Beyond that need to know more about what POWER HW you have (number of sockets and frames effects numa) and what Linux distribution you are running. Back level distro may have have optimized memcpy for your specific POWER (P6/P7/P7+) chip.

    Also did you know about the Advance Toolchain and SDK for PowerLinux?

    http://www-304.ibm.com/webapp/set2/sas/f/lopdiags/sdklop.html

    • nasica88
      nasica88
      24 Posts
      ACCEPTED ANSWER

      Re: Low latency memcpy performance

      ‏2013-06-27T23:57:56Z  in response to sjmunroe

      I got a much better response time with "nohz=off highres=off cgroup_disable=memory nmi_watchdog=0 divider=4".  I saw evident improvement with the addition of highres=off.

      I have a 7R2 with 4.2GHz 16cores, no LPAR (one whole LPAR, I mean), RHEL 6.4 with AT6.0-4. 

      Updated on 2013-07-02T01:36:33Z at 2013-07-02T01:36:33Z by nasica88
      • sjmunroe
        sjmunroe
        8 Posts
        ACCEPTED ANSWER

        Re: Low latency memcpy performance

        ‏2013-07-01T19:11:16Z  in response to nasica88

        I need specific and clear details to be of any help.

        Are these measurements on Intel or on the 7R2? It is still not clear what you are measuring and how you are measuring the results.

        From context you have some shared memory and semaphores. You have not described how the code and data is distributed across the 2 nodes (6-8 cores per node) of the 7R2.

        Which kind of Semaphore? Posix or ipc? How much time are you spending in the kernel? If Posix semaphore are you using trylock (to stay out of the kernel)?

        As far as I know, only the nohz boot option of the list you gave is applicable on POWER.

        Also did you recompile and link you test case with the advance toolchain. Just installing the AT will not change the behavior of existing applications.

        export PATH=/opt/at6.0/bin:$PATH

        then rebuild your application

        • nasica88
          nasica88
          24 Posts
          ACCEPTED ANSWER

          Re: Low latency memcpy performance

          ‏2013-07-01T22:25:35Z  in response to sjmunroe

          These measurements were on 7R2. 

          I used AT with the proper path as you suggest.

          We are not binding the processes to any core or socket with taskset command on 7R2, nor on x86.

          However, I cannot answer the rest of your questions, for I do no have access to the source codes of the customer's testcase.