Topic
10 replies Latest Post - ‏2013-04-30T15:32:37Z by mauricfo
Lambzee
Lambzee
5 Posts
ACCEPTED ANSWER

Pinned topic cgroups setup for Java 7 performance on PowerLinux P7+ system

‏2013-04-24T20:28:03Z |

Hi,

      I saw this link  (https://www.ibm.com/developerworks/community/wikis/home?lang=en#%21/wiki/W51a7ffcf4dfd_4b40_9d82_446ebc23c550/page/Java+Performance+on+POWER7) where we have following comment.

"Most applications benefit from SMT. However, some applications do not scale with an increased number of logical CPUs on an SMT enabled system. One way to address such an application scalability issue is to change to a lower SMT mode with fewer logical CPUs; for examples, changing from SMT4 to SMT2. Another way is the usage of cgroups as described in the following section.

2.3.2. cgroups

This section needs more detail. If interested, please ask on the PowerLinux Community Message Board. "

If any one has information on this, please share.

----------------------------------------------------------------------------------------------------------------

Our P7+ Power system with RHEL 6.4 installed has issues with scaling for increase in no of sockets/Memory.  Please help in guiding to have java performance scaling issues answered.

Thanks in advance

Regards

Harsha

Updated on 2013-04-25T12:01:37Z at 2013-04-25T12:01:37Z by Bill_Buros
  • Bill_Buros
    Bill_Buros
    74 Posts
    ACCEPTED ANSWER

    Re: cgroups setup for Java 7 performance on PowerLinux P7+ system

    ‏2013-04-25T12:00:52Z  in response to Lambzee

    We can help, but we'll need more details.

    What type of Power7+ system?    Which Java?   What's your application doing?   What specifically are the scaling issues you are trying to address?

    • Lambzee
      Lambzee
      5 Posts
      ACCEPTED ANSWER

      Re: cgroups setup for Java 7 performance on PowerLinux P7+ system

      ‏2013-04-25T14:04:45Z  in response to Bill_Buros

      P7+ POWER is a IBM Power 760 system with 48 cores and 256GB memory with RHEL 6.4 (Santiago) installed(HMC controlled, dedicated partition). Execution of SERT (server efficiency rating tool)  on this SUT shows whether java performance per watt of CPU workload (stresses CPU), Memory workload (stresses memory and processor cache) and Storage workload (stresses Storage) . Performance score should scale as i increase no of sockets (1,2,4,8) OR memory (4GB,8,32,128,256,512,1024GB) OR storage disks.  

      It is expected that with 87.7% of memory set as Hugepages (H), for a 1 jvm per core, the memory allocated per JVM (X) = H / no of JVMs employed in the SERT run.  At present when i run Memory workload, this formula for per JVM causes the SUT to kill all running processes and i am forced to reduce the X to    (Y=69.3% of X) for which run execution completes without any Hang. Co-efficient of variation is defined in the SERT Suite as the variation of JVM performance score w.r.t  each JVM instance and from SPEC committee this is decided to be at 10%.  But currently i am having it as 12-16%.  (Hardware prefetch is turned OFF)

      IBM j9 java7 SR3 is used

      /usr/SERT/j9SR3/ibm-java-ppc64-70/bin/java -version
      java version "1.7.0"
      Java(TM) SE Runtime Environment (build pxp6470sr3-20121025_01(SR3))
      IBM J9 VM (build 2.6, JRE 1.7.0 Linux ppc64-64 20121024_126071 (JIT enabled, AOT enabled)
      J9VM - R26_Java726_SR3_20121024_1635_B126071
      JIT  - r11.b02_20120924_26343a
      GC   - R26_Java726_SR3_20121024_1635_B126071
      J9CL - 20121024_126071)
      JCL - 20121019_01 based on Oracle 7u6-b17

      Similar run on AIX 7.1 on IBM Power 760 with 48 cores, 128GB memory doesn't cause this issue of CV.  Any help is appreciated.


      Regards

      Harsha

      Updated on 2013-04-25T14:31:32Z at 2013-04-25T14:31:32Z by Bill_Buros
      • Bill_Buros
        Bill_Buros
        74 Posts
        ACCEPTED ANSWER

        Re: cgroups setup for Java 7 performance on PowerLinux P7+ system

        ‏2013-04-25T15:04:17Z  in response to Lambzee

        Interesting.   We do not use the SPEC.org SERT benchmark in our performance work, so I'm not familiar with the tuning approaches to be leveraged here.

        That said, I'm still trying to parse what you need help with in tuning considerations.   Optimizing a SPEC benchmark on a platform is usually a challenging iterative process.   What's your current focus?    Understanding why the processes are all killed at the 87% 16MB hugepage threshold?     Or understanding the variability of the scores across the 48 JVMs?

        I assume you're binding JVMs to cores and SMT is on?      We recently completed a Websphere+Java SPECjEnterprise2010 publish, and the details are explained in this article.    There will be a number of Java settings to check.. and Linux settings as well.

        You might also install the newer Java service packs for Java 7.  Are you using a 64-bit JVM or the 32-bit JVM?

        The primary tool we start with is the "lpcpu", which is run while the workload is going.    If you wish to gather that data and send it to me, you can send the tar ball to wmb at us.ibm.com.    

        http://ibm.co/download-lpcpu

         

         

        • Lambzee
          Lambzee
          5 Posts
          ACCEPTED ANSWER

          Re: cgroups setup for Java 7 performance on PowerLinux P7+ system

          ‏2013-04-25T16:29:08Z  in response to Bill_Buros

          Thanks for the article. I will try to implement in terms of OS settings and JVM settings. Processes are getting killed at 87.7% 16MB pages as java application processes are using more than expected memory and so due lack of memory processes like ssh are killed i assume. My current focus is to understand the variability of the scores across the 48 JVMs. Yes i am running SMT4, and binding process to cores as below.  Could you please point me to the latest java7 SR4 for PowerLinux ? i usually download from http://w3.hursley.ibm.com/java/jim/ibmsdks/latest/linuxppc64/index.html  I am using 64 bit java as per the java -version above.

          These are the JVM arguments used.  (SMT4, 1jvm per core, 4 threads per JVM,). With gencon policy CV issues persists.

           -DtotalHostHardwareThreads=192 -Xms3345m -Xmx3345m -Xlp -Xgcpolicy:balanced -Xmn401m -Xcompressedrefs -Xnoloa -Xgcthreads4 -Xverbosegclog:/usr/SERT/SERT-1.0.0-PwrLnx/GC/sert43_%pid_%uid_%d_%m_%y_%H_%M_%S.log org.spec.chauffeur.client.ClientJvm -director localhost:36398 -jvmid 4 -numJvms 48 -hostId perf144145
           

          I will run lpcpu and share the data later tomorrow.

           

          Regards

          Harsha

          • Bill_Buros
            Bill_Buros
            74 Posts
            ACCEPTED ANSWER

            Re: cgroups setup for Java 7 performance on PowerLinux P7+ system

            ‏2013-04-25T17:16:34Z  in response to Lambzee

            The normal place for customers to download is from the official web site...    http://www.ibm.com/developerworks/java/jdk/

          • mauricfo
            mauricfo
            5 Posts
            ACCEPTED ANSWER

            Re: cgroups setup for Java 7 performance on PowerLinux P7+ system

            ‏2013-04-29T13:52:39Z  in response to Lambzee

            Hi Harsha,

            The scenario you described (processes getting killed under high memory usage)
            looks like the kernel's Out-of-memory (OOM) killer in action.

            You can check it with dmesg (lines similar to 'Out of Memory: Killed process [PID] [process name].').

            I'd suggest you to try
            1) enable memory overcommit
            2) select the memory-allocating process to be the process killed (rather than other, unrelated processes,e.g., ssh)

            # echo 1 > /proc/sys/vm/overcommit_memory
            # echo 1 > /proc/sys/vm/oom_kill_allocating_task

            and refer you to these 2 documentation pages, for more information about both tunings
            1) https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-captun.html
            2
            http://man7.org/linux/man-pages/man5/proc.5.html (See overcommit_memory and oom_kill_allocating_task)

             

            I'd expect the system not to kill more processes, if the memory usage is under system's total available memory,
            or, otherwise, only the java processes allocating memory to be killed (and not your SSH daemon/connection, fortunately).

            Hope it helps.
            Mauricio

            Updated on 2013-04-29T13:53:34Z at 2013-04-29T13:53:34Z by mauricfo