IBM®
Skip to main content
    Country/region [select]      Terms of use
 
 
    
     Home      Products      Services & solutions      Support & downloads      My account     
 
developerworks > My developerWorks >  Dashboard > Linux for Power Architecture > ... > Performance Insights > Understanding Linux on Power performance
developerWorks
Log In   View a printable version of the current page.
Overview Connect Spaces Forums Wikis
Understanding Linux on Power performance
Added by wburos, last edited by billburos on Oct 12, 2009  (view change)
Labels: 


For discussions or questions...

To start a discussion or get a question answered, consider posting on the Linux for Power Architecture forum.



Topics covered here



Introduction


Installing additional software on your system

When running with Linux on Power, we recommend adding several components to the system installation:

  • sysstat
  • gcc/gfortran and the pre-reqs
  • oprofile

To extend what's available from the basic distro packages, we recommend downloading some additional packages from the web

  • nmon - latest version - Look for version nmon 12a for Linux. nmon runs on SLES9, SLES10, SLES 11 and RHEL4, RHEL 5.
  • The trial IBM compilers - see this page for some pointers
  • The latest IBM Java - download - Java 6, Java 5, Java 1.4.2



Understand the system

There are three aspects to understanding your system

  1. The operating system and Linux software
  2. The hardware system - processors, memory, disk, etc
  3. The Power logical partition


Understand your operating system

In general, the newer versions of the operating system and software stack have improved support of the latest Power hardware.

  Kernel
gcc level
Page size
16MB pages
16GB pages
SMT steal cycles
RHEL 4.7
2.6.9-78.EL gcc 3.4.6
4KB
     
RHEL 5.1
2.6.18-53.el5 gcc 4.1.2 (6/26/2007) 64KB Supported Not avail  
RHEL 5.2
2.6.18-92.el5 gcc 4.1.2 (11/24/2007)
64KB
Supported Not avail
RHEL 5.3
2.6.18-128.el5 gcc 4.1.2 (7/28/2008)
64KB
Supported Not avail New form
RHEL 5.4
2.6.18-164.el5 gcc 4.1.2 (7/04/2008)
64KB Supported Not avail New form
             
SLES 10 sp1
           
SLES 10 sp2
2.6.16.60-0.21
gcc 4.1.2 (1/15/2007)
4KB
Supported Not availl New form
SLES 11
2.6.27.19-5
gcc 4.3.2
64KB
Supported Supported New form

Examples are below..

Red Hat Enterprise Linux (RHEL)

For RHEL 5.2:

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.2 (Tikanga)

# gcc --version
gcc (GCC) 4.1.2 20071124 (Red Hat 4.1.2-42)

# uname -r
2.6.18-92.el5

# /usr/bin/time -v sleep 0 2>&1 | grep Page
Page size (bytes): 65536

For RHEL 5.2:

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.3 (Tikanga)

# gcc --version
gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44)

# uname -r
2.6.18-128.el5

# /usr/bin/time -v sleep 0 2>&1 | grep Page
Page size (bytes): 65536


Novell SUSE Linux Enterprise Server (SLES)

For SLES 10 sp2:

# cat /etc/SuSE-release
 SUSE Linux Enterprise Server 10 (ppc)
 VERSION = 10
 PATCHLEVEL = 2

# gcc --version
 gcc (GCC) 4.1.2 20070115 (SUSE Linux)


# uname -r
 2.6.16.60-0.21-ppc64

# /usr/bin/time -v sleep 0 2>&1 | grep Page
 Page size (bytes): 4096

For SLES 11:

# cat /etc/SuSE-release
 SUSE Linux Enterprise Server 11 (ppc64)
 VERSION = 11
 PATCHLEVEL = 0

# gcc --version | grep gcc
 gcc (SUSE Linux) 4.3.2 [gcc-4_3-branch revision 141291]

# uname -r
 2.6.27.19-5-ppc64

# /usr/bin/time -v sleep 0 2>&1 | grep Page
 Page size (bytes): 65536


Understand your hardware system

Industry terminology for cores, processors, chips, sockets, CPUs can be ambiguous.

  • Even respected organizations like SPEC.org have worked over the years trying to clarify the terms used by marketing teams and technical teams. For example, the SPECmpi2007 group published run rules which describe these terms for use with respect to SPECmpi workloads.

For Power systems, performance analysts generally follow these conventions...

  • The controlling HMC refers to "processors" .. which are also often referred to as "cores" on Power
  • Each Power 5 and 6 processor (or core) can support two Simultaneous hardware threads (SMT)
  • Power systems typically do not leverage the term "sockets" since that term can be ambiguous for the packaging approach used across the diverse Power systems.
  • Linux sees "CPUs" which are individually controlled by the scheduler with one CPU per hardware SMT thread - up to two hardware threads per core
  • The Linux scheduler for Power systems knows how to efficiently schedule the two SMT threads for each Power core
  • If SMT is on, the Linux CPUs are numbered sequentially (0, 1, 2, 3, ...)
  • There is no correlation or association that either Linux CPU (ie: 0 or 1) is the "real core". We hear this surprisingly often.
  • With SMT on, the processor core is kept "more busy" when one of the two hardware threads is waiting on something
  • If SMT is off, the Linux CPUs are numbered with even numbers (0, 2, 4, ...)

It is usually recommended that you run with SMT = on


Understand your partition

Here's an example of a Power 6 partition running with SLES 11. There are quite a few fields, all of which are helpful in describing the details of the defined partition. For performance purposes, there are a handful of fields which are particularly important which we'll look at a little more closely.

The values from "lparcfg" simply reflect how the partition is defined. To change these values you'll need to modify the partition definition with the HMC or the IVM. For more details, check out the IBM Redbook [Virtualizing an Infrastructure
with System p and Linux |http://www.redbooks.ibm.com/abstracts/sg247499.html?Open]. Once changed, the partition will need to be shutdown and re-started from the HMC.

# cat /proc/ppc64/lparcfg 
lparcfg 1.8 
serial_number=IBM,0210AB920
system_type=IBM,9406-525
partition_id=4
BoundThrds=1
CapInc=1
DisWheRotPer=2375000
MinEntCap=40
MinEntCapPerVP=10
MinMem=2048
MinProcs=1
partition_max_entitled_capacity=200
system_potential_processors=2
DesEntCap=40
DesMem=4096
DesProcs=2
DesVarCapWt=128

partition_entitled_capacity=40
group=32772
system_active_processors=2
pool=0
pool_capacity=200
pool_idle_time=0
pool_num_procs=0
unallocated_capacity_weight=0
capacity_weight=128
capped=0
unallocated_capacity=0
cmo_enabled=0
purr=300918736623
partition_active_processors=2
partition_potential_processors=2
shared_processor_mode=1


Key fields typically checked by performance teams:

  • system_potential_processors=2 The number of physical cores on the underlying system
  • system_active_processors=2 The number of physical cores which are active on the underlying system
  • partition_potential_processors=2 The number of virtual processors (cores) defined in the partition
  • partition_active_processors=2 The number of virtual processors (cores) are active
  • partition_max_entitled_capacity=200 100=1 processor. So 2 processors total as the "max".
  • partition_entitled_capacity=40 40=40% of 1 processor
  • Capped=0 0 means this partition can use other unused CPU cycles from other partitions up to the max_entitled_capacity for this partition
  • shared_processor_mode=1 Unused CPU cycles are shared with other partitions on the system

So in this example, if SMT is on when Linux is running, this partition will see four CPUs, running on the two processor cores assigned to the partition. The CPU cycles not used in this partition are "shared" with other partitions, and this partition will get assigned at least 40% of 1 processor and can use up to 2 full processors if available.

For Performance Benchmarks

For consistent performance benchmarking, we recommend that shared_processor_mode should be 0, capped should be 1, and the capacity values should correlate to the processors assigned.

Later, for production use, for more effective use of the whole system, you should seriously consider sharing CPU cycles across partitions.


Understand your application




 
    About IBM Privacy Contact