Performance Rocks - Best Practices

Aug 2014.   This article's relevancy and currency may be need to be improved due to out-of-date information.    Please consider helping to improve the contents or ask questions on the DeveloperWorks Forums.

 

Best Practices for Performance

We recommend POWER7 and POWER7+ systems. POWER7+ provides faster clock speeds and significantly increased L3 caches (10MB).

Now POWER8 !

Linux is optimized for all sizes of Power systems and partitions: small LPARs (partitions in PowerVM terminology), small full systems, clear up to 256 POWER7 cores in a single physical system (1024 hardware threads).

If you need to gather performance data, we recommend the Linux Performance Customer Profiling Tool (aka lpcpu ). 

 

Recommended Distros

We recommend leveraging the latest distro versions and releases. SLES 11 SP3 and RHEL 6.5

SLES 11 SP3 is the latest SUSE recommended version and release level for POWER7 and now POWER7+

RHEL 6.5 is the latest Red Hat recommended version and release level for POWER7 and POWER7+ systems.

  • RHEL 6.3 issue: If you are experiencing strange (incorrect) CPU values, in particular random CPUs seemingly running 100% busy, you may need a kernel update from RedHat.   This is only seen on initial RHEL 6.3 kernel.   See the blog post on CPU utilization metrics for RHEL 6.3.  

    The "bad" kernel:  2.6.32-279.el6.ppc64

     
  • Good reading from Red Hat: The Red Hat Performance Tuning Guide

Fedora 19 is also now available if you feel inclined to play around with that. Fedora is not intended for business-critical production modes since it's a community development platform, but it does nicely introduce new technologies "coming". Fedora 19 is already in progress.

 

32-bit vs 64-bit applications.  Linux and POWER supports both 32-bit and 64-bit applications. There is no requirement from the performance side to move to 64-bit application support when running on Power systems.      Future versions of Linux are expected to more strongly encourage 64-bit only applications.

 

Related Information

If you are new to the combination of Linux and POWER systems for porting, check our New to PowerLinux Tuning? page.

We maintain a PowerLinux Performance FAQs page. The FAQs page will help explain various acronyms and point to related descriptions of Power concepts.

For more information on IBM Power Systems and performance:

 



Recommended Software Packages

For more information on a new system, see theSetup and Understand Your System article. This page provides a simple set of instructions and easy to use scripts to help you make sure you have everything we recommend on a new system. See the list below on recommended Performance Tools.

We recommend using the IBM POWER Linux Tools Repository which provides a zypper/yum repository for SLES and RHEL releases. Follow the instructions there to install the packages and tools for your operating system.

 

In other words. download the repo setup. Install the recommended pieces (this example is RedHat - similar steps exist for SLES)

rpm -i ibm-power-repo-1.2.1-0.ppc.rpm

yum install ibm-power-managed-rhel6



yum install advance-toolchain-at6*

yum install ibm-sdk*

  • If you are installing a new system, we  alternatively  recommend leveraging the latest IBM Installation Toolkit to install the tools available for Power7 on your system.

 

We recommend installing the latest IBM Advance Toolchain for Power. This package nicely installs some additional tools.

 

We recommend you consider leveraging the IBM SDK for Linux on Power, an Eclipse based IDE. Check out the recent Think Power Linux blog post. The IBM SDK will prereq the Advance Toolchain for Linux.

 

If you need to gather performance data, we recommend the Linux Performance Customer Profiling Tool (aka lpcpu ). This tool will encourage the usage of oprofile, sysstat tools, perf, others.     This tool is a tar-ball which can be downloaded to x86 and Power systems.   On x86, you can format the results of the profiler output if desired.

 

For more information on the contents of the YUM repos..  see..

Blades:
HMC/IVM:
Non-Managed:

 

IBM Java

We recommend downloading and using the latest IBM Java for Power systems. The minimal level of IBM Java 6 SR7 is specifically tuned and tailored to exploit Power7. IBM provides Java 6 (currently SR14) and Java 7 (currently SR5) from the IBM Java download site.    You can download 32-bit and 64-bit versions.

POWER7+ systems provide for significantly increased L3 caches, which can dramatically improve some Java workloads.

 

 

IBM XL compilers

For the ultimate performance on Power7 servers, consider leveraging the IBM XL C/C++ and Fortran compilers. They are strongly optimized for Power7 servers and have more advanced levels of optimization available to them (compared to GCC). If your application might benefit from the extreme optimizations of -O4 and -O5, targeted specifically at Power7, the IBM XL compilers should be a good boost.

For more information on the IBM XL compilers, see the following portals. Note that both IBM XL Fortran and C/C++ compilers come with the tuned MASS libraries

 

gcc - Advance Toolchain

When building with gcc, we recommend gcc -O3 optimization. The default is -O0 (dash oh zero), which isn't interesting for performance - that mode is more tailored for quick compiles and debugging.

For applications targeted at Power7 servers, we recommend the gcc options: -O3  -mtune=power7 -mcpu=power7

We recommend considering Advance Toolchain (with newer versions of gcc and improved CPU-tuned libraries) available from and supported by IBM.

  • As an example, see the release notes for the Advance Toolchain 5.0 release.  The latest Advance Toolchain libraries provide an accelerated tcmalloc library.
  • The Advance Toolchain 6.0 release provides a newer GCC version and more POWER7 optimized libraries.

Check out the article on Improving performance with Advance Toolchain.

 

Recommended Performance Tools

If you need to gather performance data, we recommend the Linux Performance Customer Profiling Tool - lpcpu.

Power Linux supports the normal suite of Linux performance tools. vmstat, mpstat, iostat, sar, top etc. We recommend that you be familiar with these. These are a part of the "sysstat" rpm package.

The command ppc64_cpu will tell you useful key information for your system. Check the CPU frequencies, SMT mode, etc. This is a part of the powerpc-utils rpm package.

Binding processes to CPUs can be done in a number of ways. The easiest way is with the command "taskset".

Looking at the numa topology of the server is easy, look at the command "numactl".

The first recommended deeper analysis tool is "oprofile". This is the standard oprofile. There is a lot of documentation available for oprofile.

Important note in May 2013: The operf version of oprofile will be the recommended approach with oprofile going forward.    The legacy "opcontrol" version of oprofile is not expected to be updated and maintained in the future.

There are plenty of hardware counter tools available, but these are generally not needed until later in the analysis process. Search for "perf" tools in this context. By the way, oprofile also can easily profile with specific hardware counters.

For information on optimizing malloc, see the malloc paper on tuning and optimization techniques.

There are common and easy performance tools available - as on any Linux system. Check out Red Hat's documentation on "profilers".

  • valgrind (part of the IBM SDK)
  • systemtap
  • performance counters for Linux (pcl) aka "perf"
  • ftrace

 



Tuning Guide information for PowerLinux

 

The Power Instruction Set Architecture

The PowerISA 2.06B (Power7) document is available at https://www.power.org/wp-content/uploads/2012/07/PowerISA_V2.06B_V2_PUBLIC.pdf

 

 

Power7 Performance Analysis with hardware counters

https://www.power.org/events/Power7/

Comprehensive PMU Event Reference - POWER7

There are currently 557 events that can be measured using the POWER7 Performance Monitor Unit instrumentation. These events can be measured using tools like hpmcount (AIX) and perf (Linux). This document provides details on each event and how it is triggered. Each entry lists the event name, a brief event description and a detailed description. This is a reference document for anyone interested in characterizing the performance of an application on POWER7 systems.

Commonly Used Metrics for Performance Analysis - POWER7

The first step in optimizing an application is characterizing how well the application runs on a POWER7 system.

First, this paper briefly covers the POWER7 execution pipeline and the PMU hardware. Then it introduces some AIX and Linux tools that can be used to collect hardware events. Finally, the paper discusses several useful sets of metrics that can characterize how applications run on POWER7 subsystems.

While not an exhaustive list, these metrics do cover many common areas of concern such as the CPI stack, address translation and memory fabric utilization.

For a practical article on CPI Breakdown, see this DeveloperWorks CPI Breakdown article.

 

Evaluate performance for Linux on POWER. Learn to evaluate Linux on POWER® performance issues that focus on compiled language (such as C or C++) environments. This article explains the POWER7® CPI model and demonstrates the use of commonly available Linux® tools to show potential CPU stalls, pipeline hazards, and performance issues. Analyze and optimize an algorithm for POWER7 in the final section.