Performance Rocks - Best Practices
Aug 2014. This article's relevancy and currency may be need to be improved due to out-of-date information. Please consider helping to improve the contents or ask questions on the DeveloperWorks Forums.
Best Practices for Performance
We recommend POWER7 and POWER7+ systems. POWER7+ provides faster clock speeds and significantly increased L3 caches (10MB).
Now POWER8 !
Linux is optimized for all sizes of Power systems and partitions: small LPARs (partitions in PowerVM terminology), small full systems, clear up to 256 POWER7 cores in a single physical system (1024 hardware threads).
We recommend leveraging the latest distro versions and releases. SLES 11 SP3 and RHEL 6.5
SLES 11 SP3 is the latest SUSE recommended version and release level for POWER7 and now POWER7+
- Good reading from SUSE: The SLES 11 System Analysis and Tuning Guide
RHEL 6.5 is the latest Red Hat recommended version and release level for POWER7 and POWER7+ systems.
RHEL 6.3 issue: If you are experiencing strange (incorrect) CPU values, in particular random CPUs seemingly running 100% busy, you may need a kernel update from RedHat. This is only seen on initial RHEL 6.3 kernel. See the blog post on CPU utilization metrics for RHEL 6.3.
The "bad" kernel:
- Good reading from Red Hat: The Red Hat Performance Tuning Guide
Fedora 19 is also now available if you feel inclined to play around with that. Fedora is not intended for business-critical production modes since it's a community development platform, but it does nicely introduce new technologies "coming". Fedora 19 is already in progress.
32-bit vs 64-bit applications. Linux and POWER supports both 32-bit and 64-bit applications. There is no requirement from the performance side to move to 64-bit application support when running on Power systems. Future versions of Linux are expected to more strongly encourage 64-bit only applications.
If you are new to the combination of Linux and POWER systems for porting, check our New to PowerLinux Tuning? page.
We maintain a PowerLinux Performance FAQs page. The FAQs page will help explain various acronyms and point to related descriptions of Power concepts.
For more information on IBM Power Systems and performance:
- IBM Power Systems Technical Guide (facts and features)
- IBM Power Systems performance benchmarks (a summary of published results)
- IBM Power Systems Performance Reports (the comprehensive list of official published performance results)
Recommended Software Packages
For more information on a new system, see theSetup and Understand Your System article. This page provides a simple set of instructions and easy to use scripts to help you make sure you have everything we recommend on a new system. See the list below on recommended Performance Tools.
We recommend using the IBM POWER Linux Tools Repository which provides a zypper/yum repository for SLES and RHEL releases. Follow the instructions there to install the packages and tools for your operating system.
In other words. download the repo setup. Install the recommended pieces (this example is RedHat - similar steps exist for SLES)
rpm -i ibm-power-repo-1.2.1-0.ppc.rpm
yum install ibm-power-managed-rhel6
yum install advance-toolchain-at6*
yum install ibm-sdk*
- If you are installing a new system, we alternatively recommend leveraging the latest IBM Installation Toolkit to install the tools available for Power7 on your system.
We recommend installing the latest IBM Advance Toolchain for Power. This package nicely installs some additional tools.
- A related technical report is available on understanding how Optimized Libraries work with the Auxiliary Vector.
If you need to gather performance data, we recommend the Linux Performance Customer Profiling Tool (aka lpcpu ). This tool will encourage the usage of oprofile, sysstat tools, perf, others. This tool is a tar-ball which can be downloaded to x86 and Power systems. On x86, you can format the results of the profiler output if desired.
For more information on the contents of the YUM repos.. see..
We recommend downloading and using the latest IBM Java for Power systems. The minimal level of IBM Java 6 SR7 is specifically tuned and tailored to exploit Power7. IBM provides Java 6 (currently SR14) and Java 7 (currently SR5) from the IBM Java download site. You can download 32-bit and 64-bit versions.
POWER7+ systems provide for significantly increased L3 caches, which can dramatically improve some Java workloads.
Check out the Tuning Guide advice for Java performance on POWER7.
Redbook: POWER7 and POWER7+ Optimization and Tuning Guide -
- A related IBM Websphere and IBM Java performance paper: SPECjEnterprise2010: A performance case study
IBM XL compilers
For the ultimate performance on Power7 servers, consider leveraging the IBM XL C/C++ and Fortran compilers. They are strongly optimized for Power7 servers and have more advanced levels of optimization available to them (compared to GCC). If your application might benefit from the extreme optimizations of -O4 and -O5, targeted specifically at Power7, the IBM XL compilers should be a good boost.
For more information on the IBM XL compilers, see the following portals. Note that both IBM XL Fortran and C/C++ compilers come with the tuned MASS libraries
gcc - Advance Toolchain
When building with gcc, we recommend gcc -O3 optimization. The default is -O0 (dash oh zero), which isn't interesting for performance - that mode is more tailored for quick compiles and debugging.
For applications targeted at Power7 servers, we recommend the gcc options: -O3 -mtune=power7 -mcpu=power7
We recommend considering Advance Toolchain (with newer versions of gcc and improved CPU-tuned libraries) available from and supported by IBM.
- As an example, see the release notes for the Advance Toolchain 5.0 release. The latest Advance Toolchain libraries provide an accelerated tcmalloc library.
- The Advance Toolchain 6.0 release provides a newer GCC version and more POWER7 optimized libraries.
Check out the article on Improving performance with Advance Toolchain.
Recommended Performance Tools
Power Linux supports the normal suite of Linux performance tools. vmstat, mpstat, iostat, sar, top etc. We recommend that you be familiar with these. These are a part of the "sysstat" rpm package.
The command ppc64_cpu will tell you useful key information for your system. Check the CPU frequencies, SMT mode, etc. This is a part of the powerpc-utils rpm package.
Binding processes to CPUs can be done in a number of ways. The easiest way is with the command "taskset".
Looking at the numa topology of the server is easy, look at the command "numactl".
The first recommended deeper analysis tool is "oprofile". This is the standard oprofile. There is a lot of documentation available for oprofile.
Important note in May 2013: The operf version of oprofile will be the recommended approach with oprofile going forward. The legacy "opcontrol" version of oprofile is not expected to be updated and maintained in the future.
There are plenty of hardware counter tools available, but these are generally not needed until later in the analysis process. Search for "perf" tools in this context. By the way, oprofile also can easily profile with specific hardware counters.
For information on optimizing malloc, see the malloc paper on tuning and optimization techniques.
There are common and easy performance tools available - as on any Linux system. Check out Red Hat's documentation on "profilers".
- valgrind (part of the IBM SDK)
- performance counters for Linux (pcl) aka "perf"
Tuning Guide information for PowerLinux
- Taking advantage of DSCR on Power systems
- Compilers and Optimization Tools for C/C++/Fortran
- Empirical Performance Analysis using the IBM SDK for PowerLinux
- Deeper Empirical Analysis for improving performance
The Power Instruction Set Architecture
The PowerISA 2.06B (Power7) document is available at https://www.power.org/wp-content/uploads/2012/07/PowerISA_V2.06B_V2_PUBLIC.pdf
Power7 Performance Analysis with hardware counters
Comprehensive PMU Event Reference - POWER7
There are currently 557 events that can be measured using the POWER7 Performance Monitor Unit instrumentation. These events can be measured using tools like hpmcount (AIX) and perf (Linux). This document provides details on each event and how it is triggered. Each entry lists the event name, a brief event description and a detailed description. This is a reference document for anyone interested in characterizing the performance of an application on POWER7 systems.
Commonly Used Metrics for Performance Analysis - POWER7
The first step in optimizing an application is characterizing how well the application runs on a POWER7 system.
First, this paper briefly covers the POWER7 execution pipeline and the PMU hardware. Then it introduces some AIX and Linux tools that can be used to collect hardware events. Finally, the paper discusses several useful sets of metrics that can characterize how applications run on POWER7 subsystems.
While not an exhaustive list, these metrics do cover many common areas of concern such as the CPI stack, address translation and memory fabric utilization.
For a practical article on CPI Breakdown, see this DeveloperWorks CPI Breakdown article.
Evaluate performance for Linux on POWER. Learn to evaluate Linux on POWER® performance issues that focus on compiled language (such as C or C++) environments. This article explains the POWER7® CPI model and demonstrates the use of commonly available Linux® tools to show potential CPU stalls, pipeline hazards, and performance issues. Analyze and optimize an algorithm for POWER7 in the final section.