The 8.5 release of IBM® Rational® Developer for Power Systems Software™ introduced a new component called Performance Advisor, which provides a rich set of features for performance tuning C and C++ applications on IBM® AIX® and IBM® PowerLinux™.
Performance Advisor is both easy to use and very powerful. If you are new to performance tuning, you will find that Performance Advisor is a great way to get started, because the user interface is simple and the tool provides plenty of feedback and guidance. If you are an experienced performance tuner, you will find a rich set of tools that you can use to effectively isolate and fix performance issues.
Rational Developer for Power Systems Software (often called RD Power or RDp unofficially) is already well-known for its development and debugging tools, and Performance Advisor is well-integrated with those. You can use Performance Advisor as a stand-alone tool, or you can seamlessly integrate it into your existing code, build, test, and debug cycle.
This tutorial takes you through a day in the life of a Performance Advisor user.
Before we begin the tutorial, let's take a look under the hood to see how Performance Advisor works.
Performance Advisor gathers data from several sources. The raw application performance data comes from low-level operating system tools that sample the state of the processor and memory at regular intervals. The debug information generated by the compiler allows this data to be matched back to the original source code. XLC compilers can generate XML report files that provide information on optimizations that were performed during compilation. And finally, the application's build and runtime systems are analyzed to determine whether there are any potential environmental problems.
All of this data is automatically gathered, correlated, analyzed, and presented to you in a way that is quick to access and easy to understand. This makes it much easier to determine the best strategies for optimizing your application.
The main source of performance data on AIX is the
tprof command (on Linux, the equivalent tool is
While your application is running,
tprof will wake up approximately every 10 milliseconds to record a sample of the state of the processor's instruction pointer, which contains the memory address of the currently executing instruction. Each sample is called a tick. After the performance run is complete, the debug information generated by the compiler is used to map each tick to its corresponding source code line. From this correlated data, you can tell which parts of the program execute most often. These parts are called hot and are usually the best place to start looking for opportunities to optimize the program code.
You can run
tprof directly from the AIX command line, but it can be quite difficult to use this way. It is highly configurable and takes many command line options. The raw data files it generates are very large, and they can be time-consuming to analyze manually. Performance Advisor uses
tprof to gather the raw performance data, but this is done transparently. Therefore, as an end user, you never have to deal with these low-level tools directly.
Call stack sampling data is also collected from low-level system tools. This data comes from the
procstack command on AIX and from the
OProfile command on Linux.
The application's call stack is sampled at regular intervals, and all of the application's currently executing functions are recorded. You can then explore the runtime call paths leading to and from any function of interest by using a graphical viewer. With this information, you can answer questions such as, "Is this function hot because it takes too long to execute or because it is called too often?"
The performance of your application also depends on how it was built and on the environment where it runs. Often, a simple change to the build or runtime environment can have a large impact on an application's performance without actually having to change the application's source code.
Performance Advisor analyzes the build and runtime hosts and scores them based on several criteria; including hardware level, OS level, compiler version, and build options. A report called the System Scorecard is generated that provides recommendations for how to improve the system configuration for better application performance. This is a good place to start if you are looking for some low-hanging fruit.
XLC compilers can produce XML report files that describe optimizations that were performed during compilation. These reports are not strictly required, but if they are generated, more information will be available for analysis. One of the most interesting things that these reports reveal is the location of function calls that were inlined during compilation.
You can use the Performance Advisor to analyze and compare the performance of an application across several machines running AIX or PowerLinux, but to keep things simple, we will focus on performance tuning on a single AIX machine.
A sample application called RayTracer is provided in the Downloads section, because we use it throughout this article. RayTracer is a small C++ application that generates image files, picturing various geometric shapes. We will use Performance Advisor to incrementally improve the performance of this application and to compare its performance to a baseline as we go.
You can download the RayTracer sample program and follow along with the demonstration, but keep in mind that performance results are dependent on the system where the application runs, so you will get different numbers from the ones shown in the upcoming examples.
You will need access to an AIX server with XLC 11.1 and the Rational Developer for Power Systems Software server component installed.
- Begin by switching to the Performance Advisor perspective (Window > Open Perspective > Other > Performance Advisor).
Figure 1. Performance Advisor perspective
The first thing that you need is a connection to the remote machine where the application will be built and run.
- Find the Remote Systems view and, under New Connection, right-click the AIX node and select New Connection, and then follow the wizard.
Figure 2. Creating a new connection
Next, we need a remote C++ project that we will use to edit and build our application.
- Extract the RayTracer source code to a folder somewhere on the remote machine.
- From the main menu, select New > Remote C/C++ Project.
Figure 3. Creating a remote C/C++ project
- Follow the wizard to set up the project. (There are a few different types of remote projects in Rational Developer for Power Systems Software, but Performance Advisor supports all of them.)
Figure 4. The new remote C/C++ project wizard
The application must be compiled with debug information enabled to collect line-level performance data. For both XLC and GCC compilers, this is done by passing the
-g option. Additionally, if you are using XLC, you will need the
-qlistfmt=xml=all option to generate XML transformation reports during a build.
For a more detailed overview of compiler options used by Performance Advisor, please see the documentation. The makefile provided with RayTracer uses the correct options for XLC on the AIX platform.
We want to be able to launch the RayTracer application from within the IDE.
- From the main menu select Run > Run Configurations.
Figure 5. The Run Configurations dialog window
- In the dialog window, double-click Remote Compiled Application, and then browse to the location of the RayTracer executable file.
If you click the Run button, you will see that RayTracer is launched on the remote machine and the console output shows locally in the Console View.
Now comes the fun part: performance tuning the application.
The main view in Performance Advisor is the Performance Explorer view. From here, you can start performance runs, organize the data, and analyze the results.
Performance runs are organized by using two artifacts: Sessions and Activities. Each Activity represents a single performance run of the application, and a Session is just a list of Activities. There are two types of Activities, System Scorecard and Hotspot Detection. Both will be covered in the upcoming steps.
- To create a Session, click the New Session toolbar button at the top of the Performance Explorer view to open the New Performance Tuning Session wizard.
Figure 6. New Session Toolbar Button
Figure 7. New Performance Tuning Session wizard, first page
You can use the New Performance Tuning Session wizard to configure complex scenarios, such as tuning large applications across several machines. For this tutorial however, we will set up a simple scenario where we build and tune our small RayTracer application on a single machine.
The first page of the wizard asks for the following information:
- Name of the session
- Build host: Rational Developer for Power Systems Software 8.5 introduces a new feature that allows a remote project to be synchronized across more than one host. For projects using this feature, you would select the build host at this point. This tutorial uses only one host, so the default is correct.
- Runtime host: Performance Advisor supports a scenario where you can build on one host but execute the performance run on a different host. This is intended for organizations that have dedicated performance testing machines, or if you just want to test your application on a different machine than where you built it, but you don't want to copy all of the project files there. For this tutorial, leave the runtime host the same as the build host.
- Temporary data directory: During performance data collection, some temporary files are created. You need to provide a folder to store these files during the run (they are automatically deleted after the run). Click the Use Default button to pick a default location under your home directory.
- After you have provided the information, click Next.
Figure 8. New Performance Tuning Session wizard, second page
In the second page of the wizard, you provide the location of multiple executables and shared libraries that make up the application. This information is used to provide more accurate recommendations (this data can also be updated after the session is created).
- Browse to the location of the RayTracer executable file, and add it to the list of the executables.
- Click Next.
Figure 9. New Performance Tuning Session wizard, third page
The third page will prompt you to create a System Scorecard Activity along with the new Session. This is a good idea to do if you are performance tuning on a particular remote machine for the first time. On this page, you can specify a minimum and a preferred Power platform version. If you do this, the recommendations that are generated later will be focused on the platforms that you care about most.
- For this simple example, just leave the defaults and click Finish.
The new Session and Activity show in the Performance Explorer view.
Figure 10. Performance Explorer with the new Session and Activity
The System Scorecard Activity starts in the new state, meaning that it is ready to run. The bottom panel of the Performance Explorer is used to run Activities.
- Select the System Scorecard Activity, and click the Begin Data Collection button.
Figure 11. Running a System Scorecard Activity
The Activity goes into the running state. In the background, Performance Advisor is analyzing the runtime host and the executable. When this process is finished, the Activity goes into the complete state. The time of completion now appears next to the Activity.
- Double-click the Activity to open the Scorecard Viewer.
Figure 12. The System Scorecard viewer
Here (Figure 12), we can see that we are actually missing some key best practices. It turns out that RayTracer was initially built without turning on compiler optimization.
- Let's find out what we can do about this by clicking the link that says 2 recommendations to open the Recommendations view.
The Recommendations view shows automatically generated recommendations for the currently selected Activity. Each recommendation indicates a Confidence Level, and the higher the confidence level, the more likely that following the recommendation will have a positive impact on performance.
Figure 13. The Recommendations view
Performance Advisor has determined that the application should be rebuilt using the
-O compiler option. This is a quick and easy way to get better performance.
Increasing optimization levels will also probably lengthen the time that it takes for a build to finish. Performance Advisor will selectively recommend optimization level increases specifically for the hottest parts of your application. That way, you get the most benefit by optimizing the hot parts of the application while avoiding spending time optimizing parts that have little effect on overall performance.
Performance Advisor has recommended that we rebuild RayTracer with a higher compiler optimization level. But before we do that, it's a good idea to establish a baseline for comparison. That means executing a performance run of the application before making any changes.
- Right-click the Session, and select New Activity.
- Create a new Hotspot Detection Activity, name it
Hotspot Detection 1, and use the launch configuration that you created previously.
Figure 14. The New Activity window
The new Activity will appear in the Performance Explorer view.
Figure 15. Performance Explorer with hotspot detection
- Select the new Activity, and click the button that says Launch Program and Collect Data.
- When the Activity is complete, right-click it, and select Set as Baseline.
Figure 16. Setting the baseline
Figure 17. The Baseline Activity
- Open the makefile, add
-O2to the compiler options, and rebuild the application.
Figure 18. The makefile editor
Now let's see what effect this has on the application's performance.
- Create another Hotspot Detection Activity, name it
Hotspot Detection 2, and run it.
- When that has finished, right-click it, and select Compare with Baseline.
The Hotspots Comparison browser shown in Figure 19 will open.
Figure 19. Hotspots Comparison browser
This viewer compares the results of two performance runs. At the top of the viewer, it says that the application got approximately 2 times faster. That's a big result from such a small change!
Now let's try making a change to the application itself. But before we can do that, we need to look at the performance data and figure out what change we should make.
Double-click the Hotspot Detection 2 Activity to open the Hotspots Browser.
Figure 20. Hotspots browser
The left pane of the Hotspots Browser shows the Process Hierarchy Tree. This tree shows all of the processes and threads that were sampled during the performance run.
Processes that correspond to the application being profiled are isolated under the My Application node. All other processes running on the system at the same time as the application show up under the Other Processes node.
You can expand the My Application node to drill down and examine the processes, threads, and modules that make up the application. RayTracer is single-threaded, so only one thread is shown in this example. For multithreaded or multiprocess applications, each thread and process can be examined individually or as a group.
You can select a node in the Process Hierarchy Tree to see the functions that were sampled in that level of the hierarchy. By default, the functions are sorted by the amount of time that each function takes from the profile. The hottest functions are at the top of the list and make a good starting point for performance tuning.
By looking at this data, we can see that the
sqrt function is taking up a significant percentage of the application's execution time. But
sqrt is a library function, so we can't directly alter it. Besides, it's not our job to optimize library code. So instead, let's try to find out how
sqrt is used by our application.
- Right-click on the sqrt entry in the function hotspots table, and select Show callers/callees, which will open the Invocations Browser.
Figure 21. Invocations browser
The Invocations Browser is focused on one function at a time, and it shows a graphical representation of all of the sampled call stacks that include that function.
The Invocations Browser is very flexible. You can zoom in and out, isolate specific call paths, and focus on different parts of the application. Here, we can see that the hottest function that calls
sqrt is this one:
MyShape::sphere_find_intersectionLet's look at the code for that function.
- Right-click the sphere_find_intersection node, and select Open Source.
Figure 22. Open Source option
The Performance Source Viewer opens on the source code for this function:
The viewer displays your source code along with line-level performance data (see Figure 23). To the left of the code, you can see how much execution time each individual line of code takes from the total profile.
Figure 23. Performance source code viewer
Next to the Performance Source Viewer is the standard Outline View. When the Performance Source Viewer is open, the Outline View shows a breakdown of the code blocks within each function in the file. You can use this to find hot blocks of code, such as hot loops.
Figure 24. The standard Eclipse outline view
- Back in the Performance Source Viewer, click the first toolbar button to jump to the hottest line of code in the file.
It's no surprise that the hottest line contains a call to the
Figure 25. Go to the hottest line
It looks like there's an opportunity for a simple optimization. The value returned by the
sqrt function is not used in the first branch of the following
if statement. Changing the code so that
sqrt is called only in the
else branch might have a positive effect on performance, so let's try making that change.
The code cannot be directly edited in the Performance Source Viewer because the line-level performance data would not line up correctly after a change.
- Click the Switch to Editor toolbar button.
Figure 26. The remote C/C++ editor
- Change the code so that
sqrtis called only in the
elsebranch, and then rebuild the application.
- Create another Hotspot Detection activity, name it
Hotspot Detection 3, and run it.
- When it is complete, compare it to the previous Activity by right-clicking it and selecting Compare with Previous.
Figure 27. Select Compare with previous
As the screen capture in Figure 28 shows, the code change has had a positive impact on performance: the application is now 2.353 times faster than the previous run.
Figure 28. Comparison browser
The function impact table shows that
sqrt has had a significant impact on the change. But there are three functions above
sqrt that have even bigger impacts. By looking at the Invocations Browser (Figure 29), we can see that these functions are all called by
sqrt, so by reducing the calls to
sqrt, we also reduce the calls to those functions.
Figure 29. Invocations browser
Now let's compare Hotspot Detection 3 to the baseline.
Figure 30. Comparison browser
The application is 4.785 times faster as a combined result of the two changes. Not bad!
We can go one step further and compare the line-level performance for
sphere_find_intersection before and after the change.
- Open the Performance Source viewer on sphere_find_intersection in Hotspot Detection 2.
Looking at the viewer, you can see the original code from before the change (along with the original performance data).
- Now open the viewer on the same file in Hotspot Detection 3.
- Dock the two viewers next to each other to see both at the same time.
Figure 31. Comparing the source code
At this point, you might be thinking, "But I edited that file. How can I see the code from before the change was made?"
Performance Advisor includes a feature called Automatic Source Tracking. Every time you execute a performance run, the states of all of the source files in the project are saved as a snapshot. If you go back to view performance data from previous runs, you will see exactly what the code looked like at that time. This enables you to view detailed comparisons of line-level performance data between any two performance runs. This feature is completely transparent and works automatically. It does not interfere with any version control system that you might already be using, such as IBM® Rational Team Concert™.
- Let's go back a bit and take another look at the comparison between Hotspot Detection 1 and Hotspot Detection 2.
Figure 32. Comparison browser
Several of the top functions have no speedup information. In fact, they say "Not detected in Activity Hotspot Detection 2." What's going on here?
The only difference between the two runs is that we turned on compiler optimization in the second run. One of the main optimizations performed by the compiler is function inlining.
- Open the Performance Source viewer on the RayTrace.cpp file in Hotspot Detection 1.
The viewer shows nothing terribly interesting. There is some line-level data, but all it really tells us is that there are no hot lines of code in this file.
Figure 33. RayTrace.cpp from Hotspot Detection
- Now open the same file in Hotspot Detection 2.
Figure 34. RayTrace.cpp from Hotspot Detection 2
There's a lot more information in the viewer for the second run. The little arrow icons on the left side of the viewer indicate lines of code that contain function calls that were inlined by the compiler.
- Hover the mouse cursor over an arrow to get a pop-up window that shows which functions were inlined.
Clicking on an arrow icon will expand the source to show the source for the inlined function directly inline with the rest of the code.
Figure 35. Exploring function inlining
The reason that several of the top functions in the Function Comparison browser were not detected in the second run is that the compiler fully inlined those functions into their call sites. This had a large positive impact on the application's performance. The result of this impact can be explored at a high level of detail by using the Performance Source viewer's support for function inlining.
Performance Advisor provides a rich set of tools for performance tuning C/C++ applications on AIX and PowerLinux. This tutorial provided a quick overview of the most basic features.
Here is a list of some of the more advanced features not covered in this article:
- The Recommendations view is always available. Recommendations are generated for every Activity and are an excellent way to get guidance on where to look next for performance opportunities.
- Complex tuning scenarios are supported. You can build and performance test your application across many AIX and PowerLinux servers, and then compare the results.
- You do not have to be sitting in front of your computer to run a performance test. Performance Advisor allows you to schedule performance runs for when you are offline. For example, you can schedule a performance run for overnight and analyze the data the next day.
- Performance Advisor comes with a set of shell scripts that can be used to execute a performance run on servers that do not have Rational Developer for Power Systems Software installed or on servers that you do not have direct access to. Simply give the scripts to your clients to run, and then import the resulting data for analysis.
- Performance data can be shared between team members by using the import and export capabilities.
- To learn more, start with the developerWorks page, and then review the Rational Developer for Power Systems Software product information, features and benefits page, as well as the product line overview.
- For documentation, see these information centers:
- Rational Developer for Power Systems Software
- XL C/C++ for AIX
- oprofile page on SourceForge.net
- tprof command page (AIX 7.1)
- procstack command page (AIX 7.1)
- Visit the Rational software area on developerWorks for technical resources and best practices for Rational Software Delivery Platform products.
- Subscribe to the developerWorks weekly email newsletter, and choose the topics to follow.
- Stay current with developerWorks technical events and webcasts focused on a variety of IBM products and IT industry topics.
- Attend a free developerWorks Live! briefing to get up-to-speed quickly on IBM products and tools, as well as IT industry trends.
- Watch developerWorks on-demand demos, ranging from product installation and setup demos for beginners to advanced functionality for experienced developers.
- Improve your skills. Check the Rational training and certification catalog, which includes many types of courses on a wide range of topics. You can take some of them anywhere, any time, and many of the "Getting Started" ones are free.
Get products and technologies
- Download the free trial so you can evaluate Rational Developer for Power Systems Software with Performance Advisor.
- Download a free trial version of other Rational software.
- Evaluate other IBM software in the way that suits you best: Download it for a trial, try it online, use it in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement service-oriented architecture efficiently.
- Join the Rational Developer for Power Hub forum to ask questions and participate in discussions.
- Rate or review Rational software. It's quick and easy.
- Share your knowledge and help others who use Rational software by writing a developerWorks article. Find out what makes a good developerWorks article and how to proceed.
- Follow Rational software on Facebook, Twitter (@ibmrational), and YouTube, and add your comments and requests.
- Ask and answer questions and increase your expertise when you get involved in the Rational forums, cafés, and wikis.
- Get connected. Join the Rational community to share your Rational software expertise and get connected with your peers.