Optimizing AIX 5L performance: Monitoring your CPU, Part 3

Controlling thread usage and CPU binding

Part 3 of this series focuses on arguably the least understood area of Central Processing Unit (CPU) performance tuning: controlling thread usage and CPU binding. This article addresses key tools and utilities you can use to analyze threads and administrate your processes.

Ken Milberg, Future Tech UNIX Consultant, Technology Writer, and Site Expert, Future Tech

Ken Milberg is a Technology Writer and Site Expert for techtarget.com and provides Linux technical information and support at searchopensource.com. He is also a writer and technical editor for IBM Systems Magazine, Open Edition. Ken holds a bachelor's degree in computer and information science and a master's degree in technology management from the University of Maryland. He is the founder and group leader of the NY Metro POWER-AIX/Linux Users Group. Through the years, he has worked for both large and small organizations and has held diverse positions from CIO to Senior AIX Engineer. Today, he works for Future Tech, a Long Island-based IBM business partner. Ken is a PMI certified Project Management Professional (PMP), an IBM Certified Advanced Technical Expert (CATE, IBM System p5 2006), and a Solaris Certified Network Administrator (SCNA). You can contact him at kmilberg@gmail.com.



15 May 2007

Also available in Chinese Russian

About this series

This three-part series focuses on the various aspects of Central Processing Unit (CPU) performance and monitoring. The first installment of the series provides an overview of how to efficiently monitor your CPU, discusses the methodology for performance tuning, and gives considerations that can impact performance, either positively or negatively. Though the first part of the series goes through some commands, the second installment focuses much more on the detail of actual CPU systems monitoring and analyzing trends and results. The third installment focuses on proactively controlling thread usage and other ways to tune your CPU to maximize performance. Throughout this series, I'll also expound on various best practices of AIX® CPU performance tuning and monitoring.

Introduction

This article covers threads, processes, and CPU binding. It also discusses how to use several of the tools illustrated in prior installments to make changes to your systems. The most important commands used to tune the CPU scheduler and the various methods of binding threads that are available on AIX Version 5.3 are also covered.

A junior administrator might consider process management nothing more than monitoring active processes and possibly killing runaway or zombie processes. You'll find out that there is a lot more to process management than using the kill command, or even nice. The fundamental question that needs to be answered before moving forward is how processes relate to threads. The answer is surprisingly simply. The process is the actual entity that AIX uses to control the use of system resources, while the threads control the actual time consumption, as each kernel thread is a single sequential flow of control. Each process is made up of one or more threads. Controlling thread usage is where you can make a difference. To do this, you need to understand the tools that allow you to work with threads to improve your CPU performance, which is the scope of this final part of the series.

Thread monitoring

In this section, I discuss the tools and commands that are available to help you monitor and analyze thread usage. While AIX Version 4 introduced the usage of threads to control processor time consumption, it was in AIX 5L™ where system management tools really evolved to help you monitor and analyze the thread usage. One such tool is procmon, which was introduced in AIX Version 5.3.

Procmon displays a list of processes (changing dynamically while your system changes) that enable you to gather information about what is running on your system. Where it really stands out compared to other monitoring tools is that it actually allows you to run commands to facilitate process and thread management. Some of the critical information that it gathers with respect to performance tuning includes:

  • The actual amount of CPU time the process is using
  • The amount of memory and I/O that the process is using
  • The nice values of the process and their priorities

You can even kill jobs and renice them on the fly. Figure 1 gives a nice graphical representation of overall performance. To launch the Performance Workbench Platform, use: # perfwb.

Figure 1. Procmon partition performance tab
procmon partition performance tab

There is also a process table view, which can actually show you a list of threads in a sorted table. You just select Show threads metrics (see Figure 2).

Figure 2. Procmon processes tab
procmon processes tab

Other menus allow you to either kill processes or renice them (see Figure 3).

Figure 3. Procmon processes tab
procmon processes tab

So what exactly is nice? Usage of the nice command allows you to adjust the priority of a given process. Every process default value is 20. Using the renice command (either through Procmon or from the command line) can cause the system to either assign a higher or lower priority to a given process. When you do this, you actually change the value of the priority of a thread (default value of 40) by changing the nice value of its process.

When you use the -l flag with ps, you will see your nice information (see Listing 1).

Listing 1. nice information
# ps -l
       F S  UID   PID  PPID   C PRI NI ADDR    SZ    WCHAN    TTY  TIME CMD
  200001 A    0 12972 45770   0  60 20 dea6   764           pts/1  0:00 ksh
  200001 A    0 33816 12972   3  61 20 36168   440           pts/1  0:00 ps
  240001 A  207 45770 40374   0  60 20 258ec   744           pts/1  0:00 ksh

Let's start a new ksh with nice, changing the priority of the process: # nice --10 ksh (see Listing 2).

When you look at the process table again, you'll see that the priority of this process has changed from its default as well as the child process that was forked from it (ps).

Listing 2. A new ksh using nice
# ps -l
       F S  UID   PID  PPID   C PRI NI ADDR    SZ    WCHAN    TTY  TIME CMD
  200001 A    0 12972 45770   0  60 20 dea6   764           pts/1  0:00 ksh
  200001 A    0 17246 12972   0  50 10 68a1f   748           pts/1  0:00 ksh
  200001 A    0 18450 17246   1  50 10 51bb1   380           pts/1  0:00 ps
  240001 A  207 45770 40374   0  60 20 258ec   744           pts/1  0:00 ksh

You can also use the renice command (illustrated previously with Procmon in Figure 3) to dynamically reassign a priority to a running process.

Back to ps. If you want to see a more granular look at your threads, you would use the -mo flag (see Listing 3).

Listing 3. Using the -mo flag for a more granular look at your threads
# ps -mo THREAD
    USER   PID  PPID    TID ST  CP PRI SC    WCHAN        F     TT BND COMMAND
    root 12800 45770      - A    0  60  1        -   200001  pts/1   - -ksh
       -     -     -  56759 S    0  60  1        -    10400      -   - -
    root 44648 12800      - A    1  60  1        -   200001  pts/1   - ps -mo THREAD
       -     -     -  64905 R    1  60  1        -        0      -   - -
kmilberg 45770 40374      - A    0  60  1        -   240001  pts/1   - -ksh
       -     -     -  54005 S    0  60  1        -    10400      -   - -

Though most administrators usually use ps only when doing ps -ef, if you play around a bit more with its features, you will see that there is a lot more to ps then meets the eye.

Changing the priority of threads

Now that you know how to change the priority of processes, how do you do this with threads? This section shows how you can change some of the CPU scheduling parameters, which are used to calculate the priority value for each thread. You do this by using schedo (schedune in AIX Version 5.2 and earlier).

First, let's make sure you have the filesets (see Listing 4).

Listing 4. Checking for the filesets
# lslpp -lI bos.perf.tune
  Fileset                      Level  State      Description
  ----------------------------------------------------------------------------
Path: /usr/lib/objrepos
  bos.perf.tune             5.2.0.10  COMMITTED  Performance Tuning Support

Path: /etc/objrepos
  bos.perf.tune             5.2.0.10  COMMITTED  Performance Tuning Support

Now let's report back all the CPU parameters, as shown in Listing 5.

Listing 5. Reporting back all the CPU parameters
# schedo -a
              %usDelta = 100
          affinity_lim = 7
         big_tick_size = 1
      fixed_pri_global = 0
             force_grq = 0
idle_migration_barrier = 4
               maxspin = 16384
              pacefork = 10
               sched_D = 16
               sched_R = 16
             timeslice = 1
         v_exempt_secs = 2
         v_min_process = 2
           v_repage_hi = 0
         v_repage_proc = 4
            v_sec_wait = 1

Start with fixed_pri_global. The default setting is 0. When a CPU is ready to dispatch a thread, the global run queue is checked before any of the others. When the thread completes its running slice on the CPU, it gets put back on the queue. This helps maintain processor affinity (I'll get to this in a little bit). To improve overall thread performance, there is an environment variable called RT_GRQ that you can set to on. This automatically places the thread on the global run queue. All fixed priority threads will be placed on the run queue if you change the default from 0 to 1. You do this by: #schedo -o fix_pri_global=1.

Let's get back to threads. The actually priority of a user process varies over time, depending on the amount of overall CPU time that the process has used most recent. The parameters that you need to look at are sched_R and sched_D. The values for both are in 1/32 seconds and each has a default value of 16. Further, when a thread is created, the CPU value is zero. The more time that it spends on CPU, the more the usage increments. Essentially, the scheduler ages using the following formula: CPU usage = CPU usage*(D/32).

In this instance, if the D parameter is set to 32, the thread usage does not decrease—the default value (16) allows the usage to decrease over time, giving it more time on the CPU.

Each CPU has a dedicated run queue. A run queue is a list of runnable threads, sorted by thread priority value. There are 256 thread priorities (zero to 255). There is also an additional global run queue where new threads are placed.

Schedo is more commonly used to change the length of the scheduler time slice. To change the time slice, use the schedo -o timeslice=value option. Increasing the time slice might improve system throughput, due to reduced context switching. Before changing this, make sure you run vmstat enough to determine that there really is a considerable amount of context switching going on.

CPU binding

In this section, I introduce the topic of CPU binding, which is allowing processes to run on a specific processor. The term itself is called processor affinity. Process affinity has many purposes, some of which are even used during debugging. For example, you can bind threads to a given processor to find the root cause of a hanging program. It is generally used when trying to spread around the wealth of your system, in an SMP box, for example. The command that you use is the bindprocessor command. Assuming that simultaneous multithreading (SMT) is enabled (it is by default), each and every hardware thread of the physical processor is listed as a separate processor when running the bindprocessor command. On POWER5 chips, there are two hardware threads on each processor. With shared processor logical partitions (LPARs), using this command binds to virtual CPUs, so you must be very careful because it could cause problems for applications that are predisposed to run on a specific CPU. Let's first check to see if SMT is enabled (see Listing 6).

Listing 6. Checking to see if SMT is enabled
# smtctl

SMT is currently enabled.

Listing 7 shows the output of a two-way box with SMT enabled.

Listing 7. Output of a two-way box with SMT enabled
# bindprocessor -q
The available processors are:  0 1 2 3

If you want to bind a process to a particular CPU, it's as simple as this:

# bindprocessor 12741 2

Processor affinity also occurs naturally. When a thread is running on a CPU and gets interrupted, it usually gets placed back on the same CPU because the processor's cache might still have lines belonging to the thread. If it were to get dispatched to a different CPU, it might have to get information from RAM, which would slow down the processing time dramatically.

You can also bind threads using subroutines, though I would be very cautious when attempting to do so. What it does is bind all kernel threads in a process to a processor, which has the effect of forcing these threads to be run on that specific processor, until they are unbound.

Another important thread command used in programming is gprof. The gprof command produces an execution profile of your compiled programs, either in C, Pascal, FORTRAN, or even COBOL. gprof reports on your flow control through all the subroutines of your program and provides you with the amount of CPU time consumed by each subroutine. This is very useful when troubleshooting how processes consume CPU resources. The data is taken from the profile file (gmon.out). You can use gprof to profile your program and determine which functions are using the CPU. The profile data is taken from the call graph profile file (gmon.out by default). So what's different in AIX Version 5.3? Because AIX Version 5.3 allows the profiling of output files to have a user-specified name, by setting special environment variables, there is additional profiling support for threads and the options that affect the type of profiling data that is collected along with it.

Summary

In this article, I've discussed the importance of controlling thread usage and CPU binding. You've looked at the key tools and utilities used to analyze threads and administrate your processes. Further, you've tuned your kernel using schedo, learned all about processor affinity, and figured out how to bind CPUs. This three-part series on CPU monitoring first introduced the overall concepts of tuning, then went into monitoring and data collection, and concluded with systems tuning and administration. While most of you might be more familiar with tuning memory subsystems, I hope this series illustrated the importance of CPU monitoring and tuning.

Resources

Learn

Get products and technologies

  • IBM trial software: Build your next development project with software for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=219875
ArticleTitle=Optimizing AIX 5L performance: Monitoring your CPU, Part 3
publish-date=05152007