About this series
This three-part series focuses on the various aspects of Central Processing Unit (CPU) performance and monitoring. The first installment of the series provides an overview of how to efficiently monitor your CPU, discusses the methodology for performance tuning, and gives considerations that can impact performance, either positively or negatively. Though the first part of the series goes through some commands, the second installment focuses much more on the detail of actual CPU systems monitoring and analyzing trends and results. The third installment focuses on proactively controlling thread usage and other ways to tune your CPU to maximize performance. Throughout this series, I'll also expound on various best practices of AIX® CPU performance tuning and monitoring.
This article covers threads, processes, and CPU binding. It also discusses how to use several of the tools illustrated in prior installments to make changes to your systems. The most important commands used to tune the CPU scheduler and the various methods of binding threads that are available on AIX Version 5.3 are also covered.
A junior administrator might consider process management nothing more than monitoring
active processes and possibly killing runaway or zombie processes. You'll find out that
there is a lot more to process management than using the
kill command, or
nice. The fundamental question that needs to be answered before moving
forward is how processes relate to threads. The answer is surprisingly simply. The
process is the actual entity that AIX uses to control the use of system resources, while
the threads control the actual time consumption, as each kernel thread is a single
sequential flow of control. Each process is made up of one or more threads. Controlling
thread usage is where you can make a difference. To do this, you need to understand the
tools that allow you to work with threads to improve your CPU performance, which is the
scope of this final part of the series.
In this section, I discuss the tools and commands that are available to help you monitor and analyze thread usage. While AIX Version 4 introduced the usage of threads to control processor time consumption, it was in AIX 5L™ where system management tools really evolved to help you monitor and analyze the thread usage. One such tool is procmon, which was introduced in AIX Version 5.3.
Procmon displays a list of processes (changing dynamically while your system changes) that enable you to gather information about what is running on your system. Where it really stands out compared to other monitoring tools is that it actually allows you to run commands to facilitate process and thread management. Some of the critical information that it gathers with respect to performance tuning includes:
- The actual amount of CPU time the process is using
- The amount of memory and I/O that the process is using
- The nice values of the process and their priorities
You can even
kill jobs and
renice them on the fly. Figure 1 gives a nice graphical representation of overall
performance. To launch the Performance Workbench Platform, use:
Figure 1. Procmon partition performance tab
There is also a process table view, which can actually show you a list of threads in a sorted table. You just select Show threads metrics (see Figure 2).
Figure 2. Procmon processes tab
Other menus allow you to either
kill processes or
(see Figure 3).
Figure 3. Procmon processes tab
So what exactly is
nice? Usage of the
nice command allows you to
adjust the priority of a given process. Every process default value is 20. Using the
renice command (either through Procmon or from the command line) can cause
the system to either assign a higher or lower priority to a given process. When you do
this, you actually change the value of the priority of a thread (default value of 40) by
nice value of its process.
When you use the
-l flag with
ps, you will see your
nice information (see Listing 1).
Listing 1. nice information
# ps -l F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 200001 A 0 12972 45770 0 60 20 dea6 764 pts/1 0:00 ksh 200001 A 0 33816 12972 3 61 20 36168 440 pts/1 0:00 ps 240001 A 207 45770 40374 0 60 20 258ec 744 pts/1 0:00 ksh
Let's start a new
nice, changing the priority of the
# nice --10 ksh (see Listing 2).
When you look at the process table again, you'll see that the priority of this process has
changed from its default as well as the child process that was forked from it
Listing 2. A new ksh using nice
# ps -l F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 200001 A 0 12972 45770 0 60 20 dea6 764 pts/1 0:00 ksh 200001 A 0 17246 12972 0 50 10 68a1f 748 pts/1 0:00 ksh 200001 A 0 18450 17246 1 50 10 51bb1 380 pts/1 0:00 ps 240001 A 207 45770 40374 0 60 20 258ec 744 pts/1 0:00 ksh
You can also use the
renice command (illustrated previously with Procmon in
Figure 3) to dynamically reassign a priority to a running
ps. If you want to see a more granular look at your threads, you
would use the
-mo flag (see Listing 3).
Listing 3. Using the
-mo flag for a more
granular look at your
# ps -mo THREAD USER PID PPID TID ST CP PRI SC WCHAN F TT BND COMMAND root 12800 45770 - A 0 60 1 - 200001 pts/1 - -ksh - - - 56759 S 0 60 1 - 10400 - - - root 44648 12800 - A 1 60 1 - 200001 pts/1 - ps -mo THREAD - - - 64905 R 1 60 1 - 0 - - - kmilberg 45770 40374 - A 0 60 1 - 240001 pts/1 - -ksh - - - 54005 S 0 60 1 - 10400 - - -
Though most administrators usually use
ps only when doing
ps -ef, if you play around a bit more with its features, you will see that
there is a lot more to
ps then meets the eye.
Changing the priority of threads
Now that you know how to change the priority of processes, how do you do this with threads? This section shows how you can change some of the CPU scheduling parameters, which are used to calculate the priority value for each thread. You do this by using schedo (schedune in AIX Version 5.2 and earlier).
First, let's make sure you have the filesets (see Listing 4).
Listing 4. Checking for the filesets
# lslpp -lI bos.perf.tune Fileset Level State Description ---------------------------------------------------------------------------- Path: /usr/lib/objrepos bos.perf.tune 184.108.40.206 COMMITTED Performance Tuning Support Path: /etc/objrepos bos.perf.tune 220.127.116.11 COMMITTED Performance Tuning Support
Now let's report back all the CPU parameters, as shown in Listing 5.
Listing 5. Reporting back all the CPU parameters
# schedo -a %usDelta = 100 affinity_lim = 7 big_tick_size = 1 fixed_pri_global = 0 force_grq = 0 idle_migration_barrier = 4 maxspin = 16384 pacefork = 10 sched_D = 16 sched_R = 16 timeslice = 1 v_exempt_secs = 2 v_min_process = 2 v_repage_hi = 0 v_repage_proc = 4 v_sec_wait = 1
fixed_pri_global. The default setting is 0. When a CPU is ready to
dispatch a thread, the global run queue is checked before any of the others. When the
thread completes its running slice on the CPU, it gets put back on the queue. This helps
maintain processor affinity (I'll get to this in a little bit). To improve overall thread
performance, there is an environment variable called RT_GRQ that you can set to on. This
automatically places the thread on the global run queue. All fixed priority threads will
be placed on the run queue if you change the default from 0 to 1. You do this by:
#schedo -o fix_pri_global=1.
Let's get back to threads. The actually priority of a user process varies over time,
depending on the amount of overall CPU time that the process has used most recent. The
parameters that you need to look at are
The values for both are in 1/32 seconds and each has a default value of 16. Further, when
a thread is created, the CPU value is zero. The more time that it spends on CPU, the more
the usage increments. Essentially, the scheduler ages using the following formula:
CPU usage = CPU usage*(D/32).
In this instance, if the
D parameter is set to 32, the thread usage does not
decrease—the default value (16) allows the usage to decrease over
time, giving it more time on the CPU.
Each CPU has a dedicated run queue. A run queue is a list of runnable threads, sorted by thread priority value. There are 256 thread priorities (zero to 255). There is also an additional global run queue where new threads are placed.
Schedo is more commonly used to change the length of the scheduler time slice. To change
the time slice, use the
schedo -o timeslice=value option. Increasing the
time slice might improve system throughput, due to reduced context switching. Before
changing this, make sure you run vmstat enough to determine that there really is a
considerable amount of context switching going on.
In this section, I introduce the topic of CPU binding, which is allowing processes to run
on a specific processor. The term itself is called processor affinity. Process affinity
has many purposes, some of which are even used during debugging. For example, you can
bind threads to a given processor to find the root cause of a hanging program. It is
generally used when trying to spread around the wealth of your system, in an SMP box, for
example. The command that you use is the
bindprocessor command. Assuming
that simultaneous multithreading (SMT) is enabled (it is by default), each and every
hardware thread of the physical processor is listed as a separate processor when running
bindprocessor command. On POWER5 chips, there are two hardware threads
on each processor. With shared processor logical partitions (LPARs), using this command
binds to virtual CPUs, so you must be very careful because it could cause problems for
applications that are predisposed to run on a specific CPU. Let's first check to see if
SMT is enabled (see Listing 6).
Listing 6. Checking to see if SMT is enabled
# smtctl SMT is currently enabled.
Listing 7 shows the output of a two-way box with SMT enabled.
Listing 7. Output of a two-way box with SMT enabled
# bindprocessor -q The available processors are: 0 1 2 3
If you want to bind a process to a particular CPU, it's as simple as this:
# bindprocessor 12741 2
Processor affinity also occurs naturally. When a thread is running on a CPU and gets interrupted, it usually gets placed back on the same CPU because the processor's cache might still have lines belonging to the thread. If it were to get dispatched to a different CPU, it might have to get information from RAM, which would slow down the processing time dramatically.
You can also bind threads using subroutines, though I would be very cautious when attempting to do so. What it does is bind all kernel threads in a process to a processor, which has the effect of forcing these threads to be run on that specific processor, until they are unbound.
Another important thread command used in programming is
gprof command produces an execution profile of your compiled programs,
either in C, Pascal, FORTRAN, or even COBOL.
gprof reports on your flow
control through all the subroutines of your program and provides you with the amount of
CPU time consumed by each subroutine. This is very useful when troubleshooting how
processes consume CPU resources. The data is taken from the profile file (gmon.out). You
gprof to profile your program and determine which functions are
using the CPU. The profile data is taken from the call graph profile file (gmon.out by
default). So what's different in AIX Version 5.3? Because AIX Version 5.3 allows the
profiling of output files to have a user-specified name, by setting special environment
variables, there is additional profiling support for threads and the options that affect
the type of profiling data that is collected along with it.
In this article, I've discussed the importance of controlling thread usage and CPU
binding. You've looked at the key tools and utilities used to analyze threads and
administrate your processes. Further, you've tuned your kernel using
learned all about processor affinity, and figured out how to bind CPUs. This three-part
series on CPU monitoring first introduced the overall concepts of tuning, then went into
monitoring and data collection, and concluded with systems tuning and administration.
While most of you might be more familiar with tuning memory subsystems, I hope this
series illustrated the importance of CPU monitoring and tuning.
- Optimizing AIX 5L performance: Check out other parts in this series.
- High-Performance Architecture with a History: Read this paper for a brief description of PowerPC® architecture.
- "Processor Affinity on AIX" (developerWorks, November 2006): Using process affinity settings to bind or unbind threads can help you find the root cause of troublesome hang or deadlock problems. Read this article to learn how to use processor affinity to restrict a process and run it only on a specified central processing unit (CPU).
- "CPU Monitoring and Tuning" (developerWorks, March 2002): Read this article to learn how standard AIX tools can help you determine CPU bottlenecks.
- "nmon performance: A free tool to analyze AIX and Linux® performance" (developerWorks, February 2006): This free tool gives you a huge amount of information all on one screen.
- "nmon analyser—A free tool to produce AIX performance reports" (developerWorks, April 2006): Read this article to learn how to produce a wealth of report-ready graphs from nmon output.
- Check out other articles and tutorials written by Ken Milberg
- AIX and UNIX: The AIX and UNIX developerWorks zone provides a wealth of information relating to all aspects of AIX systems administration and expanding your UNIX skills.
- Safari bookstore: Visit this e-reference library to find specific technical resources.
- developerWorks technical events and webcasts: Stay current with developerWorks technical events and webcasts.
- Podcasts: Tune in and catch up with IBM technical experts.
- Future Tech: Visit Future Tech's site to learn more about their latest offerings.
Get products and technologies
- IBM trial software: Build your next development project with software for download directly from developerWorks.
- Participate in the developerWorks blogs and get involved in the developerWorks community.
- Participate in the AIX and UNIX forums