The most important part of tuning your memory subsystem does not involve actual tuning. Before tuning your system, you must have a strong understanding of what is actually going on in the host system. To do that, an AIX® administrator must know which tools to use and the how to analyze the data that he or she will be capturing. To reiterate what I had discussed in some other tuning documents published recently (see Resources), you cannot properly tune a system without first monitoring the host, whether it's running as a logical partition (LPAR) or on its own physical server. There are many commands that allow you to capture and analyze data, so you'll need to understand what they are and which ones are most suitable for the intended job. After you capture your data, you need to analyze the results. What might initially look like a Central Processing Unit (CPU) problem can be properly diagnosed as a memory or I/O problem, assuming you are using the right tools to capture data and understand how to do the analysis. Only when this is properly done can you really consider making changes in your system. Just as a medical doctor cannot treat an illness without knowledge of your history and the symptoms you are experiencing, you also need to come up with a diagnosis before tuning your subsystems. Tuning your memory subsystem when you have a CPU or I/O bottleneck will not help you and it might even hurt the health of the host.
This article helps you understand the importance of getting the diagnosis correct as well. You will see that performance tuning is much more than actual tuning itself. Some of the tools you will be looking at are generic monitoring tools that are available on all flavors of UNIX, while others were written specifically for AIX. I will point out some of the tools that have been optimized for AIX Version 5.3 and the new ones developed specifically for AIX 5.3 systems.
I can't reiterate enough the importance of generating baseline data. The time to be monitoring your system is not when you get that ticket from the Help Desk complaining about poor performance. Data should be captured on your servers as soon as they are put into production. If you do this, you can be proactive in your tuning, with the objective of actually finding the problem before the user points it out to you. How can you determine if the data they are looking at substantiates a performance issue without looking at data when the performance on the box was acceptable. This is all part of appropriate performance tuning methodology; capturing data effectively and properly analyzing the results and the trends. Let's get on with it.
UNIX generic memory monitoring
In this section, I provide an overview of generic UNIX tools available on all
vmstat. Most of these tools allow
you to quickly troubleshoot a performance problem, but they are not really geared
for historical trending and analysis.
Most administrators tend to shy away from ever using the
ps command to troubleshoot a possible memory
bottleneck. In fact, I would add that many UNIX administrators don't even know
that you can use
ps to help you determine the cause of
a memory problem. The most commonly used function of
is to look at the processors running on your systems (see
Listing 1. Using ps to look at the processors running on your system
# ps -ef | more UID PID PPID C STIME TTY TIME CMD root 1 0 0 May 03 - 0:03 /etc/init root 11244 19154 0 0:00 <defunct> root 11384 1 0 May 03 - 0:00 /usr/lib/errdemon root 12434 16618 0 May 03 - 0:29 /usr/opt/ifor/bin/i4llmd -b -n wc clwts -l /var/ifor/llmlg root 13218 16618 0 May 03 - 0:00 /usr/sbin/rsct/bin/IBM.AuditRMd root 13440 1 0 May 03 - 0:00 /usr/ccs/bin/shlap root 13690 13954 0 May 03 - 0:00 dtlogin <:0> -daemon root 13954 1 0 May 03 - 0:00 /usr/dt/bin/dtlogin -daemon
As you can see, there is not much here that can help you determine a memory
bottleneck. The command in Listing 2 shows you the memory
usage for each active process running on your system, sorted in a nice format.
ps the old fashioned Berkeley Software
Distribution (BSD) way, without the dash. What I like about this command is that
you don't have to call up any GUI-type tools to quickly get a sense of what is
going on from a memory perspective (see Listing 2).
Listing 2. Memory usage for each active process
. # ps gv | head -n 1; ps gv | egrep -v "RSS" | sort +6b -7 -n -r PID TTY STAT TIME PGIN SIZE RSS LIM TSIZ TRS %CPU %MEM COMMAND 15256 - A 64:15 755 2572 2888 xx 2356 316 0.9 0.0 /usr/lpp/ 22752 - A 0:08 261 1960 1980 32768 465 20 0.0 0.0 dtwm 14654 - A 0:00 324 1932 1932 xx 198 0 0.0 0.0 /usr/sbin 20700 - A 0:07 271 1868 1896 32768 95 28 0.0 0.0 /usr/dt/b 20444 - A 0:03 203 1736 1824 32768 551 88 0.0 0.0 dtfile 17602 - A 0:00 274 948 1644 32768 817 696 0.0 0.0 sendmail: 13218 - A 0:00 74 1620 1620 xx 116 0 0.0 0.0 /usr/sbin
Let's briefly identify what some of this information means.
- RSS—The amount of RAM used for the text and data segments per process. PID 15256 is using 2888k.
- %MEM—The actual amount of the RSS / Total RAM. Watch for processes that consume 40-70 percent of %MEM.
- TRS—The amount of RAM used for the text segment of a process in kilobytes.
- SIZE—The actual amount of paging space allocated for this process (text and data).
While this command provides a lot of useful information, I don't usually start
with this unless one of my trusted administrators has already diagnosed that there
is a memory issue of some kind on the system. You should start with the old
vmstat. You should actually use
vmstat to identify the cause of your bottleneck, even
before you have determined that it might be memory related.
vmstat reports back information about kernel threads,
CPU activity, virtual memory, paging, blocked I/O disks, and related information
(see Listing 3). For me, it's the quickest and dirtiest way
of finding out what is going on.
Listing 3. Using vmstat to identify the cause of a bottleneck
# vmstat 1 4 System Configuration: lcpu=4 mem=4096MB kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------- r b avm fre re pi po fr sr cy in sy cs us sy id wa 1 2 136583 127 0 4 57 44 92 0 345 2223 605 30 40 29 1 2 7 136587 118 0 2 230 0 245 0 329 3451 526 20 37 10 33 1 6 136587 157 0 3 67 0 678 0 334 3304 536 25 32 20 23 3 8 136587 111 0 5 61 0 693 0 329 3341 511 19 26 35 20
Let's first define what the columns mean:
- avm—The amount of active virtual memory (in 4k pages) you are using, not including file pages.
- fre—The size of your memory free list. In most cases, I don't
worry when this is small, as AIX loves using every last drop of memory and does
not return it as fast as you might like. This setting is determined by the
minfree parameter of the
vmocommand. At the end of the day, the paging information is more important.
- pi—Pages paged in from the paging space.
- po—Pages paged out to the paging space.
CPU and I/O:
- r—The average number of runnable kernel threads over the timing interval you have specified.
- b—The average number of kernel threads that are in the virtual memory waiting queue over your timing interval. If r is not higher than b, that is usually a symptom of a CPU problem, which could be caused by either an I/O or memory bottleneck.
- us—User time.
- sy—System time.
- id—Idle time.
- wa—Waiting on I/O.
Let's return to the
vmstat output and what is wrong
with your system. First a disclaimer: Please do not go to senior management with a
detailed analysis and recommended tuning strategy based on a five-second
vmstat output. You have to work a little harder before
you can properly diagnosis the ills of your system. You should use
vmstat when you have a production performance issue and
need to know as soon as possible what is going on in your system so that you can
either alert people of what the problem might be or take immediate action if it is
possible and appropriate.
Now then, back to the output. What is going on? Several things, actually. On
first glance, you might think you have a CPU bottleneck, as the system is
definitely working hard and there is little idle time. As you look at things more
carefully though, you'll see that while the CPU might be breathing heavy, there
are other things going on—for instance, paging. There is a lot of
paging out going on (po), which usually occurs when you are short of real memory.
In the output, even your free list has dropped dangerously low. The reason that is
probably happening is because your free list (fre) is probably lower then the
threshold for minfree, which you had given it using
vmo. What about the I/O? When you are seeing blocked
processes or high values on waiting on I/O (wa), it usually signifies either real
I/O issues where you are waiting for file accesses or an I/O condition associated
with paging due to a lack of memory on your system. In this case, it seems to be
the latter. You are having VMM issues, which seem to be causing blocked processes
and the waiting on I/O condition. You might benefit by either tuning your memory
parameters or possibly doing a dynamic LPAR (DLPAR) operation and adding more RAM
to your LPAR.
Let's drill down deeper. You can use the
that you looked at earlier to try to identify the offending processes. What I'd
like to do at this point is run a
sar to see if the
condition continues to show with another tool. It is a good idea to use multiple
tools to further help with the diagnosis to make sure it is right.
While I don't like
sar as much as other tools (you
need too many flags and have to enter too many commands prior to diagnosing a
problem), it allows you to collect data in real time and to view data that was
previously captured (using
sadc). Most of the older
tools allow you to do one or the other.
sar has been
around for almost as long as UNIX itself and everyone has used it at one time or
the other. Use of the
-r flag provides some VMM
information (see Listing 4).
Listing 4. Using sar with the -r flag to obtain VMM information
# sar -r 1 5 System Configuration: lcpu=4 mem=4096MB 06:18:05 slots cycle/s fault/s odio/s 06:18:06 1048052 0.00 387.25 0.00 06:18:07 1048052 0.00 112.97 0.00 06:18:08 1048052 0.00 45.00 79.21 06:18:09 1048052 0.00 216.00 0.00 06:18:10 1048052 0.00 8.00 0.00 Average 1048052 0 79 16
So what does this actually mean?
- cycle/s—Reports back the number of page replacement cycles per second.
- fault/s—Provides the number of page faults per second.
- Slots—Provides the number of free pages on the paging spaces.
- odio/s—Provides the number of non paging disk I/Os per second.
You're seeing a lot of page faults per second here, but not much else. You're also seeing that there are 1048052 4k pages available on your paging space, which comes out to 4GB. Time to drill down further using more specific AIX tools.
Specific AIX memory monitoring
In this section, I provide an overview of the specific AIX tools available to
nmon. Most of these tools allow you
to both quickly troubleshoot a performance problem and capture data for historical
trending and analysis.
svmon is an analysis utility. It is used specifically for the VMM. It provides a
lot of information, including real, virtual, and paging space memory used. The
-G flag gives you a global view for memory utilization
on your host (see Listing 5).
Listing 5. Using svmon with the -G flag
# svmon -G size inuse free pin virtual memory 1048576 1048416 160 79327 137750 pg space 1048576 524 work pers clnt lpage pin 79327 0 0 0 in use 137764 910652 0 0
The size reports back to total size of RAM in 4k pages. The inuse column reports back the pages in RAM used by processes plus the number of persistent pages that belonged to a terminated process and is still resident in RAM. Free reports back the amount of pages on the free list. Pin reports back the number of pages pinned in physical memory (RAM). This cannot be paged out.
The paging space column reports back the actual use of paging space (in 4k
pages). It's important to make the distinction between this and what is reported
vmstat. The vmstat avm column shows ALL the
virtual memory that is accessed, even if it is not paged out. I also like to look
at the working and persistent numbers. These parameters show the number of both
the working and persistent pages in RAM. Why is this important? As you might
remember from Part 1, I discussed some of the differences between working and
persistent storage. Computational memory is used while your processes are working
on actual computation. They use working segments, which are temporary (transitory)
and only exist up until the time a process terminates or the page is stolen. File
memory uses persistent segments and have actual permanent storage location on the
disk. Data files or executable programs are mapped to persistent segments rather
then working segments. Given the alternative, you would much rather have file
memory paged to disk than computational memory. In this situation, computational
memory is unfortunately paged out more than file memory. Perhaps a little tuning
vmo parameters might help shift the balance in
your favor. Another useful feature of svmon is that you can display memory
statistics for a given process. Listing 6 provides an
Listing 6. Using svmon to display memory statistics for a given process
# svmon -P | grep -p 15256 ------------------------------------------------------------------------------- Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd LPage 15256 X 12102 3221 0 12022 N N N
From here you can determine that this process is not using paging space. Using
ps command I discussed earlier, in conjunction with
svmon, positions you to find the offending memory resource hog.
Let's use something a little more user friendly—
topas is a
nice little performance monitoring tool which can be used for a number of purposes
Figure 1. The topas tool
As you can see, running
topas gives you a list of your process information, CPU,
I/O, and VMM activity. From this view you can see that there is very little paging
space used on the system. I like to use this command for quickly troubleshooting
an issue, especially when I want a little more than
vmstat on my screen. I see
topas as a graphical type of
vmstat. With recent improvements, it now allows the
ability to capture data for historical analysis.
procmon? First released in AIX Version 5.3,
it not only provides overall CPU performance statistics, but it also allows you to
take action on the actual running processes. You might already know that you can
process on the fly, but I bet you didn't know that you can drill down into the
Though I would say this is more of a tool people use for CPU analysis, there are
also nice hooks into
svmon that can help you in a
pinch. This view sets options for using the
procmon, which allows you to pull your
information in a nicer format (see Figure 2).
Figure 2. View setting options for using the svmon utility from procmon
You can also export
procmon data to a file, which
makes it a nice data little data collection tool.
My favorite of all performance tools is actually a non-supported IBM tool called
nmon. Similar in some respects to
topas, the data that you collect from
nmon is either available from your screen (similar to
topas) or available through reports that you can
capture for trending and analysis. What this tool provides that others simply do
not is the ability to view pretty looking charts from an Microsoft® Excel
spreadsheet, which can be handed off to senior management or other technical teams
for further analysis. This is done with the use of yet another unsupported tool
nmon analyzer, which provides the hooks into
nmon. Figure 3 shows an example of
the kind of output that one can expect from an
Figure 3. nmon analysis output
There are many different types of nmon views you can see using this tool, which provide all sorts of CPU, I/O, and memory utilization information.
In this article, you looked at the various tools that are available to capture data for memory analysis. You also spent some time troubleshooting a system that had some performance problems that you were able to pin (pardon the pun) on virtual memory. I can't reiterate enough that tuning is actually a small part of appropriate tuning methodology. Without capturing data and taking the time to properly analyze your system, you will basically be doing the same thing as a doctor throwing antibiotics at a sick patient without even examining him or her.
There are many different types of performance monitoring tools available to you. Some are tools that you can run from the command line to quickly enable you to gauge the health of your system. Some are more geared to long-term trending and analysis. Some tools even provide you with graphically formatted data that can be handed off to non-technical staff. Regardless of which tool you use, you must also spend the time to learn about what the information you are looking at really means. Don't jump to conclusions based on a small sampling of data. Also, do not rely on only one tool. To substantiate your results, you really should look at a minimum of two tools while performing your analysis. I also briefly discussed tuning methodology and the importance of establishing a baseline while the system is behaving normally. After you examine your data and tune, you must continue to capture data and analyze the results of any changes that are made. Further, you should only make one change at a time, so you can really determine the effect of each individual change.
- Use RSS
feed to request notification for the upcoming articles in this series:
- Optimizing AIX 5L™ performance: Tuning disk performance
- Optimizing AIX 5L performance: Tuning your memory settings
- Optimizing AIX 5L performance: Monitoring your CPU
- Check out other parts in each series:
- Virtual Memory Management - Tuning Parameter lru_file_repage: This document provides details on using the lru_file_repage parameter.
- AIX memory affinity support: Visit the IBM System p and AIX InfoCenter to learn more about AIX memory affinity support.
- "nmon performance: A free tool to analyze AIX and Linux performance" (Nigel Griffiths, developerWorks, February 2006): This article provides excellent coverage of the nmon tool.
- "nmon analyser -- A free tool to produce AIX performance reports" (Steven Atkins, developerWorks, April 2006): Read this article to get the latest information on the nmon analyser tool.
- IBM Redbooks: Database Performance Tuning on AIX is designed to help system designers, system administrators, and database administrators design, size, implement, maintain, monitor, and tune a Relational Database Management System (RDMBS) for optimal performance on AIX.
- Power Architecture: High-Performance Architecture with a History: Read this white paper.
- "Power to the People; A history of chip making at IBM" (developerWorks, December 2005): This article covers the IBM power architecture.
- "Processor Affinity on AIX" (developerWorks, November 2006): Using process affinity settings to bind or unbind threads can help you find the root cause of troublesome hang or deadlock problems. Read this article to learn how to use processor affinity to restrict a process and run it only on a specified central processing unit (CPU).
- "CPU Monitoring and Tuning" (March, 2002): Learn how standard AIX tools can help you determine CPU bottlenecks.
- IBM Redbooks: AIX 5L Practical Performance Tools and Tuning Guide is a comprehensive guide about performance monitoring and tuning tools that are provided with AIX 5L Version 5.3.
- "AIX 5L Version 5.3: What's in it for you?" (developerWorks, June 2005): Learn what features you can benefit from in AIX 5L Version 5.3.
- Operating System and Device Management: This document from IBM provides users and system administrators with complete information that can affect your selection of options when performing such tasks as backing up and restoring the system, managing physical and logical storage, and sizing appropriate paging space.
- IBM Redbooks: The AIX 5L Differences Guide Version 5.3 Edition focuses on the differences introduced in AIX 5L Version 5.3 when compared to AIX 5L Version 5.2.
- Check out other articles and tutorials written by Ken Milberg:
- Popular content: See what AIX and UNIX content your peers find interesting.
- AIX and UNIX: The AIX and UNIX developerWorks zone provides a wealth of information relating to all aspects of AIX systems administration and expanding your UNIX skills.
- New to AIX and UNIX?: Visit the New to AIX and UNIX page to learn more about AIX and UNIX.
- AIX 5L Wiki: Discover a collaborative environment for technical information related to AIX.
- Search the AIX and UNIX library by topic:
- Safari bookstore: Visit this e-reference library to find specific technical resources.
- developerWorks technical events and webcasts: Stay current with developerWorks technical events and webcasts.
- Podcasts: Tune in and catch up with IBM technical experts.
- Future Tech: Visit Future Tech's site to learn more about their latest offerings.
Get products and technologies
- IBM trial software: Build your next development project with software for download directly from developerWorks.
- Participate in the developerWorks blogs and get involved in the developerWorks community.
- Participate in the AIX and UNIX forums: