You can trace through applications to figure out why they hang or don't run. And you can use the same techniques to learn more about an application and understand the performance implications of certain configurations. This article uses Apache as its example, because it's popular and most readers are familiar with it. Each system call that Apache has to make incurs delay in serving the Web page and, by tracing the Web server under various configurations, you can determine the impact of the configuration.
As an application executes, it makes system calls to the underlying operating system whenever it needs to open a file, send a packet, or otherwise use system resources. Tracing the application means that you view the system calls as they're being called, which gives you insight into the application's behavior. Under Solaris and the IBM AIX® operating system (AIX), this is done with the truss command, and Linux® uses strace. Listing 1 shows an example of the pwd command being traced.
Listing 1. Tracing the
pwd command
-bash-3.00$ truss pwd
...
getcwd("/export/home/sean", 1025) = 0
/export/home/sean
write(1, " / e x p o r t / h o m e".., 18) = 18
_exit(0)
|
After removing the output at the beginning relating to loading the application, you can see that three system calls are made:
getcwdreturns the current working directory. The output shows that the string "/export/home/sean" was returned in the buffer.writedisplays the given string. Because the system call is displayed after it has been executed, the output is shown in the line before. Also note that the result of thewritesystem call is the number of characters written -- in this case, 17 plus one carriage return._exitexits the program with an error code of 0, traditionally associated with a successful completion.
Although this is a simple example, it illustrates the degree of transparency enjoyed through application tracing. For more in-depth information on tracing, see the Resources section.
A brief look at Apache configuration
The Apache Web server is configured through a file called httpd.conf. Listing 2 shows part of a simple configuration.
Listing 2. Sample httpd.conf
DocumentRoot "/var/apache/htdocs"
<Directory />
Options FollowSymLinks
AllowOverride None
</Directory>
<Directory "/var/apache/htdocs">
Options Indexes FollowSymLinks MultiViews
</Directory>
|
The first line defines where the HTML files can be found. Requests are mapped to this directory. If a request is made for /project/charter.html, the contents of /var/apache/htdocs/project/charter.html are served. The rest of httpd.conf consists of two Directory stanzas. Anything between <Directory ...> and </Directory> applies to the named directory and all subdirectories. In this case, the first stanza applies two settings to the root directory, and the second refers to /var/apache/htdocs.
If multiple stanzas apply to a single request, the commands are combined, with the most specific directory taking precedence. For example, the request for /project/charter.html would be served out of /var/apache/htdocs/project/charter.html. /var/apache/htdocs is a subdirectory of /, so Options Indexes FollowSymLinks MultiViews is taken from the second stanza and AllowOverride None from the first.
Many things can be configured, and each has performance implications. The rest of the article focuses on quantifying the affect of these changes.
Before you make any tweaks, it's important to get an idea of how the system is currently running. Start Apache with the -X parameter, which forces it into a single-process debugging mode. Doing so ensures that the request always goes to the process being traced and eliminates the overhead associated with regular inter-process communication.
After the daemon has had a chance to start, find the process ID by running ps -ef and looking for the httpd daemon. Once you find it, attach to the process with truss -c -p PID. The -c option counts the system calls rather than displaying them one by one, and -p indicates that the tracer is to attach to a running process.
Make a request to a document using a Web browser. After the page has loaded, go back to the truss application and press Ctrl-C to stop the counting. For a static HTML page, you should see something like Listing 3 (to make the numbers more interesting, this example made the same request 100 times).
Listing 3. System call baseline
sunbox# truss -c -p 15026
(make the Web request 100 times)
^C
syscall seconds calls errors
read .009 200
write .020 200
close .020 200
time .004 300
alarm .018 1100
fcntl .009 300
sigaction .007 400
munmap .007 100
llseek .001 100
pollsys .005 100
mmap64 .008 100
stat64 .007 100
open64 .006 100
accept .019 100
getsockname .002 100
setsockopt .002 100
-------- ------ ----
sys totals: .149 3600 0
usr time: .120
elapsed: 8.960
|
truss returns a list of the system calls, the total time spent executing the call, the number of calls, and any errors found. At the end of the report, the total time spent in system calls is returned, along with the time spent executing the application. The elapsed time is meaningless for these purposes, because it refers to the time from when truss was started to when it ended, and it has no relation to the Web request.
Listing 3 shows the simplest case. After the connection request from the Web browser arrives, the accept system call completes the connection. The Web server uses the read call to get the contents of the request, which are internally mapped to a file on disk. The Web server first verifies that the file exists with stat64, opens the file for reading with open64, and maps the contents of the file to memory with mmap64. This file is then sent back to the client with write, the log file is generated with another write, and the server performs a final read from the browser. The other calls in the list are overhead and won't change significantly, as the configuration is changed.
It's tempting to say that 100 requests took 0.269s (0.149 + 0.120) and that the server should be able to serve some 370 pages per second (100/0.269). These numbers must be taken with a grain of salt, because they only represent the time the process spent on the CPU, not the real time it took to execute (also known as the wall clock time). Many more factors come into play, such as disk and network speed, other things running on the computer, and the fact that the daemon is running in debugging mode. You must also consider the overhead of system call tracing.
The techniques used in this article focus on relative timing of operations and using application tracing to eliminate wasted operations. If you need to know the number of pages per second your Web server can handle, the Resources section has links to software that can help you determine this.
Apache gives the administrator the ability to delegate configuration authority to individual users through the .htaccess mechanism. .htaccess is a file containing additional configuration directives that the Web server looks for if the directory with the request has been configured with AllowOverride from within httpd.conf. Listing 4 shows the previous configuration with AllowOverride Limit configured, which allows users to require usernames and passwords to access a Web page.
Listing 4. Sample httpd.conf with AllowOverride configured
DocumentRoot "/var/apache/htdocs"
<Directory />
Options FollowSymLinks
AllowOverride Limit
</Directory>
<Directory "/var/apache/htdocs">
Options Indexes FollowSymLinks MultiViews
</Directory>
|
Restarting the httpd daemon and rerunning the tests gave the results shown in Listing 5.
Listing 5. Results of 100 requests with AllowOverride Limit turned on
sunbox# truss -c -t write,read,open64,stat64,mmap64 -p 21136
^C
syscall seconds calls errors
read .012 200
write .021 200
mmap64 .007 100
stat64 .007 100
open64 .022 500 400
-------- ------ ----
sys totals: .072 1100 400
usr time: .141
elapsed: 16.660
|
At first glance, the time in system calls has gone down, but that's because the tracing was limited to a few interesting system calls with the -t option. Most syscalls stayed the same, but now there are 500 calls to open64, 400 of which returned an error. The time spent executing open64 also increased (from 0.006s to 0.22s), as did the userland part (0.12s to 0.141s).
The time increased because Apache must now do additional work to process the request, even though no overrides were configured. The mere configuration of AllowOverride Limit added noticeable overhead. The question remains, what caused all those errors? To answer this, trace a single Web request, as shown in Listing 6.
Listing 6. Determining the cause of the failed
open64 calls
sunbox# truss -t open64 -p 21136
open64("/.htaccess", O_RDONLY) Err#2 ENOENT
open64("/var/.htaccess", O_RDONLY) Err#2 ENOENT
open64("/var/apache/.htaccess", O_RDONLY) Err#2 ENOENT
open64("/var/apache/htdocs/.htaccess", O_RDONLY) Err#2 ENOENT
open64("/var/apache/htdocs/test.html", O_RDONLY) = 5
|
Listing 6 shows when the request came in and that Apache checked every directory leading up to /var/apache/htdocs and tried to open the .htaccess file, which doesn't exist because AllowOverride was configured on the root directory. Apache had to check each subdirectory for overrides in the .htaccess file and process them. This adds latency in terms of extra system calls, more userland overhead, and not to mention extra disk activity. A couple of tenths of a second over 100 requests might seem insignificant but, on a busy server, it adds up.
Now that you understand the scoping of overrides, the ideal solution is to not allow overrides and force everything to be configured in httpd.conf. Failing that, limit the scope of the configuration to the directories that need it. In the example being studied, placing the AllowOverride Limit in the second Directory stanza would add only one extra open64 to look for .htaccess in /var/apache/htdocs. Searching all the parent directories is wasteful, because no files will be served out of them in this configuration.
When the Web server receives a request, the only thing known about the client is the IP address, such as 129.42.42.212 for IBM.com. However, the Web server doesn't know that this address is from IBM.com; for this, it has to do a reverse DNS lookup. Doing so takes time and, if the name is needed before the request can be sent, delays the response to the client. In the past, Apache performed these reverse lookups by default, but this behavior has been changed.
There is another case where the reverse DNS lookup is required. When configuring an access restriction based on hostname (rather than IP address), Apache must first reverse-resolve the IP address into a hostname and then resolve that hostname back to an IP address to make sure they match. Because the reverse domain lookup can be anything the owner of the address block wants, the second lookup is necessary to ensure that no spoofing is taking place. Can application tracing determine the impact of DNS resolution?
To test this, remove the AllowOverride Limit from the previous example and add Allow from ibm.com instead of the default Allow from all. Then, change the DNS server to return something.ibm.com for your workstation, ensuring the initial reverse check will succeed and that the subsequent forward lookup must go over the Internet. In a sample run, the request to the protected Web server took 15 seconds. By contrast, using an IP address instead of ibm.com took less than half a second. Listing 7 shows some of the system calls counted when using DNS for security.
Listing 7.
truss output of a Web request that has been restricted by hostname
bash-3.00# truss -c -p 26089
^C
syscall seconds calls errors
read .000 5
write .000 3
open .000 2
close .001 10
time .000 3
stat .000 5
alarm .000 8
fcntl .000 7 2
sigaction .000 3
sysconfig .000 5
pollsys .001 5
door_info .000 2
stat64 .000 1
open64 .000 2
so_socket .001 5
accept .000 1
connect .002 5
recvfrom .000 2
send .001 5
getsockname .000 1
setsockopt .000 1
-------- ------ ----
sys totals: .011 81 2
usr time: .004
|
truss reports that the process used far less time than was experienced from the client end (0.015s versus 15s). This is because most socket operations are made asynchronously in that the socket is polled to see if data is present rather than having the application block (wait) for the response. As such, the application isn't consuming CPU time while it's waiting for results. This explains the discrepancy between the time reported by truss and that from the client end.
truss wasn't oblivious to the change -- 0.015s is still an order of magnitude higher than the first simple case investigated in this article. From the system call count, there were several calls to read, write, close, and stat that didn't appear before. Along with send, connect, so_socket, and pollsys, these system calls are used to make name requests. Because name resolution can come from a variety of sources, including the local file system and name-cache daemons, several places must be checked. On subsequent calls, the request time was under a second, because the DNS information was cached.
In the end, the bulk of the delay came from the remote name server and the forward resolution. This fact underscores the need for DNS caching and fast DNS servers, if you plan to rely on name lookups. However, the best solution is to come up with an alternative way of addressing the problem. One easy solution is to specify IP address blocks (such as Allow from 10.0.0.0/8); this approach is much faster than a DNS lookup. Because both the reverse and forward lookups must match, there is a high probability that what you can do with a name, you can do with a netblock. Alternatively, Apache can integrate with a multitude of authentication systems, so user-based authentication is another alternative.
If you're familiar with socket programming, you might be looking for gethostbyname and other such calls, which are used to perform the host-lookup function. These are library calls, provided by the system libraries such as libc and libsocket, found in /usr/lib. Library calls wrap one or more system calls, along with extra logic. You can think of them as a programmer-friendly interface to system calls. For instance, the gethostbyname library call goes through many steps to find the name based on the configuration of the server, such as checking /etc/hosts or looking in Network Information System (NIS) or DNS. These steps involve smaller building blocks that the kernel provides, which are the system calls. In most cases up to this point, the difference has been small. truss can also look at library calls, but doesn't provide the same level of detail as it can for system calls.
The ability to watch and count system calls is helpful not only for troubleshooting, but also for understanding how application configuration affects performance. Each interaction with the rest of the system invokes one or more system calls, and each call adds overhead. This isn't to say that system calls are bad -- a program that doesn't interact with the rest of the world isn't very interesting! Rather, you can monitor system-call usage as an application is changed to give you an idea of how changes affect overall performance.
Learn
- "Solve application problems with tracing" (developerWorks, March 2006): This article focuses on using tracing as a troubleshooting tool.
- Performance tuning: The Apache Web server comes with a document on performance tuning. This article looked at two of the big items to tune, but the Apache people point out several more.
- AIX and UNIX: Want more? The developerWorks AIX and UNIX zone hosts hundreds of informative articles and introductory, intermediate, and advanced tutorials.
- developerWorks
technical events and webcasts: Stay current with developerWorks technical events and webcasts.
- Podcasts: Tune in and catch up with IBM technical experts.
Get products and technologies
- IBM trial software: Build your next development project with software for download directly from developerWorks.
- Siege: This tool simulates load on your Web server and helps you determine how much it can handle.
- mod_dtrace: This Apache module provides additional probes to Solaris's DTrace. DTrace takes
trussone step further and gives you a variety of statistics about the resource usage of running applications.
Discuss
-
Participate in the AIX and UNIX forums, developerWorks
blogs, and get involved in the developerWorks community.

Sean Walberg has been working with Linux and UNIX systems since 1994 in academic, corporate, and Internet Service Provider environments. He has written extensively about systems administration over the past several years. You can contact him at sean@ertw.com.
Comments (Undergoing maintenance)





