Skip to main content

Maximizing Java Performance on AIX: Part 4: What goes in

Amit Mathur (amitmat@us.ibm.com), Senior Technical Consultant and Solutions Enablement Manager, IBM, Software Group
Amit Mathur works in the IBM Solutions Development group, working primarily with IBM ISVs in enablement/performance of their apps on IBM eServer platforms and providing self-sufficiency to ISVs and customers by providing education and articles on developer works. Amit has more than fourteen years' experience working in Leading software support and development in C/C++, Java and databases on UNIX and Linux platforms. He holds a Bachelor of Engineering degree in Electronics and Telecommunication from India. You can reach Amit at amitmat@us.ibm.com.
Sumit Chawla (sumitc@us.ibm.com), IBM Certified IT Architect and Technical Lead, Java Enablement, IBM, Software Group
Sumit Chawla leads the Java Enablement initiative for IBM eServer (for AIX, Windows, and Linux platforms), assisting Independent Software Vendors for IBM Servers. Sumit has a Master of Science degree in Computer Science, with almost 10 years of experience in the IT industry, and is certified by IBM as an Application Architect. He is a frequent contributor to the developerWorks eServer zone. You can contact him at sumitc@us.ibm.com.

Summary:  This 5-part series provides several tips and techniques that are commonly used for tuning Java™ applications for optimum performance on AIX®. This article deals with situations where I/O or networks may become bottlenecks.

View more content in this series

Date:  03 May 2004
Level:  Intermediate
Activity:  1120 views

Introduction

This is the fourth article in the 5-part series about Java on AIX Performance Tuning. If you have not done so already, we strongly recommend that you review Part 1 of this series before proceeding further.

This article talks about two additional areas that can become performance bottlenecks:

  • network
  • disk I/O

Usually these show up as problems specific to AIX and need to be tuned independently of the Java application. As a result, this article takes a break from the format used in Parts 2 and Part 3, and instead concentrates on how to find out the information you would need for doing a tuning exercise. Only a handful of tips are provided in this article as a result, but we hope that the overall discussion of performance tools, combined with the handful of tips will provide you with enough information to get started on a performance tuning exercise.


I/O and network as bottlenecks

The purpose of this article is to deal with situations where I/O or networking may become bottlenecks.

If you have been going through each of the articles in this series, we hope you would have started to see how each of the smaller components fit into the big picture. We have attempted to categorize the tips based on common areas where they occur, but this categorization is by no means exclusive. For network and I/O, you may not see the actual cause so easily, but you may end up feeling the effects on your application. Only a good knowledge of your application can guide you to the cause. As an example, earlier in this series, we discussed the importance of making sure that heap does not page. The maximum heap size, specified using -Xmx switch, should be less than the total physical memory installed on the system (shown by "bootinfo -r" or "lsattr -El sys0 -a realmem", see "AIX Commands you should not leave home without" for more of these commands).

Tools like topas and iostat show you the usage of various disks, but in most cases either the culprit is a GC cycle or a known piece of functionality, and locating the cause should be quite straightforward if you know your application. Tools like filemon will even tell you which files were being accessed, taking out the guesswork from the tuning exercise. If your Java application performance is being affected due to a misconfigured system, it is time to change focus and look at system performance tuning instead. For example, resolutions for disk bottlenecks can range from distributing the data intelligently to getting a faster set of disks. This topic is beyond the scope of the current article; you should refer to redbooks like Understanding IBM eServer pSeries Performance and Sizing for more on this topic.

Configuring network buffers and tuning other network parameters can have a significant impact on network-intensive applications. A good source of reference for Network tunable parameters is the Network Tunable Parameters section of Performance Management Guide. Some of the popular tweaks involve thewall, socketthresh, sbmax, somaxconnect, tcp_sendspace, tcp_recvspace, rfc1323 and so on. This information is specific nether to AIX nor to Java, but especially for network-intensive applications, this should be the first stop for performance tuning.

The rest of this section provides quick introduction to some common tools, and how to detect Java-specific problems. For more details please refer to AIX 5L Performance Tools Handbook and Understanding IBM eServer pSeries Performance and Sizing.

vmstat

The versatile vmstat command should already be a good friend of yours. For I/O work, look at the wa (I/O Wait) value in the cpu section. If it is high, a disk bottleneck may exist, and iostat can then be used to look at the disk usage more closely.

iostat

iostat is the ideal tool for determining if a system has an I/O bottleneck. It shows the read and write rates to all disks. This makes it a great tool to determine if you need to "spread out" the disk workload across multiple disks. The tool also reports the same CPU activity as vmstat does.

Start with a simple iostat -s when your application is running, to see what the system overall is doing. It will print something like this:

  
  
    tty:      tin         tout   avg-cpu:  % user    % sys     % idle    % iowait
              0.3        232.9              13.8     19.1       27.4      39.6     

    Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
    hdisk0          28.7     291.4      35.0     176503   2744795
    hdisk1           0.0       0.4       0.0       3537         0
    hdisk7           1.7      34.9       9.8       8920    341112
    hdisk14         24.5     1206.1      36.2    1188404  10904509
    hdisk18          0.0       1.2       0.1      10052      2046
    hdisk8           2.1      36.8      10.5      10808    357910

Look at the %iowait figure to see if your system is spending too much waiting for I/O to complete. If your system is paging, this would be the figure to watch. But note that just this figure in isolation will not be sufficient in determining what's happening with the system. As an example, if you are writing sequential files within your application, a higher %iowait is normal.

The %tm_act shows the percentage of time that a particular disk was active. The trace above shows a very interesting scenario; %iowait was close to 40% but tm_act was nowhere near 100%, hovering instead just below 30%. The system on which this was taken had a Fibre Channel attached storage, and the bottleneck turned out to be the route to SAN storage. Looks very easy once you figure it out!

You can also use # iostat -at <interval> <count> or iostat -sat ... which will give you tps and KBps (as well as read and write rates) for the adapters. The -s flag will give you overall system stats.

netstat

For Network tuning, netstat is the ideal tool to start your investigations. netstat -m can be used to look at mbuf memory usage, which will tell you something about socket and network memory usage. If no -o extendednetstats=1 is used, netstat -m shows more detailed information, but it can have a performance impact and should be used only for diagnostic purposes. When using netstat -m, the interesting information is printed at the top, like this:

      67 mbufs in use:
      64 mbuf cluster pages in use
      272 Kbytes allocated to mbufs
      0 requests for mbufs denied
      0 calls to protocol drain routines
      0 sockets not created because sockthresh was reached


and at the end of the output, something like:

      Streams mblk statistic failures:
      0 high priority mblk failures
      0 medium priority mblk failures
      0 low priority mblk failures


The AIX 5L Performance Tools Handbook gives a good description of which parameters to tweak if you see failures in the netstat -m output. You may also want to try netstat -i x (replace x with the interval to collect data) to look at network usage and possible dropped packets. For network-intensive applications, this is the first stop to check if "all is well".

netpmon

netpmon uses the trace facility to obtain a detailed picture of network activity during a time interval. It also displays process CPU statistics that show:

  • the total amount of CPU time used by this process
  • the CPU usage for the process as a percentage of total time
  • the total time that this process spent executing network-related code

To start off the exercise, try the following:

netpmon -o /tmp/netpmon.log; sleep 20; trcstop

This runs the netpmon command for 20 seconds, and then stops it using trcstop, and writes the output to /tmp/netpmon.log. Looking at the generated data, you see that the example we chose is a good one for Java Performance tuning article:

      Process CPU Usage Statistics:
      -----------------------------
                                                         Network
      Process (top 20)             PID  CPU Time   CPU %   CPU %
      ----------------------------------------------------------
      java                       12192    2.0277   5.061   1.370
      UNKNOWN                    13758    0.8588   2.144   0.000
      gil                         1806    0.0699   0.174   0.174
      UNKNOWN                    18136    0.0635   0.159   0.000
      dtgreet                     3678    0.0376   0.094   0.000
      swapper                        0    0.0138   0.034   0.000
      trcstop                    18460    0.0121   0.030   0.000
      sleep                      18458    0.0061   0.015   0.000
 

Another useful part of trace is the adapter usage:


                            ----------- Xmit -----------   -------- Recv ---------
    Device                   Pkts/s  Bytes/s  Util  QLen   Pkts/s  Bytes/s   Demux
    ------------------------------------------------------------------------------
    token ring 0             288.95    22678  0.0%518.498   552.84    36761  0.0222
    ...
    DEVICE: token ring 0
    recv packets:           11074
      recv sizes (bytes):   avg 66.5    min 52      max 1514    sdev 15.1   
      recv times (msec):    avg 0.008   min 0.005   max 0.029   sdev 0.001  
      demux times (msec):   avg 0.040   min 0.009   max 0.650   sdev 0.028  
    xmit packets:           5788
      xmit sizes (bytes):   avg 78.5    min 62      max 1514    sdev 32.0   
      xmit times (msec):    avg 1794.434 min 0.083   max 6443.266 sdev 2013.966


Let's say you decide that there's too much information, or you wish to see something more specific. Let's try the following command:

      netpmon -O so -o /tmp/netpmon_so.txt; sleep 20; trcstop

The "-O so" makes netpmon concentrate on socket level traffic. Now we can zoom into the java process information:

    PROCESS: java   PID: 12192
    reads:                  2700
      read sizes (bytes):   avg 8192.0  min 8192    max 8192    sdev 0.0    
      read times (msec):    avg 184.061 min 12.430  max 2137.371 sdev 259.156
    writes:                 3000
      write sizes (bytes):  avg 21.3    min 5       max 56      sdev 17.6   
      write times (msec):   avg 0.081   min 0.054   max 11.426  sdev 0.211  


Useful? Let's go a step ahead, and find out activity at the thread level. Add "-t" to the command, like this:

      netpmon -O so -t -o /tmp/netpmon_so_thread.txt; sleep 20; trcstop
 

Now, the generated output contains thread-specific information, something like the following:

            THREAD TID: 114559
    reads:                  9
      read sizes (bytes):   avg 8192.0  min 8192    max 8192    sdev 0.0    
      read times (msec):    avg 988.850 min 19.082  max 2106.933 sdev 810.518
    writes:                 10
      write sizes (bytes):  avg 21.3    min 5       max 56      sdev 17.6   
      write times (msec):   avg 0.389   min 0.059   max 3.321   sdev 0.977  


Now, you can take a javadump and see what this thread is, and decide whether it is working as expected or not. Especially for applications that have multiple network connections, netpmon allows a comprehensive view of activity to be captured.

filemon

filemon can be used to identify the files that are being used most actively. This tool gives a very comprehensive view of file access, and can be useful for drilling down once vmstat/iostat confirm disk to be a bottleneck. This tool also uses the trace facility, so it is used in a manner similar to netpmon:

filemon -o /tmp/filemon.log; sleep 60; trcstop

The generated log file is quite large. Some sections that may be useful are:

    Most Active Files
    ------------------------------------------------------------------------
      #MBs  #opns   #rds   #wrs  file                 volume:inode
    ------------------------------------------------------------------------
      25.7     83   6589      0  unix                 /dev/hd2:147514
      16.3      1   4175      0  vxe102               /dev/mailv1:581
      16.3      1      0   4173  .vxe102.pop          /dev/poboxv:62
      15.8      1      1   4044  tst1                 /dev/mailt1:904
       8.3   2117   2327      0  passwd               /dev/hd4:8205
       3.2    182    810      1  services             /dev/hd4:8652
    ...
    ------------------------------------------------------------------------
    Detailed File Stats
    ------------------------------------------------------------------------

    FILE: /var/spool/mail/v/vxe102  volume: /dev/mailv1 (/var/spool2/mail/v)  inode: 581
    opens:                  1
    total bytes xfrd:       17100800
    reads:                  4175    (0 errs)
      read sizes (bytes):   avg  4096.0 min    4096 max    4096 sdev     0.0
      read times (msec):    avg   0.543 min   0.011 max  78.060 sdev   2.753
    ...
 

This tool is covered in references quoted earlier, and a more detailed look is beyond the scope of the current article.

Java-Specific Tips

The tips generic to Java, for avoiding I/O and Network bottlenecks, boil down to good designs, and are already quite well documented in several places. Have a look at NI004 and NI005 though.


Characteristics-based Tuning Tips

Now we look at various characteristics of typical applications. You should locate the behavior that resembles that of your application (either by design or through observation) and apply the corresponding tips.


Network-Intensive Applications

For applications that are network-intensive, you should use netstat to make sure that there are no dropped packets etc. The netstat and netpmon sections in the AIX 5L Performance Tools Handbook describe various tweaks that can be done if failures are observed during monitoring, so they are not being repeated here.

If you suspect network throughput to be a bottleneck, NI001 can be useful to find if there is a problem. Also, if you do not use IPv6 at all, NI002 can be used as well.

If you are looking at a difference in application performance between AIX and another platform, and you suspect it to be due to some socket options that you are setting, have a look at

NI004

.

RMI Applications

If your application is an RMI client or server, you may observe some lines that are unaccounted in the verbosegc output. For example, this is an excerpt from verbosegc output of an RMI application:

<GC(4057): GC cycle started Thu Apr 15 11:14:28 2004
<GC(4057): freed 254510616 bytes, 55% free (453352000/810154496), in 1189 ms>
  <GC(4057): mark: 991 ms, sweep: 198 ms, compact: 0 ms>
  <GC(4057): refs: soft 0 (age >= 32), weak 2, final 330, phantom 0>
 <GC(4057): stop threads time: 10, start threads time: 260>
<GC(4058): GC cycle started Thu Apr 15 11:15:29 2004
<GC(4058): freed 267996504 bytes, 56% free (454445800/810154496), in 1243 ms>
  <GC(4058): mark: 1041 ms, sweep: 202 ms, compact: 0 ms>
  <GC(4058): refs: soft 0 (age >= 32), weak 0, final 253, phantom 0>
<GC(4059): GC cycle started Thu Apr 15 11:16:31 2004
<GC(4059): freed 248113752 bytes, 56% free (455754152/810154496), in 1386 ms>
  <GC(4059): mark: 1095 ms, sweep: 291 ms, compact: 0 ms>
  <GC(4059): refs: soft 0 (age >= 32), weak 0, final 263, phantom 0>


These GC cycles are being triggered almost exactly 60 seconds apart, and are not being triggered due to Allocation Failure. After making sure that the application does not call System.gc() directly, NI003 could be applicable here.

For RMI-intensive applications, NI005 should be considered, but note the caveat mentioned with that tip.


Disk-Intensive Applications

Using iostat and filemon, you should be able to find out the cause for the bottleneck. The solution is usually either a tweak of the application design to stop relying on disk access, or a tweak of the system to optimize disk access. Since both of these tweaks are beyond the scope of this article, we recommend getting familiar with iostat and filemon. The information in the previous section should get you started on that path.


General Collection of Tips

The text below refers to command-line arguments to Java (specified before the class/jar file names) as "switches". For example, the line java -mx2g hello has a single switch, -mx2g.

NIO001 check speed of net connection

An FTP session can be established between the two systems that need the connection speed analyzed, and the following ftp command can be executed:

ftp> put "|dd if=/dev/zero bs=32k count=1000" /dev/null
200 PORT command successful.
150 Opening data connection for /dev/null.
1000+0 records in.
1000+0 records out.
226 Transfer complete.
32768000 bytes sent in 130.4 seconds (245.4 Kbytes/s)
local: |dd if=/dev/zero bs=32k count=1000 remote: /dev/null
 

The above quick test attempts to transmit 1000 blocks of zeroes, each of size 32 KB, and gives an easy way to determine the throughput of the connection between the two AIX boxes. The above example shows the throughput to be 245.4 KBps, which would point to network problems since both AIX boxes were using 100 Mbps Full-duplex network adapters. If the above test showed, say, 1.140 E+4 Kbytes/s, it would have been a good hint to concentrate on the application instead. You can vary the block-size and the count to more closely mimic the behavior of your application.

NI002 IPv4 stack

If you do not wish to use IPv6 in your application, you can set the property preferIPv4Stack to true, as follows:

java -Djava.net.preferIPv4Stack=true <classname>


NIO003 remote GC

If your application is an RMI client or server, you can use the sun.rmi.dgc.client.gcInterval and/or sun.rmi.dgc.server.gcInterval properties defined at http://java.sun.com/j2se/1.4.2/docs/guide/rmi/sunrmiproperties.html with IBM Java as well. Both of these properties default to 60 seconds, and based on your application needs, the interval may be increased to reduce the performance impact of redundant GC cycles.

Note: The warning at the top of this link, as well as the risks associated with not freeing up distributed objects, apply equally well to IBM Java.

NI004 socket buffer sizes

If you are setting send and receive buffer sizes, note that the calls to setSendBufferSize(int) are used only as a hint. So if a performance difference is observed between platforms, you should add a call to getSendBufferSize() and see whether the hint is being picked up by the platform in question. In a recently reported performance issue on AIX, the application was calling setSendBufferSize(4096) from within its code. AIX was using the hint and setting the buffer size as requested, while other platforms were ignoring this call. As a result, the performance was being perceived as bad on AIX! Removing this call from the code more than quadrupled the performance of the application on AIX.

In general, you may want to leave out calls to tweak the TCP/IP stack from your application, as the AIX Network stack is finely-tuned out of the box.

NI005 connection pooling

For RMI intensive applications, enabling the thread pooling allows reuse of the existing connection rather than creating a new one for each RMI call. To enable thread pooling, set the following property:

java -Dsun.rmi.transport.tcp.connectionPool=true <classname>

and the same can be disabled as follows:

java -Dsun.rmi.transport.tcp.noConnectionPool=true <classname>

Note: It is advisable to use thread pooling only for RMI-intensive applications. The latest versions of Java on AIX (1.3.1 SR7 onwards, and 1.4.1 SR2 onwards) disable thread pooling by default.


Conclusion

This article described the common tools and techniques when dealing with Network and Disk I/O Bottlenecks.

The next article concludes the series with general observations and pointers to useful references

.


Resources

About the authors

Amit Mathur works in the IBM Solutions Development group, working primarily with IBM ISVs in enablement/performance of their apps on IBM eServer platforms and providing self-sufficiency to ISVs and customers by providing education and articles on developer works. Amit has more than fourteen years' experience working in Leading software support and development in C/C++, Java and databases on UNIX and Linux platforms. He holds a Bachelor of Engineering degree in Electronics and Telecommunication from India. You can reach Amit at amitmat@us.ibm.com.

Sumit Chawla leads the Java Enablement initiative for IBM eServer (for AIX, Windows, and Linux platforms), assisting Independent Software Vendors for IBM Servers. Sumit has a Master of Science degree in Computer Science, with almost 10 years of experience in the IT industry, and is certified by IBM as an Application Architect. He is a frequent contributor to the developerWorks eServer zone. You can contact him at sumitc@us.ibm.com.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=87793
ArticleTitle=Maximizing Java Performance on AIX: Part 4: What goes in
publish-date=05032004
author1-email=amitmat@us.ibm.com
author1-email-cc=
author2-email=sumitc@us.ibm.com
author2-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers