On Solaris on Sparc we're seeing a scenario of high CPU with the majority threads doing work in similar thread stacks (see below) with the top of the stack sometimes in montReduce, squareToLen, multiplyToLen, subN. Obviously the scenario is a number of new TLS connections are incoming but a bug identified as 8153189 causes high CPU. However, on Solaris on Sparc platform it appears there is no fix available even though there is a fix available for Solaris on x86 (however it is not enabled by default you have to use -XX parameters to enable the fix. See earlier link). I am still waiting on Java to confirm the fix status.
This particular scenario is playing out in the tiers between IHS and WAS. The workaround is to minimize the frequency of TLS handshakes by setting the IHS configs to maximize settings so the connections persist and are not destroyed and to reconfigure WAS to have unlimited requests per connection. See addendum at the end of this post:
"WebContainer : 123” daemon prio=3 tid=0x00123456 nid=0xfffa runnable [0x1234568a0]
at com.ibm.crypto.provider.RSACore.a(Unknown Source)
at com.ibm.crypto.provider.RSACore.rsa(Unknown Source)
at com.ibm.crypto.provider.RSACipher.a(Unknown Source)
at com.ibm.crypto.provider.RSACipher.engineDoFinal(Unknown Source)
at javax.crypto.Cipher.doFinal(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
Servers > Application servers > $SERVER > Web container settings > Web container transport chains > * > HTTP Inbound Channel > Select "Use persistent (keep-alive) connections" and "Unlimited persistent requests per connection" (and then restart the server)
Modified by polozoff
As applications grow over time they tend to add features and functions and then one day they run out of native heap as more and more Java classes are piled in.
I am promoting this link written by one of my colleages that covers how to use -Xmcrs and setting it to 200M or higher. The fact I have seen this twice in the past month tells me this is a growing (pun intended) problem.
Edit: Oct 30, 2015 and Nov 17, 2015
Native OOM (NOOM) landscape continues to shift. An argument to offset the heap to a different area in the address space is much better (-Xgc:preferredHeapBase). With this new argument, one can place the Java heap allocated past the initial 4g of address space, allowing all native code to use almost all of the lower 4g of space.
The Xmcrs option is not used anymore because of this. Xmcrs is still used. Here is the latest technote that supersedes the link I provided previously. More on troubleshooting out of memory errors.
Testing requires a tool and for one of my projects I'm using JMeter. I'm testing an https based site and was just having a hard time figuring out what was going on. I kept seeing an error in the JMeter "view results tree" that just said "ensure browser is set to accept the jmeter proxy certificate". I started researching that phrase and got nowhere quickly.
However, in the jmeter/bin subdirectory I found the jmeter.log file. In there I found a java.security.NoSuchAlgorithmException and referencing the SunX509 KeyManagerFactory. Ah ha, yes, I'm running the IBM JRE and not the Sun version.
Unfortunately changing the jmeter.properties proxy.cert.factory=IbmX509 (and of course uncommenting it) had no effect and I got the same SunX509 exception. I decided to try it at the command line as:
jmeterw -Dproxy.cert.factory=IbmX509 and voila the problem went away!
Modified by polozoff
For as long as I can remember the most debated Java topic has been the difference in opinion on the heap size minimum = maximum with lots of urban myths and legends that having them equal was better. In a conversation with a number of colleagues and Chris Bailey who has lead the Java platform for many years he clarified the settings for the IBM JVM based on generational vs non-generational policy settings.
"The guidance [for generational garbage collection policy] is that you should fix the nursery size: -Xmns == -Xmnx, and allow the tenured heap to vary: -Xmos != -Xmox. For non generational you only have a tenured heap, so -Xms != -Xmx applies.
The reason being that the ability to expand the heap adds resilience into the system to avoid OutOfMemoryErrors. If you're then worried about the potential cost of expansion/shrinkage that this introduces by causing compactions, then that can be mitigated by adjusting -Xmaxf and -Xminf to make expand/shrink a rare event."
A link to Chris Bailey's presentation on generational garbage collection http://www.slideshare.net/cnbailey/tuning-ibms-generational-gc-14062096
[edit to correct typo, added tags]
Modified by polozoff
I'm working on a Liberty server (this is the latest beta I downloaded a couple of days ago) and using the installUtility I'm getting the following error.
# bin/installUtility install adminCenter-1.0
Establishing a connection to the configured repositories...
This process might take several minutes to complete.
CWWKF1219E: The IBM WebSphere Liberty Repository cannot be reached. Verify that your computer has network access and firewalls are configured correctly, then try the action again. If the connection still fails, the repository server might be temporarily unavailable.
I then found out about a command to help try and figure out what is wrong
bin]# ./installUtility find --type=addon --verbose=debug
[6/25/15 10:57:53:093 CDT] Establishing a connection to the configured repositories...
This process might take several minutes to complete.
[6/25/15 10:57:53:125 CDT] Failed to connect to the configured repository:
IBM WebSphere Liberty Repository
[6/25/15 10:57:53:125 CDT] Reason: The connection to the default repository failed with the
following exception: RepositoryBackendIOException: Failed to read
[6/25/15 10:57:53:128 CDT] com.ibm.ws.massive.RepositoryBackendIOException: Failed to read properties file https://public.dhe.ibm.com/ibmdl/export/pub/software/websphere/wasdev/downloads/assetservicelocation.props
Caused by: java.net.SocketException: java.lang.ClassNotFoundException: Cannot find the specified class com.ibm.websphere.ssl.protocol.SSLSocketFactory
Will update when I have more details on why I'm getting the ClassNotFoundException.
and that resolves the issue. A defect has been raised to have the script use the Java we supply instead of the machine's.
[Edited Aug 25 to add
I also needed to update /etc/host.conf to enable hosts file lookup and then add entries for
to /etc/hosts file
The Aug 2015 beta seems to have made a number of fixes to installUtility so if you're on an older beta get the latest.]
Someone takes a javacore during what looks to be a hung app server and notices it contains lots of threads in socketRead. This is symptomatic of a slow back end whether it is a database, Web service, etc. An application is as strong as its weakest link. If the backend the application depends on is unable to respond in a timely manner then there is nothing that can be tuned at the application layer except for aggressive timeouts to protect the application from getting stuck. Hangs like these typically happen under high load/traffic conditions. It is important that the group that maintains the backend is aware of an issue with their tier and they need to fix it.
This is the page to follow if there seem to be any Maximo performance or stability problems.
Report scheduler enhancements
in Maximo v7.5. As with any online transaction application most enterprises need to pull reports from their environment. Reports tend to be (a) scheduled to repeat and (b) heavy users of CPU and memory. Therefore having more control on the report scheduler is a good thing to look at in Maximo v7.5.
I am often asked specifically what metrics should be monitored in WebSphere Application Server. This list can be considered a starting point. The WAS admin will frequently use "Custom" PMI setting and disable any metrics not in this list.
Suggested: Pool Size, Active Thread
Optional: ActiveTime (avg. time in
For WESB, per-mediation:
Database - JDBC Data Source Connection
Messaging - JMS Queue Connection
Factory Connection Pools.
PrepStmtCacheDiscardCount (for JDBC)
Transactions (per application server)
Memory Management. (Java GC).
Suggested: Heap Size, % Free after GC,
% Time spent in GC.
Yesterday at my "Performance Testing and Analysis" talk a question was asked is it better to have one large machine and virtualize the environment or to have lots of small machines?
Both strategies work. If deciding to go with large capacity frames then you want to have at least 3 frames. This way if one frame is taken out of service there are at least two other frames running. Otherwise, if one builds out only 2 large frames and one is taken out of service then the remaining frame becomes a single point of failure. I don't like SPOFs and so would have at least 3. This is if one frame can take the entire production load during the outage. That information has to be culled from the performance testing to see if there is enough capacity in one frame to carry the entire production load. If not then more likely than not there will need to be additional frames to be able to ensure that even if half the infrastructure is taken out of service the remaining frames can carry the entire production workload.
Likewise, having lots of smaller machines also works. Odds are less likely for a massive hardware outage with smaller machines so as one fails it can be safely taken out of service and replaced without impacting the production workload.
Which strategy is better? In my opinion they are both valid strategies as long as the proper capacity planning is conducted to ensure that when an outage occurs (and think worst case scenario here) that the remaining infrastructure is able to continue processing the production workload without impacting the SLA (Service Level Agreement including the non-functional requirements [i.e. response time, resource utilization, etc]). You do have a defined SLA, right?
A few years ago I blogged about how adding JSESSIONID logging to the access log helps identify which cluster member a user was pinned to. It turns out this also helps troubleshoot another interesting problem.
A WebSphere Application Server administrator noted that the session count on one of their JVMs in the cluster was getting far higher session counts than any other JVM in the cluster. So much so it was like a 3:1 imbalance in total number of sessions in the JVM. We applied the JSESSIONID logging and captured all the session ids. Through various Unix utilities (cut, sort, uniq, etc) we ended up with a prime suspect. One session was calling the /login page 10-20 times per second and had eclipsed every other session by over 10x the number of requests.
Why did we go down this path? We were able to see through the PMI data that the session manager in WebSphere Application Server was invalidating sessions. So we knew it wasn't an issue in the product of not deleting sessions. Also, with one JVM in the cluster creating more sessions than the other JVMs is suspicious. I would have expected to have seen higher load across the cluster. In addition, they have seen the behaviour move around the cluster every few months. That lead me to believe this was like a replay attack. Someone at some point captured a response with a JSESSIONID and was then using that JSESSIONID over and over again until some event caused it to capture a new JSESSIONID (most likely from a failover event as the cluster went through a rolling restart). That behaviour was curious! The fact it was smart enough to realize the HTTP header content changed and adapted was interesting.
So next time you see one or more JVMs with considerably higher session counts than the other JVMs in the same cluster you can use the same troubleshooting methodology to track down who the suspect is. Especially if your application is Internet-facing meaning anyone can start pinging your application.
Modified by polozoff
I have had a number of conversations with various colleagues lately around "optimal performance." It kind of reminds me of an old post I had about predicting capacity and application performance for a non-existent application (hint: it can't be done).
One scenario we were discussing was an application that was currently in one of its testing cycles. And the question came up, well, could we have another team test the same application and see if they could get better performance or at least validate the performance we are seeing is consistent? That seemed an unusual request. It turns out after some questioning that one team was not confident in the people they had or the results they were seeing. My answer to that was they should pull in the right resources to help. Having a second performance team test the same application made little sense. Just standing up a test environment would have been a big effort not to mention getting floor space in the data center.
So how do you determine if the performance your application sees is optimal performance?
The answer is two fold. First, like my previous post of when is enough performance tuning really enough is driven by business requirements (SLAs) and having the appropriate application monitoring tools in place to verify the application is meeting those SLAs.
The second is a little greyer. While SLAs should hold a lot of the requirements one that is commonly not in scope is capacity. It is difficult to predict capacity until performance testing has been completed. While the application may meet response time and transaction throughput requirements we may discover that, especially in high volume environments, the necessary hardware infrastructure capacity to support the application is going to cost a lot of money (or more money than was expected). Much like the conclusion in my previous article that in this case there may be a reason to continue performance tuning (including application development to reduce resource utilization) to try and reduce the infrastructure cost impact. However, at some point in time a decision will have to be made if the performance tuning effort is exceeding the cost differential of the increased capacity needed to support the application.
Modified by polozoff
Here is another excerpt from our performance cookbook that will be published in the near future.
Excessive Direct Byte Buffers
Excessive native memory usage by java.nio.DirectByteBuffers is a classic problem with any generational garbage collector such as gencon (which is the default starting in IBM Java 6.26/WAS 8), particularly on 64-bit. DirectByteBuffers (DBBs) (http://docs.oracle.com/javase/6/docs/api/java/nio/ByteBuffer.html) are Java objects that allocate and free native memory. DBBs use a PhantomReference which is essentially a more flexible finalizer and they allow the native memory of the DBB to be freed once there are no longer any live Java references. Finalizers and their ilk are generally not recommended because their cleanup time by the garbage collector is non-deterministic.
This type of problem is particularly bad with generational collectors because the whole purpose of a generational collector is to minimize the collection of the tenured space (ideally never needing to collect it). If a DBB is tenured, because the size of the Java object is very small, it puts little pressure on the tenured heap. Even if the DBB is ready to be garbage collected, the PhantomReference can only become ready during a tenured collection. Here is a description of this problem (which also talks about native classloader objects, but the principle is the same):
If an application relies heavily on short-lived class loaders, and nursery collections can keep up with any other allocated objects, then tenure collections might not happen very frequently. This means that the number of classes and class loaders will continue increasing, which can increase the pressure on native memory... A similar issue can arise with reference objects (for example, subclasses of java.lang.ref.Reference) and objects with finalize() methods. If one of these objects survives long enough to be moved into tenure space before becoming unreachable, it could be a long time before a tenure collection runs and "realizes" that the object is dead. This can become a problem if these objects are holding on to large or scarce native resources. We've dubbed this an "iceberg" object: it takes up a small amount of Java heap, but below the surface lurks a large native resource invisible to the garbage collector. As with real icebergs, the best tactic is to steer clear of the problem wherever possible. Even with one of the other GC policies, there is no guarantee that a finalizable object will be detected as unreachable and have its finalizer run in a timely fashion. If scarce resources are being managed, manually releasing them wherever possible is always the best strategy. (http://www.ibm.com/developerworks/websphere/techjournal/1106_bailey/1106_bailey.html)
Essentially the problem boils down to either:
There are too many DBBs being allocated (or they are too large), and/or
The DBBs are not being cleared up quickly enough.
It is very important to verify that the volume and rate of DBB allocations are expected or optimal. If you would like to determine who is allocating DBBs (problem #1), of what size, and when, you can run a DirectByteBuffer trace. Test the overhead of this trace in a test environment before running in production.
One common cause of excessive DBB allocations is the default WAS WebContainer channelwritetype value of async. In this mode, all writes to servlet response OutputStreams (e.g. static file downloads from the application or servlet/JSP responses) are sent to the network asynchronously. If the network and/or the end-user do not keep up with the rate of network writes, the response bytes are buffered in DBB native memory without limit. Even if the network and end-user do keep up, this behavior may simply create a large volume of DBBs that can build up in the tenured area. You may change channelwritetype to sync to avoid this behavior although servlet performance may suffer, particularly for end-users on WANs.
If you would like to clear up DBBs more often (problem #2), there are a few options:
Specifying MaxDirectMemorySize will force the DBB code to run System.gc() when the sum of outstanding DBB native memory would be more than $bytes. This option may have performance implications. When using this option with IBM Java, ensure that -Xdisableexplicitgc is not used. The optimal value of $bytes should be determined through testing. The larger the value, the more infrequent the System.gcs will be but the longer each tenured collection will be. For example, start with -XX:MaxDirectMemorySize=1024m and gather throughput, response time, and verbosegc garbage collection overhead numbers and compare to a baseline. Double and halve this value and determine which direction is better and then do a binary search for the optimal value.
Explicitly call System.gc. This is generally not recommended. When DBB native memory is freed, the resident process size may not be reduced immediately because small allocations may go onto a malloc free list rather than back to the operating system. So while you may not see an immediate drop in RSS, the free blocks of memory would be available for future allocations so it could help to "stall" the problem. For example, Java Surgery can inject a call to System.gc into a running process: https://www.ibm.com/developerworks/community/groups/service/html/communityview?communityUuid=7d3dc078-131f-404c-8b4d-68b3b9ddd07a
In most cases, something like -XX:MaxDirectMemorySize=1024m (and ensuring -Xdisableexplicitgc is not set) is a reasonable solution to the problem.
A system dump or HPROF dump may be loaded in the IBM Memory Analyzer Tool & the IBM Extensions for Memory Analyzer DirectByteBuffer plugin may be run to show how much of the DBB native memory is available for garbage collection. For example:
=> Sum DirectByteBuffer capacity available for GC: 1875748912 (1.74 GB)
=> Sum DirectByteBuffer capacity not available for GC: 72416640 (69.06 MB)
There is an experimental technique called Java surgery which uses the Java Late Attach API (http://docs.oracle.com/javase/6/docs/technotes/guides/attach/index.html) to inject a JAR into a running process and then execute various diagnostics: https://www.ibm.com/developerworks/community/groups/service/html/communityview?communityUuid=7d3dc078-131f-404c-8b4d-68b3b9ddd07a
This was designed initially for Windows because it does not usually have a simple way of requesting a thread dump like `kill -3` on Linux. Java Surgery has an option with IBM Java to run the com.ibm.jvm.Dump.JavaDump() API to request a thread dump (Oracle Java does not have an equivalent API, although Java Surgery does generally work on Oracle Java):
$ java -jar surgery.jar -pid 16715 -command JavaDump
The past few weeks meeting with various WebSphere Application Server-based customers reminded me of the importance of the basic and fundamental performance tuning tasks. The InfoCenter provides information on tunings at the OS level, TCP/IP, JVM, etc. I have visited no less than 3 different environments running WebSphere Application Server without these base tunings. Just by applying the base tunings to the OS and JVM we saw as much as 99% less garbage collection, improved response time, throughput and less CPU utilization with the same production loads. The best part of following these instructions is the administrator does not need to be a performance guru to realize these gains. These improvements also help save money requiring less capacity going forward.
Links to the v6.1
"Tuning Performance" in the InfoCenter
In the "Tuning the JVM" section I have never been disappointed with the "Option 1" settings. Options 2 and 3 require the ability to place the application under load/stress test. If you do not have a load/stress test environment (i.e. you have to test in production) then stick with "Option 1".
Notice that "Tuning Performance" has several sections for both application developers and WebSphere administrators. This is because we all know that to realize the best performance gains one has to optimize the application code. Runtime tuning can realize 5-15% but application code improvements can see 300% and higher performance improvements.
I am writing this blog entry to remind people that data collection should occur in all phases of an application's life cycle in production. This includes collecting data like javacores and heapdumps in production when the application is running in a nominal, steady state condition. This provides valuable data for when problems do occur to be able to compare the data when negative behaviour is occurring to when things were not bad. The data also helps to feed trend analysis in conjunction with application monitoring tools in terms of understanding what users are doing under both conditions.
Collect data often and be prepared for that day when all of the sudden the production environment is exhibiting distress.