Recently I came across a performance problem at a client which likely affects multiple other Portal installations; particularly those using JSF (JavaServer Faces). The client reported that a newly migrated Portal had significantly worse performance than it's predecessor.
1. JavaCores are often the single debug source providing the best clues. Javacores in Unix are taken with a "kill -3 pid" command where "pid" is the Unix/Linux process ID of the Portal process. In this case, the Javacores showed several WebContainer threads (WebContainer threads process most inbound HTTP requests) sitting in the following classes:
java/io/UnixFileSystem.getLastModifiedTime or java/io/UnixFileSystem.getBooleanAttributes0
The key observation is that the thread is blocked waiting in file IO from the Unix filesystem. Note that these threads all seem to be waiting on file attributes but not actually reading or writing the file content. That will be important later
2. This system is an AIX LPAR running in a new frame without competing LPARs (at this time). It has twin VIO servers handing file IO to a fast SAN on optical cable.
3. The Unix/Linux process statistics ("top", "topas", etc) show that the OS is not swapping and that there is plenty of buffer space for files.
I initially suspected a problem in the SAN or the VIO servers on AIX due to latency. But when the Javacores only showed waiting on the file attributes but not the file reads and writes themselves, I realized that Unix/Linux stores the file and attributes in memory buffers. The Java JVM was only checking the attributes to make sure that something had not updated on the SAN.
Realizing that files were not changing but that Java was blocking on the attribute lookups, I quickly came to the conclusion that we have frequent class reloading since the cache was full. While the SAN is fast, getting attributes from the SAN was still slower then loading the class contents which were already lingering in Unix file buffers. Looking carefully at the Javacores, I also noticed that the class loaders were instantiating most of the file IO which further pointed to class loading.
Upon inspection, the Java shared class cache was exhausted. To check this, you must run a Java command against the same JVM as the WebSphere Portal processes are using. You can get the exact location of the Java command from the Javacore; look for the "CMDLINE" string. In that string, you'll see the exact location of the Java command used to start Portal. Let's assume "/opt/IBM/WebSphere/AppServer/java_1.7_64/bin/java".
Run the following command to get all the cache names:
# /opt/IBM/WebSphere85/AppServer/java_1.7_64/bin/java -Xshareclasses:listAllCaches -fullversion
Listing all caches in cacheDir /tmp/javasharedresources/
Cache name level persistent OS shmid OS semid last detach time
Compatible shared caches
webspherev85_1.7_64_system Java7 64-bit yes In use
In this case the shared cache is named "webspherev85_1.7_64_system".
Running the next command will show the statistics on that cache:
# /opt/IBM/WebSphere85/AppServer/java_1.7_64/bin/java -Xshareclasses:name=webspherev85_1.7_64_system,printStats -fullversion
Near the bottom of the report you will see a line like this:
Cache is 61% full
This is the key metric. In the case of the customer with performance issues, the cache was 100% full.
So, the solution was to insure that the shared class cache was sufficiently large to avoid class reload thrashing!! This become more obvious as I realized that JSF loads an incredible number of class instances which could help overwhelm the class cache.
One can quickly see the size of the cache by listing the shared cache file, "/tmp/javasharedresources/":
# ls -latr /tmp/javasharedresources
drwxrwxrwx 2 root system 512 Mar 18 13:09 .
-rw-r----- 1 root system 157286400 Mar 18 13:12 C260M3A64P_webspherev85_1.7_64_system_G21
drwxrwxrwt 32 bin bin 67584 Mar 18 13:15 ..
So, you can see a 150M cache size on that file which corresponds to a -Xscmx150M parameter in the generic JVM arguments of the AppServer defined for Portal.
Note also that this is a statically allocated file. So, if you want to change the size of it, stop ALL Java process that use the shared cache. Then, delete this file. It will be created by the first Java process to the size dictated via -Xscmx. All subsequent Java process cannot change the size of the cache file.
It's also import that "/tmp" reside on a very fast disk; preferable a virtual file in memory and not something with latency like a SAN.