Technical Blog Post
Anatomy of the application server JVM in WebSphere Application Server
Sometimes servers do not start. Maybe it is due to a bad build, the configuration has gotten corrupted, or it just stopped working. This blog helps you understand the anatomy of the Application Server start up process. It provides you with troubleshooting and debugging techniques to determine why the server is not starting or why the server is not stopping.
Starting the server
Let's see what happens when a server is started using the startServer.sh /bat server1 command.
Two Java virtual machines (JVM) are actually launched. The first JVM is the systems management server launch utility. Its job is to locate the appropriate configuration, for example the server.xml file, and spawn the second JVM, which is the actual server process.
Note: Some default properties are set in the various systemlaunch.properties files.
Spawning is achieved by constructing a command, such as java -Dxxx -classpath x;y;z etc, and executing it. This server launching process waits until it receives a status back from the server process unless you specify the -nowait parameter.
However, starting in WebSphere Application Server V8.0, there is a little change in the design of server start up process. When an application server starts for the first time after installing an update -- that is, a fix pack or an interim fix -- the server launch application performs any post-install configuration by invoking the runConfigActions.sh(.bat) command in a separate process before spawning the server process.
Here is an example output from this launch process:
ADMU0116I: Tool information is being logged in file D:WebSphereAppServerprofilesAppSrv01logsserver1startServer.log
ADMU0128I: Starting tool with the AppSrv01 profile
ADMU3100I: Reading configuration for server: server1
ADMU3200I: Server launched. Waiting for initialization status. <= At this point the server launching JVM is waiting for status back from the server JVM
ADMU3000I: Server server1 open for e-business; process id is 3004
Stopping the server
The server is stopped using the <PROFILE_HOME>\bin\stopServer server1 command.
As a result, a new JVM is created to read the configuration and send a message to the server to shut down. By default, the stopServer utility does not return control to the command line until the server completely shuts down. Unless the command is invoked with the “-nowait” option, it will not return until the server is fully stopped. A user ID and password is required to stop a secure application server.
Application Server start up fails but only the startServer.log file is created
Possible causes of the problem:
- Java could be corrupted.
- Invalid JVM arguments are set on the application server JVM.
- The startServer launcher could be crashing. Check the native log files.
- Javasharedcache is corrupted (Java class sharing)
- A java.lang.UnsatisfiedLinkError occurred.
- Non-root user problem, which is a permissions issue.
- The ulimit value was set too low on the system
Things to check when Java is corrupted
Try running the “java –fullversion” command from the WAS_HOME/java/jre/bin directory. If this command fails, then Java likely has a problem. When you start the server, you might see a messages such as:
- “The system cannot find the path specified.” –Check java classpath.
- “No public JRE found”
- java.lang.main() method could not be created
Check the native_stdout.log/native_stderr.log file to see if there are any entries. If you happen to see any entries in these log files, then it says that Java has a problem initializing. If previous checks are true, then, as a work around, you can try copying the entire /WAS_HOME/java directory from another working system. However, make sure that the level of Java and the WebSphere Application Server versions match. It is recommended that you back up the failing system before copying over the directory.
Invalid JVM arguments are set on the application server JVM
- You might have specified some Java arguments to the application server JVM due to an application requirement or another stack product requirement.
- Check the minimum and maximum heap size values that are set.
- Take a look at the application server server.xml file and see if there are any invalid JVM arguments specified. If invalid arguments exist, try removing them and then see if the server starts. If server starts, then it is a problem with one of the JVM properties.
When the java class cache is corrupted
It is possible that the Java cache is corrupted. A class cache is an area of shared memory of a fixed size that persists beyond the lifetime of any JVM that is using it. A JVM does not own the cache and there is no master and slave JVM concept; instead, any number of JVMs can read and write to the cache concurrently. A cache is deleted either when it is explicitly destroyed using a JVM utility or when the operating system restarts. A cache cannot persist beyond an operating system restart. Its purpose is to reduce the virtual memory footprint and improve JVM start up time. By default, this option is enabled starting with SDK 1.5 on all IBM platforms.
Run the WAS_HOME\profiles\profile_name\bin>clearClassCache.bat/sh command to clear up the Java cache of this WebSphere Application Server node from the common location of Java cache on the system level. Alternatively, you can also delete the content of the following directories:
- On UNIX-based platforms: /tmp/javasharedresources
- On Windows: C:\Documents and Settings\<userid>\Local Settings\Application Data\ javasharedresources
However, keep in mind that deleting the Java cache from system level deletes the cache for every Java instance on the system.
You can disable Java class cache feature permanently using the -Xshareclasses:none argument. To delete it, complete these steps in the administrative console:
- Click Servers > Application Servers > server_name> Java and process management > process definition > Java virtual machine.
- Under Generic JVM arguments, specify -Xshareclasses:none
- Save and synchronize the changes with nodes.
When the OSGi cache is corrupted
The Equinox OSGi framework is used to manage class loading and relationships between the server component bundles. In some cases, the cached bundle data, which is maintained on a per profile basis and has a separate cache at the WAS_HOME level for installation-wide processes, can become out-of-sync with the actual binaries on the server. You can use the osgiCfgInit.sh(bat) script to clear and recreate the OSGi cache.
You should run the osgiCfgInit script on the command line from the WAS_HOME/bin or user_install_root/bin directory. The behavior of the script depends on the directory from which you run the script. If you run the script from a profile level bin directory, the script clears the OSGi cache for all servers within that profile. If you run the script from the WAS_HOME/bin directory, the script clears the OSGi cache for all servers within the default profile.
Avoid trouble: Before you run the osgiCfgInit script, stop the server on which the script will be run. If you run this script on a server that is active, the server might have problems trying to read or update the cache after the script is finished.
There might be cases after applying a fix or fix pack with the Update Installer or with IBM Installation Manager, that servers (deployment manager, node agent, and application servers) might fail to start. The SystemOut.log file will not be generated to indicate a reason. The startServer.log shows:
!MESSAGE Error reading configuration: /home/WebSphere/AppServer/profiles/Dmgr01/configuration/org.eclipse.osgi/.manager/.fileTableLock (Permission denied)
java.io.FileNotFoundException: /opt/WebSphere/AppServer/profiles/Dmgr01/configuration/org.eclipse.osgi/.manager/.fileTableLock (Permission denied)
at java.io.FileOutputStream.openAppend(Native Method)
It is necessary that you run the osgiCfgInit.sh(bat) script before you start any server JVM for the first time after you install a fix pack when similar errors are thrown. The following documents describe some known issues with the OSGI cache in the context of root user and non-root user managing the WebSphere Application Server file systems:
This error gets thrown when the JVM cannot find the appropriate native library that is required for WebSphere Application Server to start.
Here are some causes:
- A user is starting the server who does not have the right permissions to load native libraries (.so) .
- You have edited some environment settings on the machine that might be causing the java.library.path to not set correctly.
Startup issues and a SystemOut.log file is generated
- The server start up process takes longer time
- The server start up process hangs.
- The server start up process fails with errors.
- The server starts fine, but has errors.
- Port conflict issues occur during the start up process.
You can ignore the following issues:
- The variables.xml, virtualHosts.xml FileNotFound exceptions can be ignored in the startserver.log file.
- Most of the warning messages can be ignored including FFDC messages.
Server stops by itself (graceful shutdown)
The process to get a thread dump or Javacore during a server shut down is documented at this URL: http://www.ibm.com/support/docview.wss?uid=swg21304559
When you set the -Dcom.ibm.ws.runtime.dumpShutdown=true property, a thread dump is triggered during the server shut down process. To set the property in the administrative console, complete these steps:
- Click Servers > Application Servers > server_name > Server Infrastructure > Java and Process Management > Process Definition > Java Virtual Machine > Custom Properties > New.
- Specify com.ibm.ws.runtime.dumpShutdown for the property name and true for the value.
For platforms where an IBM Software Development Kit is used, a Javacore is generated in the working directory of the application server. For all other platforms, a thread dump is written to the native_stdout.log file for the application server. Solaris/HP thread thread dumps are written out to the native_stdout.log as well as verbosegc. In addition to the thread dump, the stack trace of the current thread that is processing the shut down is included in the SystemErr.log for the application server.
This information should help to determine the source of the problem that is causing the Application Server to shut down gracefully .
I hope this blog helps you in understanding the anatomy of the WebSphere Application Server start up process and provides you with some troubleshooting / debugging tips for all currently supported releases.