IBM Support

How nodeagent monitors WebSphere Application Server

Technical Blog Post


Abstract

How nodeagent monitors WebSphere Application Server

Body

 

This document explains how the monitoring policy works in WebSphere Application Server (WAS) and what is the recommended way to start the application server in parallel.

There are multiple ways to monitor the application server. For example, using JMX programming, using 3rd party tools, and using other sources. This document explains only how to use nodeagent monitoring for application server and monitoring servers using the Windows service.

 

1. When I reboot my machine, I want to start all the servers (including Dmgr and nodeagent) automatically. What is the recommended way to do to that?

Create wasservice for Deployment Manager server and nodeagent using the WASServiceCmd and set it to automatic. Don't enable Application Server.

Set the monitoring policy of application server to Running.

Nodeagent can monitor application server process. It can start the server if the server is down) during nodeagent startup or can restart the hung server or start the server when it goes down abnormally.

Note: Never create wasservice for the application server and set to automatic. During machine reboot the application server might try to get started before nodeagent starts. This can cause the server to fail to register with nodeagent LSD and fails to start. This is the only reason why we don't recommend the server to be started automatically by the operating server process like WASService. Review the dW Answer item "What is the recommended way to start WebSphere Application Server, Dmgr and nodeagent automatically?" for more information.

 

2. How can we start the application servers in parallel? In other words, can I start all application servers at the same time (not in sequence)?

Yes, you can do it using the com.ibm.websphere.management.nodeagent.bootstrap.maxthreadpool custom property.

Set the property under System Administration > Node agent > nodeagent_name > Java and process management > Process definition > Java virtual machine > Custom properties.

Use this property to control the number of threads that can be included in a newly created thread pool. A dedicated thread is created to start each application server Java virtual machine (JVM). The JVMs with dedicated threads in this thread pool are the JVMs that are started in parallel whenever the node agent starts.

You can specify an integer from 0 - 5 as the value for this property. If the value you specify is greater than 0, a thread pool is created with that value as the maximum number of threads that can be included in this newly created thread pool. The following table lists the supported values for this custom property and their effect.

Property threadpool.maxsize is set to 0 or not specified - The node agent starts up to five JVMs in parallel.
Property threadpool.maxsize is set to 1                  - The node agent starts the JVMs serially.
Property threadpool.maxsize value between 2 and 5        - The node agent starts a number of JVMs equal to the specified value in parallel.

Note: With this property you can only start a maximum of 5 servers at a time.

 

3. Why does logging off of Windows system crash WebSphere Application Server?

You should never logoff the system when the windows service is not created for the process. It won't provide any footprint about the crash. It's almost like killing the java process from the Task Manager. When you logoff it kills the server process. It can be Dmgr, nodeagent, or application server process. You must create the windows service for the server process if you want the process to be running when you logoff the windows service user.

To create windows service using wasservicehelper, see the topic "Using the WASServiceHelper utility to create Windows services for application servers" in the product documentation.

Notes:

  1. If you want the process to be running (without the windows service), lock the computer, don't logoff.
  2. If you restart the system, the process will be killed irrespective of windows service.
  3. It's recommended not to create windows service for the application servers. Nodeagent should be monitoring the application server using the monitoring policy. For more information, please refer to the product documentation.

 

4. How does nodeagent monitor the application server and how does it know the previous state of the application server?

When the nodeagent monitors the application server (with the monitoring policy created as mentioned in question 1) it saves the server state information in the monitoring.state file. It will maintain the previous server state and the application server PID. In case of an application server crash or hang, the nodeagent will get the previous state of the server from the monitoring.state file and then try to start the application server automatically.

Note: If you notice StringIndexOutOfBoundsException or any other exception in the NodeAgent.loadNodeState stack (nodeagent Systemout.log file), it means the monitored.state file is corrupted. You must stop all servers, delete the file and then start the nodeagent again. For example:

    Caused by: java.lang.StringIndexOutOfBoundsException
    at java.lang.String.substring(String.java:1115)
    at com.ibm.ws.management.nodeagent.NodeAgent.loadNodeState(NodeAgent .java:3210)

 

5. My application servers were monitored by the nodeagent. When the server was hung, why didn't the nodeagent restart the server?

Before I answer this question, please review the product documentation section "Monitoring policy settings" that explains how the monitoring policy works in WAS.

Nodeagent PidWaiter sends the signal every ping time out interval to get the status of the application server. If the PidWaiter does not get the response back from Application Server then AppServer is considered hung. Once the application server is identified as unresponsive/hung the nodeagent PidWaiter sends a SIGTERM to the process, which does not guarantee the process is immediately stopped. It sends the signal wait for the process to normally shutdown. If the server doesn't respond to any request, the server just stays hung forever.

If you want the server to be killed when it's hung or doesn't respond to the nodeagent ping, then you need to set "com.ibm.server.allow.sigkill" property to true in the nodeagent custom property. Please review section "Java virtual machine settings" in the product documentation for more information.

 

[{"Business Unit":{"code":"BU004","label":"Hybrid Cloud"},"Product":{"code":"","label":""},"Component":"","Platform":[{"code":"","label":""}],"Version":"","Edition":""}]

UID

ibm11081005