Diagnosing the Integration Server

Introduction

This chapter contains information for the server administrator who troubleshoots the Integration Server or maintains diagnostic data from the server. Diagnostic data is the configurational, operational, and logging information from the Integration Server. This information is useful in situations where the server becomes unresponsive and unrecoverable.

To facilitate the troubleshooting process, the Integration Server provides the following features:

  • Diagnostic port. A special port that uses a dedicated thread pool.
  • Diagnostic utility. A special service that extracts important diagnostic data from the Integration Server.
  • Safe mode switch. A method of starting the Integration Server in which the server does not connect to any external resource.
  • Thread dump. A facility to generate a log containing information about currently running threads and processes within Java Virtual Machine (JVM), to help diagnose issues with Integration Server.

Configuring the Diagnostic Port

The diagnostic port is a special port that uses threads from a dedicated thread pool to process requests submitted via HTTP. It behaves like a typical HTTP port, except that the server uses the diagnostic thread pool instead of the server thread pool.

By maintaining a separate thread pool, this port improves the troubleshooting capability when the server becomes unresponsive. For example, when the server reaches its maximum number of threads, you cannot open the Integration Server Administrator. This prevents you from accessing information that might help you determine why the threads are not available. It also prevents you from freeing up other server resources. Using the threads from the diagnostic thread pool, the diagnostic port enables you to open the Integration Server Administrator.

When you install the Integration Server, it automatically creates the diagnostic port at 9999. If another port is running at 9999, the server will disable the diagnostic port when you start the Integration Server. To enable the diagnostic port, you must edit the port number. For instructions about how to edit port configurations, see Editing a Port. Only one diagnostic port can exist on each Integration Server.

Diagnostic Thread Pool Configuration

Through the Integration Server Administrator, you can configure the number of threads in the diagnostic thread pool. The server adds threads to the pool as needed until it reaches the maximum allowed. If the server reaches the maximum number, it waits until processes complete and returns threads to the pool before beginning more processes.

You can also set the thread priority for the diagnostic thread pool. The diagnostic thread priority determines the order of execution when the JVM receives requests from different threads. The larger the number, the higher the priority. When the JVM receives requests from different threads, it will run the thread with the higher priority. Therefore, by assigning a higher priority to the threads in the diagnostic thread pool, you can take advantage of the dedicated thread pool and improve access to the Integration Server Administrator.

For more information about how to configure the diagnostic thread pool, see Working with Extended Configuration Settings.

Diagnostic Port Access

Only users in the Administrators group can access the server through the diagnostic port. You can access the Integration Server Administrator via http://<hostname>:<diagnostic port> where hostname is the machine that hosts the Integration Server and diagnostic port is the diagnostic port number. After prompting you for a username and password, the server displays the Integration Server Administrator. Because you can access the diagnostic port only through HTTP, data and passwords will be passed clear=unencrypted.

The diagnostic port allows access only to services defined with the Administrators ACL. Software AG recommends that you do not change the default access settings.

Note: Software AG strongly recommends that you discourage any external user access to the diagnostic port and utility. LDAP users should not access the diagnostic port.

Using the Diagnostic Utility

You use the diagnostic utility to collect configuration, operation, and logging data from the Integration Server. You can also use the diagnostic utility to view the list of fixes applied to the installed packages and Integration Server. The diagnostic utility is an Integration Server service called wm.server.admin:getDiagnosticData. It is accessible only by members of the Administrators group. Although you run the utility via the diagnostic port to troubleshoot, it can also be used with any HTTP or HTTPS port to collect diagnostic data periodically. You can access the service through the Get Diagnostics button in Integration Server Administrator.

The diagnostic utility creates a temporary diagnostics_hostname_port_yyyyMMddHHmmss.zip file in the Integration Server_directory \instances\instance_name\logs directory and writes to the .zip file as it collects information. It also contains a config\PackagesAndUpdates.txt file, which lists the packages and package updates for the Integration Server.

If there are problems creating the .zip file, such as insufficient space in the file system, it will return a text file. In the text file, the configuration and operation data are separated into distinct sections for easier reading. Unlike the .zip file, the text file does not contain logging data.

The .zip file contains a file in the config directory called PackagesAndUpdates.txt. This file lists the packages and package updates for the Integration Server.

Diagnostic Utility Performance

The diagnostic utility can execute slowly when logging large amounts of data from the Integration Server. To increase performance, you can set limits to the amount of data the diagnostic tool returns by specifying a maxLogSize value or setting the watt.server.diagnostic.logperiod parameter.

The maxLogSize parameter of the wm.server.admin:getDiagnosticData service sets the size limit for log files written to the diagnostic_data.zip file. If a log file exceeds the specified maxLogSize, the diagnostic utility omits it from the .zip file but records it in a diagnosticwarning.txt file. This file lists all of the log files that exceed the maxLogSize value. It is located in the logs directory of the .zip file.

Note: You can use the maxLogSize parameter only when running the diagnostic utility from a browser. You cannot limit the log size when you run the diagnostic utility from Integration Server Administrator. For more information, see Running the Diagnostic Utility Service.

Use the watt.server.diagnostic.logperiod parameter to specify the log period. By default, it is set to 6 hours. When this property is set to 0, the utility does not return any log files. It returns only the configuration and run-time data files.The logging information the utility returns depends on how you store the logs. If you save the logs to a database, the diagnostic utility will return the exact number of log entries that match the specified number of hours. If you save the logs to the file system, it will return not only the period within the specified number of hours but the entire log for that day. For instructions about how to set server configuration parameters, see Working with Extended Configuration Settings.

Use the watt.server.diagnostic.logFiles.maxMB parameter to specify the size limit for including audit log tiles in the diagnostic archive. While collecting each audit log file, Integration Server calculates the total size of the log files for the requested log period. If the total size of the log files for a particular audit log exceeds the value of watt.server.diagnostic.logFiles.maxMB for the log period, Integration Server does not include that audit log file in the diagnostic archive. Consider the examples below.

Example 1

  • watt.server.diagnostic.logFiles.maxMB is 250 and watt.server.diagnostic.logperiod is 6.
  • There are two WMSESSION log files that cover the previous six hours.
  • The total size of the two WMSESSION log files is greater than 250 MB.

RESULT: No session audit log data will be included in the diagnostic data archive.

Example 2

  • watt.server.diagnostic.logFiles.maxMB is 300 and watt.server.diagnostic.logperiod is 8
  • There is one WMSERVICE log file that covers the previous eight hours.
  • The size of the WMSERVICE log file is less than 300 MB.

RESULT: Service audit log data will be included in the diagnostic data archive.

Running the Diagnostic Utility from Integration Server Administrator

About this task

Complete the following procedure to run the diagnostic utility from Integration Server Administrator.

To run the diagnostic utility from Integration Server Administrator

Procedure

  1. Open Integration Server Administrator if it is not already open.
  2. Click About in the upper right-hand part of the screen.
  3. In the Software area, click Get Diagnostics.
  4. You can choose to perform one of the following:
    1. Open the diagnostic data file.
    2. Save the diagnostic data file to the client file system.
    3. Cancel and exit this operation.
    Note: If you save or open the diagnostic data file, it opens or saves the file to the client system. Integration Server automatically saves a copy to the Integration Server_directory \instances\instance_name\logs directory of the host machine where Integration Server is running.

Running the Diagnostic Utility Service

About this task

Complete the following procedure to run the diagnostic utility without using Integration Server Administrator. For example, you would use this method if you wanted to use the maxLogSize parameter to limit the size of the .zip file.

To run the diagnostic utility without using Integration Server Administrator

Procedure

  1. Start your web browser.
  2. Type the following URL:
    http://<hostname>:<port>/invoke/wm.server.admin/getDiagnosticData

    where < hostname > is the IP address or name of the machine and < port > is the port number where the Integration Server is running.

    Note: You can limit the byte size of the log files included in the .zip file by adding the maxLogSize parameter to the URL as follows:
    http://<hostname>:<port>/invoke/wm.server.admin/getDiagnosticData?maxLogSize 
    =number_of_bytes
                            
  3. Log on to the Integration Server with a username and password that has administrator privileges.
  4. You can choose to perform one of the following:
    1. Open the diagnostic data file.
    2. Save the diagnostic data file to the file system.
    3. Cancel and exit this operation.
    Note: If you save or open the diagnostic data file, it opens or saves the file to the client system. Integration Server automatically saves a copy to the Integration Server_directory \instances\instance_name\logs directory of the host machine where Integration Server is running.

Starting the Integration Server in Safe Mode

If Integration Server is having trouble starting because it or one of its packages cannot connect to an external resource, you can stop Integration Server and then start it in safe mode. When you start Integration Server in safe mode, it does not connect to any external resources, including databases. As a result, when Integration Server is in safe mode, it writes audit logging data associated with the IS Core Audit and Process Audit functions to flat files on the Integration Server instead of to an external database. In addition, when in safe mode, Integration Server loads only the WmRoot package; all other packages are inactive. When you restart Integration Server after you diagnose and correct the problem, Integration Server resumes audit logging for IS Core Audit and Process Audit functions to the external database and loads all enabled packages.

Important: Use safe mode for diagnostic or troubleshooting purposes only. Do not run any regular Integration Server tasks or Designer while in safe mode. It will return unpredictable results.

If Integration Server could not connect to a Broker or database, check the appropriate connection parameters and modify them as necessary. For instructions, see the webMethods Audit Logging Guide .

If a package such as Trading Networks Server or the webMethods SAP Adapter could not connect to an external resource, open Integration Server Administrator and go to the Packages > Management > Activate Inactive Packages page. In the Inactive Packages list, select the package and click Activate Package. Integration Server puts the package into the state it would have been in if you had started Integration Server normally. For example, if the package would have been enabled, Integration Server loads and enables it. Check and modify the connection parameters using the instructions in the appropriate guide.

Starting Integration Server in Safe Mode

About this task

To start Integration Server in safe mode

Procedure

  1. Stop the Java process associated with the Integration Server(for example, in Windows Task Manager).
  2. At the command line, go to the home directory of the server instance (Integration Server\profiles\IS_instance_name) and enter one of the following commands to start the server.
    System Command
    Windows bin\console.bat -safeboot (other switches)
    UNIX bin/console.sh -safeboot (other switches)

    For information about other switches, see Starting a Server Instance from the Command Prompt. When you open the Integration Server Administrator, it will display a message indicating that the server is running in safe mode.

When the Server Automatically Places You in Safe Mode

If the Integration Server detects a problem with the master password or outbound passwords at startup, it will automatically place you in safe mode. You will see the following message in the upper left corner of the Server Statistics screen of the Integration Server Administrator:

SERVER IS RUNNING IN SAFE MODE. Master password sanity check failed -- invalid 
master password provided.

These problems can be caused by a corrupted master password file, a corrupted outbound password file, or by simply mis-typing the master password when you are prompted for it. If you suspect you have mis-typed the password, shut down the server and restart it, this time entering the correct password. If this does not correct the problem, refer to When Problems Exist with the Master Password or Outbound Passwords at Startup... for instructions.

Generating Thread Dumps

If Integration Server or a subsystem becomes slow or unresponsive, or users are unable to log into Integration Server, you can generate thread dumps to help you diagnose the problem. A thread dump can help you locate thread contention issues that can cause thread blocks or deadlocks.

You can generate thread dumps of the following:

  • The JVM in which the Integration Server is running
  • Individual threads running on Integration Server

Based on the information you obtain from these thread dumps, you might be able to correct the problem.

If you detect a problem with a thread that is associated with a user-written Java service or a flow service, you have the option of canceling or killing the thread.

When you cancel a thread, Integration Server frees up resources that are held by the thread and returns the thread to the thread pool. If Integration Server cannot cancel the thread, it gives you the option of killing the thread. When you kill a thread, Integration Server terminates the thread and adds a new one to the thread pool. For more information about canceling and killing service threads, see Canceling and Killing Threads Associated with a Service.

The following example describe how you might use the JVM thread dump and individual thread dumps to diagnose and fix problems.

Scenario 1: A Service Is Running Longer than Expected
  1. A Flow service has been running for a very long time. You suspect the service is in an infinite loop, or is waiting for external resources that are not available.
  2. You log in through the Integration Server primary port and navigate to the Statistics > System Threads screen.
  3. On the System Threads screen you see threads that are associated with the service in question. You look at individual dumps of those threads.
  4. Using the information provided in the dumps, you determine that the threads are experiencing contention issues.
  5. You cancel the threads. This action allows the service to complete.
Scenario 2: The Server Is Unresponsive, Users Cannot Log In Through the Primary Port
  1. Integration Server is unresponsive and no one can log in through the primary port.
  2. You log in through the diagnostic port and navigate to the Statistics > System Threads screen.
  3. On the System Threads screen you see multiple threads for the same service. You look at individual dumps of those threads.
  4. Based on the information provided in the dumps, you try canceling the threads. The problem persists.
  5. You try killing the threads. The problem persists.
  6. You perform a JVM dump.
  7. Using the information provided in the JVM dump, you determine the cause of the problem and are able to resolve it.

The following procedures show how to generate dumps for individual threads and for the JVM.

Generating a Dump of an Individual Thread

About this task

To view information about an individual thread

Procedure

  1. Open the Integration Server Administrator if it is not already open.
  2. From the Server menu in the Navigation panel, click Statistics.
  3. In the System Threads field, in the Current column, click the number of current threads.

    The System Threads screen displays. This screen contains a list of all the threads that are currently running on the server.

  4. To view a dump of a particular thread, in the Name column for that thread, click the thread name.

    The server displays a dump of that thread.

Generating a Dump of the JVM

About this task

To generate a JVM thread dump

Procedure

  1. Open the Integration Server Administrator if it is not already open.
  2. From the Server menu in the Navigation panel, click Statistics.
  3. Under Usage, click the Current number of System Threads.

    The Server > Statistics > System Threads page is displayed.

    The System Threads table lists thread names and displays information about them.

  4. Click Generate JVM Thread Dump.

    The Server > Statistics > System Threads > Thread Dump page is displayed, showing the dump.

  5. Click Return to System Threads to return to the System Threads page.
  6. For information about server recovery when a hardware or software problem forces you to do a server restart, see Starting and Stopping the Server.