IBM Support

Analyzing data: Clustering WebSphere Process Server V6.1

Troubleshooting


Problem

This document helps you analyze clustering problems for WebSphere Process Server, versions 6.1.0 and 6.1.2.

Resolving The Problem

Tab navigation



This document describes methods for analyzing problems in a clustered WebSphere Process Server environment and how to resolve them using IBM tools.





Tools and resources for data analysis and problem determination

The following list of tools and information resources can help to solve common problems:
  • WebSphere Process Server support page

    The WebSphere Process Server support page is a starting point for current support information. It includes fixes, technotes, and other product-specific resources. You can search for additional documentation regarding common problems and how to solve them.

  • IBM Support Assistant

    IBM Support Assistant helps you search for answers to the problem. It is a free, local, software-serviceability workbench tool that can help you answer questions about a wide range of IBM software product families. In addition to detailed search functionality on IBM support databases, this tool includes problem analysis and determination tools. You can also open and track a PMR (Problem Management Record) for IBM support.

  • developerWorks WebSphere Zone

    The WebSphere Zone on developerWorks is a resource for product-related discussions, tutorials, and other information about WebSphere products and other product families. The forum provides a discussion board on common problems and solutions.




Diagnostic guide for identifying and solving a problem

Follow these steps to determine the problem and to find an appropriate solution using tools and resources provided by IBM:
  1. Identify the situation as you saw it the first time. A problem can occur at various times--during product installation, configuration, or run time--and it can affect a WebSphere Process Server-related service application or component. Problems in custom applications can also be analyzed by product-related logging mechanisms

    Example

    In this example, you start a deployment environment and, when you review the startup phase, you see that one of the WebSphere Process Server-related applications (BPCECollector) did not start:

    BPCECollector_<deploymentEnvironmentName>.<clusterName>

    When you try to start the application using the administrative console, the following error message is displayed:



  2. After you isolate the root problem, review the appropriate log files to gather the root cause and details of the problem. WebSphere Process Server provides a rich set of available log files depending on the component or activities (installation, migration, or run-time logging).

    Example

    The error message that is displayed in the administrative console front end does not provide details on the root cause but refers to the location (name of node and server) of the related log files. If no location is provided, find out to which server or cluster the application is mapped:

    Applications Enterprise Applications applicationName Target specific application status

    If the application is mapped to a cluster, you must review the log files for all of the servers that are members of the cluster to identify the server that has the problem. Locate the log file directory and review the log files that apply to the problem:

    <profileName>/logs/<serverName>
      /SystemOut.log
      /SystemErr.log

    Note: To reproduce the problem, make a note of the time stamp for a particular task. In a large log file, this helps you and IBM support to quickly find the log entries about the problem.

    To gather the problem description and error code, start with the SystemOut.log file at the point when the error message was reported on the administrative console front end:

    ActivationSpe E J2CA0138E: The MessageEndpoint activation failed for ActivationSpec eis/BPCCEIConsumerActivationSpec and MDB Application BPCECollector#EventConsumer, due to the following exception: javax.resource.ResourceException:

    CWSIV0961E: The authorization exception com.ibm.wsspi.sib.core.exception.SINotAuthorizedException:
    CWSIT0010E: A client request for messaging engine default.Messaging.000-CommonEventInfrastructure_Bus in bus CommonEventInfrastructure_Bus failed with reason:
    CWSIT0020E: An unexpected exception occurred in messaging engine default.Messaging.000-CommonEventInfrastructure_Bus in bus
    CommonEventInfrastructure_Bus while creating a connection, exception: com.ibm.wsspi.sib.core.exception.SINotAuthorizedException:

    CWSIP0302E: A user admin is not authorized to access
     the messaging engine default.Messaging.000-CommonEventInfrastructure_Bus on bus
    CommonEventInfrastructure_Bus was thrown while attempting to create a connection to the bus CommonEventInfrastructure_Bus using the activation specification [...]

    Caused by: com.ibm.wsspi.sib.core.exception.SINotAuthorizedException: CWSIT0010E: A client request for messaging engine
    default.Messaging.000-CommonEventInfrastructure_Bus in bus CommonEventInfrastructure_Bus failed with reason: [...]

    CWSIP0302E: A user admin is not authorized to access
     the messaging engine default.Messaging.000-CommonEventInfrastructure_Bus on bus
    CommonEventInfrastructure_Bus [...]

    The error message indicates a configuration problem regarding the security settings on the messaging engine and the Common Event Infrastructure (CEI) bus. A client (in this case the BPECollector application) tried to access the infrastructure component with a user name that does not have the necessary authorization privileges.

    Note
    : It is important to analyze the entire stack trace. Due to the architecture of the product and component interaction styles, some exceptions might be caused by another problem that occurred in subsequent components. You can find the root component and exception by looking at the last
    "Caused by" statement in the trace. For all problems, note the error codes at the beginning of an error message; those codes help in identifying related technotes and problem descriptions in the product documentation.

  3. If you cannot immediately troubleshoot the problem by using the log files, use the recommended tools and resources to search for an appropriate problem description.

    Note
    : Use IBM Support Assistant to improve the troubleshooting process. The tool searches various locations--product documentation, technote database, developerWorks repository, and many other resources--for known problems.

    Example

    Using the error codes J2CA0138E and CWSIT0010E to search using IBM Support Assistant results in matches for a technote that might address the problem. One of the technotes describes the problem:

    J2CA0138E and CWSIT0010E issued when no user authorized to connect to CEI bus

    Following the instructions in the technote solve the problem in the cluster configuration.

  4. If you cannot find the root cause of the problem, you can review the detailed trace if it enabled:

    <profileName>/logs/<serverName>/trace.log

    The SystemOut.log file often records the generation of a FFDC (First Failure Data Capture) file. FFDC files are the most detailed log files and are needed by IBM support. They cover the entire stack trace of a logged problem, not just an extract, as in the SystemOut.log file:

    ServiceLogger I com.ibm.ws.ffdc.IncidentStreamImpl open
    FFDC0009I: FFDC opened incident stream file <ffdcFileName>.txt



General and recurring problems with clustering

This section describes common problem areas in a clustered environment and recommendations for preventing problems in those areas:

Product installation and maintenance
  • Before installing WebSphere Process Server, review the hardware and software requirements to avoid performance bottlenecks. See the product documentation for the mandatory steps for preparing the operating system.
  • Install WebSphere Process Server on UNIX-based systems as the root user to avoid problems with file-system permissions. Note that installing the product as a non-root user has limitationsregarding profile creation.

  • Install the latest available and recommended fixes on all WebSphere Process Server installations, and keep all of the installations at the same fix-pack level to avoid inconsistencies.

  • Before you apply a fix pack to an existing clustered environment, read the installation instructions carefully for additional configuration steps that must be run on existing profiles. Before applying a fix pack on a production system, test the upgrade on a nonproduction copy of the clustered production environment. Implement a backup strategy for critical environments so you can roll back a failed upgrade. Refer to the appropriate fix-pack documentation, which documents the necessary backup steps.

Deployment environment creation
  • Whenever possible, create a deployment environment using the administrative console approach. One advantage is that the latest fix pack can be applied to the product installation before the deployment environment configuration. In addition, using the administrative console provides full control for configuring the deployment environment.

  • Create a database account with administrative privileges during the deployment environment setup. You cannot delay the database setup for a deployment environment.

Infrastructure and performance
  • An environment that is distributed over multiple computers needs a stable network infrastructure to perform well over time. Review the network and firewall configuration on all computers to eliminate communication limitations and port conflicts.

  • Enable tracing in the production environment only if necessary, and restrict detailed tracing to only some components. Consult the Collect troubleshooting data for clustering problems in WebSphere Process Server V6.1 (step 8) to get advice. Massive tracing activity affects the overall performance because of the large number of hard-disk write operations.

  • Tuning the product is an important step before activating a clustered production environment. This includes the performance tuning of custom applications and the database back end.

Database and performance
  • Check the connectivity of the database if it is hosted on a remote system. Make sure that the database can be reached from each computer that uses it. Access credentials and port numbers are known problems that can cause installation, profile creation, and run-time problems.

  • Use latest available database JDBC drivers from your vendor, and use type 4 database drivers for better performance. Review the supported databases for WebSphere Process Server before creating a clustered environment.

  • Measure or calculate the number of database connections you need at peak times, and rate the connection pools for the corresponding databases.

Cluster startup
  • If you use a deployment environment pattern (Remote Messaging or Remote Messaging and Remote Support), you might create multiple clusters that depend on one another. Make sure that you start the infrastructure and clusters in the right order to avoid startup problems:


    1. Database, Lightweight Directory Access Protocol (LDAP), and Web server
    2. Deployment manager (if needed)
    3. Node agents
    4. Messaging infrastructure cluster
    5. Support cluster
    6. Application deployment cluster

  • Note that starting or restarting a full deployment environment requires time for it to be fully up and running. Until that time, no requests are handled. You can use the ripplestart mechanism, which restarts servers in sequence and ensures that at least one server in the cluster is online to handle requests.


[{"Product":{"code":"SSQH9M","label":"WebSphere Process Server"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Clustering","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF012","label":"IBM i"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"},{"code":"PF035","label":"z\/OS"}],"Version":"6.2;6.1.2;6.1","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
15 June 2018

UID

swg21313681