IBM®
Skip to main content
    Country/region [select]      Terms of use
 
 
    
     Home      Products      Services & industry solutions      Support & downloads      My IBM     
developerworks > My developerWorks >  Dashboard > WebSphere Virtual Enterprise > Home > Best practices for managing the on demand router
developerWorks
Log In   View a printable version of the current page.
Best practices for managing the on demand router
Added by finn, last edited by CarrieMiller on Oct 23, 2009  (view change)
Labels: 
(None)

Overview and features

IBM WebSphere Virtual Enterprise improves the quality of service for the users of your site while decreasing the total cost of ownership of your middleware server environment. WebSphere Virtual Enterprise
improves quality of service by dynamically monitoring the demand for applications and ensuring that sufficient resources are available to meet that demand. As a result, the cost of total ownership decreases by
streamlining the hardware resources automating common administrative tasks. Automating administrative tasks has the additional benefit of improving the consistency and continuous availability
of your middleware applications.

The on demand router (ODR) is a key component that enables many features of WebSphere Virtual Enterprise. The ODR is the entry point into the WebSphere Virtual Enterprise environment for HTTP
(including SOAP over HTTP) and Session Initiation Protocol (SIP) traffic. This document provides best practice information for managing the ODR.

Hardware sizing requirements for ODRs

Although the ODR provides many advanced features, it is very efficient and scalable. The ODR is designed to be completely asynchronous. In particular, rather than letting threads block while waiting for network I/O to complete, the ODR reassigns these threads to other requests or responses that are ready to be processed. Because of this design, the ODR can scale to a very large number of concurrent connections.

As a general rule, the system requirements for an ODR are similar to those of a Web server with a WebSphere plug-in. However, you should consider several factors when you are determining the number and size of systems needed to host the ODRs. In particular, you should consider the following factors:

  • The maximum number of concurrent requests
  • The average request and response payload
  • The percentage of inbound and outbound SSL connections
  • The percentage of responses that are cached by the ODR (if any)

You can use the following recommended procedure to estimate the number and size of ODR systems that are required:

  1. Let PeakTP equal the peak throughput for your environment in units of requests/second. If the ODRs are fronting multiple cells, be sure to consider all cells in this calculation. For the following example, PeakTP equals 27000 requests/second.
  2. Let AvgPL equal the average payload size of the HTTP request body plus the HTTP response body for your environment.
  3. Let MaxOdrTP be the maximum throughput of an ODR for the average payload (AvgPL) by referring to the following chart. For example, if AvgPL is 5 kilobytes, then MaxOdrTP is 15158 requsts/second.
    ODR message size ramp up
  4. The value of MaxOdrTP is the maximum throughput for an average payload of 5 kilobytes as measured on a 32 bit 4-way Xeon MP. Let MyMaxOdrTP equal the adjusted value for the platform that you use to host the ODR. The following chart can be used to compare various platforms. For example, if the ODR will be on a Xeon EM64T dual core, then MyMaxOdrTP = 15158 * 22733/18475 = 18651.
    ODR platform comparison
  5. Let NumODRs equal the maximum of 2 (for high availability) and PeakTP/MyMaxOdrTP (rounded up). For example, with PeakTP equal to 27000 requests/second and MyMaxOdrTP equal to 18651 requests/second, then two ODRs are sufficient to handle the peak workload. You can provision additional ODRs for future growth or increased high availability.

Common configuration tasks for the ODR

This section discusses several steps to consider when configuring the ODR. Not all of these steps are compulsory.

HTTP trusted security proxies

In the standard HTTP topology, a Web server forwards requests to the ODR, which then routes the request to an application server. In this topology, a separate Transmission Control Protocol (TCP) connection exists from the client to the Web server, and another TCP connection exists from the ODR to the application server. When the WebSphere plug-in that is running in a Web server process forwards a request to the ODR, the plug-in adds special HTTP headers that provide information about the original connection from the client. These special headers are often called private headers, and the name of these headers is of the form $WS xx. For example, the $WSRH contains the name of the remote host, which is the host name of the client that connects to the Web server. If the ODR receives private headers from an address that is not in the trusted security proxies list, then the ODR ignores the private headers it receives on the request and generates its own private headers. Generating ODR private headers is necessary to prevent a rogue client from forging false values for these headers.

Therefore, in this topology, you must add the host name or IP addresses of each Web server to the list of "Trusted security proxies" (one per line) on the ODR settings panel. If you specify the host name of the Web server, first make sure that you are able to resolve the host name from the ODR server by invoking the ping <hostname> command.

If you do not add the Web server's host name or address to the list of trusted security proxies, you might experience the following issues:

  • 404 (Not Found) errors that are returned by the ODR when the ODR is listening on a non-standard HTTP port. The private header that contains the original port to which the client connected to on the Web server (e.g. port 80) is ignored; instead, the ODR listening port is used, resulting in a virtual host mismatch.
  • Unexpected rule processing by the ODR. When you are configuring routing or service policy rules in the ODR, several of the HTTP operands attempt to retrieve information that pertains to the client's connection. However, if the private headers are ignored, the client is considered to be the Web server instead of the real client, resulting in unexpected values being returned by these operands.
  • Unpredictable application behavior. With the servlet API, applications can retrieve information about the client connection, for example, the client's host name or IP address. Unpredictable behavior can occur if the Web server is considered to be the client rather than the true client.

The plugin-cfg.xml file

In the standard topology, WebSphere Virtual Enterprise can automatically generate and propagate the plugin-cfg.xml to the Web server machine. This plug-in file is different from the file that is generated by WebSphere Application Network Deployment because the target servers are the ODRs instead of the application servers.

Generating the plugin-cfg.xml file

WebSphere Virtual Enterprise supports two methods of automatically generating the plug-in:

  • ODR-generated plug-in: One or more on demand routers can be configured to generate the plug-in. Although this method is still supported, the following method is now recommended as a best practice.
  • Highly available plug-in generator service - This method is recommended because it runs as a single highly-available service. See Generating the plug-in configuration in a high availability environment in the WebSphere Virtual Enterprise information center for more information.
Propagating the plugin-cfg.xml file

WebSphere Virtual Enterprise supports three methods of propagating the plugin-cfg.xml file to the Web server machine each time it is automatically updated:

  • Web server on a managed node: In this method, the file is copied into the WebSphere repository and is then propagated through node synchronization to the appropriate node. To copy the file to the repository, you can invoke the propagatePluginCfg.py jython script (available in WebSphere Virtual Enterprise Version 6.1.1 or later) in wsadmin with the following arguments:
    copyToRepository <localPath> <repoPath> [<repoPath>]
    

    For example, the following cell custom properties generates a plug-in to route to all ODRs in the cell and propagates it to the Web server on node1 and node2.

    Name Value
    ODCPluginCfgOdrList_1 MyCell:.
    ODCPluginCfgOutputPath_1 /tmp/plugin-cfg.xml
    ODCPluginCfgUpdateScript_1 /WAS/bin/wsadmin.sh -f propagatePluginCfg.py -lang jython copyToRepository
    /tmp/plugin-cfg.xml
    cells/MyCell/node/node1/servers/webServer/plugin-cfg.xml
    cells/MyCell/node/node2/servers/webServer/plugin-cfg.xml
  • Script-based propagation: An arbitrary script can be run each time a new plugin-cfg.xml file is generated. This script can copy the plugin-cfg.xml file to the appropriate location.
    Tip
    When you are using this option, the recommendation is to run the script to first copy the file to a temporary location on the Web server machine, then rename the file. This eliminates a timing window in which the plug-in attempts to read a partially copied file, resulting in a parsing error by the plug-in and a failure to load the new plug-in file.

Configuring routing, service, and logging policies

Routing, service, and logging policies each contain an ordered list of rules. Each rule consists of a condition (or boolean expression) and an action. If the condition is true for a particular request, the action is performed and the remaining rules are skipped for the request. This section describes the various types of rules and the order in which they are processed by the ODR.

The following diagram shows the various types of rules and the order in which they are processed by the ODR. The remainder of this section describes each of these types of rules in more detail.

ODR routing rules

The first rules that are processed when an HTTP or SOAP request arrives are the ODR routing rules. These rules should not be confused with the application routing rules that are defined in an application work class. The ODR routing rules can only be created and managed with the wsadmin tool.

To list the ODR routing rules for the myODR ODR on the myODRNode node from a Jython script:

AdminTask.listRoutingRules('-odrname myODR -nodename myODRNode -protocol HTTP')

To add an ODR routing rule:

AdminTask.addRoutingRule('-odrname myODR -nodename myODRNode -protocol HTTP -priority 0 -expression "uri likein (\'/A\',\'/A/*\')" 
-actionType permit -routingLocations "cluster=cell1/clusterA,cluster=cell2/clusterA" -multiclusterAction Failover)

This rule matches all requests with a URI equal to "/A" or beginning with "/A/". The rule then attempts to route to clusterA in cell1, and if unable to do so, a failover occurs to clusterA in cell2.

Condition

The condition of an ODR routing rule is specified by the expression option. The allowable operands and operators are the same as for an expression that is associated with an application's SOAP work class. Because SOAP flows over HTTP, the SOAP operands also include the HTTP operands.

Tip
To build a valid value for the expression option with the administrative console, navigate to the Routing Policies tab under any application, expand Work classes for SOAP requests, click Add Rule and then Build subexpression to open the expression builder. Use this panel to build the value for the expression option. Then, cut and paste the value for use in a script to set an ODR routing rule. Remember that the AND, OR, and NOT operators can be used to build complex expressions.
Action

An ODR routing rule supports three types of actions as specified by the actionType option:.

  • reject - The 'errorcode' option must specify the error code to use when rejecting the request;
  • redirect - The 'redirectURL' option must specify the URL to which to redirect the request.
  • permit, permitsticky, permitMM, or permitstickyMM

For various permit-type actions, there are two options which affect how the request is handled by the ODR: sticky and MM (or maintenance mode).

The sticky option, when enabled, causes the ODR to keep affinity between the client and application server even when the application does not explicitly request affinity. This sticky option is sometimes referred to as active affinity. You should typically only use a sticky action (permitsticky or permit-stickyMM) when routing to an application that was not written to run in a cluster. Any Java 2 Platform, Enterprise Edition (J2EE) application that uses the standard servlet APIs for accessing session state is suitable for running in a cluster and does not require the sticky option. Using the sticky option when it is not needed can prevent the ODR from being able to appropriately balance load across the cluster.

The MM option stands for Maintenance Mode and implies that the request should be routed to an application server that is in maintenance mode, or to an application server on a node that is in node maintenance mode. Application servers can be in one of three server maintenance modes: normal, affinity, and break. If an application server is on a node that is in maintenance mode and its server maintenance mode is normal, the application server is treated as if it is in affinity mode. The following table shows how the ODR routes requests with respect to maintenance mode and the routing action. For example, the first line shows if an application server is on a node with maintenance mode disabled and the server maintenance mode is normal, then it might receive any request (with or without affinity) that matches a rule with a permit or permitsticky action, but it might not receive a request that matches a permitMM or permitstickyMM action.

Actions - Maintenance mode state permit, permitsticky permitMM, permitstickyMM
Node MM is disabled
Server MM is normal
Requests with or without affinity No requests
Node MM is enabled
Server MM is normal or affinity
Request with affinity Requests without affinity
Server MM is break No requests Requests with or without affinity

See Setting maintenance mode in the WebSphere Virtual Enterprise V6.1.1 information center for information on setting server maintenance mode.

routingLocations and multiclusterAction

ODR rules with a permit action must also specify the routingLocations and multiclusterAction options. The routingLocations option specifies one or more clusters (or standalone application servers) and the multiclusterAction option specifies how to select between the clusters or standalone servers. If you want to load balance between multiple clusters, the recommended multiclusterAction is WLOR (Weighted Least Outstanding Requests) instead of WRR (Weighted Round Robin). The WLOR algorithm provides much faster failover when an entire cluster becomes unavailable.

The value of the routingLocations option is a comma-separated list of cluster, server, and module elements. For example, consider the following example:

cluster=cell1/cluster1,server=cell1/node1/server1,module=cell1/*/*/*

This routingLocations contains three elements: (1) cluster 'cluster1' in cell 'cell1', (2) the stand-alone server named 'server1' on node 'node1' in cell 'cell1', and (3) all clusters in cell 'cell1' to which any module is deployed.

tip
Using "module=" for the routingLocations option allows you to indirectly specify a group of servers. In particular, it specifies the group of servers that are currently running the application AND which match the request. If the name of the cluster changes, the rule does not need to change.

For example, suppose that application A has a context root of /A, edition E1 is deployed to cluster C1, edition E2 is deployed to cluster C2, and a request arrives with a URI of /A. If the routingLocations option specifies module=///, the group of servers includes servers in clusters C1 and C2 that are currently running any edition of application A, or if the routingLocation option specifies "module=/A/E1/", the group of servers includes only those servers in cluster C1 that are currently running edition E1.

If no ODR routing rules are matched for a request, the default behavior is to load balance across all clusters that are running an application that can process the request. More specifically, the default behavior is equivalent to matching a rule with the following options:

-actionType permit -routingLocations module=*/*/*/* -multiclusterAction WLOR

Post processing

After ODR routing rule processing is complete, the ODR attempts to locate the application (or more accurately, the Web module) to which the request should be routed. The set of web modules considered are those that are associated with the matching rule. The web modules associated with a rule depend upon the value of the routingLocations option as follows:

  • A cluster specification includes all web modules deployed to the cluster;
  • A server specification includes all web modules deployed to the server;
  • A module specification includes the matching web modules.

If no ODR routing rule is matched, the default behavior is to consider all Web modules when attempting to locate a Web module to handle the request.

If no Web module is found for the request, the generic server cluster (GSC) routing policy rules are evaluated. The default action for the default work class of the GSC routing policies is to reject the request with a 404 (Not Found) error.

For more information on managing the ODR routing rules with wsadmin, see Rules for ODR routing policy administrative tasks or Defining routing policies for generic server clusters for more information on GSC routing.

Application routing rules

After a request has been mapped to a specific application, the application routing rules for that application are processed. These rules are those associated with work classes under the Routing policy tab of an application. There are both HTTP and SOAP work classes. If the HTTP request is also a SOAP request (SOAP over HTTP), then the SOAP work classes are used; otherwise, the HTTP work classes are used.

Each application can have multiple work classes, but it can have only one work class. There are two methods of performing different actions based upon virtual host or URI.

  • Create a separate work class and add the URI patterns from one or more of the application's Web modules.
  • Create a rule in the default work class by using the "Virtual Host" and/or "URI" operands.

The choice of which method to use is largely a matter of preference; however, creating a rule in the default work class is more straightforward and flexible.

Condition

The condition (or expression) depends upon which work class is used. The operands available for a SOAP (SOAP over HTTP) work class are a superset of the operands available for an HTTP work class. The subexpression builder in the administrative console aids in building these subexpressions. Keep in mind that you can create complex expressions by applying the AND, OR, and NOT operators to these subexpressions.

Action

The application routing rules (like the ODR routing rules) support three types of actions: permit, reject, and redirect. Reject and redirect actions are the same as for ODR routing rules. The permit action type, however, specifies the edition of the application to which to route, rather than a list of clusters to which to route as is the case for ODR routing rules.

Recommendation
Set the deactivate.checkRoutingRules and uninstall.checkRoutingRules cell custom properties to true. Setting these custom properties prevents you from inadvertently deactivating or uninstalling an edition of an application while an application routing rule still references the edition. If a request matches a routing rule that refers to an inactive or nonexistent edition, the request is rejected by the ODR. See Application edition manager custom properties for more information on these two properties.

Application service policies

After the proper edition of the application has been located, the service policy rules for that edition of the application are processed. These rules are associated with work classes under the Service policy tab of an application.

Condition

The operands available for building application service policy rules are the same as for building application routing rules as discussed earlier.

Action

The action associated with an application service policy rule is a transaction class which maps to a single service policy. The service policy specifies the quality of service to associate with the request.

Custom logging rules

The custom logging rules are processed just before sending the response back to the client from the ODR. The response may have been received from an application server or an ODR-generated response. The ODR can generate its own response when it finds the response in its cache or when an error condition prevented the ODR from sending the request to an application server.

The custom logging rules must be managed through wsadmin scripting. The manageODR.py script is available in WebSphere Virtual Enterprise V6.1.1 and is the recommended method. For WebSphere Virtual Enterprise V6.1.0.5, see Configuring custom logs in the WebSphere Virtual Enterprise information center for more information.

Condition

The operands that are available for building an expression for a custom logging rule include the "HTTP request operands", "SOAP request operands", and "HTTP response operands" as specified in the info center.

The HTTP response operands include:

  • response.code - the numeric HTTP response code
  • targetserver - the name of the application server which serviced the request in cell/node/server format
  • response.time - the number of milliseconds between when the ODR received the request and sent the response back to the client
  • service.time - the number of milliseconds that the ODR waited for a response from an application server.
Action

The action associated with a custom logging rule specifies the name of the log file and the format of the entry. See Configuring custom logs in the WebSphere Virtual Enterprise information center for more information, especially the "Custom logging parameters" table which specifies what can be logged.

Examples

The following example adds a rule to ODR odr on node odr1. The condition is that the response time is longer than 5 seconds. The action is to write a log entry to a file named slow.log where the entry consists of a time stamp, the first line of the HTTP request, the response code, the target server, the response time, and the service time.

wsadmin.sh -f manageODR.py -lang jython insertCustomLogRule odr1:odr 1 "response.time > 5000" "slow.log %t %r %s %Z %R %T"

Note that the response time includes the time spent in the ODR and is therefore always longer than the service time.

The following logging condition is true if the request's URI was either /A or begins with /A/, and the HTTP response code was not 200 or 304. If the condition is true, an entry is logged to the A-error.log file.

wsadmin.sh -f manageODR.py -lang jython insertCustomLogRule odr1:odr 2
      "(URI likein ('/A','/A/*')) AND ( NOT response.code in (200,304)) "
      "A-error.log %t %r %s %Z"

The following table summarizes the allowable operands for each policy:

Policy Allowable operands
ODR routing rules For protocol HTTP, HTTP request operands; for protocol SOAP, HTTP request operands + SOAP request operands
Application routing rules For HTTP work classes, HTTP request operands;
for SOAP work classes, HTTP request operands + SOAP re-quest operands
Application service policy rules For HTTP work classes, HTTP request operands;
for SOAP work classes, HTTP request operands + SOAP
Custom logging rules HTTP request operands + SOAP request operands + HTTP response operands

In summary, there are three types of operands associated with various policies.

  • HTTP request operands - retrieves information that is available when an HTTP request is received by the ODR;
  • SOAP request operands - retrieves SOAP-specific information that is avail-able at request arrival time if the request is a SOAP over HTTP request;
  • HTTP response operands - retrieves information that is available after the response is received from an application server.

Multi-cluster routing

The ODR supports multi-cluster routing for clusters that are in the same or different cells. There are several reasons that multiple clusters might be needed, such as:

  • Cell is unavailable: An entire cell could be down for hardware or software maintenance or failure. This includes a disaster recovery scenario. Maintaining multiple cells (and thus multiple clusters) provides continuous avail-ability when a cell is down for any reason.
  • High application demand: A single cluster cannot span multiple core groups in WebSphere Application Server Network Deployment, and the recommended maximum number of Java virtual machines in a single core group is 40 to 50. If your application has sufficiently high demand, the application could outgrow a single cluster. You can create multiple clusters in a single cell or in multiple cells.

There are two methods of configuring multi-cluster routing:

  • ODR routing rules: The preferred method because of its flexibility. See the previous "ODR routing rules" section for more information.
  • Cell custom properties: The original method, and is still supported by the ODR. Custom properties allow a load balancing or failover policy between clusters to be configured on a cell or application basis. It does not, however, allow you to select a cluster based upon a generic condition or boolean expression. For more information on this method, see Configuring the on demand router for multi-cluster failover and load balancing routing in the WebSphere Virtual Enterprise information center.

Edition-aware routing

Edition-aware routing is the ability of the ODR to route to the correct edition of an application at the correct time. The ODR supports both implicit and explicit edition-aware routing.

Implicit edition-aware routing occurs when a new edition of an application is rolled out. You can roll out a new edition in the edition control center in the administrative console or with scripting. In this case, the ODR communicates with the deployment manager and the application servers to ensure continuous availability during the rollout process. Because an atomic rollout guarantees that only one edition of the application is concurrently servicing requests, the ODR queues requests during the short switch-over period between edition availability.

Explicit edition-aware routing occurs when multiple editions of an application are deployed to different deployed targets (e.g. clusters). In this case, routing rules can be explicitly configured to tell the ODR which edition of application should handle the request. WVE supports two methods of configuring explicit edition-aware routing:

  • Application routing rules - This is the easiest and recommended method for configuring edition-aware routing within the same cell. See the previous section on "Application routing rules" for more information on this method.
  • ODR routing rules - This is the only method which supports edition-aware routing between multiple cells. See the previous section on "ODR routing rules" for more information on this method.

    Restricting routing to nodes, servers, and applications

  • Node and server maintenance mode
    Node and server maintenance mode can be used to control how the ODR routes as shown . Node maintenance mode is bi-modal with respect to ODR routing; it is either enabled or disabled. Server maintenance mode is tri-modal If an application server is in maintenance mode or on a node that is in maintenance mode, the ODR will typically ODR routing to a node or application server, respectively.

In particular, the ODR will not route requests with no affinity to an application server that is on a node which is If a node is in maintenance mode, only requests with affinity will typically be routed to an application server on the node. Node maintenance mode does not affect routing of requests with affinity.

  • Application quiesce

Autonomic request flow manager (ARFM)

As previously discussed, the ODR supports a rich set of configuration options for controlling where a request is routed. This section discusses configuration of ARFM, which controls when and if the ODR routes a request.

The ARFM is a core component of WebSphere Virtual Enterprise. Its primary purpose is to provide service level assurance and differentiation by making prioritization decisions about which incoming requests to service and when. These decisions are based on dynamic analysis of the incoming traffic, the available server capacity, and the service goals defined by the user. ARFM manages available server capacity by queuing lower priority requests in order to leave sufficient resources for higher priority requests.

Note that when referring to server capacity, ARFM considers the entire set of application servers that are capable of serving the request (for example, a cluster), not the current capacity or load on a single server (unless only one server is capable of serving a request for the application in question).

  • Service policies

Service policies are the core concept which dictates how ARFM decides which traffic to queue when insufficient resources exist to process all traffic immediately. ARFM keeps track of the historical behavior of traffic for a given service policy so that it can make predictions about incoming requests. These statistics include how long a request is expected to require on the back end and how much processor is required by a request. Using these two factors, ARFM can determine the optimal number of requests that can be accepted for each service policy while meeting the desired service goal, and to what extent a request can be queued for some period of time and still meet its service goal.

Due to this need to calculate accurate historical statistics, it is very important that traffic associated with a particular service policy has consistent behavior. If some requests for a particular policy require significantly more processor on the back end than the historical average, ARFM potentially overloads the backend system. Conversely, if requests take less processor than the average, the system is underloaded (incoming requests are queued unnecessarily). Similarly, if a request takes more time than expected, ARFM is forced to hold back other requests.

In addition to ensuring that all traffic associated with a specific service policy has similar processor and response time metrics, it is important that the goal of the policy is not set too tightly. As stated earlier, ARFM manages traffic flow by queuing requests that it determines can safely be queued for some period of time and still meet their service goal. If a service policy is configured such that requests are barely meeting their service goal even when they are not queued, then ARFM will not be able to queue those requests to prevent overload, without breaching the goal. This makes it difficult for ARFM to make good prioritization decisions since restricting any traffic would result in a goal breach. By giving ARFM some headroom to work with, it can queue traffic for short periods of time without breaching the goal.

Because ARFM is dependent on properly defined service policies, it is best to disable ARFM to avoid conditions in which it will underutilize the available resources. This can be done by running the disableARFM.py wsadmin script located in the <WAS_HOME>/bin directory. After service policies are configured, ARFM can be enabled by invoking the enableARFM.py script. This setting takes effect dynamically, no restart is required. Note that disabling ARFM also disables CPU overload protection, as ARFM will not queue traffic under any circumstances.

  • Memory overload protection

Memory overload protection (MOP) may best be thought of as heap overload protection. In particular, it prevents heap exhaustion from occurring in an application server's JVM due to a high request rate. MOP is disabled by default. To enable MOP, simply enter a percentage that is less than 100 on the ARFM panel as shown below. The default percentage is 100, which means that MOP is disabled.

When MOP is enabled, the ODR dynamically monitors the heap utilization in application servers to which it is routing. When the heap utilization nears the maximum configured heap percentage (80% on the panel above), the ODR begins to reject some HTTP requests that do not have affinity. MOP does not reject requests with affinity.

MOP does not currently directly protect against heap exhaustion that can occur from memory-to-memory replication when replicating session state. If another application server is started or stopped, it may still be possible that heap exhaustion can occur when session data is replicated to other running application servers. Because application servers in a dynamic cluster might be automatically started or stopped by the application placement controller (APC), it is recommended that you set MOP to 60% to leave room for replicated session data and increase your JVM's maximum heap setting by 20%. Changing these settings provides sufficient temporary heap space for replicated session data during application server starts and stops, while rarely using more than 60% of the JVM maximum heap. Even though the heap utilization temporarily goes above 60% when this occurs, MOP diverts traffic to other servers until the heap utilization has fallen below 60% again.

Note also that the ODR only rejects a request if there are no application servers in the cluster that can accept the request. In other words, the ODR checks all application servers before rejecting the request.

Troubleshooting

There are a number of logs/files which are helpful in determining the source of routing problems or configuration issues:

  • SystemOut.log/SystemError.log : Located in the logs directory, these files always exist and may contain indications of exceptions (such as an Out Of Memory condition) which could be causing the observed errors.
  • FFDC: First Failure Data Capture. This is a collection of exceptions and failures that have occurred in the system. Sifting through the recent entries can be helpful in determining if something has gone drastically wrong with the system.
  • Trace.log: Located in the logs directory. This file contains any specifically enabled trace. See below for information on what trace should be enabled and when.
  • Proxy.log: Located in the ODR logs directory. Traces the requests that are forwarded from the ODR to a back end server. The response codes seen in this log represent the response codes returned by the back end server.
  • Local.log: Located in the ODR logs directory. Traces requests which were handled by the ODR directly. The response codes seen in this log represent the response code returned by the ODR itself.
  • dumpODRState.jacl: A wsadmin script that is located in the bin directory. When invoked, provides various information about the state of the ODR including an output of the current routing information (target.xml file) and an indication of whether or not requests are being queued by ARFM.
  • Target.xml: Included in the output of the dumpODRState.jacl file. This content can also be obtained by enabling "com.ibm.ws.odc.*=all" tracing. In this case the file is in the <WAS_HOME>/profiles/<profilename>/installedFilters/wlm/<process>/target.xml directory. This file contains the current state of the cell including server state, application state, and routing rule information. This is the data used by the ODR to make routing decisions.
  • Custom logging : When enabled, custom logging traces information about requests that meet specific parameters such as a specific URL or client host name.
  • Binary trace(available in WebSphere Virtual Enterprise Version 6.1.1.0+): Located in the logs directory, these files contain always enabled trace which can be used to determine the source of rout-ing failures. The files are named "btrace.*" and "objects".
  • odrDebug.py(available in WebSphere Virtual Enterprise Version 6.1.1.0) - A wsadmin script located in the bin directory. This script can be used to force the ODR to dump debug information when it encounters a routing failure such as a 503 or 404. The debug information is written to the trace.log file. This is automatically done in WebSphere Virtual Enterprise Version 6.1.1.0+.

ODR problems typically fall into one of the following categories:

  • Request returns a 404
  • Request returns a 503
  • Request times out/returns very slowly
  • Request loses affinity/session
  • Request is routed to the wrong application edition/cluster/etc.
Isolating the problem tier

For the first three types of issues, problem determination should start by verifying that the application servers that are hosting the application are responding correctly. You can verify that the servers are responding by sending a request directly to the application server, instead of through the ODR or through the Web server in front of the ODR.

For 503 or 404 errors, it is also possible to determine the source of the problem by checking the local.log and proxy.log files. Error codes in the local.log file indicate that the problem exists within the ODR tier.

If the application cannot be accessed directly, or the proxy.log file shows that the error codes are coming from the application server, the problem needs to be approached as an application server or application issue.

If the application is accessible when making a request directly to the application server, it is also important to verify that there is no issue between the Web server and the ODR, if a Web server is present. You can check for this issue by sending a request directly to the ODR.

If these tests confirm that the issue exists in the ODR tier, you should determine what is causing the ODR to mishandle requests.

Analyzing 404 Errors

A 404 error code means that the ODR was unable to map a request URI to any installed application. This error could be from routing rules that were configured incorrectly, virtual host definitions being wrong, the application not being installed, or incorrect data in the routing information for the ODR, which is represented in the target.xml file.

The most common source of this issue is that the virtual host the application is installed to does not include the host name and port for the ODR. Similarly, if the ODR is being fronted by a Web server, then the host and port for the Web server should be listed in the virtual host for the application and the ODR should include the Web server in its trusted proxy list.

If the above tests indicate that the URI should have been matched to an installed application, gather the following files from the ODR log directory. You do not need to enable trace :

btrace.*
objects
Output from dumpODRState.jacl

Analyzing 503 errors

A 503 error code means that the ODR mapped the request URI to an application (module), but was unable to locate a server running that module. This error could be because the routing rules caused the request mapped to an unanticipated edition of the application that is not currently running. A second possibility is that incorrect data exists in the routing information for the ODR, with regards to which servers are running and which modules are running on those servers. If the previous tests indicate that a server should be available for the request URI, gather the following files from the ODR log directory. You do not need to enable trace :

btrace.*
objects
Output from dumpODRState.jacl

Analyzing request timeout or latency

Delayed request handling, particularly if the ODR is not experiencing a high level of processor utilization, typically indicates that the autonomic request flow manager (ARFM) is queuing requests in an attempt to prevent overloading backend application servers, or to allow higher priority traffic to be serviced. You can validate this issue by clicking Runtime Operations > Reports on the administrative console panel and graphing the queue length of the service policy that is associated with the requests. If queuing is occuring, and investigation of the back end application servers indicates this queuing is in error (the servers are not at capacity), a quick solution is to disable ARFM queuing (see the previous ARFM section).

To address the root cause of the queuing, you should evaluate the defined service policies to ensure that they are categorizing traffic so service time or CPU consumption for requests in the same service policy does not greatly vary. If the service policies seem to be properly configured, collect the ARFM mustgather. See Must gather: for more information.

You can also use the arfmMustGather.py script in the <WAS_HOME>/bin directory to enable the ARFM trace settings as appropriate for the ODR, nodeagent, application server, and so on.

If no queuing is occurring, then the following trace should be enabled and the requests sent through the system in order to capture logging for a slow or timed out request:

{{=info: com.ibm.ws.classify.=all: com.ibm.ws.dwlm.client.=all: com.ibm.ws.odc.=all: com.ibm.ws.odr.route.Server=all: com.ibm.ws.proxy.=all: com.ibm.ws.xd.dwlm.client.=all: com.ibm.ws.xd.filter.HttpSessionAffinitiesResponseFilter=all: Web-SphereProxy=all: com.ibm.ws.wsgroup.=all: com.ibm.ws.odr.=all:HTTPChannel=all: GenericBNF =all}}

The trace.log should be collected from the ODR.

Request loses affinity/session

A good starting point for investigating an issue in which the session data for a user is lost or affinity is not maintained to a specific server when it would be expected to be maintained is to monitor the HTTP headers between the browser and the ODR. Tools such as LiveHTTPHeaders for Firefox can show the headers being sent/received and this can be used to confirm the presence/absence of an expected session cookie and its value.

Assuming the cookie is sent as expected on the request, the next step is to ensure that the server to which the request has affinity, is available. This means the server must be running and not in maintenance mode with break affinity set. This can be checked and modified via the administrative console in the "All Servers" view.

If this does not turn up a configuration issue, then the next step is to collect trace from the ODR for support to analyze. The following trace should be enabled on the ODR and requests should be sent through the ODR to recreate the issue:

*=info: com.ibm.ws.classify.*=all: com.ibm.ws.dwlm.client.*=all: com.ibm.ws.odc.*=all: com.ibm.ws.odr.route.Server=all: com.ibm.ws.proxy.*=all: 
com.ibm.ws.xd.dwlm.client.*=all: com.ibm.ws.xd.filter.HttpSessionAffinitiesResponseFilter=all: Web-SphereProxy=all: com.ibm.ws.wsgroup.*=all: 
com.ibm.ws.odr.*=all:HTTPChannel=all: GenericBNF =all

Collect the trace.log and target.xml files from the ODR. (If you cannot locate the target.xml file, use the dumpODRState.jacl script to obtain the file).

Request routed to wrong application edition

The ODR makes routing decisions based on the URI of the request as well as other pieces of data referenced by configured routing rules. If a request seems to be going to the wrong application, or the wrong edition of an application, verify the following configuration items.

First, it is important to confirm that the desired edition of the application is the active and running edition. You can validate the edition in Edition Control Center. You also must confirm that the routing policies for all editions of application are defined to permit requests to the desired (active) application edition.

Another common source of undesirable routing behavior is when multiple applications are registered for the same URI. This configuration is valid if the applications are associated with distinct virtual hosts, but if they are not, then the ODR makes an arbitrary decision about the application to which the request is sent, assuming no additional routing rules are defined to make the determination.

If both the application edition and ODR routing rules seem to be defined correctly for the URI in question, the next step is to enable ODR tracing and gather logs showing the incorrect processing of a request. The trace that should be enabled prior to recreating the scenario is:

*=info: com.ibm.ws.classify.*=all: com.ibm.ws.dwlm.client.*=all: com.ibm.ws.odc.*=all: com.ibm.ws.odr.route.Server=all: 
com.ibm.ws.proxy.*=all: com.ibm.ws.xd.dwlm.client.*=all: com.ibm.ws.xd.filter.HttpSessionAffinitiesResponseFilter=all: 
Web-SphereProxy=all: com.ibm.ws.wsgroup.*=all: com.ibm.ws.odr.*=all:HTTPChannel=all: GenericBNF =all

The trace.log and target.xml files should be collected from the ODR. (If you cannot locate the target.xml file, use the dumpODRState.jacl script to obtain one).


    About IBM Privacy Contact