Let's look at Dispatch Timeout Handling in WebSphere Application Server for z/OS
Mike Stephen 120000KD98 Comments (2) Visits (8053)
If you run WebSphere Application Server on z/OS you are aware of the many 'timer' settings that can affect the workload that is running in the Server.
This Blog entry will focus on the topic of dispatch timeout handling, and the tradeoffs between settings that control the behavior of the environment when dispatch timeouts occur.
Let's first look at the Dispatch Process Overview in the WebSphere Application Server on z/OS.
1. Request Received by Control Region (CR)
The HTTP request is received by the CR. The CR works with WLM to classify the work and a WLM enclave is created for the request.
2. Request Placed on WLM Queue
The CR places the request on the WLM work queue in preparation for dispatch into the Servant Region. A dispatch timer is started for the request.
The dispatch timer is central to this discussion. The issue is what happens if the work does not complete within the timeout value for the dispatch.
3. Request Dispatched to Thread in Servant Region (SR)
When a thread in the servant region is available to take work, WLM will dispatch the request from the work queue to the worker thread.
If no threads are available, the request remains in the WLM queue. A hung thread is not eligible for work dispatch. That is why having hung threads build up in a servant region is a problem: eventually no eligible threads remain and WLM can no longer dispatch work to the servant.
4. Request Processing
The work begins execution. How long the work takes to complete is a function of the application design. Some requests are very short-lived; others take longer because they perform more complex processing.
The goal is to have all work complete within the defined dispatch timer value.
However, some work fails to complete within the dispatch timer value. There are many different reasons why a request may not complete in time.
Dispatch Timeout Processing Overview
** At a high-level, processing is:
Understanding the Nature of Hung Threads
As noted earlier, some threads may be marked as 'hung' but eventually complete. Others are marked 'hung' and never complete:
The distinction is important:
When considering thread timeout behavior and the settings that are appropriate, there are three aspects of the runtime thread environment to keep in mind:
With respect to #1 - neither a delayed thread or a hung thread is desirable. But of the two, the delayed thread is somewhat easier to manage to, depending on the duration of the delay and the frequency of occurrence (#2).
With respect to #2 - a timer expiration event that occurs rarely implies a different response than timer expiration events that occur frequently or, worse still, for every request. The former may be due to a rare combination of factors; the latter suggests a more systemic, structural problem.
With respect to #3 - depending on the frequency of timeout events and business impact of those timeouts, an investigation of the underlying cause will be called for. Timeouts may occur for a variety of reasons: insufficient system resources; network delays; DB2 tuning issues; or perhaps poor application design.
Dispatch Timeout vs. Transaction Timeout
There is a difference between the timer maintained for dispatched requests and the timer maintained for transactions created by the application.
The important point to keep in mind is that transaction has a specific meaning within the context of a timeout value discussion. Request and transaction are two separate things, and they have separate timers associated with them.
A big THANK YOU goes out to Don Bagwell, for working with the entire WebSphere Application Server z/OS Team, Level 2 and Development, in getting this information into consumable documentation for our customers and Support teams.
There are several White Papers that go into more detail on WebSphere Application Server for z/OS Timeout settings: