A J2EE server running on z/OS is implemented by multiple address spaces, or processes. There is one control region, at least one servant region, and one (or none) control region adjunct, as shown in Figure 1.
The control region is used to accept the client requests (HTTP, IIOP), and classify the requests into different service classes according to predefined rules. For instance, HTTP requests are classified based on different URLs, which give you the ability to set goal-oriented service levels to different application components deployed on the same application server. After the workload classification, the control region will insert the requests into the queues managed by IBM Workload Manager (WLM) for z/OS. A servant region will pick up the requests from these queues and process them in the servant region address space.
The control region is controlled by the user, which means you can start or stop the control region with commands. Servant regions are not directly controlled by the user. Instead, WLM will dynamically start and maintain the servant regions as required to meet the workload’s performance goal. IBM WebSphere Application Server leverages WLM’s queuing manager service to maintain the queues for passing requests from the control region to the servant regions. This will be discussed more later. (For the purpose of this article, the control region adjunct is not significant will not be discussed further.)
Figure 1. WebSphere Application Server for z/OS architecture
Before we look at details of this configuration, here are a few reasons for the split address space architecture of WebSphere Application Server for z/OS:
Business-oriented workload management:
- Architecture provides the capability to differentiate a higher priority query (for example, 95% finished within 0.05 seconds) from a lower priority data analysis operation (for example, 80% finished within 5 seconds).
- WLM is capable of dynamically managing the number of servant regions in order to achieve the overall system throughput goal.
Separation of transaction and application execution:
- “Trusted code” runs in the control region, user application code runs in the servant region, which completely isolates application execution and transaction management.
- Transaction recovery is handled by the control region.
- Additional exception handling mechanisms are available in WebSphere Application Server. For example, transaction timeout handling: when the control region detects transaction timeout, it can be set up to restart servant regions to clean up the system resources, preventing them from being locked by an application hang.
The remainder of this article looks at the integration between WebSphere Application Server for z/OS and Workload Manager for z/OS in detail.
Enclaves and WebSphere Application Server transaction classification
The concept of an enclave was first introduced in MVS™ 5.2.0 and is mostly applicable to "new workloads" from WebSphere Application Server, IBM DB2® DDF, IBM HTTP Server, and so on. An enclave provides the ability to manage the performance of a business transaction across multiple address spaces and systems inside a Sysplex. In other words, the subsystem (for example, WebSphere Application Server) will create an enclave (a kind of token) to represent each work request, after which the underlying z/OS dispatchable units (trusted computing base (TCB) or servant region) for the transaction execution inside different address spaces can be encapsulated within the enclave boundary.
These are some important features of enclaves:
- The whole transaction execution can be scheduled to meet the enclave’s performance goal, independent of the performance goal of the address space where the dispatchable unit runs.
- The resource consumption (CPU, I/O) for this business transaction can also be charged to the enclave itself.
- With enclaves, it’s possible to assign a performance goal to the whole business transaction spanning from WebSphere Application Server to other subsystems, and have an integrated view of the transaction’s resource consumption across address space boundaries.
There are two types of enclaves: A dependent enclave is actually a logical extension of an existing address space. It just inherits service class from its originator address space. By contrast, an independent enclave is classified separately. Figure 2 illustrates the concept of dependent and independent enclaves. WebSphere Application Server for z/OS uses independent enclave, so this will be our focus.
Figure 2. Dependent and independent enclaves
Using an independent enclave
If no service class is specified for a J2EE application (not even a default service class set for the WebSphere Application Server type workload), then the enclave will be classified in the SYSOTHER service class, which is a system-defined service class with WLM’s discretionary service goal that tells the system to, basically, “do the best that you can.” The result is that the workload can only get serviced when there is no competition for resources. It is not advisable, therefore, to have a WebSphere Application Server workload classified as SYSOTHER because, in a constrained environment, it won’t be able to compete for resources with other “higher priority” workloads. Even worse, if the work request into WebSphere Application Server does not finish within a certain period of time, it is likely to hit various timeout parameters, such as HTTP timeout, transaction timeout, and so on. When this happens, the control region will terminate the servant region by default, to clean up the environment and prevent important system resources from being locked. The servant region will be restarted by WLM to continue the work. If CPU resources are very tight during that period, then the servant region will have a hard time getting started. While all this is happening, there is the potential in an active system for requests to queue up and another timeout parameter to be exceeded, causing another servant region restart, again impacting application availability.
You can classify incoming HTTP, IIOP, and message-driven bean work requests with different service classes using an XML workload classification document, and assign XML file transaction classes (TCLASS) to the incoming requests. The TCLASS value will be passed to the WLM, which will ultimately associate the TCLASS to a service class and to a report class, if specified.
See the WebSphere Application Server V6.1 Information Center for guidance on using the XML workload classification document and classifying in the WLM.
The recommended practice of not having too many service classes in a system still holds true, even for WebSphere Application Server workloads. From a z/OS perspective, because WLM keeps sampling and adjusting system resources to help all service classes meet their performance goals, too many active service classes will slow down the adjustment process (WLM makes one adjustment during every 10-second interval). Too many service classes can create the potential for some “hungry” workloads to not get attention they need in time. Typically, the number of active service class periods in a system should not exceed 30. Traditionally, z/OS users apply the philosophy of classifying workloads into fewer service classes and combining similar workloads as appropriate.
Although you have the capability to classify different WebSphere Application Server applications or components into different performance goals, you shouldn’t overuse this mechanism by defining very fine-grained classifications.
Be aware that not all Web applications are able to use URI-based classification. Examples of such applications are those that use frameworks such as Struts, Ajax, and so on, where the URLs requested from the client would be identical for different kinds of requests. These types of applications prevent you from classifying different business transactions based on URI.
WebSphere Application Server is designed to process a large number of parallel short-time tasks, and work queues and thread pools are two components that are used often to handle parallel workload.
Across all platforms, WebSphere Application Server implements a thread pool to process requests from clients. WebSphere Application Server for z/OS combines a work queue mechanism with the thread pool to serve the client requests. The control region of WebSphere Application Server for z/OS inserts requests into the work queues associated with servant regions, which then select requests from the bound work queues and process in the thread pools inside.
The work queue mechanism is inherited from the WLM queuing manager service. In addition, WebSphere Application Server for z/OS also leverages the WLM application environment to dynamically manage the servant region address spaces in order to meet the overall performance goals of the applications.
In the terms of WLM queuing manager service, as shown in Figure 3, the control region is the queuing manager and the servant region is the server address space managed by WLM.
Figure 3. Workload Manager queuing manager service
In Figure 3, ApplEnv (application environment) represents a way to group similar server programs together and dynamically start or stop server address spaces (WebSphere Application Server servant regions) as needed to handle workload changes and guarantee that the performance goal is met. WebSphere Application Server for z/OS takes advantage of the dynamic application environment, which you do not need to manually define in WLM because WebSphere Application Server handles this task automatically.
For a J2EE server, the name of the dynamic application environment is specified via the administrative console as the Cluster transition name; navigate from the admin console to Application server => Custom Properties.
By default, a J2EE server is configured for a single servant region. Multiple servant region instances can be enabled and the minimum and maximum number can be set via the admin console, as shown in Figure 4.
Figure 4. Application server instances
If multiple servant regions are enabled, be sure to specify the maximum number of instances for the servant region to prevent service degradation due to an excessive number of servant regions starting. Meanwhile, by specifying a minimum number of instances, you can have a number of servant regions active during the entire lifecycle of the WebSphere Application Server.
The WLM queuing manager service depends on the application environment. As shown in Figure 3, the control region inserts the enclaves representing the work requests to the WLM work queues associated with service classes within the application environment. Within each application environment, there will be only one work queue mapping to one service class; the work queue will just follow the service class’s name. There can be more work queues serving a single service class, but they are separated into different application environments (servers).
Work queue and servant region relationship
As mentioned above, within a WebSphere Application Server application environment, the number of work queues equals the number of service classes classified for the applications running on the WebSphere Application Server.
Next, let’s look at the relationship between work queues and servant regions, and their impact on the overall performance in the context of these two scenarios (where M = number of servant regions):
- Single servant region: M = 1
- Multiple servant regions: M >1
You can use the WLM work queue viewing tool, WLMQUE, to monitor the active server address spaces (WebSphere Application Server servant region), the service classes work queues, the number of current requests sitting in the work queue, the total number of requests taken from the queues by the servant region, the number of requests directly sent to the servant region because of affinity, and so on. (Screen captures from the WLMQUE tool are included below for illustrative purposes.)
Single servant region: M=1
By default, a J2EE server on z/OS has one servant region. In this scenario, the servant region does not bind to any specific service class, which means all work requests will be selected by the single servant region and get serviced there. However, if a performance bottleneck occurs that could be alleviated by using multiple servant regions, the WLM will not be able to start additional servant regions because the multiple servant region feature is not enabled. Thus, such an application server environment with a single servant region is called an “unmanaged” application environment.
In this scenario, all requests with different service classes will all be executed in the servant region concurrently. Within the servant region, the worker thread (TCB) will be running with the service class of the enclave representing the requests.
Figure 5. WLMQUE panel showing single servant region
Figure 5, from the WLMQUE tool, shows there is one ApplEnv named Y6C001 which belongs to a WebSphere Application Server server Y6SRVA1. Note these key columns in the ApplEnv heading row:
- Dyn indicates whether the application environment is dynamic or static.
- Qlen displays the number of current work requests in the queues ready to be dispatched to a servant region. If you give enough stress loads to the system, you will observe a number of current waiting requests in the queues.
- WorkQue with a value of ******** indicates that this is an unmanaged application environment.
- SvAS is the ID of the WebSphere Application Server servant region’s address space, which is "0097" in the figure.
- WUQue is the total number of requests taken from the work queue.
- AffQue is the number of work requests sent directly to the servant region because of client affinities (discussed later).
Multiple servant regions: M > 1
When multiple servant regions are enabled in WebSphere Application Server, the behavior is different from the single servant region scenario. Basically, at any given time, any servant region will only bind to a single specific service class work queue and select requests from there.
To illustrate, multiple servant region instances have been enabled for a server with the following setup:
- Servant region instances is set to minimum of 2 and maximum of 4.
- There is one application deployed on the server and the incoming workload is classified into three service classes: CBDEF, CBHI, and CBLO.
- The default service class for the WebSphere Application Server application is CBDEF.
Figure 6. WLMQUE showing multiple servant regions
Figure 6 shows the WLMQUE tool after the server has started. The minimum number of two servant regions have been started with ASID of 00A4 and 00AA. Notice any differences between the single servant region and multiple servant regions. For example, the WorkQue column now has a name of "CBDEF," which is the default service class for WebSphere Application Server type workload in this testing environment.
When a new request comes in with a different service class of CBHI, WLM starts a new servant region to bind to the new work queue of CBHI, the request is passed thru the CBHI work queue, and then executed in the new servant region with ASID 00A9, as shown in Figure 7.
Figure 7. WLMQUE showing multiple servant regions with new request
If WLM can start a new servant region (because the maximum number of servant regions has not already been reached), then it will. WLM does not simply convert any servant region from CBDEF to CBHI; it will only convert a servant region’s bound service class work queue to another one when it has no other choices (for example, if the number of current servant regions reaches the servant region maximum limitation). Also, because WLM can help dynamically maintain the number of servant regions to meet workload changes, it might also start new servant regions during production to help the workloads achieve the performance goal.
If WLM can start a new servant region, the processing of requests that will be serviced by that servant region will be delayed until the servant region is started. If you cannot accommodate the delay or overhead of starting new servant regions dynamically -- especially if the deployed applications are complex -- then an alternative solution is to set the minimum number of servant regions equals to the maximum.
How many service classes?
In general, do not classify WebSphere Application Server applications into more service classes than the maximum number of servant regions. For example, if you set your maximum servant region number to 5, then do not classify your application into more than five service classes.
Let’s assume your WebSphere Application Server is maintaining the maximum number of servant regions, all bound to different service class work queues. A new request comes in and stays in the new service class work queue. All the servant regions are now bound to existing work queues, and no servant region can immediately convert to the new service class work queue. The duration of the conversion cannot be timed precisely; it might take seconds or minutes. The work requests can only stay in the queue, postponed for an uncertain length of time, and could potentially hit some timeouts in the meantime.
As a general practice, always set N <= M, where N = number of mapping service classes, including a service class for unclassified requests, and M = number of servant regions.
Every servant region has a requirement for real storage. If there is a limited amount of real storage on the system, then you can’t afford to have many servant regions. Make sure that there is enough storage to support the maximum number of servant regions that are needed for the expected workload throughput. Considering the fact that number of service classes for your application should be less than the number of servant regions, you might not be able to afford classifying applications into many service classes.
In an ideal world, highly scalable systems would have stateless applications. In the real world, however, users need to store some information on the server side (for example, shopping cart information in a typical retail application). J2EE applications typically store this kind of data in in-memory session objects for better performance and for a better user experience. When user state is stored in session objects, this generally implies that a given user’s requests are all directed to the same server instance. The practice of sending all HTTP requests for a given session to a specific J2EE server is referred to as session affinity. In WebSphere Application Server for z/OS, where a single J2EE server is backed by multiple servant regions in which the in-memory session object is stored, there is obviously a need to ensure that the request is serviced by the appropriate servant region.
Session affinity is implemented in WebSphere Application Server by leveraging WLM’s extended capability of temporal affinity. By specifying a region token of the server region in the IWM4QIN interface, the queuing manager is able to directly send the requests to the servant regions, rather than inserting the requests into the service class work queues.
In the WLMQUE tool (Figure 8), the requests with client affinity that are directly sent to the server region are reported in the AffQue column. The column WUQue reports only the number of requests taken from the service class work queue that are bound to the server region.
Figure 8. WLMQUE showing session affinity
WLM is able to dynamically maintain a pool of server regions to meet the performance goal only when the majority of the requests are queued in these service class work queues and selected by the server regions. When client affinity is to be honored, the work requests are directly routed to the server regions and out of WLM’s control.
Additionally, when there is affinity information in some servant regions, WLM is prevented from terminating the servant regions by using the IWMTAFF interface, which marks the servant regions as being needed by subsequent work requests. In the z/OS SDSF panel (Figure 9), the servant region with temporal affinity is indicated by TEMP-AFF.
Figure 9. z/OS SDSF panel
When the queuing manager directly sends the work requests with affinity to the servant regions, it will not go through the service class work queues specified for the requests. For example, it’s possible that an HTTP request classified as CBHI is directed to a servant region bound to the CBDEF work queue, because the request comes with a cookie that had session affinity established with a servant region previously bound to the CBDEF work queue.
If you’re an application designer, be aware that session affinity overrides service class mapping. If you want to classify applications into different service classes and separate the requests into different server regions for processing, then you need to remember that session affinity might lead subsequent requests on different service classes to the same server region where session objects were created before. This could prevent you from exploiting “service class mapping” within an application.
Even distribution of HTTP requests
When multiple servant regions are bound to the same service class work queue, WebSphere Application Server distributes the new incoming requests without session affinity to a “hot” servant, by default. The hot servant strategy provides better performance because just-in-time compiled code is ready to run for the most used application methods, the necessary pages are loaded into memory which reduces I/O overhead, the data is already cached, and so on.
The hot servant strategy has the potential to lead to an unequal distribution of HTTP sessions among the servant regions. This depends on the rate at which requests that do not have session affinity arrive into the system and the number of free TCBs in the hot servant. If a hot servant has a free TCB, the default behavior is to send a request with no session affinity to a servant region not yet established to the request. Because requests cannot be redirected to an idle servant region after affinity is established, the hot servant has the potential to result in performance degradation due to excessive queuing, shortage of real storage, or garbage collection.
Avoiding the hot servant drawback involves ensuring even distribution of HTTP requests that do not yet have their affinity pinned to a servant region, which should result in a balanced distribution of session affinity among the servant regions. In order to ensure even distribution of HTTP requests in WebSphere Application Server by the WLM, the WLMStatefulSession parameter for the server should be set to true (Figure 10).
Figure 10. Ensure even distribution of HTTP requests
Additionally, you might also need to make some changes to the classification mapping file and WLM, as described in the WebSphere Application Server Information Center.
You can use the WLMQUE tool to monitor the even distribution behavior. In the panel shown in Figure 11, there are two servant regions bound to the same service class work queue, CBDEF. After a certain amount of priming, you can see that the numbers of session affinity (indicated in the Aff column) are pretty even across the two available servant regions.
Figure 11. WLMQUE showing session affinity
Be aware that WLM even distribution DOES NOT really balance the HTTP requests without session affinity equally across the servant regions in a simple round-robin way. What it actually guarantees is the equal distribution of session affinities among servant regions.
Thread pool management
As mentioned earlier, WebSphere Application Server for z/OS has a thread pool inside each servant region to serve the client requests. This thread pool management is a bit different from distributed versions of WebSphere Application Server.
Placeholder for requests
When a request comes in, the WebSphere Application Server control region receives the request from the network and creates an enclave to represent the request. Before the request is processed by the servant region, it’s put inside the WLM work queue. Unlike distributed WebSphere Application Server, WebSphere Application Server for z/OS does not need to use the threads to hold the requests, using the WLM queue as the placeholder instead.
For a distributed WebSphere Application Server system handling relatively high stress workload, you might need a larger number of threads to hold the client requests; the system overhead will be higher because of intensive context switch. At this point, WebSphere Application Server for z/OS gives you much easier management for the thread number. You only need to set a proper number of worker threads for workload execution, without having to consider the number of client requests.
In WebSphere Application Server, request processing is performed on a worker thread. Large system benchmarking has shown that for an application that is well designed and implemented, having a number of worker threads that is two to three times the number of the CPUs can saturate the processor resource of the LPAR. Depending on the nature of the application workload, you can indirectly control the number of threads in the servant region using the administrative console (Figure 12).
Figure 12. Controlling worker threads in the administrative console
In the WebSphere Application Server admin console, you can set these workload profiles:
- ISOLATE: 1 thread
- CPUBOUND: (number of CPUs - 1), minimum value = 3
- IOBOUND: (number of CPUs * 3), minimum = 5, maximum = 30; this is the default profile for a server.
- LONGWAIT: 40
Following are some typical usage patterns for the various workload profile settings. The most appropriate values for any given environment might be different than those shown here and can be inferred only by testing and analysis of performance data.
Consider, for example, the default profile IOBOUND. The number of worker threads is three times your number of CPUs, but it’s no less than 5 and no larger than 30. For a typical WebSphere Application Server application that uses a database, IOBOUND is enough for the application to drive the system to maximum utilization.
If an application is CPU intensive (for example, an application that involves lots of XML processing), then you do not need many threads to saturate the processor. A single worker thread might be able to push one logic CPU to 100% utilization in the LPAR, so the total thread number is set to number of the CPUs minus 1.
For some applications that will have the worker thread suspended for a long wait remote call to outside systems, such as ERP, CRM, or some legacy applications, you will definitely need more threads to improve system utilization and overall throughput by concurrently serving more requests. For such applications, LONGWAIT is an appropriate setting which, by default, results in 40 threads.
How do you choose the right workload profile and set the proper number of worker threads to achieve the best throughput? The answer depends on some considerations.
Create enough worker threads for best throughput
One way to confirm whether better throughput can be achieved by increasing the number of threads is to check the RMF reports for a load test run. If the system is not fully utilized and the QMPL time is high (when compared to the execution time), then there is potential to achieve more throughput by increasing the number of threads.
In the context of WebSphere Application Server for z/OS, the amount of time requests spend in the work queue waiting to be processed by servant regions will be reported as a kind of execution delay, as shown by the QMPL column in the RMF Workload Activity report (Figure 13).
Figure 13. RMF Workload Activity report segment
If the QMPL delay is very high, then you have a shortage of servant region worker threads and cannot process the requests waiting in work queues fast enough. But if the CPU resource is saturated because of very intensive workload, then adding more worker threads would not improve the overall system throughput. From the RMF report, you can see that the QMPL decreases but the CPU delay will increase.
If you have determined that increasing the number of threads will be helpful for throughput, here are some approaches you can take:
- Adjust number of worker threads for each servant region by changing the workload profile. This approach is preferred when garbage collection for the servant region JVM is reasonable and acceptable.
- Add more servant regions by changing the minimum and maximum number of servant regions. This approach is useful when the storage requirements for single request processing are high, resulting in excessive garbage collection. This can also be achieved through WLM’s capability to automatically maintain the number of servant regions.
Other resources to consider
Only adjust the number of threads if you know that there are enough spare system resources. Considering any related subsystems (like DB2, CICS®, and so on), a large number of total threads might cause a resource shortage in those subsystems. For example:
- A large number of servant regions with a large number of worker threads might require a number of connections that exceeds the total number of connections a DB2 system can support.
- A servant address space can only have 100 EXCI pipes in total (APAR PQ92943 for CICS Transaction Server V2 raised the 100 pipe limit to 250 using the new SYS1.PARMLIB LOGONLIM parameter). If your applications talk to multiple CICS regions through different connection pools, and there are a large number of threads inside each servant region, then your calls to CICS resources might randomly fail because of EXCI pipe shortage.
This article demonstrated how IBM WebSphere Application Server for z/OS V6.x works with IBM Workload Manager services to deliver technical value in different areas. Examples from actual installations were shared to illustrate how your J2EE applications can also leverage these features and avoid potential risks caused by some usage patterns.
I would like to thank Ms. Renuka Chekkala for reviewing this article and for her support and mentoring through several customer cases. Also, thanks to Mike Cox, David Follis, Robert Vaupel, and Frank Chu for technically reviewing this article and for providing excellent comments and suggestions.
- MVS Programming: Workload Management Services
- Redbook: System Programmer's Guide to: Workload Manager
- Controller and Servant WLM classifications
- Workload management (WLM) tuning tips for z/OS
- Handling workload management and server failures
- Multiple servant regions
- Configuring an application server to use the WLM even distribution of HTTP requests function
- WebSphere Application Server for z/OS product information
- IBM Workload Manager for z/OS
- IBM developerWorks WebSphere on System z zone