IBM®
Skip to main content
    Country/region [select]      Terms of use
 
 
    
     Home      Products      Services & industry solutions      Support & downloads      My IBM     
developerworks > My developerWorks >  Dashboard > WebSphere Virtual Enterprise > Home > Common WebSphere Virtual Enterprise features in a distributed environment
developerWorks
Log In   View a printable version of the current page.
Common WebSphere Virtual Enterprise features in a distributed environment
Added by courtneymauney, last edited by courtneymauney on Sep 22, 2009  (view change)
Labels: 
(None)

Scenario Overview

The objective of this scenario is to establish an environment in which an IBM® WebSphere® Application Server Network Deployment Version 6.1 cell is constructed with multiple clusters and multiple applications. The basic premise of the scenario is to use WebSphere Virtual Enterprise Version 6.1 features to manage and protect service level agreements for the applications installed into these clusters in a distributed environment.

Audience

This document provides a general overview of the scenario and environment details. The document is intended for technical sales, marketing, or the interested customer who wants to learn more about some of the common features found in WebSphere Virtual Enterprise.

Environment Details

The configuration was established to identify a series of benchmarks and points of failure within a typical environment. The environment consisted of the following installations:

  • WebSphere Application Server Network Deployment Server Version 6.1.0.25
  • WebSphere Virtual Enterprise Version 6.1.1 (Formerly WebSphere Extended Deployment Operations Optimization)
  • Multiple custom test applications

The environment was configured with multiple applications, multiple dynamic clusters, two on demand routers (ODRs), and global security enabled for selected tests.

Operational Characteristics

Implicit components used:

  • Application placement controller (APC)
  • Autonomic request flow manager (ARFM)
  • Dynamic workload manager (DWLM)

Console operation and management:

  • CPU utilization graphs
  • Queuing graphs
  • Application service time graphs
  • Application average response time graphs
  • Heap usage graphs
  • Server weight graphs
  • Task management
  • Deployment manager repository management
  • Application edition management
  • Server and node management
  • Service policy management
  • Health policy management

Configuration components:

  • Global security
  • Ports
    • Non-default ports used for WebSphere Application Server and federation
      • Starting port = 19060
      • Federation starting port = 20000
    • Standard ports for Virtual Enterprise and the on demand routers
  • Core groups
    • Different applications in each dynamic cluster containing more than 50 server processes
    • A mesh topology configuration

      Note: A mesh topology consists of core group bridges that are all connected with each other as opposed to a linked topology in which a core group bridge connects only to a subset of the other bridges. Running the coregroupsplit.py script creates a mesh topology by default.

  • High availability deployment manager
    • Two physically different systems with a shared configuration repository
    • Failover
    • Repository checkpoint
  • Application-level service policies (typical)
    • A different application in each dynamic cluster
    • Configured in a high complexity configuration with seven applications
    • Configured in a moderate complexity configuration with three applications
    • Configured at the application level in which specific applications were configured and profiled for specific average response times
  • Application edition manager
    • Editions rolled out using different options
    • Editions rolled out while running work load requests over an extended period of time
  • Multi-cluster routing policies (MCRP)
    • Configured with the most commonly used routing rules for different applications
    • Verified service policies for different applications containing different MCRP rules
  • Conditional operation for an extended period of time
    • Continuous high load
    • Managed operation with peak and off-peak periods of operation
    • Extended continuous operation with high load peak

Implementation Diagram

The following diagram shows the maximum number of dynamic clusters configured within the cell and the relationship of the application servers in regard to the physical nodes.

Application Placement Diagram

The following diagram shows the maximum number of applications that were configured within the cell and the relationship of the applications to the dynamic clusters and the physical nodes.

Hardware Specifications

Machine OS
  • *# of CPUs*
CPU Speed CPU Type RAM (GB) Function
IBM, 8844-PBN Red Hat Enterprise Linux 5 4 2.3 GHZ PPC64 4 Node
IBM, 8844-PBN Red Hat Enterprise Linux 5 4 2.3 GHZ PPC64 4 Node
IBM, 8844-PBN Red Hat Enterprise Linux 5 4 2.3 GHZ PPC64 4 Node
IBM, 8844-PBN Red Hat Enterprise Linux 5 4 2.3 GHZ PPC64 4 Node
IBM, 8844-PBN Red Hat Enterprise Linux 5 4 2.3 GHZ PPC64 4 Node
IBM, 8844-PBN Red Hat Enterprise Linux 5 4 2.3 GHZ PPC64 4 Node
Intel® Xeon® SUSE Linux Enterprise Server 10 4 3800 EMT64 4 Node
Intel Xeon SUSE Linux Enterprise Server 10 4 3800 EMT64 4 Node
IBM, 8844-PBN Red Hat Enterprise Linux 5 4 2.3 GHZ PPC64 4 ODR
IBM, 8844-PBN Red Hat Enterprise Linux 5 4 2.3 GHZ PPC64 4 ODR
IBM, 8844-PBN Red Hat Enterprise Linux 5 4 2.3 GHZ PPC64 4 Deployment manager (primary)
IBM, 8844-PBN Red Hat Enterprise Linux 5 4 2.3 GHZ PPC64 4 Deployment manager (secondary)

Verification of the Environment

24 Hour Baseline Testing Under Continuous Load

Validation criteria
  • High continuous load
  • Service policies
  • Application edition rollout
  • MCRP rules for one or more applications
  • High availability deployment manager failover under high load
Operational profile
  • Dynamic clusters and applications
    Dynamic cluster name Installed applications Off-peak periods of operation Peak periods of operations (1400 peak)
    DC1 veApp1, veApp2 20 +180
    DC2 veApp2, veApp3 20 +180
    DC3 veApp3, veApp4 20 +180
    DC4 veApp4, veApp5 20 +180
    DC5 veApp5, veApp6 20 +180
    DC6 veApp6, veApp7 20 +180
    DC7 veApp7, veApp1 20 +180
  • Multi-cluster routing policy by ODR
    • ODR 1:
      • MCRP@3tVe1$veApp2
        wrr@3tVe1$DC1
      • MCRP@3tVe1$veApp2-edition02
        wrr@3tVe1$DC1
      • MCRP@3tVe1$veApp2-edition03
        wrr@3tVe1$DC1
      • MCRP@3tVe1$veApp3
        wrr@3tVe1$DC2
      • MCRP@3tVe1$veApp3-edition02
        wrr@3tVe1$DC2
      • MCRP@3tVe1$veApp3-edition03
        wrr@3tVe1$DC2
      • MCRP@3tVe1$veApp4
        wlor@3tVe1$DC3,3tVe1$DC4
      • MCRP@3tVe1$veApp5
        wlor@3tVe1$DC4,3tVe1$DC5
      • MCRP@3tVe1$veApp6
        wrr@3tVe1$DC5,3tVe1$DC6
    • ODR 2:
      • MCRP@3tVe1$veApp2
        wrr@3tVe1$DC2
      • MCRP@3tVe1$veApp2-edition02
        wrr@3tVe1$DC2
      • MCRP@3tVe1$veApp2-edition03
        wrr@3tVe1$DC2
      • MCRP@3tVe1$veApp3
        wrr@3tVe1$DC3
      • MCRP@3tVe1$veApp3-edition02
        wrr@3tVe1$DC3
      • MCRP@3tVe1$veApp3-edition03
        wrr@3tVe1$DC3
      • MCRP@3tVe1$veApp4
        wlor@3tVe1$DC3,3tVe1$DC4
      • MCRP@3tVe1$veApp5
        wlor@3tVe1$DC4,3tVe1$DC5
      • MCRP@3tVe1$veApp6
        wrr@3tVe1$DC5,3tVe1$DC6
  • Service policies by application
    Service Policy Priority Response time goal Application name
    SP1 Highest Average response 250 milliseconds veApp1 SP
    SP2 Very high Average response 500 milliseconds veApp2 SP
    SP3 High Average response 900 milliseconds veApp3 SP
    SP4 Medium Average response 1300 milliseconds veApp4 SP
    SP5 Low Average response 1800 milliseconds veApp5 SP
    SP6 Very low Average response 2500 milliseconds veApp6 SP
    SP7 Lowest Average response 3500 milliseconds veApp7 SP

Note: Applications will be serviced based on an optimizing response time for all applications. When a condition exists where the service of all applications equally impacts those that have been designated as higher priority applications, Virtual Enterprise balances the load to service the higher priority applications based on the service policy rules. In the test environment, the applications have tuning parameters set to create a desired load on the system to achieve a specific condition. From this, a baseline run is completed to establish the test profile to be used throughout the duration of the test. The profile is established in a peak operating situation for a worst case scenario. The average response time values are established during this baseline run for a given set of hardware and simulated conditions that are established based on the target work load for each request, the CPU load levels, and the rate at which sessions begin and end. The values used in a specific production environment will be dependent on the hardware, back end peripheral systems, and the concurrently installed applications themselves. The values might not match those identified in this test scenario.

Validation

Starting with an off-peak load of 140 virtual users (20 per application) with a ramp of 100 seconds, new editions of applications were rolled out prior to ramping up to a peak load. An atomic rollout and a group rollout were performed. Validation checkpoints in the JMeter load script for both the MCRP rules and the new editions of the applications that were rolled out.

After the new editions were rolled out successfully, peak load was ramped to 1400 concurrent virtual users (200 per application) with CPU levels averaging 99% for the nodes and 75% for the ODRs. Within 30 minutes of the test ramp, service policies were within the range with the average response time meeting or exceeding the configured service policy rules.

CPU values

Time Node1 Node2 Node3 Node4 Node5 Node6 Node7 Node8 ODR1 ODR2
19:41:36 100 98 100 100 100 100 100 100 74 75

Average response times

Time SP1 SP2 SP3 SP4 SP5 SP6 SP7
21:09:07 155.92 271.35 441.71 613.51 933.22 1751.86 3040.99

During the operation of the run, the primary deployment manager was stopped and failover to the secondary deployment manager occurred. Then, the primary deployment manager was restarted and the secondary deployment manager was stopped to failover back to the primary. The operation occurred as expected. In this configuration, the deployment managers were accessed through the ODRs using port 19080.

Application Edition Rollout

Validation criteria
  • Application edition rollout
  • Continuous load
Operational profile

At the close of the 24 hour high load test, the run was shifted back to an off-peak level. New editions of applications were then rolled out. See the preceding 24 hour run details for the operational profile.

Validation

After the off-peak load of 140 virtual users (20 per application) leveled off, new editions of applications were rolled out.

Note: When performing the rollout through an ODR to access a deployment manager in a high availability configuration, a rollout can be initiated with no messages being displayed until the end of the procedure, because the proxy server buffers the response payload. Be default, the proxy server buffers 32K of payload before the server flushes the response to the client. Data containing the rollout updates are not flushed until the end, because they do not exceed 32K. However, this can be avoided by accessing the deployment manager directly.

The following sequence of events occurred when the application edition rollouts were performed:

  1. Failover occurred to the secondary deployment manager, which runs with all servers active.
  2. Performed a rollout on application veApp2 Edition 03 and restarted the primary deployment manager.
    • Rollout type: group
    • Group size: 4
    • Reset strategy: soft
    • Response time: 120 seconds
  3. Performed a rollout on application veApp3 Edition 03.
    • Rollout type: group
    • Group size: 2
    • Reset strategy: soft
    • Response time: 120 seconds
  4. Failover occurred to the primary deployment manager by accessing the deployment manager directly.
  5. Performed a rollout on application veApp1 Edition 03.
    • Rollout type: atomic
    • Reset strategy: soft
    • Response time: 120 seconds
  6. Performed a rollout on application veApp1 Base Edition.
    • Rollout type: group
    • Group size: 4
    • Reset strategy: hard
    • Response time: 120 seconds
  7. Performed a rollout on application veApp2 Base Edition.
    • Rollout type: atomic
    • Reset strategy: hard
    • Response time: 120 seconds
  8. Validated application veApp1 Edition 02, and then canceled the validation.
    • Verified that the validation dynamic clusters DC1 and DC7 were removed.
  9. Validated application veApp1 Edition 02.
    1. Created and started validation clusters DC1 and DC7.
    2. Added the clientipv4 = '9.42.94.182' rule for the client system, and verified that routing to the correct validation server occurred.
    3. Modified the routing to point to 9.42.94.181, restarted the browser, and verified that routing to the correct application edition occurred.
    4. Updated the rule to point back to 9.42.94.182, and verified that routing to the validation server occurred.
    5. Canceled the validation.
    6. Removed the servers.
  10. Stopped dynamic cluster DC2. Managed the work load request to allow for maintenance of the applications.
  11. Deactivated application veApp2 Base Edition.
  12. Activated application veApp2 Edition 02.
  13. Restarted dynamic cluster DC2.
  14. Performed a rollout on application veApp2 Base Edition.
    • Rollout type: atomic
    • Reset strategy: hard
    • Response time: 120 seconds
  15. Reinstalled applications veApp2 Edition 02 and veApp2 Edition 03 using application veApp2 Base Edition as a clone.
  16. Performed the rollout with the cloned editions again.
  17. Placed application veApp1 Edition 02 into validation mode.
  18. Performed a rollout on application veApp2 Edition 02.
    • Rollout type: atomic
    • Reset strategy: hard
    • Response time: 120 seconds
  19. Performed a rollout on application veApp2 Edition 03.
    • Rollout type: atomic
    • Reset strategy: hard
    • Response time: 120 seconds
  20. Canceled the validation of application veApp1 Edition 02.
  21. Stopped work load requests for applications veApp2 and veApp3.
  22. Stopped application veApp2 as it was still running on dynamic cluster DC1.
  23. Placed dynamic cluster DC2 in manual mode, and stopped all servers in the cluster.
  24. Deactivated application veApp2 Edition 03.
  25. Activated application veApp2 Base Edition.
  26. Modified the routing rule to point to the base edition.
  27. Started application veApp2, which would start on dynamic cluster DC1.
  28. Placed dynamic cluster DC2 in automatic mode and started the cluster.
  29. Restarted work load requests for applications veApp2 and veApp3.
  30. Stopped work load requests for applications veApp2 and veApp3.
  31. Stopped dynamic cluster DC2.
  32. Deactivated applications veApp2 Base Edition and veApp3 Edition 03.
  33. Activated applications veApp2 Edition 02 and veApp3 Edition 02.
  34. Updated the routing policies for both applications.
  35. Placed dynamic cluster DC2 in automatic mode and restarted the cluster.
  36. Restarted the work load requests for both applications.

Application-level Service Policies

Validation criteria
  • Optimized Service policy
  • High continuous load
Operational profile
  • Dynamic clusters, applications, and virtual users
    Dynamic cluster Applications Virtual users
    DC1 veApp1 200
    DC2 veApp2 200
    DC3 veApp3 200
    DC4 veApp4 200
    DC5 veApp5 200
    DC6 veApp6 200
    DC7 veApp7 200
  • Service policies by application
    • Starting point:
      Service policy Priority Response time goal Application name
      SP1 Highest Average response 250 milliseconds veApp1 SP
      SP2 Very high Average response 500 milliseconds veApp2 SP
      SP3 High Average response 900 milliseconds veApp3 SP
      SP4 Medium Average response 1300 milliseconds veApp4 SP
      SP5 Low Average response 1800 milliseconds veApp5 SP
      SP6 Very low Average response 2500 milliseconds veApp6 SP
      SP7 Lowest Average response 3500 milliseconds veApp7 SP
    • Final adjustment:
      Service policy Priority Response time goal Application name
      SP1 Highest Average response 250 milliseconds veApp1 SP
      SP2 Very high Average response 500 milliseconds veApp2 SP
      SP3 High Average response 900 milliseconds veApp3 SP
      SP4 Medium Average response 1300 milliseconds veApp4 SP
      SP5 Low Average response 1700 milliseconds veApp5 SP
      SP6 Very low Average response 2100 milliseconds veApp6 SP
      SP7 Lowest Average response 2500 milliseconds veApp7 SP
Validation

After ramping to a peak load of 1400 virtual users, adjustments were made to optimize target service policy levels. Making such adjustments is generally referred to as profiling service policies. After the service policies were optimized, the work load continued for a peak operating period of eight hours.

Note: Applications will be serviced based on an optimizing response time for all applications. When a condition exists where the service of all applications equally impacts those that have been designated as higher priority applications, Virtual Enterprise balances the load to service the higher priority applications based on the service policy rules. In the test environment, the applications have tuning parameters set to create a desired load on the system to achieve a specific condition. From this, a baseline run is completed to establish the test profile to be used throughout the duration of the test. The profile is established in a peak operating situation for a worst case scenario. The average response time values are established during this baseline run for a given set of hardware and simulated conditions that are established based on the target work load for each request, the CPU load levels, and the rate at which sessions begin and end. The values used in a specific production environment will be dependent on the hardware, back end peripheral systems, and the concurrently installed applications themselves. The values might not match those identified in this test scenario.

CPU utilization

Time Node1 Node2 Node3 Node4 Node5 Node6 Node7 Node8 ODR1 ODR2
15:41:42 97 98 100 99 98 98 100 99 71 68

Average response times

Time SP1 SP2 SP3 SP4 SP5 SP6 SP7
15:46:52 141.6 329 567.11 664.65 869.39 1168.77 2266.41

Health Policy Operation Under Continuous Load

Validation criteria
  • Create conditions to activate health policies
Operational profile
  • Dynamic clusters, applications, and virtual users
    Dynamic cluster Application Virtual users
    DC1 veApp1 800 to 3000
  • Excessive response timeout 5000 milliseconds: supervised
  • Excessive request timeout 5%: supervised
  • Memory leak standard: automatic
  • Excessive memory usage 60%, five minutes: supervised
  • Storm drainage: supervised
Validation

Using an application that has a configurable work load and delay, the load is ramped to 800 virtual users. As the load is ramped, the duration of the operations within the test applications begin to take a longer amount of time to complete.

The following task is generated:

The average response time limit specified by policy Default_Excessive_Response_Time was exceeded by server Dc1_xdblade13b01 on node xdblade13b01.
The limit is 5000 ms and the current value is 6168 ms.

Originated Time
8/26/09 11:59:57

Task ID
1633492109 (Planned-Approval)

Submitter
HealthController:xdblade13b01:Dc1_xdblade13b01 (xdblade13b06:nodeagent)

Severity
Severe

State
New

A new server restarts and requests begin to route to the new server after the operation completes when this task is accepted. After all requests are completed, the server is restarted and no action is taken on the task at this time. This sequence continues for a short period of time as the tasks are initiated for the servers and accepted for both health policies.

Next, a new task is generated and accepted for the excessive request timeout as shown in the following example:

The request timeout limit specified by policy Default_Excessive_Request_Timeout was exceeded by server Dc1_xdblade13b01 on node xdblade13b01.
The limit is 5.00 % and the current timeout fraction is 27.00 %.

Originated Time
8/26/09 12:20:02

Task ID
1303977417 (Planned-Approval)

Submitter
HealthController:xdblade13b01:Dc1_xdblade13b01 (xdblade13b06:nodeagent)

Severity
Severe

State
New
The request timeout limit specified by policy Default_Excessive_Request_Timeout was exceeded by server Dc1_xdblade13b01 on node xdblade13b01.
The limit is 5.00 % and the current timeout fraction is 27.00 %.

Originated Time
8/26/09 12:20:02

Task ID
1303977417 (Planned-Approval)

Submitter
HealthController:xdblade13b01:Dc1_xdblade13b01 (xdblade13b06:nodeagent)

Severity
Severe

State
In progress

At the close of the test, a new test for memory usage is ramped up. The same application is used but with different test parameters set to use and leak memory at higher rates. Using the same load rate, the health policy for the excessive memory usage will initiate at 60% usage before a memory leak condition is identified. In this test, a memory leak will automatically restart a server while the excessive memory usage will only notify the administrator for action, because the health policy for the memory leak is set to automatic mode and the health policy for the excessive memory usage is set to supervised mode. As memory is consumed, the tasks are initiated.

Server Dc1_xdblade13b06 (/3tVe1/xdblade13b06): The health policy this server is a member of, Default_Excessive_Memory_Usage,
has breached its configured condition, and has a severity of level 'critical'.
See the runtime tasks panels for more detail.
Server Dc1_xdblade13b01 (/3tVe1/xdblade13b01): The health policy this server is a member of, Default_Excessive_Memory_Usage,
has breached its configured condition, and has a severity of level 'critical'.
See the runtime tasks panels for more detail.
Server Dc1_xdblade13b01 (/3tVe1/xdblade13b01): The health policy this server is a member of, Default_Memory_Leak, has breached its configured condition, and has a severity of level 'warning'.
See the runtime tasks panels for more detail.
A memory leak is suspected by policy Default_Memory_Leak for server Dc1_xdblade13b01 on node xdblade13b01.

Originated Time
8/26/09 13:45:22

Task ID
1747574254 (Planned-Executing)

Submitter
HealthController:xdblade13b01:Dc1_xdblade13b01 (xdblade13b06:nodeagent)

Severity
Warning

State
Succeeded

Status
Completed: WXDH1008I: The restart operation for server 3tVe1/xdblade13b01/Dc1_xdblade13b01 succeeded.
The memory consumption limit specified by policy Default_Excessive_Memory_Usage was exceeded by server Dc1_xdblade13b06 on node xdblade13b06.
The limit is 60 % and the current heap size is 91 % of the maxi-mum of 262144 KB. Show additional task detail information.

Originated Time
8/26/09 13:20:07

Task ID
125491236 (Planned-Approval)

Submitter
HealthController:xdblade13b06:Dc1_xdblade13b06 (xdblade13b06:nodeagent)

Severity
Critical

State
Succeeded

Status
Completed: WXDH1008I: The restart operation for server 3tVe1/xdblade13b06/Dc1_xdblade13b06 succeeded.

At the close of that test, the test for the storm drainage condition is started. The same application is used, but with different test parameters set and a higher virtual user rate. For this test, 3000 virtual users are used to demonstrate a condition in which the response time drops dramatically on one server, causing a large number of requests to be routed to that server. Having the large number causes the condition to surface more quickly.

After the test is fully ramped to 3000 virtual users, a leveling off period of 30 minutes is used. After the leveling off period ends, a condition is set within the application on a specific server, causing the response time to dramatically decrease. The situation created is much like a condition in which a server loses connectivity to an LDAP, and just as a log in attempt is made, the session is closed and an error is encountered within the application. As a result, the response is quicker than that of the application during normal operation. After this condition occurs and a short period of time passes for enough requests to be routed to the server targeted for the storm drainage, the task is initiated and accepted to restart the server. A storm drain-age takes into account the average server time, the CPU utilization, and other server factors to recognize that one has occurred.

Server Dc1_xdblade48b08 (/3tVe1/xdblade48b08): The health policy this server is a member of, Default_Storm_Drain, has breached its configured condition, and has a severity of level 'minor'.
See the runtime tasks panels for more detail.
A storm drain condition is suspected by policy Default_Storm_Drain for server Dc1_xdblade48b08 on node xdblade48b08.
The average response time has dropped to 6.995732948204556 ms.

Originated Time
8/26/09 17:10:48

Task ID
2021220492 (Planned-Approval)

Submitter
HealthController:xdblade48b08:Dc1_xdblade48b08 (xdblade13b06:nodeagent)

Severity
Minor

State
New
A storm drain condition is suspected by policy Default_Storm_Drain for server Dc1_xdblade48b08 on node xdblade48b08.
The average response time has dropped to 6.995732948204556 ms.

Originated Time
8/26/09 17:10:48

Task ID
2021220492 (Planned-Approval)

Submitter
HealthController:xdblade48b08:Dc1_xdblade48b08 (xdblade13b06:nodeagent)

Severity
Minor

State
Succeeded

Status
Completed: WXDH1008I: The restart operation for server 3tVe1/xdblade48b08/Dc1_xdblade48b08 succeeded.

High Availability Deployment Manager Repository Checkpoint

Validation criteria
  • Failover during activity
  • Repository checkpoints
  • Repository checkpoints in automatic mode
  • Repository restore
Operational profile
  • Multiple dynamic clusters
  • Multiple applications
  • High availability deployment manager
Validation

After enabling the automatic checkpoint and depth to 20, a series of changes were made to the existing service policies. The following entries were created:

| Delta-1249948022415 | 1 | DELTA | 1249948022415 | Autosave delta image |
| Delta-1249995070177 | 1 | DELTA | 1249995070177 | Autosave delta image |
| Delta-1249995173456 | 3 | DELTA | 1249995173456 | Autosave delta image |

The following message was displayed when an attempt to restore Delta-1249948022415 was made:

Message: When restoring a Checkpoint please stop all processes except for the Deployment Manager.

The cell was stopped and the primary deployment manager was started. The following items were noted:

  • Leaving automatic mode set caused a number of Delta files to be created when updates were made as the cell was stopped.
  • With automatic mode set, the restore operations must be attempted in the reverse order in which they are created, or the restore operations will fail.
| Delta-1249996271016 | 1 | DELTA | 1249996271016 | Autosave delta image |
| Delta-1249996272046 | 1 | DELTA | 1249996272046 | Autosave delta image |
| Delta-1249996272440 | 1 | DELTA | 1249996272440 | Autosave delta image |
| Delta-1249996272918 | 1 | DELTA | 1249996272918 | Autosave delta image |
| Delta-1249996273366 | 1 | DELTA | 1249996273366 | Autosave delta image |
| Delta-1249996273866 | 1 | DELTA | 1249996273866 | Autosave delta image |
| Delta-1249996274367 | 1 | DELTA | 1249996274367 | Autosave delta image |
| Delta-1249996274904 | 1 | DELTA | 1249996274904 | Autosave delta image |
| Delta-1249996275462 | 1 | DELTA | 1249996275462 | Autosave delta image |
| Delta-1249996276160 | 1 | DELTA | 1249996276160 | Autosave delta image |
| Delta-1249996276830 | 1 | DELTA | 1249996276830 | Autosave delta image |
| Delta-1249996277410 | 1 | DELTA | 1249996277410 | Autosave delta image |
| Delta-1249996278089 | 1 | DELTA | 1249996278089 | Autosave delta image |
| Delta-1249996279066 | 1 | DELTA | 1249996279066 | Autosave delta image |
| Delta-1249996279796 | 1 | DELTA | 1249996279796 | Autosave delta image |
| Delta-1249996280662 | 1 | DELTA | 1249996280662 | Autosave delta image |
| Delta-1249996281670 | 1 | DELTA | 1249996281670 | Autosave delta image |
| Delta-1249996282594 | 1 | DELTA | 1249996282594 | Autosave delta image |
| Delta-1249996283754 | 1 | DELTA | 1249996283754 | Autosave delta image |
| Delta-1249996285211 | 1 | DELTA | 1249996285211 | Autosave delta image |
| Delta-1249996286610 | 1 | DELTA | 1249996286610 | Autosave delta image |

The following message was displayed when Delta-1249996271016 was checked and then restored:

Message: Log out from the administrative console and then log in again after a checkpoint restoration.
This prevents you from experiencing problems or abnormal behavior resulting from workspace issues.

Delta-1249996272046 was then checked:

#XDAgent catalog nodes information
#Tue Aug 11 09:11:10 EDT 2009
masterRepositoryHost=xdblade48b13
catalogports=7062,7062,7062,7062,7062,7062,7062,7062,7062,7062
masterRepositoryPort=7061
masterRepositoryAdminPort=20060
catalog-hosts=xdblade48b07,xdblade48b06,xdblade48b05,xdblade48b04,xdblade48b03,xdblade13b06,xdblade13b01,xdblade48b11,
xdblade48b08,xdblade48b10

The following tasks were then completed:

  1. Restored Delta-1249996272046.
  2. Logged off from the administrative console.
  3. Selected the remaining Deltas and restored them.
  4. Disabled the automatic updates.
  5. Updated the depth to 200.
  6. Completed a full checkpoint.
  7. Deleted checkpoints from disabling the automatic checkpoint.
  8. Completed a full checkpoint.
  9. Modified multiple application and service policy settings, and restored the full checkpoint. As a result, the following message was displayed:
    Message: Your workspace has been auto-refreshed from the master configuration. You can disable auto-refresh in your user preferences.
    
  10. Reverted the service policy settings back to the original settings.
  11. Completed a full checkpoint.
  12. Enabled tracing for the autonomic request flow manager, and restored the full checkpoint. The trace remained.
  13. Deleted the ODR node group, and restored the full checkpoint. The ODR node group was restored.
  14. Removed two servers from ClusterGroup, and restored the full checkpoint. The servers were added back to ClusterGroup and were recreated.
  15. Deleted the base editions for veApp1 and veApp2, and restored the full checkpoint. The base editions for veApp1 and veApp2 were restored.
  16. Reset the trace setting for the autonomic request flow manager to "*=info".
  17. Restarted the cell.
  18. Started an off-peak load against applications veApp1 through veApp7.
  19. A failover occurred to the secondary deployment manager. A 503 error message was displayed immediately after the primary deployment manager was stopped, and the current console session was no longer valid.
  20. Logged on to the administrative console. The environment continued to operate as expected.

Managed Reliability and Stability Run Across Five Days

Validation criteria
  • No failed requests
  • No catastrophic failures
  • Tasks are successfully operated throughout the test period
Operational profile
  • Various loads with peak and off-peak periods of operation
  • Application editions rolled out daily during off-peak periods of operation
  • Active service policies
  • A concurrent peak of virtual users at 1000
  • A medium level for the rate of applications ramped:
    • 5 seconds for veApp1, veApp2, veApp3, veApp4
    • 12 seconds for veApp5
    • 33 seconds for veApp7
  • Unequal and inverted load levels between peak and off-peak periods of operation
  • Higher loads during peak periods of operation for high-priority applications
  • Higher loads during off-peak periods of operation for low-priority applications
  • Dynamic clusters and applications
    Dynamic cluster name Application name Maximum virtual users Off-peak period of operation Peak period of operation
    DC1 veApp1 200 10 +190
    DC2 veApp2 200 10 +190
    DC3 veApp3 200 10 +190
    DC4 veApp4 200 20 +180
    DC5 veApp5 100 20 +80
    DC6 veApp6 50 20 +30
    DC7 veApp7 50 20 +30
  • Service policies by application
    Service policy Priority Response time goal Application name
    SP1 Highest Average response 250 milliseconds veApp1 SP
    SP2 Very high Average response 500 milliseconds veApp2 SP
    SP3 High Average response 900 milliseconds veApp3 SP
    SP4 Medium Average response 1300 milliseconds veApp4 SP
    SP5 Low Average response 1800 milliseconds veApp5 SP
    SP6 Very low Average response 2500 milliseconds veApp6 SP
    SP7 Lowest Average response 3500 milliseconds veApp7 SP

Note: Applications will be serviced based on an optimizing response time for all applications. When a condition exists where the service of all applications equally impacts those that have been designated as higher priority applications, Virtual Enterprise balances the load to service the higher priority applications based on the service policy rules. In our test environment, the applications have tuning parameters set to create a desired load on the system to achieve a specific condition. From this, a baseline run is completed to establish the test profile to be used throughout the duration of the test. The profile is established in a peak operating situation for a worst case scenario. The average response time values are established during this baseline run for a given set of hardware and simulated conditions that are established based on the target work load for each request, the CPU load levels, and the rate at which sessions begin and end. The values used in a specific production environment will be dependent on the hardware, back end peripheral systems, and the concurrently installed applications themselves. The values might not match those identified in this test scenario.

Validation

Starting with a steady off-peak load, a number of new application editions were rolled out. After the rollouts complete, the load is moved to the peak operating load. For five days, peak operation occurs for approximately eight hours a day with approximately 16 hours of off-peak operation. New application editions were rolled out nightly and a serious of operation characteristics were monitored throughout the run. The operational characteristics that are monitored include, but are not limited to, the following:

  • CPU utilization across all nodes
  • Average response and service times
  • Server weighs
  • Queued requests
  • Task management and APC tasks

Peak CPU values

Time Node1 Node2 Node3 Node4 Node5 Node6 Node7 Node8 ODR1 ODR2
15:20:39 95 99 100 100 99 99 96 99 88 85

Off-peak CPU values

Time Node1 Node2 Node3 Node4 Node5 Node6 Node7 Node8 ODR1 ODR2
19:13:26 55 42 95 94 92 92 92 97 78 78

Peak service policy values

Time SP1 SP2 SP3 SP4 SP5 SP6 SP7
10:10:07 75.24 82.76 116.63 173.71 242.89 267.76 607.68

Important: CPU overload protection was set at 90% throughout the test. CPU overload protection operates on an a weighted average, which means if you set the CPU overload protection value to 90, you could potentially see CPU rates at higher level for particular systems that have higher weights. And even though the average of all systems will average out to the CPU overload protection level, the priority of the autonomic request flow manager is to achieve the best response time with regard to the the available power of the overall cell. Peak operation during these tests was simulating a worst case scenario.

Reliability and Stability Run Across Three Days

Validation criteria
  • No failed requests
  • No catastrophic failures
  • Tasks are successfully operated throughout the test period
Operational profile
  • Various loads with peak and off-peak periods of operation
  • Active service policies
  • An increased queue length of 3000
  • A concurrent peak of virtual users at 1000
  • A low level for the rate of applications ramped
  • Equal load levels between peak and off-peak periods of operation across applications
  • Higher loads in peak periods of operation for high-priority applications
  • Higher loads in off-peak periods of operation for low-priority applications
  • Dynamic clusters and applications
    Dynamic cluster name Application name Maximum virtual users Off-peak period of operation Peak period of operation
    DC1 veApp1 410 10 +400
    DC2 veApp2 410 10 +400
    DC3 veApp3 410 10 +400
  • Service policies by application
    Service policy Priority Response time goal Application name
    SP1 Highest Average response 400 milliseconds veApp1 SP
    SP2 Medium Average response 800 milliseconds veApp2 SP
    SP3 Lowest Average response 1200 milliseconds veApp3 SP
Validation

Starting with a steady off-peak load, the load is quickly moved to the peak operating load. For three days, peak operation occurs for approximately 10 + hours a day with approximately 14 hours of off-peak operation. Ramp to peak occurs within 16 minutes, or 2.5 seconds per virtual user. The operational characteristics that are monitored include, but are not limited to, the following:

  • CPU utilization across all nodes
  • Average response and service times
  • Server weighs
  • Queued requests
  • Task management and APC tasks

Peak CPU values

Time Node1 Node2 Node3 Node4 Node5 Node6 Node7 Node8 ODR1 ODR2
17:52:41 85 80 95 93 93 94 98 97 79 77

Off-peak CPU values

Time Node1 Node2 Node3 Node4 Node5 Node6 Node7 Node8 ODR1 ODR2
07:12:14 17 18 50 50 47 49 41 47 34 35

Peak service policy values

Time SP1 SP2 SP3
17:58:56 196.15 452.71 631.06

Peak queue values

Time app1 app2 app3
17:59:20 297.81 359.94 377.72

Important: CPU overload protection was set to 90% throughout the test. CPU overload protection operates on an a weighted average, which means if you set a CPU overload protection value to 90, you could potentially see CPU rates at higher level for particular systems that have higher weights. And even though the average of all systems will average out to the CPU overload protection level, the priority of the autonomic request flow manager is to achieve the best response time with regard to the available power of the overall cell. Peak operation during these tests was simulating a worst case scenario.

Memory Usage for the ODRs

ODR 1

ODR 2

References

The following information is available for reference:

WebSphere Application Server Network Deployment (All operating systems), Version 6.1

WebSphere Virtual Enterprise Version 6.1.1


    About IBM Privacy Contact