Performance Requirements for a Clustered Installation

To determine the performance requirements for a Sterling B2B Integrator environment, you must understand the work to be done and the scheduling constraints. There are two distinct classes of business processes: without deadlines and with deadlines.

Capacity for processes without deadlines depends on the arrival rate and an appropriate margin. In a cluster environment, if you are relying on more than one node to meet the workload, and a node fails, the system will not meet the workload.

Processes with deadlines can be a lot more complicated. Usually, you must consider three scenarios:
  • The process under normal circumstances.
  • The process when everything is running late. How much lateness is tolerated?
  • The recovery process, in which, in addition to the normal workload, a series of complex processes that failed due to a processing problem may be re-running.

In the recovery process, the system needs to meet deadlines for current work while reprocessing business processes that have failed. Determine what the largest jobs are and consider the workload imposed by running them outside their allotted schedules. If additional processing capacity is required, a node can be added to a running cluster environment.

Careful scheduling spreads workload throughout the time period (day or week or month) more effectively than if the system processes every incoming document immediately.

Capacity Planning for a Clustered Installation

Capacity planning includes the following specifications:

  • Disk space for locally stored documents
  • Disk space in the database (which is significantly affected by the archive policy):
    • For documents
    • For tracking information
    • For the persisting state of running business processes
  • Database processing power
  • Database connections (along with memory and other database resources)
  • Node memory
  • Node processing power
  • Network bandwidth:
    • To and from the adapters
    • To and from the database
    • Between cluster nodes
  • Number of nodes in the cluster

Scaling Vertically in a Clustered Installation

Cluster performance can be scaled just like single node performance, by improving performance with features like:

  • More CPUs

    More CPUs allow more concurrent threads to run effectively, increasing the amount of concurrent work the node can accomplish.

  • Faster CPUs

    Faster CPUs speed up the execution of individual business processes, but have less effect on the amount of concurrent activity that the node can support.

  • More memory

    Adding memory improves performance in every dimension, if parameters affecting caching and document persistence are adjusted accordingly. The number of threads a node can support is related to its processing power and its memory.

  • More input/output capacity

    Increasing input/output capacity can be a huge improvement if the File System adapter is used heavily or if local document storage is turned on, causing poor adapter performance.

Scaling Horizontally in a Clustered Installation

Cluster performance can also be increased by adding nodes to the cluster.

Scaling in this fashion has the following advantages and disadvantages:
  • Advantages
    • Extreme scalability when coupled with parallel database technology.
    • Nodes can be added to a running cluster without causing any kind of outage.
    • An increase in the level of redundancy in the environment.
    • More options for partitioning the adapter/input/output workload.
    • More cost effectiveness if the existing nodes are already fully configured and upgrading would involve replacing them.
    • High performance installations even using I/O-limited PC-based hardware.
  • Disadvantages
    • The addition of nodes is always less efficient than enhancing the existing nodes because of cluster overhead.
    • Some additional management overhead for adding a machine to your environment

Clusters can be expanded while running and new nodes do not have to be identical to existing hardware (although it is strongly recommended that they be as similar as possible). They do need to be running the same JVM versions.

For example, you could buy two nodes, about half configured with CPU, RAM and I/O cards. After about a year, you upgrade the boxes to their maximum configurations. Then you need still more capacity. Usually, by this time, the system vendor has moved on to new models. Rather than replacing your system, you just add a node using the vendor's new technology. Because nodes can be individually tuned with respect to threads and other performance parameters, the new node can be more or less powerful than the existing nodes. Once tuned, faster nodes automatically assume more of the workload.

Defining Service Level Objectives for a Clustered Installation

Once the system processing power is known, you must determine the level of service that you want to achieve with your clustered installation. Factors in this decision include:
  • Degree of fault tolerance and availability they require
  • Normal workload requirements
  • Exception/recovery processing requirements

If you are considering clustering for failover or high availability, the entire system must be examined to find single points of failure. For more information, see Managing Single Points of Failure.

There are multiple ways to handle failover. You must decide which option is best depending on the service levels you require. Failover can be accomplished by having redundancy built into some or all of the different layers of the environment, allowing one instance of a given system to take over for another upon failure. Address all of the following layers:
  • Platform (including hardware, networks, file systems and services like DNS)
  • Database
  • Application (including the application server and its components)
  • Interface (including web components)

It is recommended that the file systems be redundant to eliminate this as a single point of failure. It also is recommended that the database be clustered or mirrored to provide redundancy at this layer. Depending on the number of users accessing the web tier, this layer can also be clustered, or provide multiple web servers to provide load balancing and failover.

At the application tier, failover can be accomplished using multiple approaches. Clustering can be implemented, with a separate node on each of two different servers, or a standby strategy can be implemented and the backup activated when the primary system is unavailable. In certain circumstances, the standby server can be used by other applications during the time the primary system is available, and it would need to handle the additional workload of the main system only when the primary system fails.