High availability (load balancing)

You can use network load balancers to manage client requests across servers in a Datacap system.

Load balancing is a method for scaling a system horizontally by distributing the work across many computer nodes in a "farm." It also provides high availability by redirecting clients to a working node in case of failure. A load balancer or content switch presents a single address for communication with multiple servers for one or more Datacap applications. You configure the load balancer to send requests that are directed to each pooled / balanced address to one of the servers in the farm. You can select round-robin scheduling or another method. Configure the Datacap Server connection timeout to be longer than the processing time for the longest batch. Typically, 1 hour is sufficient.

If possible, you can grant network access to the back-end server addresses for easier initial setup and subsequent problem solving. Test your system without any load balancing at first. Add load balancing to one component at a time, and reconfigure as needed. Test each balanced address, including failover to each back-end server, before you test the next component. If policy requires that you disable connections to the back-end servers, be prepared to re-enable for troubleshooting, if required.

Datacap Server

Clients access the Datacap Server by using a TCP/IP socket-based protocol. You configure the Datacap Server name or IP address and port in Datacap Application Manager. The Datacap Server normally listens on port 2402, but you can change the port number in Datacap Server Manager. If you change the port number in Datacap Server Manager, you must also configure the port number in Datacap Application Manager. If a load balanced server fails, all new client requests are directed to a different server, the old session becomes invalid, and the user or client must log in again. Any outstanding server requests are terminated and batches that are in process via that server remain in a running status. Users who logged in to this server receive an error message and must log in again. TCP/IP sessions are inherently persistent and there is no need for the load balancer to persist Datacap Server sessions. However, you might want to configure the load balancer to persist sessions based on the client IP address to force all threads from any Rulerunner server to use the same Datacap Server.
For high availability, the best practice is to configure multiple Datacap Server instances in a farm by using a network load balancer or content switch. Enter the single virtual (balanced) address for Datacap Server in Datacap Application Manager. Configure the load balancer with persistent, sticky sessions based on the source IP.

Datacap Web Client

The servers where the Datacap Web Client is located can be farmed. Designate one or more IP addresses or port on your network for your Datacap Web Client site home page. Client browsers connect to this load balancer port by using an HTTP or HTTPS protocol. Configure your load balancer to redirect those requests to individual web servers by using round-robin scheduling, or another method. Ports 80 and 443 are standard but you can configure an alternate port in the Microsoft IIS Manager. The Datacap Web Client uses session cookies, so you must configure the load balancer to persist sessions based on the client's IP address. Set the load balancer session timeout to match the IIS session timeout. If a server fails, users who connected to the failed server receive an error message and must log in again.

Datacap Web Services

Datacap Web Services servers can be farmed. Clients connect to an address and port for the load balancer and are directed to a specific server. Sessions must be persisted by the load balancer and the session timeout must match the web service session timeout. There are different methods to achieve session persistence, refer to your specific load balancer manual for complete details. Failure of a Datacap Web Services server generates an error for requests that are in progress and the requested operations might not be completed.

Datacap Report Viewer

The Datacap Report Viewer IIS servers can be farmed. Clients connect to an address and port for the load balancer and are directed to a specific server. Sessions must be persisted based on IP address, and the session timeout must match the IIS server session timeout. If a Datacap Report Viewer server fails, all existing sessions end.

Fingerprint Service

The Fingerprint Service servers can be farmed, if the fingerprints are static during normal system operation. Updates and deletions of fingerprints are not synchronized automatically between servers. Fingerprint servers must be restarted, or their contents programmatically reset to keep them synchronized, if changes are made to the set of fingerprints. You must configure the load balancer to persist sessions based on the client's IP address.

Rulerunner

The Rulerunner servers independently poll Datacap servers for pending work. These servers do not require or benefit from load balancing.

Datacap Navigator

Datacap Navigator is a plug-in for IBM® Content Navigator. For information about configuring a load-balanced environment, see Getting IBM Content Navigator up and running. Refer to the steps that are marked High Availability Clusters.
Table 1. Load balancing options for Datacap servers
Datacap Server Load Balanced Protocol
Datacap Server (application server) Yes, persistent sessions by client IP TCP/IP socket
Datacap Web Client server Yes, persistent sessions by client IP HTTP, HTTPS
Datacap Web Services server Yes, persistent sessions by client IP HTTP, HTTPS
Datacap Report Viewer server Yes, persistent sessions by client IP HTTP, HTTPS
Fingerprint Service server Yes, persistent sessions by client IP HTTP, HTTPS
Rulerunner server No