Preparing to Install Connect:Direct for UNIX in a Cluster Environment

Connect:Direct® for UNIX supports clustering software to allow two or more computers to appear to other systems as a single system. All computers in the cluster are accessible through a single IP address. Connect:Direct for UNIX can be installed in two types of clustering environments: high availability and load balancing clustering environments.

High-Availability Cluster Environments

Consider the following information when planning to use Connect:Direct for UNIX in a high-availability cluster environment.

Supported High-Availability Cluster Environments

Connect:Direct for UNIX is certified to operate in the following high-availability cluster environments:

  • Connect:Direct for UNIX high-availability cluster multiprocessing (HACMP) environment
  • Hewlett-Packard MC/Service Guard
  • SunCluster versions 2.2, 3.0, and 3.2
  • Veritas Infoscale Availability (formerly Veritas Cluster Server)
  • Red Hat High Availability Add-On

If you plan to install Connect:Direct for UNIX in a high-availability cluster environment, complete the following tasks:

  • Install the clustering software on each computer in the cluster, including setting up a logical host or application package.
  • Create a user with the same name and user ID on each cluster node.
  • Create a Connect:Direct subdirectory on a shared file system on a shared volume group.
  • Ensure that the shared file system is owned by the IBM® Connect:Direct user.
  • Install IBM Connect:Direct on the shared file system.
  • Perform the procedures necessary to define the high-availability settings and configure the cluster environment.

Limitations of High-Availability Clusters

When running Connect:Direct for UNIX in a high-availability cluster environment, be aware of the following limitations:

  • If a failure occurs, all Processes being held will be restarted when IBM Connect:Direct is restarted. This includes Processes that are held by the operator as well as Processes held in error. This could cause a security risk.
  • When a IBM Connect:Direct ndmsmgr process associated with a IBM Connect:Direct Process is killed, the Process is not automatically restarted and is put in the Held in Error state. It must be manually restarted; otherwise, the IBM Connect:Direct Process is restarted when the cluster restart occurs.
Important: IBM offers standard support for Connect products that are installed according to the current documented installation instructions. Customers who encounter problems with a nonstandard installation, which cannot be reproduced in a standard environment, will be provided "best endeavor support". Offering best endeavor support, we will address questions and issues normally but if we determine the issue is due to a nonstandard installation environment you will be required to reinstall into a supported environment.

Load-Balancing Cluster Environments

In a load-balancing cluster environment, an incoming session is distributed to one of the Connect:Direct for UNIX instances based on criteria defined in the load balancer. Generally, from the point of view of the nodes behind the load balancer, only incoming or SNODE sessions are affected by the load balancer. PNODE, or outgoing sessions, operate the same way as non-cluster Connect:Direct for UNIX PNODE sessions.

SNODE Server Considerations for Load-Balancing Clusters

Consider the following when planning and setting up the Connect:Direct for UNIX SNODE servers in a load balancing cluster:

  • The servers used for the Connect:Direct for UNIX instances behind the load balancer must all have access to common shared disk storage because of the following:
    • Any copy statement source and destination files for SNODE processes must reside in directories accessible to all servers.
    • All nodes must have access to a common SNODE work area and that area must be on a cluster file system or a Network File System version 4 (NFSv4) or greater resource. This includes the Amazon Elastic File System (EFS), as it is mounted via NFSv4 protocol. NFSv3 is not supported.
    • All servers must be of the same platform type (for example, all Solaris SPARC or all Linux Intel) and the same Connect:Direct for UNIX version and maintenance level.
  • The system clocks on all servers must be synchronized in order for copy checkpoint/restart and run task synchronization to work.
  • The administrator user ID used to install Connect:Direct for UNIX must be defined on each server and must be the same user and group number on each server.

SNODE Setup for Load-Balancing Clusters

Consider the following when planning and setting up the Connect:Direct for UNIX SNODEs in a load-balancing cluster:

  • One Connect:Direct for UNIX node should be installed on each server behind the load balancer.
  • Each node should be installed by the same user ID.
  • Each node should have the same Connect:Direct for UNIX node name.
  • Each node should have the same node-to-node connection listening port.
  • A directory should be established for the shared SNODE work area used by the Connect:Direct for UNIX nodes behind the load balancer. This directory should be owned by the Connect:Direct for UNIX administrator ID and must be accessible to all of the servers behind the load balancer.
  • Each node should specify the same path to the directory used for the shared SNODE work area. Specify this path in the snode.work.path parameter of the ndm.path record in the initialization parameter file.

Limitations of Load Balancing Clusters

When running Connect:Direct for UNIX in a cluster environment, be aware of the following limitations:

  • If an incoming session fails and is restarted by the PNODE, then the restarted session may be assigned to any of the instances behind the load balancer and will not necessarily be established with the original SNODE instance.
  • When shared SNODE work areas are configured and the run task is on the SNODE, then at restart time, Connect:Direct for UNIX cannot determine whether the original task is still active or not because the restart session may be with a different server. If you set the global run task restart parameters to yes in the initialization parameters file, a task could be restarted even though it may be active on another machine. Therefore, exercise caution when specifying restart=y.
  • Each SNODE instance that receives a session for a given Process creates a TCQ entry for the Process. Each SNODE instance has its own TCQ file, and these files are not shared among SNODE instances. Only the work files created in the shared work area are shared among SNODE instances.
  • When a Process is interrupted and restarted to a different SNODE instance, the statistics records for that Process is distributed between the two SNODE instances involved. As a result, you cannot select all the statistics records for a Process.