System administrators want to provide services that are always available and that give a consistent quality of service as more people use them. In other words, the services should have load-balancing and failover capabilities. Lotus Instant Messaging and Web Conferencing (Sametime) provides a way for you to achieve these goals for the Lotus Instant Messaging (Sametime) Connect client with a feature called Community Services clustering. This article is motivated by several customer support experiences that have highlighted some misconceptions about the correct way to setup, deploy, and troubleshoot Community clustering. It outlines the basics of Community clustering in Lotus Instant Messaging and Web Conferencing versions 6.5.1 and 3.1, including some best practices, a discussion of limitations, and ways to circumvent those limitations. Meeting Services clustering, the main component of the Lotus Web Conferencing Management Server (Sametime Enterprise Meeting Server) product, provides failover and load-balancing for Lotus Web Conferencing and is not addressed here.
This article is for Lotus Instant Messaging and Web Conferencing administrators or anyone involved in the deployment planning of Community Services clustering. It assumes that you have used the Connect client and are familiar with basic Lotus Instant Messaging and Web Conferencing administration tasks and other basic Lotus Instant Messaging and Web Conferencing features. One final note, the information in this article is applicable to versions 6.5.1 and 3.1 of the Lotus Instant Messaging and Web Conferencing product, but the URL links provided are only for version 3.1. When the Lotus Instant Messaging and Web Conferencing 6.5.1 documentation is available on-line, you can search it on the Lotus Documentation Web site.
Introducing Community Services clustering
A single Lotus Instant Messaging server can provide Connect client functionality to about 10,000 simultaneous users. Even though this number may satisfy the requirements of most small- to medium-sized deployments, the single server installation has a major weakness--a single point of failure. If anything goes wrong with the Lotus Instant Messaging server, or if the server requires restarting because of a scheduled maintenance task, then the entire Lotus Instant Messaging community is unavailable. This weakness can be partially overcome by adding more Lotus Instant Messaging servers to the deployment. However, if any of those servers must be taken off-line, the users who are connected to them must manually edit their connection settings to connect to a working server. If users can correctly edit their connection settings before and after a server outage, they can continue to use Lotus Instant Messaging Community Services. However, this solution shifts the burden of responsibility from system administrators to end users. And, when this burden is shouldered by an end user, it inevitably leads to a swarm of help desk incidents. Community clustering is a superior solution to the problems caused by manual client configuration and a single point of failure.
Community clusters are groups of Lotus Instant Messaging servers that are addressable through a single network address. Any of the servers in a cluster can go off-line without affecting the availability of Connect client services and without requiring users to reconfigure their clients. This feature was first available in Sametime 2.5 and has undergone refinements ever since. You cannot configure clustering with the Lotus Instant Messaging and Web Conferencing Administration client, so you must manually set it up using a Notes client and text editor. Instructions for installing and deploying a Community cluster are found in the Lotus Instant Messaging and Web Conferencing Administrators guide. In addition, check the Lotus Instant Messaging and Web Conferencing release notes for any updated information.
Community clusters require a dispatcher service to function properly. The IBM Websphere Edge Server and a rolling DNS service are examples of common dispatcher services. They work by redirecting clients from their own network address to one of a list of network addresses on which Lotus Instant Messaging Community Servers are running. The list of network addresses is configurable on a dispatcher without bringing it off-line. Therefore, you can add and remove Lotus Instant Messaging servers without affecting the availability of Community Services. You configure a Connect client only once to connect to the network address of a dispatcher service. If a Lotus Instant Messaging server goes off-line, its network address can be removed from the dispatchers address list, and users are no longer directed to it.
Figure 1. Basic Community cluster design
Best practices for deploying Community clusters
The most important and easiest best practice that a surprising number of administrators do not follow when deploying Community clusters is to read all available documentation. First, read the Lotus Instant Messaging and Web Conferencing Administrators guide and then check the release notes. Next, read the excellent article written about IBM's Community cluster deployment, "The hitchhikers guide to Sametime deployment at IBM." Finally, search both the Lotus Support Services Web site and the IBM Redbooks Web site using search terms such as Sametime community cluster. Simply reading the available documentation will greatly increase your chances of a smooth Lotus Instant Messaging and Web Conferencing deployment.
You should also keep in mind that Community clusters provide the best scalability when used in conjunction with remote Community multiplexers, or remote muxes. These processes are deployed between the dispatcher service and the Community servers of a Community cluster (see Figure 2). They do not handle Community protocol processing tasks, but instead serve as concentration points for thousands of network connections to a Community server. They are free from the burden of servicing connection data and simply funnel the connections to a separate machine running the full suite of Community Services. You can deploy multiple remote muxes in a clustered environment to increase the number of users that the server can handle. The Lotus Instant Messaging and Web Conferencing Administrators guide has guidelines for the expected load-handling capabilities of multiple remote muxes in a clustered environment. Deploying remote muxes is a best practice that should not be overlooked and provides the benefit of off-loading connection switching from the Community server while greatly increasing the number of connections that can be serviced by a Lotus Instant Messaging server.
Figure 2. Community cluster with remote muxes
Another best practice to consider in large Community cluster deployments is the use of multiple Community clusters deployed by geographical region, a technique sometimes called geographical clustering. It places the servers of one cluster in a separate physical location from the servers of another cluster with a slower WAN connection between them. Users are then connected to the cluster that is physically closest to them. For example, a company that has offices in London and Boston might deploy a cluster in London and a separate cluster in Boston. Both of these clusters would be part of one Lotus Instant Messaging community, but connections from Boston employees would be directed to the Boston cluster and likewise for London employees. The network traffic from Boston employees should travel mostly among other Boston employees, and likewise for the London employees. As a result, data seldom traverses the WAN connection, and overall network throughput is increased for the entire community. The only drawback to geographical clustering occurs when users in one location send and receive most of their data to users in a separate location. Consequently, most of the network traffic flows over the slower WAN connection. This behavior introduces latency to the data flow, and the overall user experience seems slower.
In addition to deploying multiple clusters, you should always set the Home Sametime Server field for every Lotus Instant Messaging user to an appropriate value. A full description of this field is outlined in the Lotus Instant Messaging and Web Conferencing Administrators Guide. This value should typically be the name of the cluster that is geographically closest to the user. If more than one cluster is deployed in the same geographical region, you may set this value to the name of the cluster that is least loaded with other users. It gives a hint to the Community clusters where the servicing of Community data should take place, and thus, where most processing time should be spent. If the field is not set appropriately, delays can occur in processing instant messages or other Community data. When deploying multiple Community clusters, be aware of a bug in the Lotus Web Conferencing Meeting Center that does not properly route Web Conferences to the correct cluster. This problem only arises when more than one Community cluster is defined on a Lotus Web Conferencing server and only affects Web Conferencing capabilities. When a user attempts to join a Web Conference, the Meeting Room Client is directed to the first Community cluster defined on the Lotus Web Conferencing server instead of the cluster defined in the users Home Sametime Server field. This bug negates the benefits of deploying multiple Community clusters because all Web conferencing users are directed to the same Community cluster. This bug has been fixed in Lotus Instant Messaging and Web Conferencing 6.5.1, and the fix is available for earlier releases of the product through Lotus Support.
The most important limitations of Community clusters relate to their dependence on Domino clustering. You must install Community clusters on clustered Domino servers, which are limited to six servers per cluster; therefore, a Community cluster can have a maximum of six Lotus Instant Messaging servers. Also, Community clusters rely on real-time replication between servers in a Domino cluster. This type of replication requires efficient network connectivity between servers as opposed to slow WAN connections that exist between geographically dispersed server locations. In other words, if a company has a large number of employees and they are not concentrated in certain geographic regions, then it may be difficult to design an efficient cluster deployment. And finally, Domino clusters require additional maintenance and administration beyond that of a non-clustered Domino environment. Administering Domino clusters can be a complicated task, especially in the case of companies whose only Domino servers are Lotus Instant Messaging and Web Conferencing servers.
Deploying Community clusters does not help the scalability and availability of Web conferencing in Lotus Web Conferencing. If a company requires load-balancing and failover capabilities for Web conferencing, it must deploy Meeting Services clustering and a Lotus Web Conferencing Management Server (Sametime Enterprise Meeting Server). This requirement confuses some administrators because they assume that Community clustering affects the entire Lotus Instant Messaging and Web Conferencing product, which is a false assumption. Meeting Services clustering requires an entirely different deployment plan and in very large deployments can include racks of servers for Meeting Services clusters and separate racks of servers for Community Services clusters. Each of these types of clusters can have separate connectivity and hardware requirements.
Community clusters require a dispatcher service to route traffic efficiently among servers in the cluster. When the dispatcher service is a rolling DNS scheme, Connect clients use simple DNS lookups to retrieve the network address of a Lotus Instant Messaging server in the cluster. This method can cause problems if the client is not configured properly. For example, most clients are configured to cache DNS requests and rarely purge the cache. This behavior becomes a problem when the Connect client requests a server address from the dispatcher through DNS and always retrieves the same cached address. If that address is for a server that has gone off-line, the client is not directed to a working server. The dispatcher is supposed to return a different server address each time it is queried by a client, but because the client has cached its DNS request, it never consults the dispatcher service for a new address until the cache expires. The Lotus Instant Messaging and Web Conferencing Administrators guide provides an explanation of how to troubleshoot DNS caching problems for both Connect clients for the desktop and Connect clients for browsers. These problems can become difficult to troubleshoot because clients may have different operating systems and different cache expiration times.
Community clusters are worth the effort
Community clusters can provide valuable load-balancing and failover capabilities to the Community Services of a Lotus Instant Messaging server. Moreover, they do not add significant administration overhead to a Lotus Instant Messaging deployment if certain best practices are followed and some important limitations are understood. Best practices include reading all available documentation, taking advantage of the <a href='http://www.lotus.com/ldd/stforum.nsf'>Lotus Instant Messaging and Web Conferencing forum</a> and considering the deployment of remote muxes. If a Community cluster deployment spans multiple geographic regions, another best practice is to create multiple Community clusters with at least one cluster per region. Limitations of Community clustering include their reliance on Domino clustering and their reliance on DNS cache expiration time. Also remember that Web conferencing does not benefit from the load-balancing and failover capabilities of a Community cluster. If Web conferencing failover and load-balancing is a concern, then deploy a Lotus Web Conferencing Management Server alongside the Community clusters.
- View on-line or download the Lotus Instant Messaging and Web Conferencing
documentation, including the Administrators Guide and Release Notes from the Lotus Documentation Web site.
- For more information about Lotus Instant Messaging deployment, see the technical articles,
"The hitchhikers guide to Sametime deployment at IBM,"
and "Life in the fast lane: IBM moves to Sametime 3."
- For technotes and other Lotus Instant Messaging and Web Conferencing support information, see the
Lotus Support Services Web site
Isaac Hands works for IBM on the Sametime Development team in Lexington, Kentucky. His seven month-old son, Will, enjoys bouncing in a large plastic saucer, rolling across the floor to chew on the fringes of an area rug, and repeating the phrase, "DaDaDaDaDa." Isaac hopes to teach Will some new tricks by the time he writes another article for LDD. In fact, Isaac plans to teach Will how to write his next article for developerWorks: Lotus, so stay tuned.
Comments (Undergoing maintenance)





