Microsoft Failover Clustering support (Windows)

Microsoft Failover Clustering supports clusters of servers on Windows operating systems. It automatically detects and responds to server or application failure, and can balance server workloads.

Introduction

Microsoft Failover Clustering is a feature of Windows server operating systems. It is the software that supports the connection of two servers (up to four servers in Datacenter Server) into a cluster for high availability and easier management of data and applications. Failover Clustering can also automatically detect and recover from server or application failures. It can be used to move server workloads to balance machine utilization and to provide for planned maintenance without downtime.

The following Db2® database products have support for Microsoft Failover Clustering:
  • Db2 Connect server products (Db2 Connect Enterprise Edition, Db2 Connect Application Server Edition, Db2 Connect Unlimited Edition for System i® and Db2 Connect Unlimited Edition for System z®).
  • Db2 Advanced Enterprise Server Edition
  • Db2 Enterprise Server Edition
  • Db2 Workgroup Server Edition

Db2 Failover Clustering components

A cluster is a configuration of two or more nodes, each of which is an independent computer system. The cluster appears to network clients as a single server.

Figure 1. Example of a Failover Clustering configuration
Example of a Failover Clustering configuration

The nodes in a Failover Clustering cluster are connected by one or more shared storage buses and one or more physically independent networks. The network that connects only the servers but does not connect the clients to the cluster is referred to as a private network. The network that supports client connections is referred to as the public network. There are one or more local disks on each node. Each shared storage bus attaches to one or more disks. Each disk on the shared bus is owned by only one node of the cluster at a time. The Db2 software resides on the local disk. Db2 database files (for example tables, indexes, log files) reside on the shared disks. Because Microsoft Failover Clustering does not support the use of raw partitions in a cluster, it is not possible to configure Db2 to use raw devices in a Microsoft Failover Clustering environment.

The Db2 resource

In a Microsoft Failover Clustering environment, a resource is an entity that is managed by the clustering software. For example, a disk, an IP address, or a generic service can be managed as a resource. Db2 integrates with Microsoft Failover Clustering by creating its own resource type called Db2 Server. Each Db2 Server resource manages a Db2 instance, and in a partitioned database environment, each Db2 Server resource manages a database partition. The name of the Db2 Server resource is the instance name, although in the case of a partitioned database environment, the name of the Db2 Server resource consists of both the instance name and the database partition (or node) number.

Pre-online and post-online scripts

You can run scripts both before and after a Db2 resource is brought online. These scripts are referred to as pre-online and post-online scripts. Pre-online and post-online scripts are .BAT files that can run Db2 and system commands.

In a situation when multiple instances of Db2 might be running on the same machine, you can use the pre-online and post-online scripts to adjust the configuration so that both instances can be started successfully. In the event of a failover, you can use the post-online script to perform manual database recovery. Post-online script can also be used to start any applications or services that depend on Db2.

The Db2 group

Related or dependent resources are organized into resource groups. All resources in a group move between cluster nodes as a unit. For example, in a typical Db2 single partition cluster environment, there is a Db2 group that contains the following resources:
  1. Db2 resource. The Db2 resource manages the Db2 instance (or node).
  2. IP Address resource. The IP Address resource allows client applications to connect to the Db2 server.
  3. Network Name resource. The Network Name resource allows client applications to connect to the Db2 server by using a name rather than using an IP address. The Network Name resource has a dependency on the IP Address resource. The Network Name resource is optional. (Configuring a Network Name resource can affect the failover performance.)
  4. One or more Physical Disk resources. Each Physical Disk resource manages a shared disk in the cluster.
Note: When you use MSCS, it is recommended that you use drive letters when you determine the path for Db2 databases and table space containers. If you are unable to use drive letters, or if the db2mscs utility is unable to obtain the drive letter of the disk resource, you must use INSTPROF_PATH in the db2mscs.cfg file to specify a path on MSCS disks.
Note: The Db2 resource is configured to depend on all other resources in the same group so the Db2 server can be started only after all other resources are online.

Failover configurations

Two types of configuration are available:
  • Active-passive
  • Mutual takeover

In a partitioned database environment, the clusters do not all need to have the same type of configuration. You can have some clusters that are set up to use active-passive, and others that are set up for mutual takeover. For example, if your Db2 instance consists of five workstations, you can have two machines set up to use a mutual takeover configuration, two to use a passive standby configuration, and one machine that is not configured for failover support.

Active-passive configuration

In an active-passive configuration, one machine in the Microsoft Failover Clustering cluster provides dedicated failover support, and the other machine participates in the database system. If the machine for the database system fails, the database server on it is started on the failover machine. If, in a partitioned database environment, you are running multiple logical nodes on a machine and it fails, the logical nodes are started on the failover machine. Figure 2 shows an example of an active-passive configuration.

Figure 2. Active-passive configuration
Active-passive configuration

Mutual takeover configuration

In a mutual takeover configuration, both workstations participate in the database system (that is, each machine has at least one database server that is running on it). If one of the workstations in the Microsoft Failover Clustering cluster fails, the database server on the failing machine is started to run on the other machine. In a mutual takeover configuration, a database server on one machine can fail independently of the database server on another machine. Any database server can be active on any machine at any given point in time. Figure 3 shows an example of a mutual takeover configuration.

Figure 3. Mutual takeover configuration
Mutual takeover configuration

Windows Server 2008 Failover Clustering support

To configure partitioned Db2 database systems to run on Windows Server 20081 failover clusters:
  1. Follow the same procedures as described in the white paper Implementing IBM®Db2 9.7 Enterprise Server edition with Microsoft Failover Clustering,2 which is available on the developerWorks® websitehere.
  2. Due to changes in the Failover Clustering feature of Windows Server 2008, the following additional setup might be required:
    • In Windows Server 2008 failover clusters, the Windows cluster service is run under a special Local System account, whereas in Windows Server 2003, the Windows cluster service is run under an administrator's account. This affects the operations of the Db2 resource (db2server.dll), which is run under the context of the cluster service account.

      If the DB2_EXTSECURITY registry variable is set to YES on a Windows failover cluster, the DB2ADMNS and DB2USERS groups must be domain groups.

      When a multiple partition instance is running on a Windows failover cluster, the INSTPROF path must be set to a network path (for example, \\NetName\DB2MSCS-DB2\DB2PROFS). This is done automatically if you use the db2mscs command to cluster the Db2 database system.

      When the Windows Server 2008 failover cluster is formed, a computer object that represents the new cluster is created in the Active Directory. For example, if the name of the cluster is MYCLUSTER, then a computer object MYCLUSTER is created in the Active Directory. If a user clusters a multiple partition instance and the DB2_EXTSECURITY registry variable is set to YES (the default setting), then this computer object must be added to the DB2ADMNS group. You must do this addition so that the Db2 resource DLL can access the \\NetName\DB2MSCS-DB2\DB2PROFS path. For example, if the Db2 Administrators group is MYDOMAIN\DB2ADMNS, the computer object MYCLUSTER must be added to this group. Lastly, after you add the computer object to the DB2ADMNS group, you must reboot both nodes in the cluster.

    • In Windows Server 2008 Failover Clustering, the cluster fileshare resource is no longer supported. The cluster file server is used instead. The file share (a regular file share) is based on the cluster file server resource. Microsoft requires that the cluster file servers created in the cluster use Domain Name System (DNS) for name resolution. When you are running multiple partition instances, a file server resource is required to support the file share. The values of the NETNAME_NAME, NETNAME_VALUE, and NETNAME_DEPENDENCY parameters that are specified in the db2mscs.cfg file are used to create the file server and file share resources. The NetName is based on an IP address, and this NetName must be in DNS. For example, if a db2mscs.cfg file contains the following parameters, a file share \\MSCSV\DB2MSCS-DB2 is created:
      ...
      NETNAME_NAME  = MSCSN
      NETNAME_VALUE = MSCSV
      ...
      The name MSCSV must be registered in DNS. Otherwise, the FileServer or the file share that is created for the Db2 cluster fails when DNS resolution is not successful.
1 The procedure that is described in this white paper is not materially different for Windows Server 2012
2 The procedure that is described in this white paper is not materially different for Db211.5