Configuring the backup-archive client in a cluster environment

The backup-archive client is designed to manage the backup of cluster drives by placing the backup-archive client within the context of the cluster's resource groups.

About this task

This gives the advantage of backing up data from local resources (as opposed to accessing the data across the network) to maximize the performance of the backup operation and to manage the backup data relative to the resource group. Therefore, the backup-archive client can always back up data on cluster resources as if the data were local data and maximize backup performance. This ensures that critical data is getting backed up across system failures.

For example, an active/active cluster environment has three physical hosts in the cluster named NodeA, NodeB, and NodeC.

The nodes have the following qualities:

NodeA owns the cluster resource with file systems /A1 and /A2
NodeB owns the cluster resources with file systems /B1 and /B2
NodeC owns the cluster resources with file systems /C1 and /C2

Note: NodeA might also have two non-clustered volumes, /fs1 and /fs2, that must be backed up.

For best backup performance, you might want all nodes in the cluster to perform the backups of the shared file systems that they own. When a node failover occurs, the backup tasks of the failed node shift to the node to which the failover occurred. For example, when NodeA fails over to NodeB, the backup of /A1 and /A2 moves to NodeB.

The following are prerequisites before configuring the backup-archive client to back up cluster and non-cluster volumes:

A separate backup-archive client scheduler process must be run for each resource group being protected. In normal conditions, each node would have two scheduler processes: one for the cluster resources, and one for the local file systems. After a failure, additional scheduler processes are started on a node in order to protect the resources that have moved over from another node.
The backup-archive client password files must be stored on cluster disks so that after a failure, the generated backup-archive client password is available to the takeover node.
The file systems to be protected as part of a resource group are defined using the backup-archive client domain option. The domain option is specified in the dsm.sys file, which should also be stored on a cluster disk so that it can be accessed by the takeover node.

Follow the steps below to configure the backup-archive client in a cluster environment.

Procedure

Register backup-archive client node definitions on the Tivoli® Storage Manager server. All nodes in the cluster must be defined on the Tivoli Storage Manager server. If you are defining multiple cluster resources in a cluster environment to failover independently, then unique node names must be defined per resource group. For the above sample three-way active/active cluster configuration, define three nodes (one per resource), as follows: (1) tsm: IBM>register node nodeA nodeApw domain=standard, (2) tsm: IBM>register node nodeB nodeBpw domain=standard, (3) tsm: IBM>register node nodeC nodeCpw domain=standard.
Configure the backup-archive client system-options file. Each node in the cluster must have separate server stanzas for each cluster resource group in order to be backed up in each respective dsm.sys file. You must ensure that the server stanzas are identical in the system option files on each node. Alternatively, you can place the dsm.sys file on a shared cluster location. The server stanzas defined to back up clustered volumes must have the following special characteristics:
- The nodename option must refer to the client node name registered on the Tivoli Storage Manager server. If the client node name is not defined, the node name defaults to the host name of the node, which might conflict with other node names used for the same client system.
  Important: Use the nodename option to explicitly define the client node.
- The tcpclientaddress option must refer to the service IP address of the cluster node.
- The passworddir option must refer to a directory on the shared volumes that are part of the cluster resource group.
- The errorlogname and schedlogname options must refer to files on the shared volumes that are part of the cluster resource group to maintain a single continuous log file.
- All include exclude statements must refer to files on the shared volumes that are part of the cluster resource group.
- If you use the inclexcl option, it must refer to a file path on the shared volumes that are part of the cluster group.
- The stanza names identified with the servername option must be identical on all systems.

Other backup-archive client options can be set as needed. In the following example, all three nodes, NodeA, NodeB, and NodeC, must have the following three server stanzas in their dsm.sys file:

Servername        server1_nodeA
nodename          NodeA
commmethod        tcpip
tcpport           1500
tcpserveraddress  server1.example.com
tcpclientaddres   nodeA.example.com
passwordaccess    generate
passworddir       /A1/tsm/pwd
managedservices   schedule
schedlogname      /A1/tsm/dsmsched.log
errorlogname      /A1/tsm/errorlog.log

Servername        server1_nodeB
nodename          NodeB
commmethod        tcpip
tcpport           1500
tcpserveraddress  server1.example.com
tcpclientaddres   nodeB.example.com
passwordaccess    generate
passworddir       /B1/tsm/pwd
managedservices   schedule
schedlogname      /B1/tsm/dsmsched.log
errorlogname      /B1/tsm/errorlog.log

Servername        server1_nodeC
nodename          NodeC
commmethod        tcpip
tcpport           1500
tcpserveraddress  server1.example.com
tcpclientaddres   nodeC.example.com
passwordaccess    generate
passworddir       /C1/tsm/pwd
managedservices   schedule
schedlogname      /C1/tsm/dsmsched.log
errorlogname      /C1/tsm/errorlog.log

Configure the backup-archive client user-options file. The options file (dsm.opt) must reside on the shared volumes in the cluster resource group. Define the DSM_CONFIG environment variable to refer to this file. Ensure that the dsm.opt file contains the following settings:
- The value of the servername option must be the server stanza in the dsm.sys file which defines parameters for backing up clustered volumes.
- Define the clustered file systems to be backed up with the domain option.
  Note: Ensure that you define the domain option in the dsm.opt file or specify the option in the schedule or on the backup-archive client command line. This is to restrict clustered operations to cluster resources and non-clustered operations to non-clustered resources.
In the example, nodes NodeA, NodeB, and NodeC set up their corresponding dsm.opt file and DSM_CONFIG environment variable as follows:
```
NodeA: 

1) Set up the /A1/tsm/dsm.opt file:

servername server1_nodeA
domain     /A1 /A2

2) Issue the following command or include it in your user profile:

export DSM_CONFIG=/A1/tsm/dsm.opt

NodeB:

1) Set up the /B1/tsm/dsm.opt file:

servername server1_nodeB
domain     /B1 /B2

2) Issue the following command or include it in your user profile:

export DSM_CONFIG=/B1/tsm/dsm.opt

NodeC:

1) Set up the /C1/tsm/dsm.opt file:

servername server1_nodeC
domain     /C1 /C2

2) Issue the following command or include it in your user profile:

export DSM_CONFIG=/C1/tsm/dsm.opt
```
Set up the schedule definitions for each cluster resource group. After the basic setup is completed, define the automated schedules to back up cluster resources to meet the backup requirements. The procedure illustrates the schedule setup by using the built-in Tivoli Storage Manager scheduler. If you are using a vendor-acquired scheduler, refer to the documentation provided by the scheduler vendor.
- Define a schedule in the policy domain where cluster nodes are defined. Ensure that the schedule's startup window is large enough to restart the schedule on the failover node in case of a failure and fallback event. This means that the schedule's duration must be set to longer than the time it takes to complete the backup of the cluster data for that node, under normal conditions.
  If the reconnection occurs within the start window for that event, the scheduled command is restarted. This scheduled incremental backup reexamines files sent to the server before the failover. The backup then "catches up" to where it stopped before the failover situation.
  In the following example, the clus_backup schedule is defined in the standard domain to start the backup at 12:30 A.M. every day with the duration set to two hours (which is the normal backup time for each node's data).
```
tsm: IBM>define schedule standard clus_backup action=incr
starttime=00:30 startdate=TODAY  Duration=2
```
- Associate the schedule with the all of the backup-archive client nodes defined to backup cluster resources, as follows: (1) tsm: IBM>define association standard clus_backup nodeA, (2) tsm: IBM>define association standard clus_backup nodeB, (3) tsm: IBM>define association standard clus_backup nodeC.
Set up the scheduler service for backup. On each client node, a scheduler service must be configured for each resource that the node is responsible for backing up, under normal conditions. The DSM_CONFIG environment variable for each resource scheduler service must be set to refer to the corresponding dsm.opt file for that resource. For the sample configuration, the following shell scripts must be created to allow dsmcad processes to be started, as needed, from any node in the cluster.
```
NodeA: /A1/tsm/startsched
#!/bin/ksh
export DSM_CONFIG=/A1/tsm/dsm.opt
dsmcad
NodeB: /B1/tsm/startsched
#!/bin/ksh
export DSM_CONFIG=/B1/tsm/dsm.opt
dsmcad
NodeC: /C1/tsm/startsched
#!/bin/ksh
export DSM_CONFIG=/C1/tsm/dsm.opt
dsmcad
```
Define the backup-archive client to the cluster application. To continue the backup of the failed resource after a failover condition, the Tivoli Storage Manager scheduler service (for each cluster client node) must be defined as a resource to the cluster application in order to participate in the failover processing. This is required in order to continue the backup of the failed resources from the node that takes over the resource. Failure to do so would result in the incomplete backup of the failed resource. The sample scripts in step 5 can be associated with the cluster resources to ensure that they are started on nodes in the cluster while the disk resources being protected move from one node to another. The actual steps required to set up the scheduler service as a cluster resource are specific to the cluster software. Refer to your cluster application documentation for additional information.
Ensure that the password for each node is generated and cached correctly in the location specified using the passworddir option. This can be validated by performing the following steps:
1. Validate that each node can connect to the Tivoli Storage Manager server without the password prompt. You can do this by running the backup-archive client command line interface and issuing the following command on each node:
```
#dsmc query session
```
  If you are prompted to submit your password, enter the password to run the command successfully and rerun the command. The second time, the command should run without the prompt for the password. If you get prompted for the password, check your configuration.
2. Validate that the other nodes in the cluster can start sessions to the Tivoli Storage Manager server for the failed-over node. This can be done by running the same commands, as described in the step above, on the backup nodes. For example, to validate if NodeB and NodeC can start a session as NodeA in the failover event without prompting for the password, perform the following commands on NodeB and NodeC
```
#export DSM_CONFIG=/A1/tsm/dsm.opt
#dsmc query session
```
  The prompt for the password might appear at this time, but this is unlikely. If you are prompted, the password was not stored in the shared location correctly. Check the passworddir option setting used for NodeA and follow the configuration steps again.
3. Ensure that the schedules are run correctly by each node. You can trigger a schedule by setting the schedule's start time to now. Remember to reset the start time after testing is complete.
```
tsm: IBM>update sched standard clus_backup starttime=now
```
4. Failover and fallback between nodeA and nodeB, while nodeA is in the middle of the backup and the schedule's start window, is still valid. Verify that the incremental backup continues to run and finish successfully after failover and fallback.
5. Issue the command below to cause a node's (nodeA) password to expire. Ensure that backup continues normally under normal cluster operations, as well as failover and fallback:
```
tsm: IBM>update node nodeA forcep=yes
```
Configure the backup-archive client to back up local resources.
1. Define client nodes on the Tivoli Storage Manager server. Local resources should never be backed up or archived using node names defined to back up cluster data. If local volumes that are not defined as cluster resources are backed up, separate node names (and separate client instances) must be used for both non-clustered and clustered volumes.
  In the following example, assume that only NodeA has local file systems /fs1 and /fs2 to be backed up. In order to manage the local resources, register a node NodeA_local on the Tivoli Storage Manager server: tsm: IBM>register node nodeA_local nodeA_localpw domain=standard.
2. Add a separate stanza in each node's system options file dsm.sys that must back up local resources with the following special characteristics:
  - The value of the tcpclientaddress option must be the local host name or IP address. This is the IP address used for primary traffic to and from the node.
  - If the client backs up and restores non-clustered volumes without being connected to the cluster, the value of the tcpclientaddress option must be the boot IP address. This is the IP address used to start the system (node) before it rejoins the cluster:
```
Example stanza for NodeA_local: 

Servername        server1_nodeA_local
nodename          nodeA_local
commmethod        tcpip
tcpport           1500
tcpserveraddress  server1.example.com
tcpclientaddres   nodeA_host.example.com
passwordaccess    generate
managedservices   schedule
```
3. Define the user options file dsm.opt in a path that is on a non-clustered resource.
  - The value of the servername option must be the server stanza in the dsm.sys file which defines parameters for backing up non-clustered volumes.
  - Use the domain option to define the non-clustered file systems to be backed up.
  Note: Ensure that you define the domain option in the dsm.opt file or specify the option in the schedule or on the backup-archive client command line, in order to restrict the backup-archive operations to non-clustered volumes.
  
  In the following example, nodeA uses the following /home/admin/dsm.opt file and sets up the DSM_CONFIG environment to refer to /home/admin/A1.dsm.opt.
  Contents of /home/admin/A1.dsm.opt
```
servername ibm_nodeA_local
domain     /fs1 /fs2


export DSM_CONFIG=/home/admin/A1.dsm.opt
```
4. Define and set up a schedule to perform the incremental backup for non-clustered file systems.
```
tsm: IBM>define schedule standard local_backup action=incr 
starttime=00:30 startdate=TODAY  Duration=2
```
  Associate the schedule with all of the backup-archive client nodes that are defined to backup non-clustered resources.
```
tsm: IBM>define association standard nodeA_local
```
Restore cluster file system data. All volumes in a cluster resource are backed up under the target node defined for that cluster resource. If you need to restore the data that resides on a cluster volume, it can be restored from the client node that owns the cluster resource at the time of the restore. The backup-archive client must use the same user options file (dsm.opt) that was used during the backup to restore the data. There are no additional setup requirements necessary to restore data on cluster volumes.
Restore local file system data. The non-clustered volumes are backed up under the separate node name setup for non-clustered operations. In order to restore this data, the backup-archive client must use the same user options file dsm.opt that was used during the backup. In the example, set environment variable DSM_CONFIG to refer to /home/admin/A1.dsm.opt prior to performing a client restore for the local node nodeA_local.