IBM®
Skip to main content
    Country/region [select]      Terms of use
 
 
    
     Home      Products      Services & industry solutions      Support & downloads      My IBM     
developerworks > Community >  Dashboard > Tivoli Storage Manager > ... > IBM Tivoli Storage Manager and Tivoli Storage Manager Extended Edition > Electronic vaulting using deduplicated remote copy storage pools
developerWorks
Log In   View a printable version of the current page.
Electronic vaulting using deduplicated remote copy storage pools
Added by byrnetucson, last edited by byrnetucson on Oct 06, 2011  (view change)
Labels: 
(None)

 Storage Manager

Home > IBM Tivoli Storage Manager and Tivoli Storage Manager Extended Edition > Electronic vaulting using deduplicated remote copy storage pools


The information in this article is provided and maintained by IBM.
Comments are welcome, but the information is not editable.


TIP: To export this document in PDF, sign in to the wiki and click the Info tab.

Electronic vaulting using deduplicated remote copy storage pools

This paper describes an electronic vaulting solution in which deduplicated data that is stored in Tivoli Storage Manager storage pools is replicated to a remote site. The server database is replicated to a remote standby server using the DB2 HADR function. No specific hardware is required for this solution.

Solution benefits

Many Tivoli Storage Manager customers use tape copy storage pools and send the copy storage pool tapes to a remote site for disaster recovery. To get the same disaster recovery protection but avoid the logistics and security exposure of physically moving tapes, customers are interested in an electronic vaulting solution. With electronic vaulting, data is stored in Tivoli Storage Manager storage pools and replicated to a remote site for disaster recovery and failover. Remote electronic vaulting can also potentially reduce recovery time objectives (RTO) and recovery point objectives (RPO) compared to the traditional tape-transport approach. Bandwidth consumption and storage requirements can be reduced by transferring data with deduplication, wherein data chunks are not sent if they already reside at the remote location.

From a Tivoli Storage Manager perspective, this solution is not a high availability clustering solution where a failover occurs almost imperceptibly to Tivoli Storage Manager users and administrators. The solution is intended as a site disaster recovery solution that promises less down time than PTAM (pickup truck access method).

Solution overview

This solution provides deduplicated replication of Tivoli Storage Manager data with no dependency on specific hardware. Following are key elements of the solution.

  • Use client-side deduplication to store data to the Tivoli Storage Manager server. This ensures that data is deduplicated at the time it is stored in a sequential-access primary storage pool, conserving network bandwidth for storage pool backup to the copy pool volumes. Using client-side deduplication also minimizes later data transfer for reclamation between the primary and secondary sites.
  • Back up the storage pool data to a deduplicated copy storage pool at a secondary site using synchronous writes to network-attached disk. Synchronous write ensures that data has been written to the copy storage pool volumes before the server database is updated to show that the data has been copied. Synchronous write prevents synchronization problems between the data itself and the metadata in the server database.
  • Use DB2 HADR to replicate the Tivoli Storage Manager database, giving improved RPO and RTO as compared to using database backup. DB2 HADR is a log shipping facility, so replication means that log changes on the primary DB2 database are continuously applied to a standby, "warm" DB2 database at the secondary site.

When a failure occurs at the primary site, recovery requires the following steps:

  1. Start the Tivoli Storage Manager server at the secondary site using the replicated DB2 database. Thus the secondary server is a warm, not hot, standby.
  2. Switch the clients to the Tivoli Storage Manager server at the secondary site either by Domain Name System (DNS) redirection or by updates to client option files. Requests from clients to access existing objects are handled by accessing them from the copy storage pool at the secondary site. The clients can also store new objects using the secondary server.
  3. To resume operations at the primary location after an extended outage it should be possible to prime the replacement primary database using the same steps that were originally used to establish the secondary site. After HADR synchronization of the Tivoli Storage Manager database it should be possible to fail back to the primary site server and restore primary storage pool content from the copy storage pools.

This solution uses a copy storage pool on shared network disk instead of using the replication functions of a storage system to replicate primary storage pools. The use of a copy storage pool on NAS disk has the following advantages:

  • The backup stgpool operation is asynchronous with client backups to a primary storage pool. Therefore client operations are not as heavily affected by network performance between the primary and secondary systems.
  • The backup stgpool operation validates primary storage pool object states. For example primary disk errors are not automatically replicated to the secondary site.

Note: Tivoli Storage Manager provides both client-side and server- side deduplication capabilities. HADR can be successfully used with deduplication configurations other than the one proposed in this paper. See the Data deduplication best practices for Tivoli Storage Manager V6.2 white paper for recommendations on determining an appropriate deduplication strategy for your environment.

Configuration and operations overview

Configure a pair of servers, one at the primary site and one at the secondary site. In this example configuration, host TSMAWP05 is at the primary site and host TSMAWP06 is at the secondary or standby site.

Item
Primary site
Secondary site
(standby)
Server name
TSMAWP05 TSMAWP06
Database instance name
tsminst1 tsminst1
Database (TSMDB1)
HADR primary node
HADR standby node
Server status
Running Not running
Primary storage pools
Active Not active
Copy storage pools
Network-attached file system
accessible to both servers
Network-attached file system
accessible to both servers

During normal operation the Tivoli Storage Manager server runs on host TSMAWP05. Changes made to the server database, TSMDB1, during normal server operation are replicated to a standby copy of the TSMDB1 database on host TSMAWP06 using the DB2 HADR feature. The deduplicated primary storage pool of the TSMAWP05 server is backed up both to a local, normal copy storage pool, and to a deduplicated copy storage pool on a NAS device.

It is important to have adequate bandwidth between the primary and secondary sites for both the copy storage pool data and the HADR database traffic. Using HADR can degrade performance of the database on your primary system. The degradation is highly dependent on the latency between the two sites, therefore very long distances between primary and secondary sites might not be practical.


 

When a failure occurs at the primary site, operations can fail over to the secondary site. The standby status of the TSMAWP06 server's TSMDB1 database is changed, and the server is started. Backup-archive clients can restore and back up data by connecting to the TSMAWP06 server at the secondary site. The server gets the client data from the deduplicated copy storage pool on the NAS device. Optionally, the primary storage pool can be restored to a disk on the secondary server.


 

Additional licensing requirements to use HADR

Licensing terms for the Tivoli Storage Manager server include the use of the Bundled DB2 Edition software.

  • To use HADR to replicate from one Tivoli Storage Manager server (with Bundled DB2) to another Tivoli Storage Manager server (with Bundled DB2), no additional DB2 licensing is required.
  • To use HADR to replicate from a Tivoli Storage Manager server (with Bundled DB2) to a non-bundled DB2 instance, you as the Licensee must purchase 100 Processor Value Units (PVUs) to obtain a license for DB2 Enterprise Edition (bundled). Contact your sales representative.

Recommendations for the shared copy storage pool destination

You need a disk-based network-attached storage file system that can be shared between the two nodes.
The main purpose of this shared file system is to write output volumes from the backup storage pool operation. The file system must be reliable from both nodes or restores will fail from the secondary site. Consider the following items when selecting a network file system solution:

  • Reliability and data integrity
    See the recommendations for disk subsystems in the Tivoli Storage Manager documentation.
    The Tivoli Storage Manager server uses the O_SYNC flag on writes to request that the file system actually writes to disk before returning success from the operation. Do not use file system options that defeat O_SYNC via caching. For example, an NFS client must be configured with option 'hard' and an NFS server must be configured with export option 'sync'. These options will result in slower performance, but consider the slower performance in light of the fact that less data is written over time because the storage pool is deduplicated.
  • Write speed
    Typically there will be more writes to this file system than reads because actual disaster recovery situations should be rare but backup storage pool operations should occur frequently. You also need to consider that you want the writes to have been completed before HADR has applied the corresponding database log entries on the secondary system. The philosophy is that in the case of a crash of the system, you are better off with missing database entries than missing copy storage pool data.
  • Cost
    When you consider cost, also consider the value of the data. Use high quality, NAS appliances that guarantee proper O_SYNC operation.

Configuring the systems

Configure the systems by completing the following steps:

  1. Installing the Tivoli Storage Manager servers
  2. Synchronizing time on the servers
  3. Priming the HADR database
  4. Configuring HADR
  5. Starting HADR
  6. Configuring storage pools and policy
  7. Configuring backup-archive clients

Installing the Tivoli Storage Manager servers

  1. Installing the Tivoli Storage Manager server on tsmawp05 (primary site)
    Use the procedures at Installing Tivoli Storage Manager using the installation wizardUse the procedures at Taking the first steps after you install Tivoli Storage Manager. From this topic, follow the link to the topic "Configuring Tivoli Storage Manager using the configuration wizard", which directs you to start the dsmicfgx program to configure the server instance. If you do not have a GUI available then use "Configuring the server instance manually".
  2. Installing the Tivoli Storage Manager server on tsmawp06 (secondary site)
    Use the procedures at Installing Tivoli Storage Manager using the installation wizardUse the procedures at Taking the first steps after you install Tivoli Storage ManagerFrom this topic, follow the link to the topic "Configuring Tivoli Storage Manager using the configuration wizard", which directs you to start the dsmicfgx program to configure the server instance. If you do not have a GUI available then use "Configuring the server instance manually".
    Important:
    • Create the instance user ID and all the directories (database, active log, and archive log) the same as you did on the primary server.
    • Do not start the Tivoli Storage Manager server.

Synchronizing time on the servers

The clocks for the primary and secondary systems must be synchronized using facilities such as Network Time Protocol (NTP).

Tivoli Storage Manager server operations can be affected by time changes. You must have the time zone for the secondary system set to the same as the primary system, before starting the Tivoli Storage Manager server on the secondary system during a failover operation.

The distance between the nodes will affect selection of HADR modes SYNC, NEARSYNC, and ASYNC and related recovery point objectives.

Priming the HADR database

You must initialize or prime the TSMDB1 database on the standby system so that subsequent log updates that occur on the primary TSMDB1 database can be applied to the standby TSMDB1 database. You do the priming using the DB2 backup db utility.

  1. Back up the database on tsmawp05
    tsm:server1> halt
    su - tsminst1
    db2 backup db tsmdb1 to /space/mx/hadrtest
    

    Do not start the server (do not issue the dsmserv command).

  2. Restore the tsmawp05 database to the server on tsmawp06
    Stop the Tivoli Storage Manager server if it is running.
    su - tsminst1
    db2 drop db tsmdb1
    db2 restore db tsmdb1 from /space/mx/hadrtest
    
    Do not start the server (do not issue the dsmserv command).

Configuring HADR

You must set the following DB2 database configuration parameters on each HADR node.

Parameter
Description
hadr_local_host Local host name
hadr_local_svc Local TCP/IP port to be assigned to HADR process
hadr_remote_host Remote host name that the peer HADR resides on
hadr_remote_inst Remote database instance that the peer TSMDB1 database resides in
hadr_remote_svc Remote port of the peer HADR process
hadr_syncmode How primary log writes are synchronized with standby
hadr_timeout Time HADR process waits before communication attempt with peer is considered as failed

For more details on these database configuration parameters see:
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?nav=/2_2_6_6

Determining available TCP port to be assigned for use by HADR on tsmawp05

Find an available port to be assigned to the HADR process. Use the available port as the value of hadr_local_svc and, for the peer, hadr_remote_svc. Issue the command:

cat /etc/services

Observe ports currently being used by tsminst1 DB2 instance:

  DB2_tsminst1      60000/tcp
  DB2_tsminst1_1    60001/tcp
  DB2_tsminst1_2    60002/tcp
  DB2_tsminst1_END  60003/tcp

For this scenario 60010 was chosen because it was not in use in /etc/services and was an easy number to remember.

Tip: Other applications on the system might not put entries in /etc/services, yet are configured to use the port that you select. For example, other instances of HADR might exist on the same system. The "netstat -an" command can show you which ports are currently in use. Some trials might be necessary to find a port number that is not in use on the system.

Determining available TCP port to be assigned for use by HADR on tsmawp06

Find an available port to be assigned to the HADR process. This will be the value of hadr_local_svc and, for the peer, hadr_remote_svc. Issue the command:

cat /etc/services

Observe ports currently being used by tsminst1 DB2 instance:

  DB2_tsminst1      60000/tcp
  DB2_tsminst1_1    60001/tcp
  DB2_tsminst1_2    60002/tcp
  DB2_tsminst1_END  60003/tcp

For this scenario 60010 was chosen because it was not in use in /etc/services, was an easy number to remember, and was consistent with the port selected for the primary server.

Tip: Other applications on the system might not put entries in /etc/services, yet are configured to use the port that you select. For example, other instances of HADR might exist on the same system. The "netstat -an" command can show you which ports are currently in use. Some trials might be necessary to find a port number that is not in use on the system.

Configuring HADR on the primary node, tsmawp05

Issue the following sequence of commands to configure HADR on the primary node. Ensure that you are running under the DB2 instance that TSMDB1 is contained in.

su - tsminst1

db2 update db cfg for tsmdb1 using hadr_local_host  tsmwawp05.storage.tucson.ibm.com
db2 update db cfg for tsmdb1 using hadr_local_svc   60010
db2 update db cfg for tsmdb1 using hadr_remote_host tsmawp06.storage.tucson.ibm.com
db2 update db cfg for tsmdb1 using hadr_remote_inst tsminst1
db2 update db cfg for tsmdb1 using hadr_remote_svc  60010
db2 update db cfg for tsmdb1 using hadr_syncmode    SYNC
db2 update db cfg for tsmdb1 using hadr_timeout     120

Configuring HADR on the standby node, tsmawp06

Issue the following sequence of commands to configure HADR on the standby node. Ensure that you are running under the DB2 instance that TSMDB1 is contained in.

su - tsminst1

db2 update db cfg for tsmdb1 using hadr_local_host  tsmawp06.storage.tucson.ibm.com
db2 update db cfg for tsmdb1 using hadr_local_svc   60010
db2 update db cfg for tsmdb1 using hadr_remote_host tsmawp05.storage.tucson.ibm.com
db2 update db cfg for tsmdb1 using hadr_remote_inst tsminst1
db2 update db cfg for tsmdb1 using hadr_remote_svc  60010
db2 update db cfg for tsmdb1 using hadr_syncmode    SYNC
db2 update db cfg for tsmdb1 using hadr_timeout     120

Starting HADR

  1. Start HADR on tsmawp06, the secondary server. Issue the following commands:
    db2 start hadr on db tsmdb1 as standby
    
    db2pd -hadr -db tsmdb1, observe Role Standby, State Disconnected
    
  2. Start HADR on tsmawp05, the primary server. Issue the following commands:
    db2 start hadr on db tsmdb1 as primary
    
    cd /home/tsminst1
    /opt/tivoli/tsm/server/bin/dsmserv -q &
    
    db2pd -hadr -db tsmdb1, observe Role Primary, State Peer, LogGapRunAvg not 0
    

If you receive message SQL1766W when starting HADR: The message SQL1766W might be issued when you run the start HADR commands because by default the Tivoli Storage Manager database is configured with the DB2 database parameter LOGINDEXBUILD set to OFF. This setting reduces log space required by the Tivoli Storage Manager server. However, it will likely cause a longer restart process (possibly hours) during failover to the standby system. For more information on the effects of setting LOGINDEXBUILD to ON see technote.

Configuring storage pools and policy

Configure the tsmawp05 storage pools and policy.

To support deduplication the copy storage pools must use a device class with device type FILE. The copy storage pool is on network-attached file (NAS) disk shared between the primary and secondary systems.

The Tivoli Storage Manager administrative commands that are issued to set up the device class, storage pools, and policy in this scenario are:

define devclass filesloc devtype=file format=drive maxcap=2g mountl=2 directory=/lspace/devclassfiles/tsminst1 shared=no

define devclass filesnet devtype=file format=drive maxcap=2g mountl=2
   directory=/space/mx/hadrtest/tsmawp05_tsmawp06/tsminst1/files_net shared=no

define stgpool dedup filesloc pooltype=primary maxscratch=999 dedup=yes

define stgpool dedupcopy filesnet pooltype=copy maxscratch=999 dedup=yes

update copygroup standard standard standard type=backup destination=dedup
activate policyset standard standard

Backing up the deduplicated storage pool

Back up the primary storage pool to the copy storage pool:

backup stgpool dedup dedupcopy wait=yes

Ensure that this operation occurs regularly by scheduling it. For example, use the DEFINE SCHEDULE command, or add it to a maintenance script that is created using the Tivoli Storage Manager Administration Center.

Configuring backup-archive clients

  1. Register nodes on the Tivoli Storage Manager server with the DEDUPLICATION parameter set to allow the client-side data deduplication. For example:
    register node somenode somepass deduplication=clientorserver
    
  2. Set required options in the backup-archive client dsm.opt file
    TCPSERVERADDRESS tsmawp05.storage.tucson.ibm.com
    
    TCPPORT 1500
    
    DEDUPL YES
    
    NODENAME	somenode
    
  3. Run a typical backup operation on the backup-archive client to verify the setup.
    C:\Program Files\Tivoli\TSM\baclient>dsmc inc c:\tsmdrivers\v620d062\drm*
    -tcpserv=tsmawp05.storage.tucson.ibm.com -node=somenode -pass=somepass -dedupl=yes
    

Backup-archive client reroute (virtual IP) definition
The Tivoli Storage Manager backup-archive client has no native mechanism to reroute to a different Tivoli Storage Manager server IP address. So the solution requires TSA or some other cluster mechanism that supports VIP. For now, for disaster recovery the client dsm.opt file must be manually updated to change the tcpserveraddress, or the -tcpserveraddr option must be specified.

Commands to use to fail over to the secondary site

  1. Shut down the Tivoli Storage Manager server on tsmawp05:
    tsm:server1> halt
    db2start
    db2 start hadr on db tsmdb1 as standby
    db2pd -hadr -db tsmdb1, observe Role Standby, State Catchup
    
  2. Take over TSMDB1 on tsmawp06:
    db2 takeover hadr on db tsmdb1 by force
    
    cd /home/tsminst1
    /opt/tivoli/tsm/server/bin/dsmserv -q &
    
    db2pd -hadr -db tsmdb1, observe Role Primary, State Peer
    

Configuration reports

You can get information from DB2 about the HADR configuration and status.

HADR configuration information

On the primary system, tsmawp05, issue the command:

 db2 get db cfg for tsmdb1 | grep HADR

Typical results:

$ db2 get db cfg for tsmdb1 | grep HADR
HADR database role = PRIMARY
HADR local host name (HADR_LOCAL_HOST) = tsmawp05.storage.tucson.ibm.com
HADR local service name (HADR_LOCAL_SVC) = 60010
HADR remote host name (HADR_REMOTE_HOST) = tsmawp06.storage.tucson.ibm.com
HADR remote service name (HADR_REMOTE_SVC) = 60010
HADR instance name of remote server (HADR_REMOTE_INST) = tsminst1
HADR timeout value (HADR_TIMEOUT) = 120
HADR log write synchronization mode (HADR_SYNCMODE) = SYNC
HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 0
$

On the secondary system, tsmawp06, issue the command:

db2 get db cfg for tsmdb1 | grep HADR

Typical results:

$ db2 get db cfg for tsmdb1 | grep HADR
HADR database role = STANDBY
HADR local host name (HADR_LOCAL_HOST) = tsmawp06.storage.tucson.ibm.com
HADR local service name (HADR_LOCAL_SVC) = 60010
HADR remote host name (HADR_REMOTE_HOST) = tsmawp05.storage.tucson.ibm.com
HADR remote service name (HADR_REMOTE_SVC) = 60010
HADR instance name of remote server (HADR_REMOTE_INST) = tsminst1
HADR timeout value (HADR_TIMEOUT) = 120
HADR log write synchronization mode (HADR_SYNCMODE) = SYNC
HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 0
$

HADR status information

On the primary system, tsmawp05, issue the command:

db2pd -hadr -db tsmdb1

Typical results:

$ db2pd -hadr -db tsmdb1

Database Partition 0 -- Database TSMDB1 -- Active -- Up 11 days 00:06:06

HADR Information:
Role State SyncMode HeartBeatsMissed LogGapRunAvg (bytes)
Primary Peer Sync 0 1147

ConnectStatus ConnectTime Timeout
Connected Thu Mar 25 17:08:58 2010 (1269551338) 120

LocalHost LocalService
tsmawp05.storage.tucson.ibm.com 60010

RemoteHost RemoteService RemoteInstance
tsmawp06.storage.tucson.ibm.com 60010 tsminst1

PrimaryFile PrimaryPg PrimaryLSN
S0000119.LOG 2311 0x0000000252E533C0

StandByFile StandByPg StandByLSN
S0000119.LOG 2311 0x0000000252E533C0
$

On the secondary system, tsmawp06, issue the command:

db2pd -hadr -db tsmdb1

Typical results:

$ db2pd -hadr -db tsmdb1

Database Partition 0 -- Database TSMDB1 -- Standby -- Up 11 days 00:02:58

HADR Information:
Role State SyncMode HeartBeatsMissed LogGapRunAvg (bytes)
Standby Peer Sync 0 36

ConnectStatus ConnectTime Timeout
Connected Thu Mar 25 14:08:58 2010 (1269551338) 120

LocalHost LocalService
tsmawp06.storage.tucson.ibm.com 60010

RemoteHost RemoteService RemoteInstance
tsmawp05.storage.tucson.ibm.com 60010 tsminst1

PrimaryFile PrimaryPg PrimaryLSN
S0000119.LOG 2311 0x0000000252E533C0

StandByFile StandByPg StandByLSN StandByRcvBufUsed
S0000119.LOG 2311 0x0000000252E533C0 0%
$

Instance directory listings

Note: The following examples show that the database and log directories are contained within the instance directory for Tivoli Storage Manager. This is for illustrative purposes only, and is not meant to imply that locating database and logs in the instance directory is a best practice. For best performance and reliability for a real server, allocate each database and log directory to its own disk or file system.

For the primary system, tsmawp05:

# pwd
/home/tsminst1
# ls -l
total 384
-rwxr----- 1 tsminst1 tsmsrvrs 423 Mar 10 12:11 .profile
-rw------- 1 tsminst1 tsmsrvrs 848 Mar 15 14:13 .sh_history
-rw------- 1 root system 151 Mar 10 17:20 TSM.PWD
drwxr-xr-x 3 tsminst1 tsmsrvrs 256 Mar 10 15:03 activelog
drwxr-xr-x 4 tsminst1 tsmsrvrs 256 Mar 10 15:03 archfaillog
drwxr-xr-x 3 tsminst1 tsmsrvrs 256 Mar 10 15:03 archlog
drwxr-xr-x 3 tsminst1 tsmsrvrs 256 Mar 10 15:02 dbpaths
-rw-r-r- 1 tsminst1 tsmsrvrs 184 Mar 10 16:50 devconfig.txt
-rw-r-r- 1 tsminst1 tsmsrvrs 27 Mar 10 15:02 dsmserv.dbid
-rw-r-r- 1 tsminst1 tsmsrvrs 302 Mar 25 14:03 dsmserv.opt
-rw-r-r- 1 tsminst1 tsmsrvrs 60 Mar 25 14:05 dsmserv.v6lock
-rw-r-r- 1 tsminst1 tsmsrvrs 189 Mar 25 14:05 logattr.chk
-rw-r-r- 1 tsminst1 tsmsrvrs 122 Mar 10 15:51 nodelock
-rw-r-r- 1 tsminst1 tsmsrvrs 4660 Mar 15 13:55 smit.log
drwxrwsr-t 21 tsminst1 tsmsrvrs 4096 Mar 11 15:36 sqllib
-rw-r-r- 1 tsminst1 tsmsrvrs 1470 Mar 23 17:18 tsmdbmgr.log
-rw-r-r- 1 tsminst1 tsmsrvrs 29 Mar 10 17:09 tsmdbmgr.opt
drwxrwxr-x 3 tsminst1 tsmsrvrs 256 Mar 10 15:01 tsminst1
-rw-r-r- 1 tsminst1 tsmsrvrs 138757 Apr 01 17:41 volhist.txt
#

For the secondary system, tsmawp06:

# pwd
/home/tsminst1
# ls -l
total 344
-rwxr----- 1 tsminst1 tsmsrvrs 423 Mar 23 14:08 .profile
-rw-r-r- 1 root system 151 Mar 23 15:08 TSM.PWD
drwxr-xr-x 2 tsminst1 tsmsrvrs 256 Mar 23 14:00 activelog
drwxr-xr-x 2 tsminst1 tsmsrvrs 256 Mar 23 14:01 archfaillog
drwxr-xr-x 2 tsminst1 tsmsrvrs 256 Mar 23 14:01 archlog
drwxr-xr-x 3 tsminst1 tsmsrvrs 256 Mar 23 17:25 dbpaths
-rw-r-r- 1 tsminst1 tsmsrvrs 27 Mar 25 11:19 dsmserv.dbid
-rw-rw-rw- 1 tsminst1 tsmsrvrs 303 Mar 25 14:07 dsmserv.opt
-rw-r-r- 1 tsminst1 tsmsrvrs 189 Mar 25 11:19 logattr.chk
-rw-r-r- 1 tsminst1 tsmsrvrs 369 Mar 25 11:31 nodelock
drwxrwsr-t 20 tsminst1 tsmsrvrs 4096 Mar 23 17:24 sqllib
-rw-r-r- 1 tsminst1 tsmsrvrs 315 Mar 23 15:04 tsmdbmgr.log
-rw-rw-rw- 1 tsminst1 tsmsrvrs 29 Mar 23 14:28 tsmdbmgr.opt
drwxrwxr-x 3 tsminst1 tsmsrvrs 256 Mar 23 17:24 tsminst1
-rw-r-r- 1 tsminst1 tsmsrvrs 135961 Mar 25 11:37 volhist.txt
#

The cat /etc/services file for tsmawp05 and tsmawp06

$cat /etc/services
.
.
DB2_tsminst1 60000/tcp
DB2_tsminst1_1 60001/tcp
DB2_tsminst1_2 60002/tcp
DB2_tsminst1_END 60003/tcp

Scenario for takeover on the secondary system

In this scenario, the primary system is halted, and the secondary system takes over. Later, the system that had the role of primary server is restored and resumes its role.

Commands on the primary system

(tsmawp05) TSM:SERVER1>halt
(tsmawp05) db2start
(tsmawp05) db2 start hadr on db tsmdb1 as standby

Commands on the secondary system

db2 takeover hadr on db tsmdb1 by force
cd /home/tsminst1
/opt/tivoli/tsm/server/bin/dsmserv -q &

Now try to restore a user's file.

C:\Program Files\Tivoli\TSM\baclient>dsmc res c:\tsmdrivers\v620d062\drm* c:\dump\xxx\
-tcpserv=tsmawp06.storage.tucson.ibm.com -node=somenode -pass=somepass -sub=yes

The restore operation failed because the primary volume, which was only on the failed primary system, could not be mounted. Issue the following UPDATE VOLUME command so that the server obtains the data from the deduplicated copy storage pool that has been shared.

Update vol /lspace/devclassfiles/tsminst1/00000195.BFS access=unavail
 
C:\Program Files\Tivoli\TSM\baclient>dsmc res c:\tsmdrivers\v620d062\drm* c:\dump\xxx\
-tcpserv=tsmawp06.storage.tucson.ibm.com -node=somenode -pass=somepass -sub=yes

Restore succeeded.

C:\Program Files\Tivoli\TSM\baclient>dsmc inc c:\tsmdrivers\v620d042\drm*
-tcpserv=tsmawp06.storage.tucson.ibm.com -node=somenode -pass=somepass -dedupl=yes

Backup succeeded.

C:\Program Files\Tivoli\TSM\baclient>dsmc res c:\tsmdrivers\v620d042\drm* c:\dump\xxx\
-tcpserv=tsmawp06.storage.tucson.ibm.com -node=somenode -pass=somepass -sub=yes

restore stgpool dedup
anr1238i files restored: 334 bytes restore: 10334633 ...
anr1341i scratch volume .. 195.BFS has been deleted from storage pool BACKUPPOOL

C:\Program Files\Tivoli\TSM\baclient>dsmc res c:\tsmdrivers\v620d042\drm\* c:\dump\xxx\
-tcpserv=tsmawp06.storage.tucson.ibm.com -node=somenode -pass=somepass -sub=yes

Restore succeeded.

Resuming normal operations at the primary site

At some point, maybe a couple of weeks after the disaster, the plan may be to resume operations at the primary site. The steps include the following.

  1. After reinstalling Tivoli Storage Manager on the replacement machine repeat the HADR initialization at the primary site including restoration of an offline backup from the secondary site that is currently acting as HADR primary.
  2. Use HADR commands to flip roles.

Alternatives for restoring the primary system's primary storage pools when returning to primary site are:

  • Copy secondary system's primary storage pool volumes over to primary system by using FTP or other methods. To do this you must have the same primary storage pool file paths.
  • Restore primary system storage pool from the copy storage pool.

Other Considerations

Setting DEDUPREQUIRESBACKUP NO

The Tivoli Storage Manager manuals recommend using the default YES at Protecting data in primary storage pools set up for data deduplication. In addition to the deduplicated copy storage pool, use a second, non-deduplicated copy storage pool that is local. Take the volumes for the local copy storage pool offsite to a vault, following traditional DRM type procedures. This will provide multiple tiers of protection. If the HADR based solution fails for whatever reason there is this next level backing it up.

Is Tivoli Storage Manager BACKUP DB still necessary?

Yes. The Tivoli Storage Manager BACKUP DB command is still required. Also still required is saving backup copies of the volume history and device configuration files. The server database cannot be restored without a volume history file and a device configuration file.

  • For day to day operations the Tivoli Storage Manager BACKUP DB TYPE=FULL provides relief for the Tivoli Storage Manager server's recovery log space.
  • HADR replicates whatever the primary server does. You need the ability to recover from an
    accidental DELETE FILESPACE * command, for example.
  • HADR itself might fail. It might be possible to get both the primary and secondary databases into
    states where neither of them can be started as the primary HADR node. Recovery from this situation
    requires restoring the primary database from the Tivoli Storage Manager database backup, by issuing offline commands db2 backup db and db2 restore db to prime the secondary, and then issuing the HADR commands to restart HADR.

Troubleshooting HADR Operations

If you are having problems getting HADR to start on the primary or standby system, the first place to look for detailed related error messages is in the db2diag.log file for the primary or standby server's DB2 instance.

Applying Tivoli Storage Manager Server maintenance

  • On primary server:
    1. Shut down the Tivoli Storage Manager server. This will deactivate the database and stop DB2.
    2. Upgrade Tivoli Storage Manager to the new fix pack level using COI installation.
    3. Start the Tivoli Storage Manager server.
    4. Issue the following command to verify that HADR has resumed:
      db2 -hadr -db tsmdb1
      

  • On standby server:
    1. Keep HADR enabled and running until the primary server is upgraded to the new level.
    2. Because the standby server does not have the Tivoli Storage Manager server running, you must deactivate and stop DB2 manually using DB2 commands:
      db2 deactivate db tsmdb1
      db2stop
      
    3. Upgrade Tivoli Storage Manager to the new fix pack level using COI installation
      db2start
      db2 activate db tsmdb1
      db2 -hadr -db tsmdb1
      

      The last command verifies that HADR operation has resumed.

Allocating additional disk to the database in a DB2 HADR environment

The new directory for the database must have the same name and same size on both servers.

  • On the primary server (tsmawp05)
    1. Shut down the Tivoli Storage Manager server. This deactivates the database and stops DB2.
    2. Reboot the system to pick up the new disk LUN.
    3. Configure the new file system on the newly added disk.
    4. Start the Tivoli Storage Manager server.
    5. Issue the following command to verify that HADR has resumed:
      db2 -hadr -db tsmdb1
      
  • On the standby server (tsmawp06)
    1. Keep HADR enabled and running until the primary server restarts again.
    2. Because the standby server does not have the Tivoli Storage Manager server running, you must deactivate and stop DB2 manually using DB2 commands. Issue the commands:
      db2 deactivate db tsmdb1
      db2stop
      
    3. Reboot the system to pick up the new disk LUN.
    4. Configure the new file system on the newly added disk.
    5. Issue the commands:
      db2start
      db2 activate db tsmdb1
      db2 -hadr -db tsmdb1
      

      The last command verifies that HADR operation has resumed.

  • On the primary server (tsmawp05)
    Extend the database space to the newly defined file system using the TSM server administrative command EXTEND DBSPACE.

Message ANR0227S occurs at failover to secondary node

Message ANR0227S states, "Incorrect database opened. Server cannot start." This error occurs because the database ID file is not in sync with the database. There are two ways to correct the error. Complete one of the following tasks:

  • Copy the dsmserv.dbid file from the Tivoli Storage Manager instance directory for the primary HADR node, to the instance directory for the secondary HADR node.
  • Delete or rename the dsmserv.dbid file that is located in the Tivoli Storage Manager instance directory for the secondary HADR node. Then start the server in the foreground with the -S option:
     dsmserv -S
    

    After the server is started successfully, it can be halted and started normally.

Reference materials

Data Deduplication Best Practices for Tivoli Storage Manager V6.2
http://www.ibm.com/developerworks/wikis/display/tivolistoragemanager/Data+deduplication+best+practices+for+Tivoli+Storage+Manager+V6.2

Tivoli Storage Manager Version 6.2 Information Center
http://publib.boulder.ibm.com/infocenter/tsminfo/v6r2/index.jsp

DB2 Version 9.7 for Linux, UNIX, and Windows Information Center
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp

High Availability and Disaster Recovery Options for DB2 on Linux, UNIX, and Windows Redbook
http://www.redbooks.ibm.com/abstracts/sg247363.html

Performing a HADR failover operation
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.admin.ha.doc/doc/t0011835.html

An Expert's Guide to DB2 Technology BLOG: How HADR behaves in a real failure scenario
http://it.toolbox.com/blogs/db2luw/how-hadr-behaves-in-a-real-failure-scenario-5151

DB2 HADR Update, Dale Mcinnis, Session Number 2297a, IBM Information on Demand 2009 Conference, 10/25/2009, "DB2 HADR 2009 IOD.pdf"

DB2 HADR Best Practices
http://www.ibm.com/developerworks/data/bestpractices/hadr/

DB2 HADR Simulator
http://www.ibm.com/developerworks/wikis/display/data/HADR_sim

Related Materials

IBM Data Deduplication Strategy and Operations white paper
http://www.ibm.com/developerworks/wikis/download/attachments/116981890/IBM+Data+Deduplication.pdf?version=1

Contributors to this Paper

Don Moxley, Development - Author
Randy Larson, Advanced Technical Support - Author
Diem Nguyen - CET Test
Clare Byrne, Holly King - Information Development
Robert Elder - Performance

Hello,
Thank you very much for publishing this nice document on TSM Vaulting and TSM DB replication.

I am testing TSM HADR fail-over . I stopped HADR on primary and used following command to take over standby . It worked successfully.

db2 takeover hadr on db tsmdb1 by force

Now I want to convert this back to standby but it fails

db2 start hadr on db tsmdb1 as standby
SQL1767N Start HADR cannot complete. Reason code="1".

Every time when I restore primary database to DR Server then I can start it as standby with NO issues.

Is there some thing missing in my steps ? I don't want to restore primary database to secondary server after every fail-over test. I would like to know the proper step to take over standby and again covert it to back to standby using HADR commands.

TSM version is 6.2.1 . I even tried upgrading DB2 to Fix Pack 7 but problem remains.

Thanks,
TSM_Man

Posted by TSM_man at Dec 01, 2010 23:07 | Permalink

It looks a bit too clumsy to me.

Posted by Mita201 at Dec 06, 2010 06:27 | Permalink

> Now I want to convert this back to standby but it fails

Hello TSM_Man, for this simple db failback test where you are trying to return to the 'primary' system did you do the following:

secondary system:

halt the tsm server
db2start
db2 start hadr on db tsmdb1 as standby
db2pd -hadr -db tsmdb1 to verify that this side has achieved role standby state localcatchup

primary system:

db2 takeover hadr on db tsmdb1 by force
start the tsm server

It is the flip of the original failover.
If all goes well you should note hadr role standby state peer on the standby system after the tsm server has started on the primary.

Posted by moxley2 at Dec 10, 2010 11:11 | Permalink

Hello,

with the halt command TSM und DB2 are stopped which is not really what you want, you only want TSM to stop but DB2 to continue running so you can do a takeover faster without issuing a db2start first.
On the TSM Symp. I talked to one of the developers and addressed this issue, he said there is a way to stop only the TSM Server process and leave DB2 running using a "special" undocumented halt command. (something with force)
They actually use this method in development where they have multiple servers running against the same database and don't want to shutdown the db if a tsm server is halted.
Do you know what this command is? (the developer could not remember it back then out the top of his head) It would make the HADR setup much better to work with.

Posted by OttoSchakenbos at Mar 15, 2012 08:59 | Permalink

    About IBM Privacy Contact