DB2 best practices: DB2 instance recovery for IBM Smart Analytics System

Recovering from data loss or corruption is achieved by restoring from a database backup. But, what if your instance configuration has been compromised, or what if you use database partitioning? How do you recover the DB2® instance environment efficiently and effectively without having to restore the entire server? This article aims to answer these questions by showing you how to implement a DB2 instance backup and recovery strategy for a partitioned database environment. In particular, it focuses on the IBM® Smart Analytics System environment with configurations based on System x® and Power Systems™ servers.

Ilker Ender (iender@ca.ibm.com), DB2 Data Warehouse Qualtiy Assurance, IBM

Ilker Ender photoIlker Ender is a Certified Advanced Database Administrator for DB2 V9 and a Certified DB2 UDB Problem Determination Expert. He has over six years of experience with the DB2 Quality Assurance Team in the IBM Toronto Software Lab. Ilker is the backup and restore focal point and high availability disaster recovery (HADR) expert for the DB2 QA Team for DB2 Data Warehouse Edition.



Garrett Fitzsimons (fitzgarr@ie.ibm.com), Smart Analytics System Best Practices, IBM

Garrett Fitzsimons photoGarrett Fitzsimons is a Best Practices Specialist based in the IBM Dublin lab. He has worked with traditional database and data warehouse applications since 1990 on a variety of platforms in several countries. Garrett has worked on a number of large scale implementations of ERP and Data warehousing systems using IBM DB2, Oracle, and Microsoft technologies. He is currently working on providing best practice guidelines for the IBM Smart Analytics System.



21 October 2010

Also available in Chinese Russian

Introduction

This article describes how to back up your files and environment settings so that in the event of a DB2 instance becoming compromised, you are able to restore and reference every aspect of the configuration that supports the instance. The article is part of a series of best practices papers that covers backup and recovery in an IBM Smart Analytics System environment. Refer to the Resources section for a link to the series of best practices papers.

The sample environment that this article is based on is an IBM Smart Analytics System 5600 with one administration node and two data nodes. The Downloads section provides links to example scripts that contain the commands needed to back up the files, environment, and configuration settings as recommended in this article. Edit the examples as necessary so that they are compatible with your environment. Always test the commands in a non-production environment.

A DB2 instance may become unavailable for a number of reasons: file permissions mistakenly changed, files deleted, or file systems being unavailable. In order to be ready to react quickly and efficiently to any of these type scenarios, it is recommended that you put in place a plan for instance recovery. Your plan should identify the files and settings in your environment that relate to the DB2 instance, and define a backup strategy that will provide the files needed to recover from any type of instance failure that may arise.

The relationship between an instance and a database is not absolute. Dropping an instance that has been compromised does not drop the associated database. The data is not lost. You can drop and recreate an instance and catalog the associated databases, as long as you are able to recover the instance configuration and environment correctly.

By implementing a backup and recovery strategy for a DB2 instance in a partitioned database environment, you reduce the potential for downtime by avoiding the need to recover an entire server in order to recover an instance configuration.

Figure 1 illustrates the DB2 architecture in a partitioned database environment. The DB2 software executes on each node and supports the database partitions on that node. The partition group labeled IBMTEMPGROUP spans across the administration node and the two data nodes. The SDPG and IBMCATGROUP partition groups are in the administration node. The PDGP partition group spans the data nodes, which is where the table spaces are placed.

Figure 1. DB2 instance recovery architecture
1 admin node and 2 data nodes. Partition group IBMTEMPGROUP spans all. PDGP spans the data nodes. SDPG and IBMCATGROUP are in the admin node.

Instance configuration

An IBM Smart Analytics System implements database partitioning to provide large-scale parallelism and linear scalability. In this environment, it is the database (not the instance) that is partitioned. The instance is created once on the administration node and the db2home directory is then shared out to the participating nodes (servers) on which the DB2 software is running. For the partitioned database to function properly, you must configure the operating system environment, including users and groups, in the same manner on each data node. The partitioned database is managed through the administration node, which coordinates access to and manages the structure of the database.

An IBM Smart Analytics System has multiple database partitions per data node. Each database partition is allocated processor, memory, and I/O resources, and maintains separate database containers, logs, indexes, and its own database configuration manager. Queries are submitted to the coordinator function, which retrieves the data needed to satisfy the query.

The test cluster described in this article is comprised of three nodes: beluga-bvn-05 is the administration node, and beluga-bvn-06 and beluga-bvn-07 are the data nodes. The administration node has a single database partition, and each data node has four database partitions. So the cluster contains a total of nine database partitions.

The instance is created on all nodes in the cluster and the database is then created across all the database partitions defined in the db2nodes.cfg file. Listing 1 shows the sample db2nodes.cfg file in the db2home/sqllib directory. It depicts the configuration of the test cluster.

Listing 1. Sample db2nodes.cfg
0  beluga-bvn-05 0
1  beluga-bvn-06 0
2  beluga-bvn-06 1
3  beluga-bvn-06 2
4  beluga-bvn-06 3
5  beluga-bvn-07 0
6  beluga-bvn-07 1
7  beluga-bvn-07 2
8  beluga-bvn-07 3

In an IBM Smart Analytics System environment, the DB2 instance home directory is the /shared_db2home directory on the administration node. This file system is shared out to the other nodes, through GPFS or NFS. The other nodes mount the file system locally as /db2home. Because the /db2home file system for each node is a mount point to the administration node, the backup of the directory needs to take place only on the administration node.


Instance backup strategy

The backup strategy for your DB2 environment should cover the operating system, the DB2 instance, and the DB2 database. In this way, any recovery scenario can be addressed by a specific backup. Targeted recoveries are efficient in that they minimize downtime.

Incorporate the instance backup process into your overall backup schedule. Back up the instance configuration before and after changes are made to the configuration of the environment. To back up an instance, record all details related to the following:

  • Versions of software
  • Licenses
  • Users and groups
  • Files that contain configuration settings
  • The instance directories

The files, settings, and variables listed in this section represent a complete list of those you need to back up. The files are small in size and contain valuable information you can use after a recovery to reconcile and validate that everything is as it should be. In addition, the details contained in the backups may be requested by IBM support if you require further assistance.

Each of the four tables in this section describes one of the four categories of files and configuration settings that you need to back up. Tables 1 and 2 represent data you need to back up on each node. Tables 3 and 4 represent data you need to back up on the administration node. You can execute both backups as the DB2 instance owner. The backups are represented in the two scripts in the Downloads section.

Table 1. Back up operating system files
Back up these files on the operating system of each node that contains information referenced by the DB2 software. In the event of an instance failure, you may be able to avoid an operating system restore by re-applying the configuration of these files.
File nameDescription
/etc/servicesContains DB2 FCM network settings
/etc/exportsContains details on exported file systems
/etc/passwdUser information
/etc/hostsHost name and IP address of other nodes in cluster.
/etc/groupGroup information
/opt/tivoli/tsm/client/api/bin64/dsm.optBackup storage manager setup (TSM was used in test)
/opt/tivoli/tsm/client/api/bin64/dsm.sysIf TSM is integrated with DB2 log archiving and backup, copy the configuration.
Table 2. Capture operating system information
Capture the output from these commands to take a snapshot of the operating system environment on each node at the time of the backup. You can use this information to verify the environment after you restore.
CommandDescription
db2level > db2level.outDB2 product level
oslevel > oslevel.outOperating System level
df > df_g.outDisk space status
mount > mount.outMount status
ulimit -a > ulimit.outUlimit settings
crontab -l > crontab.outCrontab schedule
id $Instance > id.outInstance ID's in use
~/sqllib/java/jdk64/jre/bin/java -version > jdk.outJava JDK version
~/sqllib/java/jdk64/jre/bin/java com.ibm.db2.jcc.DB2Jcc -version > jcc.outJCC version
Table 3. Create a tar file of the DB2 home directory
When you restore an instance, you need this directory on the administration node and the files contained in it. The directory also contains backup and other scripts specific to your environment. Using the tar command ensures that links are maintained.
CommandDescription
tar -cvf 2010-07-29.beluga-bvn-05.tar $HOME/*Creates a tar file of the DB2HOME/directory on the administration node
Table 4. Output and save DB2 Manager and related configuration
Execute these commands on the administration node in order to get a snapshot of the instance configuration.
CommandDescription
db2 get admin cfg > admin.cfgGet db2 administration configuration
db2set -all > db2setList DB2 environment registry variable settings
db2licm -l > db2licm.license.informationGet DB2 license information
db2cfexp db2cfexp.bak backupGet DB2 config export data
db2 list node directory > db2.list.node.directoryGet list of node directories
db2 list database directory > db2.list.database.directoryGet list of databases
db2 list dcs directory > db2.list.dcs.directoryList DCS directory
cp /sqllib/.rhosts .rhosts.baktrusted hosts on Linux platforms.
db2 get dbm cfg > dbm.cfgGet database manager configuration
$HOME/.profileCopy instance owner profile
$HOME/sqllib/db2nodes.cfg Copy database node and partitions configuration file

Instance Failure Scenarios

The most likely causes of instance failure are human error, service failure, or file corruption. The reasons for failure may vary from a changed environment variable to a modified entry in an operating system configuration file that is referenced by DB2 software. Recovery scenarios can be categorized by whether the cause of the problem is known or unknown. If the cause of the problem is known, then the correction of the file or configuration setting will correct the problem. If the cause is unknown, then you should perform a full recovery of the instance. In a production environment, speed of recovery rather than speed of discovery takes precedent.

Understand what recovery scenarios are likely to occur in your environment and then put in place a backup strategy that can be used to recover the instance regardless of the failure scenario. Following are descriptions of four typical failure scenarios you should have a recovery plan in place for:

  • The db2home directory on the administration node is removed or the instance is inadvertently dropped. The db2home directory on the administration node is shared by all other nodes in the partitioned database. If the contents of the home directory are compromised or removed, the instance will be unavailable.
  • The instance configuration becomes corrupt or compromised. An instance may become unavailable due to configuration or DB2 binary files being compromised, corrupted, or deleted. Examples of such files include .rhosts, db2nodes.cfg, and the global registry. The inadvertent modification of environment variables can also cause this type of scenario.
  • The shared db2home file system on the data, standby, user, or InfoSphere Warehouse application node is removed. The instance will become unavailable on the node in which the directory has been removed.
  • Permissions on the instance directory are inadvertently changed. DB2 may become unavailable if file or directory permissions are changed on the db2home directory.

Instance recovery procedure

This section describes the procedures to verify the configuration of your operating system, and to drop and re-create an instance. The backup files referred to are the ones that you create using the sample backup scripts provided in the Downloads section. When verifying settings, ensure that you compare the settings on the failed node with the backup files for that node. You need root access to edit operating system files and settings. Use the DB2 instance owner for all DB2 related tasks.

  1. Verify the configuration of the operating system on the affected node:
    1. Verify that the instance user and fenced user exist in /etc/passwd.
    2. Verify that the related DB2 group account exists in /etc/group.
    3. Compare /etc/services and /etc/hosts against the respective backup copies and correct or restore as necessary.
    4. List the related software packages on the system and compare them against what is shown in the backup file. Isolate and determine the reason for any changes.
    5. Compare and verify user limits (ulimit), mount points (/etc/fstab), operating system level (oslevel), and crontab entries. Determine the reason for any changes.
  2. Drop the corrupt instance on the affected node:
    1. Use the db2ilist command to determine if the instance still exists in the instance list. If the instance is intact, check for mounts and network issues. Use db2iupdt to correct permission issues.
    2. Drop the partially corrupted instance with the command: db2idrop <instance>
    3. Use the command db2iset to remove the related profile information and configuration parameters. For example: db2iset -d bcuinst2
  3. Create and configure the instance on the affected node:
    1. Create the instance on the administration node as user root using the db2icrt command. Issue the command from the product directory where the DB2 software is installed. On the test cluster for this article, the directory was /opt/IBM/dwe/db2/V9.7/instance. The command issued was:
      db2icrt -u bcufenc2 bcuinst2
      where bcufenc2 and bcuinst2 are the instance owner user ID's.
    2. Compare, and if necessary restore, the database manager configuration parameters to those in the dbm.cfg.out backup file.
    3. Compare, and if necessary replace, the $DB2HOME/sqllib/db2nodes.cfg file with the backup version.
    4. Compare, and if necessary re-apply, the registry variables with those in the db2set.out backup file. For example, you may have to set the DB2COMM variable to tcpip as follows:
      db2set DB2COMM=tcpip
    5. Compare, and if necessary replace, the $DB2HOME/sqllib/userprofile file using the backed up version.
  4. Add license files, catalog the database on the affected node, and start DB2:
    1. Restore the db2ese.lic file from the backup set of files and issue the command:
      db2licm -a db2ese.lic
      If the license file is not found, the db2start command returns an SQL8000N error.
    2. Catalog the nodes and database directory according to what is shown in the db2.list.database.directory and db2.list.node.directory files in the backup set. For example, to do this on the test cluster you would issue these commands:
      catalog tcpip node beluga-bvn-05 remote beluga-bvn-05 server 50000
      catalog database BCUDB as BCUDB at node beluga-bvn-05
    3. Start DB2. If you are using the DB2 High Availability (HA) feature, refer to the shutdown and startup procedures for your environment.

Recovery from known failures

As described in this section, if you know the cause of an instance failure, a complete recovery may not be necessary. You can fix network and mount issues as well as permission problems with specific recovery steps.

  • The instance owner home directory on a non-administration node has been removed.

    The instance home directory on nodes that are not the administration node are mounted file systems. Try these steps to recover from this type scenario:

    1. Mount the file system on the affected node using the mount command:
      mount /db2home
    2. Start the database manager as the instance owner. For example:
      db2start

      Note: In a high availability environment the command may be different. Consult your user guide.

  • Permissions on instance directory have been inadvertently changed.

    A chmod or cp command issued within the db2home directory, in particular in the sqllib directory, can cause the instance to become unavailable because the DB2 software might fail when attempting to read, write, or execute a file. For example, on the test cluster for this article, a chmod command was issued on the administration node. This caused the db2start command to fail due to file permission errors. Listing 2 shows the failed startup.

    Listing 2. Changing permissions can cause failure
    bcuinst2@beluga-bvn-05:~>cd ~/sqllib
    bcuinst2@beluga-bvn-05:~>chmod -R 444 *
    bcuinst2@beluga-bvn-05:~> db2start
    -bash: /db2home/bcuinst2/sqllib/adm/db2start: Permission denied
    SQL6031N  Error in the db2nodes.cfg file at line number "<line>". 
    Reason code "<reason-code>".
    Explanation: 
    The statement cannot be processed because of a problem with the
    db2nodes.cfg file, as indicated by the following reason codes:
    1 Cannot access the sqllib directory of the instance.

    In this type scenario, follow these steps to re-apply ownership and permission values:

    1. Each file in ~/sqllib should be owned by the instance user ID and db2 instance group. This can be achieved by issuing the following command on the administration node:
      chown db2inst2:db2igrp /db2home/bcuinst2/sqllib/*
      where the instance and group accounts are bcuinst2 and db2igrp.
    2. Issue the db2iupt command to update the db2home directory and replace the common files based on the installed DB2 software. To update an instance with db2iupdt, you must first stop all processes that are running for the instance. The db2iupt command replaces files under the ~/sqllib directory. Listing 3 shows how this step was achieved on the test cluster.
      Listing 3. Using db2iupdt to update an instance based on the installed DB2 software.
      beluga-bvn-05:/opt/IBM/dwe/db2/V9.7/instance # ./db2iupdt bcuinst2
      DBI1070I  Program db2iupdt completed successfully.
      
      bcuinst2@beluga-bvn-05:~> db2start
      08/10/2010 15:09:16     1   0   SQL1063N  DB2START processing was successful.
      08/10/2010 15:09:17     5   0   SQL1063N  DB2START processing was successful.
      08/10/2010 15:09:17     2   0   SQL1063N  DB2START processing was successful.
      08/10/2010 15:09:17     3   0   SQL1063N  DB2START processing was successful.
      08/10/2010 15:09:17     7   0   SQL1063N  DB2START processing was successful.
      08/10/2010 15:09:17     6   0   SQL1063N  DB2START processing was successful.
      08/10/2010 15:09:18     4   0   SQL1063N  DB2START processing was successful.
      08/10/2010 15:09:18     8   0   SQL1063N  DB2START processing was successful.
      08/10/2010 15:09:30     0   0   SQL1063N  DB2START processing was successful.
      SQL1063N  DB2START processing was successful.

Conclusion

You should implement a DB2 instance backup and recovery strategy so that you are prepared to quickly and efficiently recover from scenarios where the instance has become compromised, and as a result is unavailable. By putting together a backup and recovery strategy for an instance, you can potentially avoid having to do a complete operating system restore when faced with a compromised instance. This can significantly reduce the amount of downtime you experience in such circumstances.


Downloads

DescriptionNameSize
DB2 Instance Backup Scriptdb2_instance_backup.zip10KB
DB2 Instance Backup OS Scriptdb2_instance_backup_os.zip10KB

Resources

Learn

Get products and technologies

  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=555190
ArticleTitle=DB2 best practices: DB2 instance recovery for IBM Smart Analytics System
publish-date=10212010