This article describes how to back up your files and environment settings so that in the event of a DB2 instance becoming compromised, you are able to restore and reference every aspect of the configuration that supports the instance. The article is part of a series of best practices papers that covers backup and recovery in an IBM Smart Analytics System environment. Refer to the Resources section for a link to the series of best practices papers.
The sample environment that this article is based on is an IBM Smart Analytics System 5600 with one administration node and two data nodes. The Downloads section provides links to example scripts that contain the commands needed to back up the files, environment, and configuration settings as recommended in this article. Edit the examples as necessary so that they are compatible with your environment. Always test the commands in a non-production environment.
A DB2 instance may become unavailable for a number of reasons: file permissions mistakenly changed, files deleted, or file systems being unavailable. In order to be ready to react quickly and efficiently to any of these type scenarios, it is recommended that you put in place a plan for instance recovery. Your plan should identify the files and settings in your environment that relate to the DB2 instance, and define a backup strategy that will provide the files needed to recover from any type of instance failure that may arise.
The relationship between an instance and a database is not absolute. Dropping an instance that has been compromised does not drop the associated database. The data is not lost. You can drop and recreate an instance and catalog the associated databases, as long as you are able to recover the instance configuration and environment correctly.
By implementing a backup and recovery strategy for a DB2 instance in a partitioned database environment, you reduce the potential for downtime by avoiding the need to recover an entire server in order to recover an instance configuration.
Figure 1 illustrates the DB2 architecture in a partitioned database environment. The DB2 software executes on each node and supports the database partitions on that node. The partition group labeled IBMTEMPGROUP spans across the administration node and the two data nodes. The SDPG and IBMCATGROUP partition groups are in the administration node. The PDGP partition group spans the data nodes, which is where the table spaces are placed.
Figure 1. DB2 instance recovery architecture
An IBM Smart Analytics System implements database partitioning to provide large-scale parallelism and linear scalability. In this environment, it is the database (not the instance) that is partitioned. The instance is created once on the administration node and the db2home directory is then shared out to the participating nodes (servers) on which the DB2 software is running. For the partitioned database to function properly, you must configure the operating system environment, including users and groups, in the same manner on each data node. The partitioned database is managed through the administration node, which coordinates access to and manages the structure of the database.
An IBM Smart Analytics System has multiple database partitions per data node. Each database partition is allocated processor, memory, and I/O resources, and maintains separate database containers, logs, indexes, and its own database configuration manager. Queries are submitted to the coordinator function, which retrieves the data needed to satisfy the query.
The test cluster described in this article is comprised of three nodes: beluga-bvn-05 is the administration node, and beluga-bvn-06 and beluga-bvn-07 are the data nodes. The administration node has a single database partition, and each data node has four database partitions. So the cluster contains a total of nine database partitions.
The instance is created on all nodes in the cluster and the database is then created across all the database partitions defined in the db2nodes.cfg file. Listing 1 shows the sample db2nodes.cfg file in the db2home/sqllib directory. It depicts the configuration of the test cluster.
Listing 1. Sample db2nodes.cfg
0 beluga-bvn-05 0 1 beluga-bvn-06 0 2 beluga-bvn-06 1 3 beluga-bvn-06 2 4 beluga-bvn-06 3 5 beluga-bvn-07 0 6 beluga-bvn-07 1 7 beluga-bvn-07 2 8 beluga-bvn-07 3
In an IBM Smart Analytics System environment, the DB2 instance home directory is the /shared_db2home directory on the administration node. This file system is shared out to the other nodes, through GPFS or NFS. The other nodes mount the file system locally as /db2home. Because the /db2home file system for each node is a mount point to the administration node, the backup of the directory needs to take place only on the administration node.
Instance backup strategy
The backup strategy for your DB2 environment should cover the operating system, the DB2 instance, and the DB2 database. In this way, any recovery scenario can be addressed by a specific backup. Targeted recoveries are efficient in that they minimize downtime.
Incorporate the instance backup process into your overall backup schedule. Back up the instance configuration before and after changes are made to the configuration of the environment. To back up an instance, record all details related to the following:
- Versions of software
- Users and groups
- Files that contain configuration settings
- The instance directories
The files, settings, and variables listed in this section represent a complete list of those you need to back up. The files are small in size and contain valuable information you can use after a recovery to reconcile and validate that everything is as it should be. In addition, the details contained in the backups may be requested by IBM support if you require further assistance.
Each of the four tables in this section describes one of the four categories of files and configuration settings that you need to back up. Tables 1 and 2 represent data you need to back up on each node. Tables 3 and 4 represent data you need to back up on the administration node. You can execute both backups as the DB2 instance owner. The backups are represented in the two scripts in the Downloads section.
Table 1. Back up operating system files
|/etc/services||Contains DB2 FCM network settings|
|/etc/exports||Contains details on exported file systems|
|/etc/hosts||Host name and IP address of other nodes in cluster.|
|/opt/tivoli/tsm/client/api/bin64/dsm.opt||Backup storage manager setup (TSM was used in test)|
|/opt/tivoli/tsm/client/api/bin64/dsm.sys||If TSM is integrated with DB2 log archiving and backup, copy the configuration.|
Table 2. Capture operating system information
|DB2 product level|
|Operating System level|
|Disk space status|
|Instance ID's in use|
|Java JDK version|
Table 3. Create a tar file of the DB2 home directory
|Creates a tar file of the DB2HOME/directory on the administration node|
Table 4. Output and save DB2 Manager and related configuration
|Get db2 administration configuration|
|List DB2 environment registry variable settings|
|Get DB2 license information|
|Get DB2 config export data|
|Get list of node directories|
|Get list of databases|
|List DCS directory|
|trusted hosts on Linux platforms.|
|Get database manager configuration|
|Copy instance owner profile|
|Copy database node and partitions configuration file|
Instance Failure Scenarios
The most likely causes of instance failure are human error, service failure, or file corruption. The reasons for failure may vary from a changed environment variable to a modified entry in an operating system configuration file that is referenced by DB2 software. Recovery scenarios can be categorized by whether the cause of the problem is known or unknown. If the cause of the problem is known, then the correction of the file or configuration setting will correct the problem. If the cause is unknown, then you should perform a full recovery of the instance. In a production environment, speed of recovery rather than speed of discovery takes precedent.
Understand what recovery scenarios are likely to occur in your environment and then put in place a backup strategy that can be used to recover the instance regardless of the failure scenario. Following are descriptions of four typical failure scenarios you should have a recovery plan in place for:
- The db2home directory on the administration node is removed or the instance is inadvertently dropped. The db2home directory on the administration node is shared by all other nodes in the partitioned database. If the contents of the home directory are compromised or removed, the instance will be unavailable.
- The instance configuration becomes corrupt or compromised. An instance may become unavailable due to configuration or DB2 binary files being compromised, corrupted, or deleted. Examples of such files include .rhosts, db2nodes.cfg, and the global registry. The inadvertent modification of environment variables can also cause this type of scenario.
- The shared db2home file system on the data, standby, user, or InfoSphere Warehouse application node is removed. The instance will become unavailable on the node in which the directory has been removed.
- Permissions on the instance directory are inadvertently changed. DB2 may become unavailable if file or directory permissions are changed on the db2home directory.
Instance recovery procedure
This section describes the procedures to verify the configuration of your operating system, and to drop and re-create an instance. The backup files referred to are the ones that you create using the sample backup scripts provided in the Downloads section. When verifying settings, ensure that you compare the settings on the failed node with the backup files for that node. You need root access to edit operating system files and settings. Use the DB2 instance owner for all DB2 related tasks.
- Verify the configuration of the operating system on the affected node:
- Verify that the instance user and fenced user exist in /etc/passwd.
- Verify that the related DB2 group account exists in /etc/group.
- Compare /etc/services and /etc/hosts against the respective backup copies and correct or restore as necessary.
- List the related software packages on the system and compare them against what is shown in the backup file. Isolate and determine the reason for any changes.
- Compare and verify user limits (ulimit), mount points (/etc/fstab), operating system level (oslevel), and crontab entries. Determine the reason for any changes.
Drop the corrupt instance on the affected node:
- Use the
db2ilistcommand to determine if the instance still exists in the instance list. If the instance is intact, check for mounts and network issues. Use
db2iupdtto correct permission issues.
- Drop the partially corrupted instance with the command:
- Use the command
db2isetto remove the related profile information and configuration parameters. For example:
db2iset -d bcuinst2
- Use the
- Create and configure the instance on the affected node:
Create the instance on the administration node as user root using the
db2icrtcommand. Issue the command from the product directory where the DB2 software is installed. On the test cluster for this article, the directory was /opt/IBM/dwe/db2/V9.7/instance. The command issued was:
db2icrt -u bcufenc2 bcuinst2
bcuinst2are the instance owner user ID's.
- Compare, and if necessary restore, the database manager configuration parameters to those in the dbm.cfg.out backup file.
- Compare, and if necessary replace, the $DB2HOME/sqllib/db2nodes.cfg file with the backup version.
- Compare, and if necessary re-apply, the registry variables with those in the db2set.out backup file.
For example, you may have to set the
- Compare, and if necessary replace, the $DB2HOME/sqllib/userprofile file using the backed up version.
- Create the instance on the administration node as user root using the
Add license files, catalog the database on the affected node, and start DB2:
- Restore the db2ese.lic file from the backup set of files and issue the
db2licm -a db2ese.lic
If the license file is not found, the
db2startcommand returns an SQL8000N error.
Catalog the nodes and database directory according to what is shown in the
db2.list.database.directory and db2.list.node.directory files in the backup set.
For example, to do this on the test cluster you would issue these
catalog tcpip node beluga-bvn-05 remote beluga-bvn-05 server 50000
catalog database BCUDB as BCUDB at node beluga-bvn-05
- Start DB2. If you are using the DB2 High Availability (HA) feature, refer to the shutdown and startup procedures for your environment.
- Restore the db2ese.lic file from the backup set of files and issue the command:
Recovery from known failures
As described in this section, if you know the cause of an instance failure, a complete recovery may not be necessary. You can fix network and mount issues as well as permission problems with specific recovery steps.
- The instance owner home directory on a non-administration node has been removed.
The instance home directory on nodes that are not the administration node are mounted file systems. Try these steps to recover from this type scenario:
- Mount the file system on the affected node using the mount command:
- Start the database manager as the instance owner. For example:
Note: In a high availability environment the command may be different. Consult your user guide.
- Mount the file system on the affected node using the mount command:
- Permissions on instance directory have been inadvertently changed.
cpcommand issued within the db2home directory, in particular in the sqllib directory, can cause the instance to become unavailable because the DB2 software might fail when attempting to read, write, or execute a file. For example, on the test cluster for this article, a
chmodcommand was issued on the administration node. This caused the
db2startcommand to fail due to file permission errors. Listing 2 shows the failed startup.
Listing 2. Changing permissions can cause failure
bcuinst2@beluga-bvn-05:~>cd ~/sqllib bcuinst2@beluga-bvn-05:~>chmod -R 444 * bcuinst2@beluga-bvn-05:~> db2start -bash: /db2home/bcuinst2/sqllib/adm/db2start: Permission denied SQL6031N Error in the db2nodes.cfg file at line number "<line>". Reason code "<reason-code>". Explanation: The statement cannot be processed because of a problem with the db2nodes.cfg file, as indicated by the following reason codes: 1 Cannot access the sqllib directory of the instance.
In this type scenario, follow these steps to re-apply ownership and permission values:
Each file in ~/sqllib should be owned by the instance user ID and db2 instance group.
This can be achieved by issuing the following command on the administration
chown db2inst2:db2igrp /db2home/bcuinst2/sqllib/*
where the instance and group accounts are
db2iuptcommand to update the db2home directory and replace the common files based on the installed DB2 software. To update an instance with
db2iupdt, you must first stop all processes that are running for the instance. The
db2iuptcommand replaces files under the ~/sqllib directory. Listing 3 shows how this step was achieved on the test cluster.
Listing 3. Using
db2iupdtto update an instance based on the installed DB2 software.
beluga-bvn-05:/opt/IBM/dwe/db2/V9.7/instance # ./db2iupdt bcuinst2 DBI1070I Program db2iupdt completed successfully. bcuinst2@beluga-bvn-05:~> db2start 08/10/2010 15:09:16 1 0 SQL1063N DB2START processing was successful. 08/10/2010 15:09:17 5 0 SQL1063N DB2START processing was successful. 08/10/2010 15:09:17 2 0 SQL1063N DB2START processing was successful. 08/10/2010 15:09:17 3 0 SQL1063N DB2START processing was successful. 08/10/2010 15:09:17 7 0 SQL1063N DB2START processing was successful. 08/10/2010 15:09:17 6 0 SQL1063N DB2START processing was successful. 08/10/2010 15:09:18 4 0 SQL1063N DB2START processing was successful. 08/10/2010 15:09:18 8 0 SQL1063N DB2START processing was successful. 08/10/2010 15:09:30 0 0 SQL1063N DB2START processing was successful. SQL1063N DB2START processing was successful.
- Each file in ~/sqllib should be owned by the instance user ID and db2 instance group. This can be achieved by issuing the following command on the administration node:
You should implement a DB2 instance backup and recovery strategy so that you are prepared to quickly and efficiently recover from scenarios where the instance has become compromised, and as a result is unavailable. By putting together a backup and recovery strategy for an instance, you can potentially avoid having to do a complete operating system restore when faced with a compromised instance. This can significantly reduce the amount of downtime you experience in such circumstances.
|DB2 Instance Backup Script||db2_instance_backup.zip||10KB|
|DB2 Instance Backup OS Script||db2_instance_backup_os.zip||10KB|
- Read other DB2 best practices papers on developerWorks.
- The DB2 Database for Linux, UNIX, and Windows Information Center provides information describing how to use the DB2 family of products and features, as well as related IBM InfoSphere Information Server products and features.
- Stay current with developerWorks technical events and webcasts focused on a variety of IBM products and IT industry topics.
- Attend a free developerWorks Live! briefing to get up-to-speed quickly on IBM products and tools as well as IT industry trends.
- Follow developerWorks on Twitter.
- Watch developerWorks on-demand demos ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers.
Get products and technologies
- Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.
- Participate in the discussion forum.
- Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.