In certain applications, there is a need for additional functionality that is not typically provided by a relational database. As part of your planning process, consider the following functional areas where additional products might be required, and make sure you integrate the necessary research, education, and development time of these additional products into your project plan:
Information Integration - Are there other data sources that must be accessed by the application, and will this access occur at the application code level or at the database level through the use of federation?
This topic includes products available to implement replication and federation of data and databases.
Content Management - Is there a need (current or future) to handle complex data objects or structures that have more sophisticated requirements than standard relational database data types? If so, what other tools and capabilities will be needed, and how will they be integrated?
Business Intelligence - Is there a need to summarize and analyze the data within the databases that are being used and which may require the use of OLAP tools and different structures of the data? If so, can this be done without impact to the production system, or is an additional copy of the data required (which could require replication or other similar processes)?
Web Integration - If the current GUI is awkward or out-dated, or if a larger client population needs access to the application, it may be time to consider using a Web front end to the application. If so, what development and runtime tools are needed? What education will be needed for the staff to be able to implement this technology?
Database Administration - Even after the database structure, objects, and data have been ported to DB2, are there other areas that need to be addressed when administering DB2? For example, how will performance be maintained? What will be the statistics collection strategy?
In addition to the fundamentals of database administration (creating databases, performing a database backup, maintaining indexes, and so on), there are other important considerations as well:
Backup and recovery - To prevent loss of data, you should have a tried-and-tested strategy for backing up the data in your databases and restoring it in case of a failure or error. The larger the volume of data you have, the more sophisticated your recovery strategy needs to be, especially if the database must always be online.
Replication - Replication allows you to transfer data between systems for the purpose of building new data stores, or for duplicating some or all of the original data in another DBMS.
High availability - High availability ensures that your data sources are always available for use (regardless of failures in hardware and software). This concept is closely related to the backup and recovery process, and is often implemented at the same time.
Federation - Federation allows you to access data from local and remote heterogeneous data sources in a single query.
Note: The following information is taken from the DB2 Data Recovery and High Availability Guide and Reference manual, which offers extensive information about both database backup, restore, and recovery, as well as techniques for ensuring high availability. There are also a number of articles and tutorials about high availability in the IBM developerWorks Information Managementtechnical library.
The concept of a database backup is similar to other system backups: making a copy of the database objects and data, and storing it on a different medium in case of failure or damage to the original. The simplest case of a backup involves shutting down the database and backing it up. Of course, in a 24x7 environment, this is not a possibility and you will have to use a different method such as online backup. You can then rebuild the database from the backup image if it becomes damaged or corrupted in some way at a later time. The rebuilding of the database is called recovery. Version recovery is the restoration of a previous backup image of the database, using an image that was created during a backup operation. Rollforward recovery involves the restoration of a previous backup image, as well as the re-application of transactions recorded in the database transaction log files.
Crash recovery is the automatic recovery of the database if a failure occurred before all transactions were completed and committed. During this process, the transaction log files are examined and incomplete transactions are undone, and committed transactions are redone.
A database can become unusable because of hardware failure, software failure, or both. You may, at one time or another, encounter storage problems, power interruptions, and application failures. Different failure scenarios require different recovery actions. You need to protect your data against the possibility of loss by having a well-rehearsed recovery strategy in place. Some of the questions that you should answer when developing your recovery strategy are:
Will the database be recoverable?
How much time can be spent recovering the database?
How much time will pass between full backup operations?
How much storage space can be allocated for backup copies and archived transaction logs?
Are table space level backups sufficient, or will full database backups also be necessary?
A database recovery strategy should ensure that all information is available when it is required for database recovery. It should include a regular schedule for taking database backups and, in the case of partitioned database systems, include backups when the system is scaled (database partitions or nodes are added/dropped).
Your overall strategy should also include procedures for recovering command scripts, applications, user-defined functions (UDFs), stored procedures, and operating system libraries.
Many commercial backup-and-restore applications are available for all types of file systems. These programs enable you to back up your file systems onto any media that you choose, and typically provide easy-to-use restore operations. An example of such a program is
IBM Tivoli Storage Manager. Tivoli Storage Manager runs in all environments that DB2 supports.
Your operating system also provides backup-and-restore utilities. See your operating system documentation for more information about the backup utilities.
Replication is the process of copying of data from one place to another. Data is extracted, transported to some other location, and then loaded at the receiving location. Data may be filtered and transformed during replication. There may be other requirements for replication, such as time constraints. In most cases, replication must not interfere with existing applications and have minimal impact on production systems. The replication processes need to be managed and monitored.
In DB2, SQL replication can be set up and configured using the DB2 Replication Center. A more heavy-duty type of replication called Q-Replication can also be configured; however, this form of replication requires WebSphere MQ software.
Note: The following information is taken from the DB2 Data Recovery and High Availability Guide and Reference manual, which offers extensive information about both database backups, restore, and recovery, as well as techniques for ensuring high availability. There are also a number of articles about high availability in the developerWorks technical library.
High availability (HA) is the term that is used to describe systems that run and are available to customers more or less all the time. For this to occur:
Transactions must be processed efficiently, without appreciable performance degradations (or even loss of availability) during peak operating periods.
Systems must be able to recover quickly when hardware or software failures occur, or when disaster strikes. DB2 has an advanced continuous checkpointing system and a parallel recovery capability that allow for extremely fast crash recovery. The ability to recover quickly can also depend on having a proven backup and recovery strategy in place.
Software that powers the enterprise databases must be continuously running and available for transaction processing. To keep the database manager running, you must ensure that another database manager can take over if it fails. This is called failover. Failover capability allows for the automatic transfer of workload from one system to another when there is hardware failure.
Failover protection can be achieved by keeping a copy of your database on another machine that constantly receives updates from the primary system. Log shipping is the process of copying log files to a standby machine, either from an archive device, or through a user exit program running against the primary database. With this approach, the primary database is restored to the standby machine, using either the DB2 restore utility or the split mirror function. The secondary database on the standby machine continuously applies the log files. If the primary database fails, any remaining log files are copied over to the standby machine. After applying the remaining log files, all clients are reconnected to the secondary database on the standby machine.
Failover support can be provided using platform-specific software that you can add to your system. For example:
High Availability Cluster Multi-Processing, Enhanced Scalability, for AIX
Microsoft Cluster Server, for Windows operating systems
Sun Cluster, or VERITAS Cluster Server, for the Solaris Operating Environment
Multi-Computer/ServiceGuard, for Hewlett-Packard
Failover strategies are usually based on clusters of systems. A cluster is a group of connected systems that work together as a single system. Each processor is known as a node within the cluster. Clustering allows servers to back each other up when failures occur, by picking up the workload of the failed server. Failover software may use heartbeat monitoring or keepalive packets between systems to confirm availability. Heartbeat monitoring involves system services that maintain constant communication between all the nodes in a cluster. If a heartbeat is not detected, failover to a backup system starts. End users are usually not aware that a system has failed. The two most common failover strategies on the market are known as idle standby and mutual takeover, although the configurations associated with these terms may also be associated with different terms that depend on the vendor.
Note: The following excerpts are taken from the DB2
Administration Guide for Federated Systems, which offers extensive information about planning, configuring, using, and administering federated systems.
A DB2 federated system is a special type of distributed database management system (DBMS). A federated system consists of a DB2 instance that operates as a federated server, a database that acts as the federated database, one or more data sources, and clients (users and applications) that access the database and data sources. With a federated system you can send distributed requests to multiple data sources within a single SQL statement. For example, you can join data that is located in a DB2 table, an Oracle table, and a Sybase view from a single SQL statement.
The power of a federated system is in its ability to:
Join data from local tables and remote data sources, as if all the data is local
Take advantage of the data source processing strengths, by sending distributed requests to the data sources for processing
Compensate for SQL limitations at the data source by processing parts of a distributed request at the federated server
The DB2 server in a federated system is referred to as the federated server. Any number of DB2 instances can be configured to function as federated servers. You can use existing DB2 instances as your federated server, or you can create new ones specifically for the federated system. The DB2 federated instance that manages the federated system is called a server because it responds to requests from end users and client applications. The federated server often sends parts of the requests it receives to the data sources for processing.
Typically, a federated system data source is a relational DBMS instance (such as Oracle or Sybase) and one or more databases that are supported by the instance. However, there are other types of data sources that you can include in your federated system, such as: