Best practices for data warehousing environments

The best practice papers described in this section provide practical guidelines and techniques to help you use Db2® software efficiently in data warehousing environments.

Data warehousing environments are data management systems typically designed to optimize the performance of data analysis queries on large data repositories. The IBM® Smart Analytics System environment, which incorporates IBM Db2 Warehouse software, Db2 Database software, and IBM Cognos® software, represents a best practice configuration of hardware and software for data warehousing environments. While many of the scenarios documented in the best practice papers referred in this section were extensively tested with the IBM Smart Analytics System software, they also apply to other product configurations.

AIX operating system-level backup and recovery for an IBM Smart Analytics System environment

Use the AIX operating system-level backup and recovery for an IBM Smart Analytics System environment paper to learn about the best practices for developing a backup and recovery strategy for the operating system and configuration of an IBM Smart Analytics System solution that is based on the AIX® operating system and POWER® processors. The paper focuses on recovering the configuration for a replacement device and the configured operating system for a replacement server from the backup files of the IBM Smart Analytics System components. The objective of the backup and recovery strategy is to resume normal operating behavior as quickly as possible when you replace a server or device.

You can download this best practice paper from the IBM developerWorks® Hybrid Data Management Community site: AIX operating system-level backup and recovery for an IBM Smart Analytics System.

Building a data migration strategy with IBM InfoSphere Optim High Performance Unload

IBM Optim High Performance Unload for Db2 V4.02 is a high-speed tool for unloading, extracting, and migrating data in Db2 databases. High Performance Unload (HPU) is designed to extract and migrate data from Db2 table space containers. A data migration strategy that uses HPU minimizes storage needs and automates many manual tasks.

HPU uses named-pipes and parallel LOAD operations to stream data from the source to the target database without the need to ever stage the data on disk. You can direct HPU to determine where different partition maps, software levels, and key constraints are used and automatically handle these events during data migration.

HPU can also unload subsets of data from target database without the need to access the Db2 software layer. With this functionality, you can migrate data from larger production systems to smaller pre-production or development environments.

You can download the Building a data migration strategy with IBM InfoSphere Optim High Performance Unload best practice paper from the IBM Hybrid Data Management Community site: Building a data migration strategy with IBM InfoSphere Optim High Performance Unload.

Building a recovery strategy for an IBM Smart Analytics System data warehouse

Identifying and planning for recovery scenarios that are most likely to occur are the key factors in determining the speed with which you can recover from data loss or corruption. The recommendations in the Building a recovery strategy for an IBM Smart Analytics System data warehouse best practice paper help you develop a strategy that meets your data warehouse backup and recovery needs. The paper focuses on IBM Db2 software in the IBM Smart Analytics System environment with configurations that are based on System x® and Power Systems servers.

This article is targeted at those involved in planning, configuring, designing, implementing, or administering a data warehouse that is based on Db2 software. In particular, this paper focuses on the IBM Smart Analytics System environment with configurations based on System x® and Power Systems servers.

You can download this best practice paper from the IBM Hybrid Data Management Community site: Building a recovery strategy for an IBM Smart Analytics System data warehouse.

Deploying the IBM Banking Data Warehouse to IBM Db2 10.5 with BLU Acceleration

Implementing industry models can help accelerate projects and reduce risk in a wide variety of industry sectors when enterprises develop and implement business intelligence applications. The IBM Industry Models solutions cover a range of industries that include banking, healthcare, retail, and telecommunications. The IBM Industry Models solutions provide you with an extensive and extensible data model for your industry sector. Use the logical data model as provided by IBM to build a physical model that is customized for your reporting requirements, and then deploy and populate a best practice IBM Db2 10.5 with BLU Acceleration database. This paper introduces the logical data model concepts and then focuses on what you must do to transform a non-vendor-specific logical data model into a best-practice production IBM Db2 10.5 with BLU Acceleration schema. Specifically, this paper uses the IBM Industry Models Banking Data Warehouse model pertaining to Involved Party and Social Media entities. It guides you through the recommended process for transforming the logical data model for dimensional warehousing into a physical database design for production use in your environment.

You can download this best practice paper from the IBM Hybrid Data Management Community site: Deploying the IBM Banking Data Warehouse to IBM Db2 10.5 with BLU Acceleration.

Deploying the IBM Banking Data Warehouse to IBM InfoSphere BigInsights

An IBM industry model solution is a comprehensive set of industry-specific pre-designed models that form the basis of a business and software solution, optimized for business challenges in a particular sector. Domain areas include data warehousing, business intelligence, business process management, service-oriented architecture, business terminology, and business glossary templates. The IBM Industry Models solutions covers a range of industries that include banking, health care, retail, and telecommunications. The IBM Industry Models solutions provides you with an extensive and extensible data model for your industry sector. Use the logical data model as provided by IBM to build a physical data model that is customized for your reporting requirements, and then deploy and populate an IBM InfoSphere® BigInsights™environment. This paper shows how you can deploy the IBM Banking Data Warehouse (BDW) solution to BigInsights. This paper introduces the logical data model concept and then focuses on what you must do to transform a non-vendor-specific logical data model into a production-ready BigInsights BigSQL schema.

You can download this best practice paper from the IBM Hybrid Data Management Community site: Deploying the IBM Banking Data Warehouse to IBM InfoSphere BigInsights.

Expanding an IBM Smart Analytics System database and redistributing data

This paper recommends best practices for the process of expanding the database and redistributing data following the hardware build, install, and cluster expansion. The options available for hardware expansion and planning to add hardware to an IBM Smart Analytics System are discussed in the paper Expanding an IBM Smart Analytics System.

You can download this best practice paper from the IBM Hybrid Data Management Community site: Expanding an IBM Smart Analytics System database and redistributing data.

Expanding an IBM Smart Analytics System environment

Data warehouse environments continue to experience an explosion in data growth. As a result, you might need added capacity to cope with increased enterprise demands. To help you meet these demands, the IBM Smart Analytics System software is a flexible data warehousing solution that supports a building block approach to expansion.

The Expanding an IBM Smart Analytics System best practice paper describes how to identify, plan for, and prepare for an expansion of your system and describes the milestones and options that are involved in an expansion project. This paper provides details on the following recommendations:

Use good capacity planning practices, which can assist in early detection of trends in resource usage. You can create and document a performance baseline for each workload and a forecast baseline for the next 12 months. Align your service level objectives with capacity planning indicators so that you can easily compare metrics.
Begin planning for storage expansion when storage capacity reaches 60% and is projected to reach 80% within 12 months.
You can add a data module to expand storage capacity and reduce the data volume per database partition. Alternatively, you can add a user module to increase the capacity of the system to accommodate users. When adding modules, remember that multi-generation environments might need multiple high availability groups. Ensure that you incorporate failover and failback testing into your expansion plan.

You can download this best practice paper from the IBM Hybrid Data Management Community site: Expanding an IBM Smart Analytics System.

Frequently asked questions about database administration for an IBM Smart Analytics System environment

The Frequently asked questions about database administration for an IBM Smart Analytics System environment best practice paper answers some of the frequently asked questions about database administration on an IBM Smart Analytics System data warehouse database. The frequently asked questions are grouped into the following categories:

Database implementation and design
Database administration and maintenance
Advanced statistics maintenance
Troubleshooting and resolution

You can download this best practice paper from the IBM Hybrid Data Management Community site: Frequently asked questions about database administration for an IBM Smart Analytics System environment.

Frequently asked questions about system maintenance for the IBM Smart Analytics System

Use the Frequently asked questions about system maintenance for the IBM Smart Analytics System best practice paper to obtain answers to some of the frequently asked questions about system maintenance in IBM Smart Analytics System environments and InfoSphere® Balanced Warehouse® environments. You can download this best practice paper from the IBM Hybrid Data Management Community site: Frequently asked questions about system maintenance for the IBM Smart Analytics System.

Implementing Db2 workload management in a data warehouse

Using a staged approach, the Implementing Db2 workload management in a data warehouse best practice paper guides you through the steps to implement the best practices workload management configuration on IBM Db2 software. The steps create sufficient controls to help ensure a stable, predictable system for most data warehouse environments. This initial configuration is intended to be a good base on which you can perform additional tuning and configuration changes to achieve your specific workload management objectives.

This best practice paper presents a set of definitions representing the different stages of maturity for a workload management configuration in a Db2 database. These stages range from the basic stage 0 configuration to the advanced stage 3 configuration. A specific configuration template and process are provided so that you can progress from a stage 0 configuration to a stage 2 configuration. General descriptions and advice about common stage 3 scenarios are also given.

You can download this best practice paper from the IBM Hybrid Data Management Community site: Implementing Db2 workload management in a data warehouse.

Ingesting data into an IBM Smart Analytics System

The key design goals of ingest application design are to balance the required rate of ingest with the availability of processing resources, to ingest data without affecting data availability, and to maintain flexibility in the volumes of data ingested. Identifying service level objectives for the speed and volume of data that is ingested into the data warehouse helps determine the architecture, design, and development of the data ingest application.

The Ingesting data into an IBM Smart Analytics System best practice paper is targeted at people who are involved in the design and development of data ingest applications that are based on Db2 database software. In particular, the focus is on ingesting data into the IBM Smart Analytics System environment with configurations that are based on System x and Power Systems servers. You can download this best practice paper from the IBM Hybrid Data Management Community site: Ingesting data into an IBM Smart Analytics System.

Managing data growth

Because of business needs for data retention and regulatory compliance, enterprises need to manage increasingly large databases ranging from hundreds of gigabytes to many terabytes, or even petabytes, in size. As data continues to grow at an exponential rate, DBAs and IT professionals in these organizations face daunting challenges when designing and operating such large databases. The data must be well organized to effectively cope with data growth and to meet service requirements. This document communicates best practices for managing data growth using the rich set of features in IBM Db2 software.

You can download this best practice paper from the IBM Hybrid Data Management Community site: Managing data growth.

Managing data warehouse performance with IBM InfoSphere Optim Performance Manager

This paper is targeted at those who are involved in implementing IBM InfoSphere Optim™ Performance Manager (OPM) 5.1 in a data warehouse environment that is based on Db2 V9.7 (Db2 9.7) software. In particular, this paper focuses on the IBM Smart Analytics System environment with configurations that are based on System x and Power® Systems servers.

The goal of an OPM implementation in a warehouse environment is to identify which database operations and application workloads are consuming resources on individual database partitions and across the entire database. Identifying the appropriate indicators and workloads to monitor in your environment is a key step in measuring the performance of your database workloads against your service level objectives.

When implementing OPM in a production environment it is important that you configure OPM to collect only the data that you need to monitor the service level objectives set. When more detailed metrics are needed, such as when introducing a new application or troubleshooting a query, they can be collected for defined periods.

You can download this best practice paper from the IBM Hybrid Data Management Community site: Managing data warehouse performance with IBM InfoSphere Optim Performance Manager.

Multi-temperature data management

Using faster, more expensive storage devices for hot data and slower, less expensive storage devices for cold data optimizes the performance of the queries that matter most while helping to reduce overall cost. This paper presents a strategy for managing a multi-temperature data warehouse by storing data on different types of storage devices based on the temperature of the data. It provides guidelines and recommendations for each of the following tasks:

Identifying and characterizing data into temperature tiers.
Designing the database to accommodate multiple data temperatures.
Moving data from one temperature tier to another.
Using Db2 workload manager to allocate more resources to requests for hot data than to requests for cold data
Planning a backup and recovery strategy when a data warehouse includes multiple data temperature tiers

The content of this paper applies to data warehouses based on Db2 version 10.1 or later.

Use the Multi-temperature data management paper to learn about a strategy for managing a multi-temperature data warehouse by storing data on different types of storage devices based on the temperature of the data. You can download this best practice paper from the IBM Hybrid Data Management Community site: Multi-temperature data management.

Optimizing analytic workloads using Db2 10.5 with BLU Acceleration

BLU Acceleration is a new collection of technologies for analytic queries that are introduced in Db2 Version 10.5. At its heart, BLU Acceleration is about providing faster answers to more questions and analyzing more data at a lower cost. Db2 with BLU Acceleration is about providing order-of-magnitude benefits in performance, storage savings, and time to value.

This paper gives you an overview of these technologies, recommendations on hardware and software selection, guidelines for identifying the optimal workloads for BLU Acceleration, and information about capacity planning, memory, and I/O.

A section on system configuration shows you how IBM’s focus on simplicity enables you to set up Db2 so that it automatically makes optimal configuration choices for analytic workloads. Other sections describe how to implement and use using Db2 with BLU Acceleration. Unlike other best practices that show you multiple choices, this paper focuses on how BLU Acceleration works and what it is doing under the covers. This will really give you an appreciation of the simplicity built into BLU Acceleration and show you how it really does deliver “Super Analytics, Super Easy”.

You can download the Optimizing analytic workloads using Db2 10.5 with BLU Acceleration best practice paper from the IBM Hybrid Data Management Community site: Optimizing analytic workloads using Db2 10.5 with BLU Acceleration.

Performance monitoring in a data warehouse

Monitoring a data warehouse system is important to help ensure that it performs optimally. This paper describes the most important Db2 software and operating system metrics for monitoring the performance of the IBM Smart Analytics System or IBM PureData™ System for Operational Analytics or a system having a similar architecture. This paper also presents a general methodology to help find reasons for performance problems. This approach starts with the operating system metrics and drills down to the Db2 metrics that explain the behaviour that is seen at the operating system level and help to identify the causes of performance problems. This approach is illustrated by information about typical performance problems.

You can download this paper from the IBM Hybrid Data Management Community site: Performance monitoring in a data warehouse.

Physical database design for data warehouse environments

This paper provides best practice recommendations that you can apply when designing a physical data model to support the competing workloads that exist in a typical 24x7 data warehouse environment. It also provides a sample scenario with completed logical and physical data models. You can download a script file that contains the DDL statements to create the physical database model for the sample scenario.

This paper targets experienced users who are involved in the design and development of the physical data model of a data warehouse in Db2 or IBM InfoSphere Warehouse Version 9.7 environments.

You can download the Physical database design for data warehouse environments best practice paper from the IBM Hybrid Data Management Community site: Physical database design for data warehouse environments.

Query optimization in a data warehouse

In large data warehouse systems, it is critical to optimize query workloads to maximize system utilization and minimize processing times. The Query optimization in a data warehouse best practice paper describes techniques for the optimization of data warehouse query workloads. The paper contains IBM Smart Analytics Systems scenarios where Db2 software manages multiple database partitions in a cluster. The scenarios describe optimization methods that can help improve performance in a short time. The guidelines in this paper might not apply to transactional applications.

You can download this best practice paper from the IBM Hybrid Data Management Community site: Query optimization in a data warehouse.

Storage optimization with deep compression

This paper communicates best practices for using the Db2 Storage Optimization Feature with the Db2 product. You can use the Db2 Storage Optimization Feature to apply compression on various types of persistently stored data and temporary data.

You can download this best practice paper from the IBM Hybrid Data Management Community site: Storage optimization with deep compression.

Temporal data management with Db2 V10

The temporal features in the IBM Db2 Version 10 product provide rich functionality for time-based data management. For example, you can choose to record the complete history of data changes for a database table so that you can "go back in time" and query any past state of your data. You can also indicate the business validity of data by assigning a pair of date or timestamp values to a row to indicate when the information is deemed valid in the real world. Using new and standardized SQL syntax, you can easily insert, update, delete, and query data in the past, present, or future.

You can download this best practice paper from the IBM Hybrid Data Management Community site: Temporal data management with Db2 V10.

Transforming IBM Industry Models into a production data warehouse

Implementing an industry model can help accelerate projects in a wide variety of industry sectors by reducing the effort required to create a database design optimized for data warehousing and business intelligence.

IBM Industry Models cover a range of industries which include banking, healthcare, retail, and telecommunications. The example that is chosen in this document is from a subset of the IBM Insurance Information Warehouse model pertaining to Solvency II (SII) regulations.

Many of the most important partitioned database design decisions are dependent on the queries that are generated by reporting and analytical applications. This paper explains how to translate reporting needs into database design decisions. This paper guides you through the following recommended process for transforming a logical data model for dimensional warehousing into a physical database design for production use in your environment.

You can download the Transforming IBM Industry Models into a production data warehouse best practice paper from the IBM Hybrid Data Management Community site: Transforming IBM Industry Models into a production data warehouse.

Tuning and monitoring database system performance

Most Db2 systems go through some kind of a performance evolution. You must first configure the software and hardware of the system. In many ways, configuration sets the stage for how the system behaves when it is in operation. Then, after you deploy the system, it is important to monitor system performance, to detect any problems that might develop. If such problems develop, you reach the next phase: troubleshooting. Each current phase depends on the previous ones, in that without correct preparation in the previous phases, you are much more likely to have difficult problems to solve in the current phase.

Use the Tuning and monitoring database system performance paper to learn about best practices for the performance evolution of a Db2 data server. Topics range, from important principles of initial hardware and software configuration to monitoring techniques that help you understand system performance under both operational and troubleshooting conditions. This paper provides a stepwise, methodical approach for troubleshooting a performance problem. You can download this best practice paper from the IBM Hybrid Data Management Community site: Tuning and monitoring database system performance.

Using IBM InfoSphere Optim High Performance Unload as part of a recovery strategy in an IBM Smart Analytics System

IBM InfoSphere Optim High Performance Unload (HPU) for Db2 V4.2 software is a high-speed tool for unloading, extracting, and repartitioning data in Db2 databases. HPU unloads large volumes of data quickly by reading directly from full, incremental, and delta backup images or table space containers rather than through the Db2 software layer.

The Using IBM InfoSphere Optim High Performance Unload as part of a recovery strategy in an IBM Smart Analytics System paper describes how to incorporate HPU into a recovery strategy and when to use HPU to meet your recovery objectives. In addition, the paper contrasts the use of output files and named pipes, covers introducing parallelism, describes the HPU command and control file, and provides best practices for creating control files. Finally, the paper details how to install and configure HPU in an IBM Smart Analytics System environment. You can download this best practice paper from the IBM Hybrid Data Management Community site: Using IBM InfoSphere Optim High Performance Unload as part of a recovery strategy in an IBM Smart Analytics System.