Skip to main content

skip to main content

developerWorks  >  Information Management  >

Get started with Informix Warehouse Feature, Part 1: Model your data warehouse using Design Studio

developerWorks
Go to the previous pagePage 3 of 11 Go to the next page

Document options
PDF format - Fits A4 and Letter

PDF - Fits A4 and Letter
1890 KB (56 pages)

Get Adobe® Reader®

Discuss


My developerWorks needs you!

Connect to your technical community


Rate this tutorial

Help us improve this content


Data warehousing – An overview

In order to quickly respond to the market and keep a competitive advantage, organizations need an integrated and historic view of their businesses. To support this requirement, an IT platform can be created to consolidate the data from the different heterogeneous systems across the company's business functions and operations, and even external sources, into one or more common databases called a data warehouse.

Data warehouses are designed with the business-context in mind (its terms, data, and performance measures) and typically take the form of multi-dimensional databases to better support decision-support applications.

In the broadest sense of the term, a data warehouse refers to a database that contains very large stores of historical data. The data is stored as a series of snapshots and aggregated data at a specific time, which allows a user to reconstruct history, make trend and projection analysis, discover gaps and problems, and make accurate comparisons between different time periods using Business Intelligence (BI) tools available in the market, like IBM Cognos. In addition, by using Data Mining tools, the business analysts can analyze the patterns in the data values and even predict business behavior based on different variables. A data warehouse includes processes in place to clean, integrate, and transform the data that it retrieves before it is loaded into the warehouse database so the repository contains a truthful integrated business view of a specific function or view across multiple functions within the organization.

The term data warehouse can mean different things to different people. A data-warehousing environment can encompass any of the following forms:

  • Data warehouse
  • Data mart
  • Operational data store
  • Repository

Please, refer to the "Overview of Data Warehousing" page in the IDS V11.50 Information Center for further explanation on these types of repositories (see Resources).

Figure 3 summarizes the software technologies and tools that are typically present or needed in the infrastructure of an end-to-end data warehousing solution. It also illustrates how the data flows from the sources to the targets in the process of creating and maintaining the data warehouse. As we move from the operational to the analytical world, the database systems' purpose and design change paradigms from OLTP to OLAP. The data warehouse itself can comprise one or more databases of the repositories types (ODS, data staging repository, data mart, data warehouse) mentioned earlier. Some of these could be the final data warehouse version that the front-end BI tools will use, and some could be used as staging databases in more complex data consolidation projects. Depending on the business need, you may want to put different warehouse repositories available to different business intelligence tools (reporting, dashboard, mashups, OLAP analysis and data mining). Last but not least, the extract-transform-load (ETL) or extract-load-transform (ELT) tools are needed to automate the initial and periodic tasks of consolidating and summarizing data from the different data sources (typically, OLTP database systems, external repositories and files) into the warehousing databases. Depending on the DW project, the ETL/ELT tools and processes can be deployed in several places and can be working in several stages to deal with the complexity of data extraction, transformation, cleansing, aggregation, and loading.


Figure 3. Overview of a typical Data Warehousing Infrastructure
Overview of Data Warehousing Infrastructure

The Informix Warehouse role in an Informix-based DW solution

To be effective, a data warehouse has to contain trusted information. It also has to be integrated, subject-oriented, non-volatile, and time-variant. Therefore, the design of the data warehouse and the periodic process to populate it is very important. Designers must make sure the design is consistent with the business queries that will be conducted within the warehouse. Another important aspect of data warehouses is data acquisition, which can be defined as transferring data from a source to the target warehouse. Data acquisition is one of the most expensive parts of building a data warehouse. This process will often be conducted with an extract-transform-load (ETL) or an extract-load-transform (ELT) tool.

It is here, in the data warehouse design and ELT processes, where Informix Warehouse Feature helps. This tutorial series explains by example how to use the tools available within the Informix Warehouse Feature to accomplish these tasks for Informix warehouse repositories.

Once the initial data has been transferred to the data warehouse, the process must be repeated consistently to incrementally update the warehouse according to the new data and changes in the operational sources. Data acquisition is a continuous process, and the goal is to make sure the warehouse is updated on a regular basis. When this update occurs in real time, the process is then called change data capture. This has become a separate field, and there are a number of products currently available to deal with it, including IBM InfoSphere Change Data Capture® (CDC), which can use Informix as either source or target system.

If very sophisticated and specialized ETL and data cleansing tools are needed to satisfy all types of data integration needs, the IBM InfoSphere DataStage® and QualityStage® products, which also work with Informix, are ideal for this purpose. InfoSphere CDC, DataStage, and QualityStage are products in the IBM InfoSphere® Information Server offering.



Back to top


Differences between OLTP and OLAP: A change in approaches

IT systems are divided into transactional (OLTP) and analytical (OLAP). In general, you can assume that OLTP systems provide source data to data warehouses — the OLAP systems — that will help analyze it.

OLTP (On-line Transaction Processing) is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE) predefined and pre-optimized within the application. The main emphasis for OLTP systems is put on very fast query and transaction processing to support the mission-critical operation of the organization. ERP, SCM, CRM systems to support Manufacturing, Sales, Finance/Accounting, Distribution, Payroll, and Marketing functions are typical examples of OLTP systems.

OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions. Queries are often very complex, performed ad-hoc, and involve aggregations. For OLAP systems, a response time is an effectiveness measure because the users of these systems are typically business analysts and executives who need a strategic view of several functions of the business.

The design of a data warehouse / OLAP database is fundamentally different than a OLTP database. The data warehouse is specifically designed in the form of dimensional databases, where the tables will be structured in cubes, star, or snowflakes schemas to facilitate fast query times and multi-dimensional analysis. In these schemas, the measures of the business (for example, cost, revenue) are stored in what are called the fact tables, whereas the data based on which we analyze those measures (for example, time, geography, product, customer) is stored in what are called the dimension tables. For more information, you can consult the "IBM Informix Database Design and Implementation Guide – Dimensional Databases" in the IDS V11.50 Information Center (see Resources).

Table 1 summarizes the major differences between OLTP and OLAP systems design. (Click here to see a detailed version of this comparison.) The detailed table is partially based on information from the book Building the Data Warehouse (William H. Inmon; Wiley, John & Sons, Incorporated, March 2002; ISBN: 9780471081302) and on www.rainmakerworks.com.


Table 1. Differences between OLTP and OLAP systems
CharacteristicOnline Transactional Systems (OLTP)
Operational Systems
Online Analytical Processing (OLAP)
Decision Support Systems (DSS)
Main PurposeTo support day-to-day operations, control, and run fundamental business tasksTo support managerial, strategic planning and problem solving, and decision support needs
QueriesShort transactions; relatively simple SQLLonger transactions; complex SQL with analytics
UpdatesRandom updates; few rows accessedSequential/bulk updates; many rows accessed
Processing speed and response timesSub-second response timeSeconds to minutes response time
Database modelER Modeling; minimizes redundancyDimensional Modeling; okay to have redundancy
Data normalizationNormalized data (5NF); minimizes duplicatesDe-normalized data (3NF); duplicates are okay
IndexesFew indexes; avoids index maintenance cost in writesOkay to have more indexes; mostly read-only operations
Workload predictability and tuningPre-compiled queries; repeated execution of queriesAd-hoc queries; unpredictable load


After this brief review of data warehousing concepts and OLAP design, and putting IDS and the Informix Warehouse Feature in context, let's take a look at the fictional business scenario you'll use in this tutorial and the rest of series to illustrate the capabilities of the Informix Warehouse tools.



Back to top



Go to the previous pagePage 3 of 11 Go to the next page