Get started with Informix Warehouse Feature, Part 1: Model your data warehouse using Design Studio

The new IBM® Informix® Warehouse Feature provides an integrated and simplified software platform to design and deploy a warehouse repository on your existing IBM Informix Dynamic Server (IDS) infrastructure. This tutorial, the beginning of a series, gives you a first look at the Informix Warehouse Feature and its components. It will first introduce you to the context behind the new tools available and the value they add in a data warehouse scenario based on Informix, and then, it will give you a hands-on and example-driven view of the Informix Warehouse Client component, the Design Studio, by going through the steps for creating a simplified data warehouse and data extract-load-transform (ELT) project, identifying the source and the target databases involved, and designing the database physical model you have planned for your target Informix warehouse.

Pamela Siebert (psiebert@us.ibm.com), IDS Staff Software Test Engineer, IBM

Pamela SiebertPamela Siebert is a quality assurance software tester and has been working with various Informix and DB2 database server products over five years. Currently, she works with Java tools and connectivity, with a focus on drivers for IBM data servers.



Veronica Gomes (vgomes@us.ibm.com), IT Specialist, IDS Competitive Technologies and Enablement, IBM

Veronica GomesVeronica Gomes is an IT specialist in the IDS Competitive Technologies and Enablement team. She used to work as a specialist of IBM Information Management products in IBM ChannelWorks and was a member of the Informix and DB2 Technical Support team for several years.



Jing Shan (shanj@us.ibm.com), Warehousing Tools and Data Studio Software Developer, IBM

Jing ShanJing Shan is a software developer at IBM Silicon Valley Lab. She was SQW development team lead for Informix Warehouse Feature v11.50 release and has worked on various components in SQL Warehousing for the past three years.



27 April 2009

Before you start

About this series

This tutorial series, Get started with Informix Warehouse Feature, introduces you to the highlights and capabilities of the new client and server software tools available for you in the Informix Warehouse option. These tools enable you to create and deploy a project for modeling the databases in your Informix data warehousing platform and automating the data transformation and movement processes to easily acquire and integrate data from different sources into your Informix warehouse.

The tutorials in this series are organized following this line of deployment: First, you will use the client component of the Informix Warehouse software — the Design Studio — to create the data warehouse (DW) project around Informix and build the data models for source and target databases. Then, you will continue using Design Studio to create and test the jobs for extracting, loading, and transforming (ELT) data from various sources (typically, OLTP databases and external files) into the target warehouse repository on Informix. Finally, the series illustrates the use of the server components in the Informix Warehouse — the Administration Console and SQW services — to manage and monitor resources associated to your warehouse, including the ELT jobs you programmed before in Design Studio.

About this tutorial

This tutorial is the first part of the Get started with Informix Warehouse Feature series. Get an introduction to the Informix Warehouse option, and learn more about its components and architecture. Review some of the main concepts in data warehousing, and put the Informix Warehouse Feature in context in a data warehousing solution using Informix. Set up and follow an imaginary business scenario that will run the entire tutorial series, where the Informix Warehouse Feature is used to design and populate the new warehouse repository. Finally, start using the Informix Warehouse: Get your hands on the Design Studio tool to create a new DW/ELT project, define the connections to your data sources and targets, create the physical data models for those databases using two different methods, visualize the entity-relationship (ER) diagrams of these models, and deploy the model for your target warehouse in the physical database.

Objectives

When you have completed this tutorial, you will:

  • Know the components and architecture of a solution involving Informix Warehouse
  • Be familiar with the functionality of the components in the software
  • Be familiar with Informix Warehouse's Design Studio — the client component of Informix Warehouse — to create a new DW/ELT project, connect to the data sources, and design the data models of the source and target (the warehouse) databases involved

Prerequisites

Downloading the required products

Some of the products mentioned here are available in trial versions (see the Resources section). Informix Dynamic Server Developer Edition is available at no cost. For any software that you do not find available in trial or demo resources, please consult with your IBM representative.

This tutorial is written based on an Informix Dynamic Server and Informix Warehouse installation on a Windows® platform. However, most of the information in this tutorial applies in the context of a solution on UNIX® or Linux® too. The tutorial is written for Informix database users who are familiar with and have working knowledge of Informix tools such as dbaccess, SQL language and scripts, Windows environments, and Eclipse-based IDE/GUI tools. You should also have skills and experience in relational database design, ER models, data warehousing, and ETL tools.

This tutorial assumes that you have:

  • Informix Warehouse Feature Client installed. For more information about how to install the product, please refer to the Resources section.
  • Informix Dynamic Server (IDS) 11.50.xC3 or later installed and a basic working knowledge of IDS. For download information, please refer to the Resources section.

System requirements

To run the examples in this tutorial, you need a system with at least the Informix Warehouse Feature client installed and an Informix Dynamic Server installed, with a database server instance up and running (local or remote).

This tutorial works on a fictional and simplified example of a business scenario in the Retail industry. You will design and deploy an Informix data mart for Sales and Cost analysis for the retail company Sports Stores. The new warehouse repository will combine and aggregate data from one Informix operational database and a few external files.

You will use the sample databases stores_demo as the source operational (OLTP) system, and sales_demo as the target warehouse (OLAP) system. To create the databases, follow these steps (this tutorial assumes you run these commands as user admin, which will be the schema under which the tables will be defined):

  1. Drop any databases called stores_demo and sales_demo, if they already exist.
  2. Create and populate the sample stores_demo database with the logging option. The dbaccessdemo script can be found in $INFORMIXDIR/bin:
    dbaccessdemo -log
  3. Create an empty sales_demo database with the logging option:
    dbaccess – - 
    > create database sales_demo with log ;

Introduction to Informix Warehouse Feature

The Informix Warehouse Feature simplifies the design and deployment of an Informix warehouse. It allows customers to more easily enable business applications on Informix databases, supplying a state-of-the-art Extract-Load-Transform (ELT) tool in an easy-to-use GUI/Eclipse environment. This platform provides the foundation you need to cost effectively deploy and build next-generation analytic solutions using the IBM Informix Dynamic Server.

Using Informix for a data warehouse is an ideal solution for Informix users who want to build end-to-end business intelligence (BI) and reporting solutions using data from various sources, including IDS. Users can more effectively leverage Informix for BI using Informix Warehouse capabilities to create and populate a data warehouse repository, and then utilize front-end analysis and reporting tools, like IBM Cognos®, to provide BI dashboards and other types of analytic applications and reports on top of the new warehouse repository. Informix users can deploy an end-to-end warehouse solution, simplifying operational complexity and reducing costs by using a single database server product for both operational and warehouse data.

Informix Warehouse has a component-based architecture with client and server pieces. The following software components are provided in Informix Warehouse:

  • Informix Warehouse client:
    • Informix Warehouse Design Studio
  • Informix Warehouse server:
    • WebSphere® Application Server
    • Informix Warehouse Administration Console
      • SQL Warehousing (SQW)

Informix Warehouse Client

Informix Warehouse Design Studio is an Eclipse-based common design environment for defining the source and target databases involved in your DW project, creating and reverse engineering physical data models of your databases, and building SQL-based data flows and control flows to quickly and easily build in-database data movements and transformations into your warehouse.

Design Studio is fully integrated with technology pieces from InfoSphere Data Architect® (IDA) for the graphical environment that allows data modeling of your databases (from scratch—empty template, from templates, or using reverse engineering). Design Studio is also fully integrated with what is called the SQL Warehousing (SQW) Tool, for the graphical environment and SQL-code generation of data and control flows (which are sequences of ELT jobs). Therefore, this tutorial only refers to Design Studio as the single Client component in the Informix Warehouse Feature.

Informix Warehouse Server

WebSphere Application Server is included to specifically support the Warehouse Administration Console and SQW run-time services in charge of scheduling, executing, and monitoring the SQL-based ELT jobs (control flows) defined and packaged using Design Studio.

Informix Warehouse Administration Console is a Web-based application for administering database and system resources related to your warehouse, as well as deploying, scheduling, and monitoring the control flows previously created in Design Studio through processes called the SQL Warehousing (SQW) services. In order to support these SQW run-time services, Informix Warehouse Console comes with WebSphere Application Server, but you can run the Administration Console on any other Java-based application server. The Administration Console allows you to:

  • Manage common resources, such as database connections and machine resources
  • Schedule the execution of control flows (sequences of ELT data flows)
  • Monitor the execution status

Informix Warehouse Architecture

Typically, you can arrange these Informix Warehouse Feature components along with the source and target database nodes in a three-group architecture like the one illustrated in Figure 1. Figure 1 illustrates the interaction between the different nodes in the platform. Notice that the Informix Warehouse client component relates with both the existing data sources (stored in the supported DBMS —IDS included— or in external files) and the new or existing data target (our datawarehouse, on IDS) for the purpose of designing and testing the ELT jobs. Whenever these jobs are ready to be deployed as approved and automated processes in the warehouse, the Informix Warehouse client component interacts with the Informix Warehouse server component to deploy these ELT jobs as packages or applications. The Informix Warehouse server component takes these packaged ELT jobs and allows you to schedule them for automatic execution on the target IDS data warehouse, and to monitor the status of these jobs execution as well. This architecture diagram also provides a basis for planning your installation across multiple computers, and illustrates the integration points that will be needed between the source and target data nodes and the new Informix Warehouse components (client and server).

Figure 1. Architecture overview of Informix Warehouse Feature with source and target database nodes
Informix Warehouse Features components

Now, focusing only on the new software, you can install the Informix Warehouse client and server components and the IDS for your warehouse repository in different or common machines, depending on whether you look for a multi-tier architecture or whether the operating systems you run on the computers support the different software groups' requisites. For example, you could install the data warehouse server (IDS) and the Informix Warehouse server components on the same computer or on two different computers. Consult the system requirements for the client and server components of the Informix Warehouse when planning your architecture. Figure 2 illustrates the different architecture layouts for this solution:

Figure 2. Logical groups of Informix Warehouse components and IDS data warehouse on different computers
Logical groups of Informix Warehouse components on different computers

The next section of this tutorial reviews some of the key concepts in the context of a data warehousing solution and the role that the Informix Warehouse Feature plays to help in the implementation of such solutions.


Data warehousing – An overview

In order to quickly respond to the market and keep a competitive advantage, organizations need an integrated and historic view of their businesses. To support this requirement, an IT platform can be created to consolidate the data from the different heterogeneous systems across the company's business functions and operations, and even external sources, into one or more common databases called a data warehouse.

Data warehouses are designed with the business-context in mind (its terms, data, and performance measures) and typically take the form of multi-dimensional databases to better support decision-support applications.

In the broadest sense of the term, a data warehouse refers to a database that contains very large stores of historical data. The data is stored as a series of snapshots and aggregated data at a specific time, which allows a user to reconstruct history, make trend and projection analysis, discover gaps and problems, and make accurate comparisons between different time periods using Business Intelligence (BI) tools available in the market, like IBM Cognos. In addition, by using Data Mining tools, the business analysts can analyze the patterns in the data values and even predict business behavior based on different variables. A data warehouse includes processes in place to clean, integrate, and transform the data that it retrieves before it is loaded into the warehouse database so the repository contains a truthful integrated business view of a specific function or view across multiple functions within the organization.

The term data warehouse can mean different things to different people. A data-warehousing environment can encompass any of the following forms:

  • Data warehouse
  • Data mart
  • Operational data store
  • Repository

Please, refer to the "Overview of Data Warehousing" page in the IDS V11.50 Information Center for further explanation on these types of repositories (see Resources).

Figure 3 summarizes the software technologies and tools that are typically present or needed in the infrastructure of an end-to-end data warehousing solution. It also illustrates how the data flows from the sources to the targets in the process of creating and maintaining the data warehouse. As we move from the operational to the analytical world, the database systems' purpose and design change paradigms from OLTP to OLAP. The data warehouse itself can comprise one or more databases of the repositories types (ODS, data staging repository, data mart, data warehouse) mentioned earlier. Some of these could be the final data warehouse version that the front-end BI tools will use, and some could be used as staging databases in more complex data consolidation projects. Depending on the business need, you may want to put different warehouse repositories available to different business intelligence tools (reporting, dashboard, mashups, OLAP analysis and data mining). Last but not least, the extract-transform-load (ETL) or extract-load-transform (ELT) tools are needed to automate the initial and periodic tasks of consolidating and summarizing data from the different data sources (typically, OLTP database systems, external repositories and files) into the warehousing databases. Depending on the DW project, the ETL/ELT tools and processes can be deployed in several places and can be working in several stages to deal with the complexity of data extraction, transformation, cleansing, aggregation, and loading.

Figure 3. Overview of a typical Data Warehousing Infrastructure
Overview of Data Warehousing Infrastructure

The Informix Warehouse role in an Informix-based DW solution

To be effective, a data warehouse has to contain trusted information. It also has to be integrated, subject-oriented, non-volatile, and time-variant. Therefore, the design of the data warehouse and the periodic process to populate it is very important. Designers must make sure the design is consistent with the business queries that will be conducted within the warehouse. Another important aspect of data warehouses is data acquisition, which can be defined as transferring data from a source to the target warehouse. Data acquisition is one of the most expensive parts of building a data warehouse. This process will often be conducted with an extract-transform-load (ETL) or an extract-load-transform (ELT) tool.

It is here, in the data warehouse design and ELT processes, where Informix Warehouse Feature helps. This tutorial series explains by example how to use the tools available within the Informix Warehouse Feature to accomplish these tasks for Informix warehouse repositories.

Once the initial data has been transferred to the data warehouse, the process must be repeated consistently to incrementally update the warehouse according to the new data and changes in the operational sources. Data acquisition is a continuous process, and the goal is to make sure the warehouse is updated on a regular basis. When this update occurs in real time, the process is then called change data capture. This has become a separate field, and there are a number of products currently available to deal with it, including IBM InfoSphere Change Data Capture® (CDC), which can use Informix as either source or target system.

If very sophisticated and specialized ETL and data cleansing tools are needed to satisfy all types of data integration needs, the IBM InfoSphere DataStage® and QualityStage® products, which also work with Informix, are ideal for this purpose. InfoSphere CDC, DataStage, and QualityStage are products in the IBM InfoSphere® Information Server offering.

Differences between OLTP and OLAP: A change in approaches

IT systems are divided into transactional (OLTP) and analytical (OLAP). In general, you can assume that OLTP systems provide source data to data warehouses — the OLAP systems — that will help analyze it.

OLTP (On-line Transaction Processing) is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE) predefined and pre-optimized within the application. The main emphasis for OLTP systems is put on very fast query and transaction processing to support the mission-critical operation of the organization. ERP, SCM, CRM systems to support Manufacturing, Sales, Finance/Accounting, Distribution, Payroll, and Marketing functions are typical examples of OLTP systems.

OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions. Queries are often very complex, performed ad-hoc, and involve aggregations. For OLAP systems, a response time is an effectiveness measure because the users of these systems are typically business analysts and executives who need a strategic view of several functions of the business.

The design of a data warehouse / OLAP database is fundamentally different than a OLTP database. The data warehouse is specifically designed in the form of dimensional databases, where the tables will be structured in cubes, star, or snowflakes schemas to facilitate fast query times and multi-dimensional analysis. In these schemas, the measures of the business (for example, cost, revenue) are stored in what are called the fact tables, whereas the data based on which we analyze those measures (for example, time, geography, product, customer) is stored in what are called the dimension tables. For more information, you can consult the "IBM Informix Database Design and Implementation Guide – Dimensional Databases" in the IDS V11.50 Information Center (see Resources).

Table 1 summarizes the major differences between OLTP and OLAP systems design. (Click here to see a detailed version of this comparison.) The detailed table is partially based on information from the book Building the Data Warehouse (William H. Inmon; Wiley, John & Sons, Incorporated, March 2002; ISBN: 9780471081302) and on www.rainmakerworks.com.

Table 1. Differences between OLTP and OLAP systems
CharacteristicOnline Transactional Systems (OLTP) Operational SystemsOnline Analytical Processing (OLAP) Decision Support Systems (DSS)
Main PurposeTo support day-to-day operations, control, and run fundamental business tasksTo support managerial, strategic planning and problem solving, and decision support needs
QueriesShort transactions; relatively simple SQLLonger transactions; complex SQL with analytics
UpdatesRandom updates; few rows accessedSequential/bulk updates; many rows accessed
Processing speed and response timesSub-second response timeSeconds to minutes response time
Database modelER Modeling; minimizes redundancyDimensional Modeling; okay to have redundancy
Data normalizationNormalized data (5NF); minimizes duplicatesDe-normalized data (3NF); duplicates are okay
IndexesFew indexes; avoids index maintenance cost in writesOkay to have more indexes; mostly read-only operations
Workload predictability and tuningPre-compiled queries; repeated execution of queriesAd-hoc queries; unpredictable load

After this brief review of data warehousing concepts and OLAP design, and putting IDS and the Informix Warehouse Feature in context, let's take a look at the fictional business scenario you'll use in this tutorial and the rest of series to illustrate the capabilities of the Informix Warehouse tools.


Business scenario

This tutorial series will step you through the process of designing and deploying a simple Informix data warehouse for a fictional Retail company Sports Stores that uses the well-known Informix sample OLTP database stores_demo as its source operational system. The desired warehouse repository is a Data Mart for Sales and Cost Analysis that will be stored in the newly created OLAP database sales_demo.

To review the steps to prepare these two databases for the hands-on exercises in this tutorial, please refer to the "System requirements" section.

Informix Warehouse can help Sports Stores by providing graphical tools for modeling the data mart tables in the sales_demo database and loading its tables with combined and transformed data extracted from the existing operational data sources (in this case, the stores_demo database — also on Informix — and a few plain-text files).

The existing OLTP source database: The stores_demo database

The stores_demo schema contains the transactional data for the Sports Stores retail. At the beginning of this tutorial, the "System requirements" section describes how to create and populate this database. Figure 4 shows the logical data model of the stores_demo schema. For more information about this OLTP database, you can consult the "IBM Informix Guide to SQL: Reference – Appendix A. The stores_demo Database" in the IDS V11.50 Information Center (see Resources).

Figure 4. Logical data model of the operational database stores_demo
Physical data model of the operational database stores_demo

(Click here to see a larger image of Figure 4.)

Table 2 describes the nine tables that are in the physical model:

Table 2. Description of the tables in the physical data model of the stores_demo schema
Physical table nameDescription
customerContains information about the retail stores' customers that place orders from the distributor.
ordersContains information about orders placed by the customers of the distributor.
itemsAn order can include one or more items. One row exists in the items table for each item in an order.
stockThe distributor carries 41 types of sporting goods from various manufacturers. More than one manufacturer can supply an item.
catalogDescribes each item in stock. Retail stores use this table when placing orders with the distributor.
cust_callsAll customer calls for information on orders, shipments, or complaints are logged. The cust_calls table contains information about these types of customer calls.
call_typeThe call codes associated with customer calls are stored in the call_type table.
manufactStores information about the nine manufacturers whose sporting goods are handled by the distributor.
stateContains the names and postal abbreviations for the 50 states of the United States.

The new OLAP target database: The sales_demo database

The data mart database sales_demo will contain the star schema and the aggregated data for the Sports Store retail chain that is required by BI analysts for sales and cost analysis. Again, the "System requirements" at the beginning of this tutorial describes how to create an empty sales_demo database. Figure 5 shows the intended logical data model of the sales_demo schema, the OLAP database for your sample data mart. Notice the star-form created by the dimension tables around the fact table:

Figure 5. Logical data model of the sales_demo schema
Physical data model of the sales_demo schema

Table 3 describes the dimension tables and fact table in the physical data model of the sales_demo schema:

Table 3. Description of the tables in the physical data model of the sales_demo schema
Physical table nameDescription
timeDimension table. It contains time information about the sale.
customerDimension table. It contains information about sales customers.
productDimension table. It contains information about the products sold through the data warehouse.
geographyDimension table. It contains information about the sales district and region.
salesFact table. It contains information about product sales and has a pointer to each dimension table.

This tutorial uses Informix Warehouse's Design Studio to design the sales_demo database from scratch. You can use the SQL script code in Listing 1 as guidance:

Listing 1. Sample SQL script code intended for sales_demo database
-- The following statements create the tables for the sales_demo database:

create table geography (
district_code serial,
district_name char(15),
state_code char(2),
state_name char(18),
region smallint);

create table product (
product_code integer,
product_name char(31),
vendor_code char(3),
vendor_name char(15),
product_line_code smallint,
product_line_name char(15));

create table customer (
customer_code integer,
customer_name char(31),
company_name char(20));

create table time (
time_code int,
order_date date,
month_code smallint,
month_name char(10),
quarter_code smallint,
quarter_name char(10),
year integer );

create table sales (
customer_code integer,
district_code smallint,
time_code integer,
product_code integer,
units_sold smallint,
revenue money (8,2),
cost money (8,2),
net_profit money(8,2));

In real-life scenarios you may need to use a combination of templates and reverse engineering to aid in the process of creating the data model of a large data warehouse.

For more information about this database, you can consult the "IBM Informix Guide to SQL: Reference – Appendix B. The sales_demo and superstores_demo Databases" in the IDS V11.50 Information Center (see Resources).

Based on this imaginary business scenario, this tutorial series will use the different tools in the Informix Warehouse Feature to design and deploy a platform for a data warehousing solution for the retail company Sports Stores.

In the following sections of this tutorial, see how to start using Design Studio (the client tool of the Informix Warehouse) to create a new Eclipse-based workspace. Then, define and test the database connections to both the existing operational sources you later want to extract data from (in our case, just stores_demo, on Informix) and the new datamart you want to design and populate (the currently empty Informix database sales_demo). After this first setup, you will use the information provided in this "Business scenario" section to guide the creation of the tables in the sales_demo schema and complete the data model of this new OLAP system. In order to produce the data model for the existing stores_demo OLTP system, you will use reverse engineering to discover the tables and relationships, so Design Studio can automatically generate this model from the Informix catalog information about the stores_demo database.


Getting started with the Informix Warehouse Client

Launching Design Studio

  1. Select Start > All Programs > IBM Informix Warehouse Feature > DesignStudio.
    Figure 6. Launching the Informix Warehouse Client from Start menu
    Launching the Informix Warehouse Client from Start menu

    Alternatively, you can launch Design Studio's Eclipse IDE by running the command eclipse.exe from C:\Program Files\ISWarehouse\ds\ (default location of Design Studio on Windows) or from your actual Design Studio directory.

    On UNIX, this tends to be installed at /opt/IBM/ISWarehouse. The Informix Warehouse Feature Client can be launched from /opt/IBM/ISWarehouse/ds/ by executing the command eclipse.

  2. When the following Workspace Launcher window appears, change the name of the workspace to workspace_demo and click OK.
    Figure 7. Informix Warehouse Feature Client splash screen
    Informix Warehouse Feature Client splash screen
  3. A Welcome screen is shown after the workspace is created for the first time. Close the screen, as shown in Figure 8:
    Figure 8. Closing the Welcome screen
    Closing the Welcome screen

    The newly created (blank) workspace will appear, as depicted in Figure 9:

    Figure 9. A blank workspace
    A blank workspace

Now you're ready to start working with Design Studio. In the next steps you will identify and define the connections and data models of the databases participating in the DW/ELT project.

Defining the database connections

Data warehousing project in Informix Warehouse

This tutorial uses DW/ETL or DW/ELT Project to reference the type of warehousing application that will be developed using Design Studio. Informix Warehouse allows you to create, design, test, and debug what this tutorial calls Data Warehousing applications or projects, to include the Data Models of the source and target databases (or its set of tables involved in the data movement and transformation project), the database connections involved, the Data Flows and Control Flows, File Formats, SQL Scripts, and other objects related to projects for extracting-loading-and-transforming data from different heterogeneous sources to populate a data warehouse repository (in Informix).

In order to start working with databases in your data warehousing application (ELT project), you need to create and test the database connections to the different databases participating in this project as either source or target systems (in this case, the connections to both sample databases stores_demo and sales_demo).

Create two new database connections (one for sales_demo and another one for stores_demo) using the Informix JDBC driver®. Optionally, you can use the IBM Data Server Driver for JDBC and SQLJ. This tutorial connects using the Informix JDBC driver.

  1. On the Data Source Explorer tab, right-click on Database Connections > New…
    Figure 10. Creating a new database connection
    Creating a new database connection

    A New Connection wizard screen appears.

    Figure 11. New Connection wizard
    New Connection wizard

Updating Informix JDBC® Driver

Informix Warehouse Feature 11.50 comes with Informix JDBC Driver 3.50.JC2. We strongly advise you to update to JDBC 3.50.JC3 or later. You can do this by clicking on the little triangle icon, located on the right of the JDBC Driver option, and editing the driver properties.

  1. Since this tutorial uses the Informix database and Informix JDBC® driver, select Informix under the Select a database manager option and click on the drop-down menu under the JDBC driver option to select Informix 11.5 – Informix JDBC Driver Default.
    Figure 12. New Connection wizard, part 2
    New Connection wizard, part 2
  2. One database at a time, fill out the connection information for both stores_demo and sales_demo database connections. Test the connections to ensure your database connection information is correct and that you are able to connect to both databases. (Figures 13 and 14 illustrate the steps for the stores_demo database only.)
    Figure 13. Testing the new connection
    Testing the new connection
  3. Click on Next to edit the filtering option.
  4. Uncheck the box next to Disable filter to allow filtering. Click on the radio button next to Selection. Check the box next to admin, the user that was used when creating the database (if you used a different user, check that option instead).
    Figure 14. Using filter option to select admin schema
    Using filter option to select admin schema

Important

Make sure all the database connections are online (you are connected to both data sources) in the Data Source Explorer.

  1. Click on Finish. A new database connection is created on the Data Source Explorer panel, as depicted in Figure 15.
  2. Follow the steps above for both the stores_demo and the sales_demo databases. Figure 15 shows that both connections are created and alive:
    Figure 15. Data Source Explorer showing both connections created and active
    Data Source Explorer showing both connections created
    You can browse through the objects of the databases on the Data Source Explorer panel on the left side and see the properties of the database connection's objects in the Properties window on the right side of the workspace.

Creating a new DW / ELT project in Design Studio

Now, let's create a new project for our data warehouse and its ELT jobs.

Notice the Data Project Explorer located in the upper left panel, as shown in Figure 16. It should be empty.

Figure 16. Data Project Explorer
Data Project Explorer
  1. To create the Data Warehousing project, click on File > New > Data Warehousing Project
    Figure 17. Creating Data Warehousing project
    Creating Data Warehousing project

    The New Project wizard screen appears.

  2. Enter Sales Demo - ELT and DW Project in the Project name field.
    Figure 18. New Project wizard
    New Project wizard
  3. Click on Finish. The new project now appears on the left side, under the Data Project Explorer.
    Figure 19. New project is created
    New Project is created

In the next step, you're going to generate the data model for the existing OLTP database stores_demo using reverse engineering to discover it and pick up just the tables you need for the DW/ELT project and visualizing them in an ER representation.


Creating physical data model from reverse engineering

Now, let's create the physical data model for the existing stores_demo database using the reverse engineering option. In this section, you'll also create an overview diagram to easily see the Entity-Relationship diagram.

  1. Right click on the Sales Demo – ELT and DW project, and select New > Physical Data Model, as illustrated in Figure 20:
    Figure 20. Creating a physical data model
    Creating a physical data model
  2. The New Physical Data Model wizard appears. Change the Destination Folder location to the project name, Sales Demo – ELT and DW project. To do so, you use the Browse… button and select the project.
  3. Rename the file name in the File name field to stores_demo_model.
  4. Change Database to Informix, and change Version to 11.50.
  5. Select the radio button next to Create from reverse engineering, then select Next.

    Figure 21 illustrates the New Physical Data Model dialog window with the values suggested in the steps above:
    Figure 21. New Physical Data Model wizard
    New Physical Data Model wizard
  6. Select the stores_demo connection, and click on Next again.
    Figure 22. Selecting stores_demo connection
    Selecting stores_demo connection
  7. Select the admin schema (or a different schema if you did not use admin when you created the stores_demo database), then click on Next.
    Figure 23. Selecting the schema
    Selecting the schema
  8. The next screen asks you to select the database elements. Keep the default values, and click on Next.
    Figure 24. Selecting database elements
    Selecting database elements
  9. Generate the overview diagram by checking the box next to Overview in the Generate diagrams area. Also, check the box next to Infer implicit relationships, then click on Finish.
    Figure 25. Finishing the Physical Data Model wizard
    Finishing the Physical Data Model wizard

    The newly created database model stores_demo_model will appear on the left panel, under the Data Models folder, and the database model Properties will appear on the right panel. Save all the changes.

  10. If you need to display the visual ER diagram of the discovered stores_demo data model with the tables under admin user/schema, go to Data Project Explorer > Data Diagrams. Then, go to the data diagram for stores_demo, stores_demo_model.dbm, and double-click on the schema/user name admin. The ER diagram will appear on the right side.
    Figure 26. Opening the Overview diagram (ER model) of stores_demo database for admin schema
    Opening the Overview diagram for stores_demo

    If you do not see the ER model (overview diagram), you might need to go to Data Models > stores_demo_model.dbm > admin schema, right-click on admin, select Add to Overview Diagram, and create a new diagram that includes the tables under admin schema.

  11. Arrange the model to your visual preferences. The following are suggestions to help arrange the entities and relationships in the Overview diagram displayed inside the canvas:
    1. To zoom out the diagram, click on the Zoom out button on the Palette, located on the right side of the workspace, and then click on the diagram area to zoom it out.
    2. Use the Select button to drag and drop the entities to make them closer to one another and wherever they would better go in the diagram.
    3. In order to automatically arrange the relationship connectors to the new order of the entities that you created, select Select All Connectors from the Select drop-down list on the top toolbar (or alternatively use the Menu option: Diagram > Select > All Connectors). All the relationship connectors will be selected.
    4. Next, select Arrange Selected Connectors from the Arrange drop-down list on the top toolbar. This action automatically arranges the connectors based on the new positions where you placed the entities.
    5. Now, you can zoom in again and use the Select tool to move the entities and relationships according to your preferences.
    6. Continue iterating this way until you see an ER diagram you feel comfortable with. Then save the changes.
  12. An overview diagram like the one in Figure 27 is obtained:
    Figure 27. Overview diagram (ER model) for stores_demo after arranging entities and relationships
    Overview diagram for stores_demo

Modifying the stores_demo diagram

  1. Open the Overview diagram for stores_demo.

Tip

If the diagram gets messy, you can always right-click on some blank space there and click on the Arrange All menu.

  1. While you are still in the stores_demo diagram window, you can remove the ER entities call_type and catalog because they will not be used to populate the data mart sales_demo. After removing the two entities, rearrange the remaining entities so they can be visible in the same space.
    Figure 28. Stores_demo diagram showing the relationships
    Stores_demo diagram showing the relationships
  2. Save all the changes.
  3. You could decide to create the missing non-identifying relationship between tables customer and state. To do so, you can use a relationship connector that links state(code) (parent) with customer(state) (child).

Creating physical data model from a template

Let's now create the physical data model for sales_demo from scratch (from an empty template).

  1. Create a new physical data model using the sales_demo database and the specifications from before, except select Create from template this time, then click on Next.
    Figure 29. Template option
    Template option
  2. The next screen provides an option to use an existing template. Go with the default (empty template), and click on Finish.
    Figure 30. Selecting a template
    Selecting a template

    Both data models have been created under the Data Models folder in the Sales Demo – ELT and DW Project. The data diagrams are created under the Data Diagrams folder in the project.

    Figure 31. Location of the data models and data diagrams
    Location of the data models and data diagrams
  3. Notice that the sales_demo.dbm does not have a database specified. It is named Database. Let's change that to sales_demo:
    1. Click on Database under sales_demo.dbm (you might need to double click on sales_demo.dbm to expand the tree).
    2. The Properties window appears on the lower right panel. Change the name "Database" to sales_demo, and select Save.
      Figure 32. Changing the default database name
      Changing the default database name
  4. Do the same for Schema. To do so, expand the database folder (now sales_demo) and change the schema property name.

The expanded tree of the sales_demo physical data model should now look like Figure 33 (notice the Diagrams folder is also expanded):

Figure 33. Expanded tree of sales_demo physical data model
Expanded tree of sales_demo physical data model

Creating new tables for sales_demo

The diagrams are representative of the databases themselves in the form of Entity-Relationship model. In this section, explore the diagrams and learn how to manipulate them. You'll also create new tables for the sales_demo database.

  1. Double-click to open the Diagram1 under the Diagrams folder. You see a blank diagram with the Palette on the right. This palette allows you to drag and drop objects on to the blank canvas to create entities and relationships.
    Figure 34. Blank diagram with palette
    Blank diagram with palette

    Explore the Palette and notice the Data and Geometric Shapes folders. You're going to drag and drop the Table object on to the blank canvas to create a table.

  2. Drag and drop the Table object on to the canvas.
    Figure 35. Drag and drop Table object
    Drag and drop Table object
  3. Let's now make a fact table called sales. To rename the table from Table1 to sales, click on the name and enter sales. Alternatively, you can click on the Table1 table and the properties window will appear on the lower panel where you can rename Table1 to sales.
  4. Add the columns to the sales table. Click on the Columns tab in Properties window, and select the Add Column button (the plus sign icon) to add the columns to the sales table. Add the following columns, as illustrated in Figure 36: customer_code, district_code, time_code, product_code, units_sold, revenue, cost, net_profit. Make sure Not Null is specified for customer_code, district_code, time_code, and product_code. They are primary keys, but leave the Primary Key check box unchecked for all of them because you do not want to worry about referential integrity after the first bulk load for now.
    Figure 36. sales fact table
    sales fact table
  5. Repeat the steps for the dimensional tables: geography (Figure 37), product (Figure 38), customer (Figure 39), and time (Figure 40).
    Figure 37. geography dimensional table
    geography dimensional table
    Figure 38. product dimensional table
    product dimensional table
    Figure 39. customer dimensional table
    customer dimensional table
    Figure 40. time dimensional table
    time dimensional table

Column definitions from Listing 1

Use the script provided in Listing 1 as guidance for the tables and columns definitions in the sales_demo database.

  1. Click on Save. Figure 41 illustrates how the diagram will look:
    Figure 41. Updated sales_demo diagram
    Updated sales_demo diagram
  2. Expand the schema to view the newly created tables.
    Figure 42. New tables under admin schema with the sales table expanded
    New tables under admin schema with the sales table expanded

    You want to update the database in the actual server with the new tables.

  3. Right-click on the schema (in this case, admin), and click on Generate DDL....
    Figure 43. Generate DDL
    Generate DDL
  4. The Generate DDL wizard appears. Use the default option (CREATE statements only), and click on Next.
    Figure 44. Generate DDL wizard screen 1
    Generate DDL wizard screen 1
  5. The second screen asks you which objects should be included in the script. Keep the default, and click on Next.
    Figure 45. Generate DDL wizard screen 2
    Generate DDL wizard screen 2
  6. The next screen allows the DDL to be reviewed. Notice the folder and file name. They can be edited to your preference. This example uses the default name and folder. You want to run this script against the server right away, so check the box next to Run DDL on server, and click on Next.
    Figure 46. Generate DDL wizard screen 3
    Generate DDL wizard screen 3

Warning

Please note that if you want the tables created under the admin schema, you have to be connected as user admin. If you are connected as informix user, it will create tables under the informix schema.

  1. Select the sales_demo database in the next screen, and click on Next.
    Figure 47. Generate DDL wizard screen 4
    Generate DDL wizard screen 4
  2. View the summary screen, and click on Finish.
    Figure 48. Generate DDL wizard screen 5
    Generate DDL wizard screen 5
  3. To make sure the script was run successfully, take a look at the SQL Results view in the lower right panel. It also displays the job status on the right.
    Figure 49. Reviewing the job status
    Reviewing the job status

    This view is useful if the script execution is not successful. You can expand the status and analyze every DDL statement executed. Sometimes it would say the table already existed in the database server.

  4. You can verify if the tables are created by opening a dbaccess session and viewing the database.

    Figure 50. Verifying the tables are created
    Verifying the tables are created

Congratulations!! You have completed this tutorial! Now you have created your new Informix-based warehouse repository, sales_demo, and you are ready to go through the other tutorials in this series to create and deploy the processes needed to populate your data warehouse.


Conclusion

This first tutorial allowed you to get familiar with the Informix Warehouse Feature so you can easily start using it.

This tutorial also introduced you to the context, architecture, and components of Informix Warehouse. It illustrated with an example of how to start using the Client component of the software, Design Studio, in order to create a DW/ELT project, identify the connections, and create the data models of the databases that will later participate as sources and targets in the data movement and transformation processes that will help to populate your data warehouse.

This tutorial guided you through the process of designing the data models of these databases to include the tables in the scope of the DW solution using two different techniques: Discovering the Entity-Relational model of a database using reverse engineering for existing repositories with tables, or modeling the database from scratch (from an empty template), based on the design we had originally planned for the new warehouse repository, using the graphical environment provided by Design Studio. These data models remain part of the DW/ELT project and are exportable as templates or DDL scripts, which you can validate, analyze impact of the desired changes, and execute them directly against the actual databases using Design Studio.

The following parts of this tutorial series will show you how to continue using Design Studio to now design the data and control flows that will do the task of populating the new data warehouse repository you have created in this tutorial, to finally show you how to deploy and monitor those ELT jobs using the Informix Warehouse Server component, the Administration Console.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=385827
ArticleTitle=Get started with Informix Warehouse Feature, Part 1: Model your data warehouse using Design Studio
publish-date=04272009