Cognos Dynamic Cubes overview
In a dimensional data warehouse, you model relational database tables using a star or snowflake schema. This type of data warehouse differs from a traditional OLAP model in the following ways:
- It stores information about the data in fact and dimension tables rather than in proprietary OLAP data structures.
- It describes the relationships within the data using joins between the dimension and fact tables, the collection of dimension keys in a fact table, and the different attribute columns in a dimension table.
IBM® Cognos® Dynamic Cubes adds an in-memory relational OLAP component to the dynamic query mode server to provide a multidimensional view of a relational data warehouse with accelerated performance. You can then perform OLAP analysis using the Cognos Dynamic Cubes server.
Cognos Dynamic Cubes differs from Cognos dimensionally-modeled relational (DMR) data sources for the following reasons:
- It provides increased scalability and the ability to share data caches between users for better performance.
- It allows you to a create a dynamic cube data source that is pre-loaded with dimensions.
- It allows for a richer set of dimensional modeling options and the explicit management of the member and data caches of a dynamic cube.
The benefits of Cognos Dynamic Cubes can be achieved only when using a dynamic cube as a data source. To use a dynamic cube as a data source, you must use the dynamic query mode.
Cognos Dynamic Cubes introduces a performance layer in the Cognos query stack to allow low-latency, high-performance OLAP analytics over large relational data warehouses. By using the power and scale of a relational database, Cognos Dynamic Cubes can provide OLAP analytics over terabytes of warehouse data.
Cognos Dynamic Cubes uses the database and data cache for scalability, and also uses a combination of caching, optimized aggregates (in-memory and in-database), and optimized SQL to achieve performance. The Cognos Dynamic Cubes solution includes the following characteristics:
- It uses simple, multi-pass SQL that is optimized for the relational database.
- It is able to minimize the movement of data between the relational
database and the Cognos Dynamic
This data control is achieved by caching only the data that is required and by moving appropriate calculations and filtering operations to the database. At run time, only fact data is retrieved on demand.
- It is aggregate-aware, and able to identify and use both in-memory
and in-database aggregates to achieve optimal performance.
Aggregate awareness (aggregates tables that are created in the database and modeled into a dynamic cube) uses specialized log files to allow the dynamic query mode server to decompose queries to take advantage of the aggregate tables.
- It optimizes aggregates (in-memory and in-database) using workload-specific
Aggregate Advisor, part of IBM Cognos Dynamic Query Analyzer, analyzes the performance of dynamic cubes using log files and provides suggestions for improving cube performance.
- It can achieve low latency over large data volumes, such as billions
of rows or more of fact data and millions of members in a dimension.
By using virtual cubes, companies can still present the complete view of the data, but need to refresh only smaller sets of data, leaving pre-cached query results for larger static sets. Users experience better performance for queries run against pre-cached results.
Evaluating your data
Before starting to model a cube, it is important to understand how your data affects the processing in IBM Cognos Cube Designer.
Referential integrity in data warehouses
Most databases today support referential integrity. However, it is typically turned off or is made declarative and instead is enforced during extract, transform, and load (ETL) processing. Erroneous modifications made to the data during or outside of the ETL process can create cases where a fact table has no matching dimension records.
Each data point in a dynamic cube is defined by a member from each dimension in the cube. If a value is required for some data point, then the SQL generated by Cognos Dynamic Cubes does not specify a filter on the table associated with a particular dimension if the member of that dimension is the All member. This allows for smaller SQL queries and also faster executing queries.
When a dimension is in scope, the join between the fact and dimension table is specified in the SQL query and the dimension is filtered by an explicit set of dimension key values. When the member of a dimension is the All member, dynamic cubes will not specify a filter for that dimension. All records are included, even records with invalid or missing dimension key values. This difference causes a discrepancy between values, depending upon which dimensions are involved in a query.
Even if your fact records have invalid or unknown dimension key values, you should validate your records before implementing Cognos Dynamic Cubes. Run an SQL query similar to the following for each dimension in a dynamic cube. This determines if there are any fact records with invalid dimension key values. Any returned data is the set of invalid dimension key values. If no data is returned, there are no referential integrity errors.
select distinct FACT.Key from FactTable FACT where not exists (select * from DimensionTable DIM where DIM.Key = FACT.Key)
The SQL query can also be used as a subquery, to obtain the full set of records from the fact table.
If your fact table might contain records with invalid or unknown dimension key values, a common practice is to create a row in the dimension table to represent these dimension keys. New fact rows with invalid or unknown dimension key values can be assigned this dimension key value until the fact records and the dimension table can be updated with correct information. With this practice, records with problematic dimension key values are visible, regardless of which dimensions are involved in a report or analysis.
You should also validate snowflake dimensions.
You may have a situation where tables in a snowflake dimension are joined on a column for which the outer table did not contain values for rows in the inner table. In this case, the inner dimension table joins to the fact table, but the outer dimension table does not join to the inner dimension table.
ensure that snow flake dimensions do not have this type of referential
integrity error, run an SQL query similar to the following. In this
example, the dimension is built from two tables,
joined to the fact table.
Key is the column on which
the two dimension tables are joined.
select distinct INNER.Key from D2_inner INNER where not exists (select * from D1_outer OUTER where OUTER.Key = INNER.Key)