Cubing — Module 7: Optional: Creating and deploying a virtual cube

In this module, you will partition your data effectively and create a virtual cube so that you can improve performance and increase the availability of your cube for queries.

Virtual cubes merge different cubes together to allow a single query destination that returns merged results from the cubes that make up the single virtual cube. Virtual cubes can drastically improve the response time of the cube server queries by partitioning data to optimize the use of the cache.

Term: Virtual cube

A virtual cube is a logical cube that is defined in terms of exactly two existing cubes.

Learn more about virtual cubes

A virtual cube can be made up of two standard cubes, two virtual cubes, or one virtual cube and a standard cube. If more than two cubes must be combined, the cubes can be merged in pairs. The resulting virtual cubes can be merged with other cubes.

The cubes are merged based on the dimension names. Dimensions with the same name in both cubes will be merged. Dimensions from one cube that do not have a corresponding dimension with the same name in the other cube are added to the virtual cube.

The two cubes that are combined can belong to different cube models, which means that the internal structures can be completely different. The cubes that make up a virtual cube must share at least one dimension. The virtual cubes and the cubes that make up the virtual cubes must reside in the same database. The cubes that form the virtual cube must also be from the same cube server.

Two cubes are considered to share a dimension if a dimension with a particular name exists in both cubes. For example, if two cubes both have a dimension that is named [Time], in the context of virtual cubes, the two cubes share the [Time] dimension.

When a query is issued against a virtual cube, that query is routed to the cubes that make up the virtual cube. This produces two intermediary results that are aggregated according to the merge operator of the virtual cube. The merge operator can be one of the following seven operators:

SUM
MINUS
PRODUCT
DIVIDE
MAX
MIN
NOP (returns data from the first cube in the virtual cube definition)

Up until this point, all 48 months of sales information has been stored in a single cube, the SalesCube cube. Fact data is added nightly, which requires rebuilding the data cache and refreshing the MQTs. This nightly maintenance requires a significant amount of time, during which users cannot query for sales data. Valuable time is lost during the nightly maintenance period.

You decide to implement virtual cubes. The virtual cube you will create contains two cubes, and the cube with data for the current month is refreshed every night. The other cube is refreshed every month, instead of every night. As a result, the amount of maintenance time that is required is drastically reduced. Therefore, the managers in other geographies who need to access the data warehouse are impacted by only a few hours of maintenance a month, instead of by a few hours of maintenance every day.

To improve performance, you decide to partition the GOSALESDW.SLS_SALES_FACT fact table into two different views, GOSALESDW.fact_recent and GOSALESDW.fact_historic. The new GOSALESDW.fact_recent view will select data from the current month, which creates a view of a smaller amount of data. The GOSALESDW.fact_historic view will select the rows with sales data from the remaining months of data. The GOSALESDW.fact_recent view will be used to create the SalesCubeRecent cube, and the GOSALESDW.fact_historic view will be used to create the SalesCubeHistoric cube.

The sample cubes are already created for you as a sample metadata file. This metadata file is included as a part of the samples directory when you install InfoSphere™ Warehouse 9.7.1. You will be using the virtual_cube_model.xml file that is included as a part of the installation image. If you are using InfoSphere Warehouse 9.7, you can download the sample cube model.

Important: You cannot use the existing tutorial_model.xml file to create the cubes from the new views, because the joins that are created while you work on the tutorial will be broken. To ensure that your joins will work and to avoid errors, use the sample virtual_cube_model.xml file to complete this module. Similarly, do not try to use the virtual_cube_model.xml file for any other parts of this tutorial. The joins in the two files are different.

The SalesCubeRecent cube is the smaller cube and queries a smaller amount of data. This cube is used to record the daily sales information for the current month, creating a smaller set of data. The maintenance period that is necessary for this smaller cube is also reduced.

The larger SalesCubeHistoric cube will contain a larger amount of data. This cube is refreshed monthly to include the data from the SalesCubeRecent cube. Because the SalesCubeHistoric cube is refreshed every month, the maintenance time that is required is significantly reduced. Query results for the SalesCubeHistoric cube are also pre-cached so that results are returned quickly to sales managers.

After you create the virtual cube, the performance of the cache refresh and of queries improves because of several factors:

The MQT refresh time is now reduced, because only the small MQTs that are based on the SalesCubeRecent cube must be refreshed every night.
Large MQTs that are based on the SalesCubeHistoric cube are updated only monthly.
Because query results from the SalesCubeHistoric cube are pre-cached, the performance for queries that are run against the sales data of the entire time period is improved.
Because of the smaller size of the SalesCubeRecent cube, the performance for queries that are run against the sales data of the current month or the entire time period is improved.

Learning objectives

After completing the lessons in this module, you will understand the concepts and know how to do the following tasks:

Creating views that partition a fact table
Importing sample cubes with the Design Studio
Creating virtual cubes
Deploying virtual cubes with the Administration Console

Time required

This module should take approximately 20 minutes to complete.

Prerequisites

You must complete Module 1, Module 2, Module 3, and Module 4 of this tutorial before you begin this module.

Module 7: Optional: Creating and deploying a virtual cube

Learning objectives

Time required

Prerequisites

Lessons in this module