Dimensional modeling: Identify the measures

During this step of the dimensional model design cycle, you identify the measures and the type of measures that are included in the dimensional model.

To learn more about measures, see Measures.

When you define the measures, you collect the following metadata:

Fact table name
Alias
Grain
Business definition
Load frequency and statistics
Usage statistics
How archived data is handled
How and when to purge data
Data quality and accuracy
Grain of date and time dimensions
Keys and how keys are generated
Data source information
Measures
Dimensions
Table owner contact information

When you identify the measures of the dimensional model, you perform the following steps:

Identify measures that are true to the grain

When you design the measures of a model, you identify the measures that are true to the grain of each level of the model. When you identified the grain of the dimensional model, you identified preliminary measures. You can determine other detailed measures by looking at the grain definition. For example, if you have detailed measures, such as cost per individual product or manufacturing labor cost per product, you identify all of the remaining measures in the model.

Identify measure types

Determine what types of measures are used in the dimensional model. Each measure in a fact table should have a default aggregation (or derivation) rule.

Determine how to handle year-to-date measures

Year-to-date measures are numeric values that consist of an aggregated total from the start of year to the current date. You must make sure that such measures are not included in a fact table with the atomic level line items.

Assume that a fact table stores sales data for the year 2005. The sales for each month are additive, and you add the sales to produce year-to-date totals. If you create a year-to-date fact such as Sales_$$_Year_To_Date, then when you query this fact in August 2005, you would get the sum of all sales to August 2005.

Dimensional modelers may include aggregated year-to-date measures inside the fact table to improve performance and also reduce complexities when year-to-date queries are formed. However, to avoid confusion, calculate these measures in the report application.

Handle year-to-date measures with the following methods:

OLAP-based applications
SQL functions in views or stored procedures

Determine how to handle events

If the table is an event-based fact table, you must determine how events are handled.

Event fact tables are used to record events, such as Web page clicks and employee or student attendance. Events do not always result in measures. If you handle event-based scenarios where there are no measures, use event fact tables that consist of either pseudo facts or factless facts.

Keep in mind the following considerations that are associated with an event-based fact table:

Event-based fact tables typically have pseudo facts or no facts at all.
Use pseudo facts to perform count operations.
A factless fact event table has only foreign keys and no facts. Use the foreign keys to perform count operations.

Predict fact table growth

Predict the size and growth of the fact table to determine how you can tune the performance of the dimensional model.

You can estimate the size and growth of the fact table by performing one of the following steps:

Understand the business

For example, assume that the retail sales business generates a gross revenue of $100 million. Also assume that the average price of a line item is $2. To calculate the amount of line items that you need, multiply the gross revenue by the average price of a line item:

100000000 / 2 = 50000000 rows

Therefore, 50 million rows are inserted into the star schema for the retail sales business process at this grain.

Calculate from a technical perspective

Determine the size of the foreign keys, degenerate dimensions, and measures. Multiply these columns by the number of rows that could possibly be inserted, assuming that all products that sell in all stores on all days. For example, in the retail sales business process, you take the following steps:

Calculate the number of rows inside each of the dimensions:

Time dimension: 4 rows
Date dimension: 365 rows for 1 year
Product dimension: 100 rows (100 products)
Store dimensions: 2 rows (2 stores)
Customer dimension: 1000000 customers
Supplier dimension: 50 suppliers
Employee dimension: 10 employees

Calculate the base level of fact records by multiplying together the number of rows for each dimension. Use the numbers that you gathered in the previous step:
```
4 * 365 * 100 * 2 * 1000000 * 50 * 10 = 730000000 rows
```
Note: This number might be larger than you expect. The number only applies if every product is sold in every store by every employee to every customer.
Calculate the maximum fact table size growth:
```
Number of foreign keys: 8
Number of degenerate dimensions: 1
Number of measures: 8
```
Assume that the fact table takes 4 bytes for an INTEGER column, and calculate the size of a single row:
```
(8 + 1 + 8) * 4 bytes = 68 bytes
```
Calculate the maximum data growth for a single year for the dimensional model:
```
730000000 rows * 68 bytes = 45 GB
```

Step 4: Identify the measures

Identify measures that are true to the grain

Identify measure types

Determine how to handle year-to-date measures

Determine how to handle events

Predict fact table growth