During this step of the dimensional model design cycle,
you identify the measures and the type of measures that are included
in the dimensional model.
To learn more about measures, see Measures.
When you define the measures, you collect the following metadata:
- Fact table name
- Alias
- Grain
- Business definition
- Load frequency and statistics
- Usage statistics
- How archived data is handled
- How and when to purge data
- Data quality and accuracy
- Grain of date and time dimensions
- Keys and how keys are generated
- Data source information
- Measures
- Dimensions
- Table owner contact information
Identify measures that are true
to the grain
When you design the measures of a model, you identify
the measures that are true to the grain of each level of the model.
When you
identified the grain of the dimensional
model, you identified preliminary measures. You can determine
other detailed measures by looking at the grain definition. For example,
if you have detailed measures, such as cost per individual product
or manufacturing labor cost per product, you identify
all of the remaining measures in the model.
Identify measure types
Determine
what types of measures are used in the dimensional model. Each measure
in a fact table should have a default aggregation (or derivation)
rule.
Determine how to handle year-to-date
measures
Year-to-date measures are numeric
values that consist of an aggregated total from the start of year
to the current date. You must make sure that such measures are not
included in a fact table with the atomic level line items.
Assume
that a fact table stores sales data for the year 2005. The sales for
each month are additive, and you add the sales to produce year-to-date
totals. If you create a year-to-date fact such as Sales_$$_Year_To_Date, then when you query this fact in August 2005, you would get the
sum of all sales to August 2005.
Dimensional modelers may include
aggregated year-to-date measures inside the fact table to improve
performance and also reduce complexities when year-to-date queries
are formed. However, to avoid confusion, calculate these measures
in the report application.
Handle year-to-date measures with
the following methods:
- OLAP-based applications
- SQL functions in views or stored procedures
Determine how to handle events
If the table is an event-based fact table, you must determine
how events are handled.
Event fact tables are used to record
events, such as Web page clicks and employee or student attendance.
Events do not always result in measures. If you handle event-based
scenarios where there are no measures, use event fact tables that
consist of either pseudo facts or factless facts.
Keep in mind
the following considerations that are associated with an event-based
fact table:
- Event-based fact tables typically have pseudo facts or no facts
at all.
- Use pseudo facts to perform count operations.
- A factless fact event table has only foreign keys and no facts.
Use the foreign keys to perform count operations.
Predict fact table growth
Predict the size and growth of the fact table to determine how
you can tune the performance of the dimensional model.
You can
estimate the size and growth of the fact table by performing one of
the following steps:
- Understand the business
- For example, assume that the retail sales business generates a
gross revenue of $100 million. Also assume that the average price
of a line item is $2. To calculate the amount of line items that you
need, multiply the gross revenue by the average price of a line item:
100000000 / 2 = 50000000 rows
Therefore, 50 million rows
are inserted into the star schema for the retail sales business process
at this grain.
- Calculate from a technical perspective
- Determine the size of the foreign keys, degenerate dimensions,
and measures. Multiply these columns by the number of rows that could
possibly be inserted, assuming that all products that sell in all
stores on all days. For example, in the retail sales business process,
you take the following steps:
- Calculate the number of rows inside each of the dimensions:
Time dimension: 4 rows
Date dimension: 365 rows for 1 year
Product dimension: 100 rows (100 products)
Store dimensions: 2 rows (2 stores)
Customer dimension: 1000000 customers
Supplier dimension: 50 suppliers
Employee dimension: 10 employees
- Calculate the base level of fact records by multiplying together
the number of rows for each dimension. Use the numbers that you gathered
in the previous step:
4 * 365 * 100 * 2 * 1000000 * 50 * 10 = 730000000 rows
Note: This number might be larger than you expect. The number only
applies if every product is sold in every store by every employee
to every customer.
- Calculate the maximum fact table size growth:
Number of foreign keys: 8
Number of degenerate dimensions: 1
Number of measures: 8
Assume that the fact table takes
4 bytes for an INTEGER column, and calculate the size of a single
row:(8 + 1 + 8) * 4 bytes = 68 bytes
Calculate
the maximum data growth for a single year for the dimensional model:730000000 rows * 68 bytes = 45 GB