Typically, many properties are the result of
an aggregation.
The level of individual purchases is too fine-grained for prediction,
so the properties of many purchases must be aggregated to a meaningful
focus level. Normally, aggregation is done to all focus levels. In
the example of forecasting sales for individual stores, this means
aggregation to store and day.
Basic aggregation
In
most
cases,
aggregation
means summing up the individual values. In general, aggregation is
defined by an aggregation function and its arguments, the set of values
to which this function is applied. The most common aggregation function
is SUM. Other functions might also make sense, for example AVG or
MAX.
The argument can be the value of a column or a measure
from the input model. If the values to be aggregated are not immediately
available in the input model, you can also compute the argument values
of an aggregation by using an expression over columns and measures.
The
following list shows simple aggregations:
- Total sales
(SALES)
- Summing SALES_AMOUNT to the focus levels Store and
Day
- Number of sales transactions (SALES_TRX)
- Counting the number of sales for the focus levels
- Total profit (SALES_PROFIT)
- Summing the differences
between SALES_AMOUNT and cost to
the focus levels. The summation is done over the expression (SALES_AMOUNT
- cost).
Aggregation to a higher level
Sometimes,
information needs to be aggregated to a level higher than the focus
level. For example, to compare daily results to weekly sales, it is
necessary to first sum the sales amounts for weeks instead of days:
- Total week sales (SALES_WK)
- Summing SALES_AMOUNT
to the week level. The week level is higher
in a hierarchy than the focus level day.
- Average
sales in month (SALES_AVG_MTH)
- Averaging individual daily
store sales in month. This amounts
to summing SALES_AMOUNT to the day level and the store level, and
then taking the average across the month.
Aggregation for a range of values
When
analyzing sales data, an important input into forecasts
is the sales behavior in comparable earlier periods or in adjacent
periods of time. The extent of such periods directly depends on the
value in the time portion of the focus, because the periods are defined
relatively to some point in time. Therefore, values cannot simply
be aggregated to some hierarchy level, but must be computed individually
for each row of data.
An example for an aggregation on a rolling
window looks like this:
- Past 7 day sales (SALES_7_DAYS)
- Computed by summing the values of the daily sales amounts for
the seven shopping days that immediately precede the current day.