Aggregation

Show Me
Typically, many properties are the result of an aggregation. The level of individual purchases is too fine-grained for prediction, so the properties of many purchases must be aggregated to a meaningful focus level. Normally, aggregation is done to all focus levels. In the example of forecasting sales for individual stores, this means aggregation to store and day.

Basic aggregation

In most cases, aggregation means summing up the individual values. In general, aggregation is defined by an aggregation function and its arguments, the set of values to which this function is applied. The most common aggregation function is SUM. Other functions might also make sense, for example AVG or MAX.

The argument can be the value of a column or a measure from the input model. If the values to be aggregated are not immediately available in the input model, you can also compute the argument values of an aggregation by using an expression over columns and measures.

The following list shows simple aggregations:
Total sales (SALES)
Summing SALES_AMOUNT to the focus levels Store and Day
Number of sales transactions (SALES_TRX)
Counting the number of sales for the focus levels
Total profit (SALES_PROFIT)
Summing the differences between SALES_AMOUNT and cost to the focus levels. The summation is done over the expression (SALES_AMOUNT - cost).

Aggregation to a higher level

Sometimes, information needs to be aggregated to a level higher than the focus level. For example, to compare daily results to weekly sales, it is necessary to first sum the sales amounts for weeks instead of days:
Total week sales (SALES_WK)
Summing SALES_AMOUNT to the week level. The week level is higher in a hierarchy than the focus level day.
Average sales in month (SALES_AVG_MTH)
Averaging individual daily store sales in month. This amounts to summing SALES_AMOUNT to the day level and the store level, and then taking the average across the month.

Aggregation for a range of values

When analyzing sales data, an important input into forecasts is the sales behavior in comparable earlier periods or in adjacent periods of time. The extent of such periods directly depends on the value in the time portion of the focus, because the periods are defined relatively to some point in time. Therefore, values cannot simply be aggregated to some hierarchy level, but must be computed individually for each row of data.

An example for an aggregation on a rolling window looks like this:
Past 7 day sales (SALES_7_DAYS)
Computed by summing the values of the daily sales amounts for the seven shopping days that immediately precede the current day.


Feedback | Information roadmap