Tabular versus Transactional Data
Data used by association rule models may be in transactional or tabular format, as described below. These are general descriptions; specific requirements may vary as discussed in the documentation for each model type. Note that when scoring models, the data to be scored must mirror the format of the data used to build the model. Models built using tabular data can be used to score only tabular data; models built using transactional data can score only transactional data.
Transactional Format
Transactional data have a separate record for each transaction or item. If a customer makes multiple purchases, for example, each would be a separate record, with associated items linked by a customer ID. This is also sometimes known as till-roll format.
Customer | Purchase | |
---|---|---|
1 | jam | |
2 | milk | |
3 | jam | |
3 | bread | |
4 | jam | |
4 | bread | |
4 | milk |
The Apriori, CARMA, and Sequence nodes can all use transactional data.
Tabular Data
Tabular data (also known as basket or truth-table data) have items represented by separate flags, where each flag field represents the presence or absence of a specific item. Each record represents a complete set of associated items. Flag fields can be categorical or numeric, although certain models may have more specific requirements.
Customer | Jam | Bread | Milk | |
---|---|---|---|---|
1 | T | F | F | |
2 | F | F | T | |
3 | T | T | F | |
4 | T | T | T |
The Apriori, CARMA, GSAR, and Sequence nodes can all use tabular data.