Sequence Node Expert Options
For those with detailed knowledge of the Sequence node's operation, the following expert options allow you to fine-tune the model-building process. To access expert options, set the Mode to Expert on the Expert tab.
Set maximum duration. If this option is selected, sequences will be limited to those with a duration (the time between the first and last item set) less than or equal to the value specified. If you haven't specified a time field, the duration is expressed in terms of rows (records) in the raw data. If the time field used is a time, date, or timestamp field, the duration is expressed in seconds. For numeric fields, the duration is expressed in the same units as the field itself.
Set pruning value. The CARMA algorithm used in the Sequence node periodically removes (prunes) infrequent item sets from its list of potential item sets during processing to conserve memory. Select this option to adjust the frequency of pruning. The number specified determines the frequency of pruning. Enter a smaller value to decrease the memory requirements of the algorithm (but potentially increase the training time required), or enter a larger value to speed up training (but potentially increase memory requirements).
Set maximum sequences in memory. If this option is selected, the CARMA algorithm will limit its memory store of candidate sequences during model building to the number of sequences specified. Select this option if IBM® SPSS® Modeler is using too much memory during the building of Sequence models. Note that the maximum sequences value you specify here is the number of candidate sequences tracked internally as the model is built. This number should be much larger than the number of sequences you expect in the final model.
Constrain gaps between item sets. This option allows you to specify constraints on the time gaps that separate item sets. If selected, item sets with time gaps smaller than the Minimum gap or larger than the Maximum gap that you specify will not be considered to form part of a sequence. Use this option to avoid counting sequences that include long time intervals or those that take place in a very short time span.
Note: If the time field used is a time, date, or timestamp field, the time gap is expressed in seconds. For numeric fields, the time gap is expressed in the same units as the time field.
For example, consider the following list of transactions.
ID | Time | Content |
---|---|---|
1001 | 1 | apples |
1001 | 2 | bread |
1001 | 5 | cheese |
1001 | 6 | dressing |
If you build a model on these data with the minimum gap set to 2, you would get the following sequences:
apples -> cheese
apples -> dressing
bread -> cheese
bread -> dressing
You would not see sequences such as apples -> bread
because the gap between apples
and bread
is smaller than the
minimum gap. Similarly, consider the following alternative data.
ID | Time | Content |
---|---|---|
1001 | 1 | apples |
1001 | 2 | bread |
1001 | 5 | cheese |
1001 | 20 | dressing |
If the maximum gap were set to 10, you would not see any sequences with
dressing
, because the gap between cheese
and
dressing
is too large for them to be considered part of the same sequence.