Sequence Model Nuggets
Sequence model nuggets represent the sequences found for a particular output field discovered by the Sequence node and can be added to streams to generate predictions.
When you run a stream containing a Sequence node, the node adds a pair of fields containing predictions and associated confidence values for each prediction from the sequence model to the data. By default, three pairs of fields containing the top three predictions (and their associated confidence values) are added. You can change the number of predictions generated when you build the model by setting the Sequence node model options at build time, as well as on the Settings tab after adding the model nugget to a stream. See the topic Sequence Model Nugget Settings for more information.
The new field names are derived from the model name. The field names are
$S-sequence-n for the prediction field (where n indicates the nth prediction)
and $SC-sequence-n for the confidence field. In a stream with multiple Sequence Rules nodes
in a series, the new field names will include numbers in the prefix to distinguish them from each
other. The first Sequence Set node in the stream will use the usual names, the second node will use
names starting with $S1- and $SC1-, the third node will use names starting with
$S2- and $SC2-, and so on. Predictions are displayed in order by confidence, so that
$S-sequence-1 contains the prediction with the highest confidence, $S-sequence-2
contains the prediction with the next highest confidence, and so on. For records where the number of
available predictions is smaller than the number of predictions requested, remaining predictions
contain the value $null$
. For example, if only two predictions can be made for a
particular record, the values of $S-sequence-3 and $SC-sequence-3 will be
$null$
.
For each record, the rules in the model are compared to the set of transactions processed for the current ID so far, including the current record and any previous records with the same ID and earlier timestamp. The k rules with the highest confidence values that apply to this set of transactions are used to generate the k predictions for the record, where k is the number of predictions specified on the Settings tab after adding the model to the stream. (If multiple rules predict the same outcome for the transaction set, only the rule with the highest confidence is used.) See the topic Sequence Model Nugget Settings for more information.
As with other types of association rule models, the data format must match the format used in building the sequence model. For example, models built using tabular data can be used to score only tabular data. See the topic Scoring Association Rules for more information.
Note: When scoring data using a generated Sequence Set node in a stream, any tolerance or gap settings that you selected in building the model are ignored for scoring purposes.
Predictions from Sequence Rules
The node handles the records in a time-dependent manner (or order-dependent, if no timestamp field was used to build the model). Records should be sorted by the ID field and timestamp field (if present). However, predictions are not tied to the timestamp of the record to which they are added. They simply refer to the most likely items to occur at some point in the future, given the history of transactions for the current ID up to the current record.
Note that the predictions for each record do not necessarily depend on that record's transactions. If the current record's transactions do not trigger a specific rule, rules will be selected based on the previous transactions for the current ID. In other words, if the current record doesn't add any useful predictive information to the sequence, the prediction from the last useful transaction for this ID is carried forward to the current record.
For example, suppose you have a Sequence model with the single rule
Jam -> Bread (0.66)
and you pass it the following records.
ID | Purchase | Prediction |
---|---|---|
001 |
jam
|
bread
|
001 |
milk
|
bread
|
Notice that the first record generates a prediction of bread, as you
would expect. The second record also contains a prediction of bread, because there's no rule
for jam followed by milk; therefore, the milk transaction doesn't add any
useful information, and the rule Jam -> Bread
still applies.
Generating New Nodes
The Generate menu allows you to create new SuperNodes based on the sequence model.
- Rule SuperNode. Creates a SuperNode that can detect and count occurrences of sequences in scored data. This option is disabled if no rule is selected. See the topic Generating a Rule SuperNode from a Sequence Model Nugget for more information.
- Model to Palette. Returns the model to the Models palette. This is useful in situations where a colleague may have sent you a stream containing the model and not the model itself.