Building sequence rule models
The Sequence Rule mining function computes sequence rules. In previous versions of the Intelligent Miner® for Data, the Sequence Rule mining function was called Sequential Patterns mining function.
You can build a sequence rule model by using the BuildSeqRuleModel procedure.
Syntax
IDMMX.BuildSeqRuleModel(<modelName>,
<inputTable>,
<sequenceColumn>,
<groupColumn>,
<minSupport>,
<minConfidence>,
<maxRuleLength>,
[,<optionsString>])Input parameters
With the BuildSeqRuleModel procedure, you must specify the following parameters:
- <modelName>
- The name of the model that
you want to build.
The model is stored in the
IDMMX.RuleModelstable. If a model with the same name already exists, the previous model is replaced with the new model.This parameter is of type VARCHAR. Its size is 240.
- <inputTable>
- The name of the
input table or the input view.
The BuildSeqRuleModel procedure starts a mining run on this table.
The columns of the input table that are unlikely to be useful to create a model are ignored by the Easy Mining procedure. These are, for example, key columns.
This parameter is of type VARCHAR. Its size is 240.
- <sequenceColumn>
- The name of the sequence
column.
A sequence contains the item sets that have the same sequence ID.
- <groupColumn>
- The name of the column that contains the group ID or the transaction
ID.
The remaining columns of the input table with the exception of the sequences column are used as item columns.
An item set contains items that have the same sequence ID and the same group ID.
The item sets in a sequence are sorted according to the value in the group column.
This parameter is of type VARCHAR. Its size is 128.
- <minSupport>
- The minimum support
value for all sequence rules is expressed
as a percentage.
You can specify a value between 0 and 100. If you specify 0, the value for minimum support is automatically determined to produce a result that contains at least a few sequence rules.
This parameter is of type REAL.
- <minConfidence>
- The minimum confidence value for all sequence rules is expressed
as a percentage.
You can specify a value between 0 and 100. If you specify 0, the default value of 25% is automatically used as the lower limit for the confidence of a sequence rule.
This parameter is of type REAL.
- <maxRuleLength>
- The value for the maximum rule length.
The rule length determines the maximum number of item sets that occur in a sequence rule. You must specify a value of at least 2. For example, if you specify 3 as the maximum rule length, the sequence rule might contain up to two item sets in the rule body and one item set in the rule head. If you specify a negative value or 0, the maximum rule length is not limited.
This parameter is of type INTEGER.
- <optionsString>
- The optional
parameter string that you want to use.
This parameter is of type VARCHAR. Its size is 32672.
Example
You might want to build the model BANK.PRODUCT_SEQ_RULES to
compute the sequence rules for the products that are bought by bank
customers.
call IDMMX.BuildSeqRuleModel('BANK.PRODUCT_SEQ_RULES',
'BANK.CUSTOMER_PRODUCTS2',
'CLIENT_ID',
'DATE',
5, 30, 3); Output
The FindSeqRules procedure uses sequence rules to find relationships in your data. Sequential relationships are represented as sequence rules. Sequence rules describe patterns in sequences. Depending on the business area, sequences might be, for example, purchases of customers or defects of cars over time.
<digital camera and rechargeable batteries> >>>
<memory card> ==> <photo printer> where:- <digital camera and rechargeable batteries>
- represents an individual item set that is part of the rule body
- >>>
- represents a temporal ordering of item sets in ascending order
- <memory card>
- represents an individual item set that is part of the rule body
- ==>
- splits the sequence rule into a sequence rule head and a sequence rule body
- <photo printer>
- represents an item set that is included in the sequence rule head
You can interpret the sequence rule above like this: If customers buy a digital camera together with rechargeable batteries at one purchase and a memory card in a later purchase, they will buy a photo printer during a subsequent purchase.
- Confidence
- The confidence value
represents the validity of the rule. A confidence
value of 50% means that in 50% of the cases where a particular rule
body is present in a sequence, a particular rule head is also present
after the item sets of the rule body.
For example, in the sequence rule above, a confidence value of 50% means that 50% of the customers who bought a digital camera together with rechargeable batteries at one purchase and a memory card in a later purchase, bought a photo printer during a subsequent visit.
- Support
- The support value indicates how many sequences are covered by
a sequence rule. The support value is expressed as a percentage of
the total number of sequences.For example, a support value of 2% in the following sequence rule means that 2% of all customers purchased all three sets of objects in this particular sequence. Note that they might have bought other items along with the items in the sequence rule, and that they might have made other purchases as well.
<digital camera and rechargeable batteries> => <memory card> =► <photo printer> - Lift
- The lift value indicates how much
the confidence value is different
from the expected confidence value.
The lift value is computed by dividing the confidence value by the support value of the sequence rule head.
If the support value of the above example is 10% and the confidence value of the sequence rule is 50%, the value for lift is 50% divided by 10% = 5.
A lift value of 5 means, that customers who buy a digital camera together with rechargeable batteries at one purchase and a memory card in a later purchase, are 5 times more likely than average customers to buy a photo printer during a subsequent purchase.
- Mean time difference
- This value indicates the mean time difference between the time
stamp of the first item set and the time stamp of the last item set
in a sequence.
If the type of the group column is numeric, this value is the mean value of the differences between pairs of subsequent groups of transactions.
- Standard Deviation of time difference
- This value indicates the standard deviation
of the time difference
between the time stamp of the first item set and the time stamp of
the last item set in a sequence.
If the type of the group column is numeric, this value is the standard deviation of the differences between pairs of subsequent groups of transactions.