Building sequence rule models

The Sequence Rule mining function computes sequence rules. In previous versions of the Intelligent Miner® for Data, the Sequence Rule mining function was called Sequential Patterns mining function.

You can build a sequence rule model by using the BuildSeqRuleModel procedure.

Syntax

IDMMX.BuildSeqRuleModel(<modelName>,
                        <inputTable>,
                        <sequenceColumn>,
                        <groupColumn>,
                        <minSupport>,
                        <minConfidence>,
                        <maxRuleLength>,
                        [,<optionsString>])

Input parameters

With the BuildSeqRuleModel procedure, you must specify the following parameters:

<modelName>

The name of the model that you want to build.

The model is stored in the IDMMX.RuleModels table. If a model with the same name already exists, the previous model is replaced with the new model.

This parameter is of type VARCHAR. Its size is 240.

<inputTable>

The name of the input table or the input view.

The BuildSeqRuleModel procedure starts a mining run on this table.

The columns of the input table that are unlikely to be useful to create a model are ignored by the Easy Mining procedure. These are, for example, key columns.

This parameter is of type VARCHAR. Its size is 240.

<sequenceColumn>

The name of the sequence column.

A sequence contains the item sets that have the same sequence ID.

<groupColumn>

The name of the column that contains the group ID or the transaction ID.

The remaining columns of the input table with the exception of the sequences column are used as item columns.

An item set contains items that have the same sequence ID and the same group ID.

The item sets in a sequence are sorted according to the value in the group column.

This parameter is of type VARCHAR. Its size is 128.

<minSupport>

The minimum support value for all sequence rules is expressed as a percentage.

You can specify a value between 0 and 100. If you specify 0, the value for minimum support is automatically determined to produce a result that contains at least a few sequence rules.

This parameter is of type REAL.

<minConfidence>

The minimum confidence value for all sequence rules is expressed as a percentage.

You can specify a value between 0 and 100. If you specify 0, the default value of 25% is automatically used as the lower limit for the confidence of a sequence rule.

This parameter is of type REAL.

<maxRuleLength>

The value for the maximum rule length.

The rule length determines the maximum number of item sets that occur in a sequence rule. You must specify a value of at least 2. For example, if you specify 3 as the maximum rule length, the sequence rule might contain up to two item sets in the rule body and one item set in the rule head. If you specify a negative value or 0, the maximum rule length is not limited.

This parameter is of type INTEGER.

<optionsString>

The optional parameter string that you want to use.

This parameter is of type VARCHAR. Its size is 32672.

Example

You might want to build the model BANK.PRODUCT_SEQ_RULES to compute the sequence rules for the products that are bought by bank customers.

Use the following command to run the Easy Mining procedure:

call IDMMX.BuildSeqRuleModel('BANK.PRODUCT_SEQ_RULES',
                             'BANK.CUSTOMER_PRODUCTS2',
                             'CLIENT_ID',
                             'DATE',
                              5, 30, 3);

Output

The FindSeqRules procedure uses sequence rules to find relationships in your data. Sequential relationships are represented as sequence rules. Sequence rules describe patterns in sequences. Depending on the business area, sequences might be, for example, purchases of customers or defects of cars over time.

For example, customers might buy a digital camera and rechargeable batteries. A couple of weeks later, they buy a memory card and, again a couple of weeks later, they buy a photo printer. The sequence rule of this pattern looks like this:

<digital camera and rechargeable batteries> >>> 
<memory card> ==> <photo printer>

where:

<digital camera and rechargeable batteries>: represents an individual item set that is part of the rule body
>>>: represents a temporal ordering of item sets in ascending order
<memory card>: represents an individual item set that is part of the rule body
==>: splits the sequence rule into a sequence rule head and a sequence rule body
<photo printer>: represents an item set that is included in the sequence rule head

You can interpret the sequence rule above like this: If customers buy a digital camera together with rechargeable batteries at one purchase and a memory card in a later purchase, they will buy a photo printer during a subsequent purchase.

Sequence rules include the following attributes:

Confidence

The confidence value represents the validity of the rule. A confidence value of 50% means that in 50% of the cases where a particular rule body is present in a sequence, a particular rule head is also present after the item sets of the rule body.

For example, in the sequence rule above, a confidence value of 50% means that 50% of the customers who bought a digital camera together with rechargeable batteries at one purchase and a memory card in a later purchase, bought a photo printer during a subsequent visit.

Support

The support value indicates how many sequences are covered by a sequence rule. The support value is expressed as a percentage of the total number of sequences.

For example, a support value of 2% in the following sequence rule means that 2% of all customers purchased all three sets of objects in this particular sequence. Note that they might have bought other items along with the items in the sequence rule, and that they might have made other purchases as well.

<digital camera and rechargeable batteries> => 
<memory card> =► <photo printer>

Lift

The lift value indicates how much the confidence value is different from the expected confidence value.

The lift value is computed by dividing the confidence value by the support value of the sequence rule head.

If the support value of the above example is 10% and the confidence value of the sequence rule is 50%, the value for lift is 50% divided by 10% = 5.

A lift value of 5 means, that customers who buy a digital camera together with rechargeable batteries at one purchase and a memory card in a later purchase, are 5 times more likely than average customers to buy a photo printer during a subsequent purchase.

Mean time difference

This value indicates the mean time difference between the time stamp of the first item set and the time stamp of the last item set in a sequence.

If the type of the group column is numeric, this value is the mean value of the differences between pairs of subsequent groups of transactions.

Standard Deviation of time difference

This value indicates the standard deviation of the time difference between the time stamp of the first item set and the time stamp of the last item set in a sequence.

If the type of the group column is numeric, this value is the standard deviation of the differences between pairs of subsequent groups of transactions.