The Association mining function and the Sequence Rules mining function can read groups of items out of tables with different table layouts. The following examples show the table layouts that are supported. They also show how these table layouts are defined in your settings.
In the following table, there is a group ID field and an item field. Each record corresponds to one item in a group. This means that each group is represented by as many records as there are items in the group.
In this table, the group ID is mandatory because it is required to identify the items that belong to the same group.
This table shows the most flexible table layout because neither the maximum transaction length nor the number of different items per group are limited after the table format is defined.
Group ID | Item |
---|---|
Smith | checking account |
Smith | saving account |
Smith | credit card |
Jackson | checking account |
Jackson | credit card |
Jackson | loan |
Jackson | custody account |
Douglas | checking account |
Douglas | custody account |
IDMMX.DM_RuleSettings()..DM_useRuleDataSpec(...) ..DM_setGroup('Group ID')
In the table layout with limited transaction length and in the most inflexible table layout, each group is represented by exactly one record. The group ID is not required to find the items that belong to the same group. Therefore the group ID is optional for these table layouts for Associations. For Sequence Rules, the sequence and the group ID are mandatory.
In the following table, each item field contains the item at a given position in the group, or NULL if there is no such item in this group. The field value identifies the item, however, the field name is not relevant.
The table layout of this table limits the maximum transaction length to the number of item fields after the table is created.
Group ID | Item1 | Item2 | Item3 | Item4 |
---|---|---|---|---|
Smith | checking account | saving account | credit card | - |
Jackson | checking account | credit card | loan | custody account |
Douglas | checking account | custody account | - | - |
IDMMX.DM_RuleSettings()..DM_useRuleDataSpec(...) ..DM_setItemFormat(1) ..DM_setFldUsageType('Group ID', 2)
The command DM_setFldUsageType deactivates the field Group ID. The group field is not required because each transaction corresponds to one row in the data.
The command DM_setItemFormat(1) sets the item format in the generated rules to format ID = 1. This means that only the item field values, for example, saving account, appear as items in the generated rules.
If there are more than one item columns and you do not set the item format to 1, the default value 0 is used. Item format 0 means that the items in the generated rules are compositions of item column name and item value, for example, Item2=saving Account.
IDMMX.DM_RuleSettings()..DM_useRuleDataSpec(...) ..DM_setItemFormat(1) ..DM_setGroup('Group ID')
Group ID | Items |
---|---|
Smith | <item>checking account</item><item>saving account</item><item>credit account</item> |
Jackson | <item>checking account</item><item>credit card</item><item>loan</item><item>custody account</item> |
Douglas | <item>checking account</item><item>custody account</item> |
IDMMX.DM_RuleSettings()..DM_useRuleDataSpec(...) ..DM_setFldUsageType('Group ID',2) ..DM_setItemFormat(101)
The item format 101 specifies that, with exception of the deactivated Group ID field, the remaining fields are considered as item fields of the format 101. Format 101 is the multi-value format.
In the following table, each item field corresponds to one item. This means that if your table contains N different items, there are N item fields. The field identifies the item, and the field value states if the item is part of the group. In this example, "Y" indicates that the item is a member of the group. NULL, an empty string, or a blank string indicates that it is not a member of the group.
Each string that is not NULL, empty, or blank is considered as 'Yes'-indicator. Therefore you can use any string as 'Yes'-indicator.
Group ID | checking account | saving account | credit card | loan | custody account |
---|---|---|---|---|---|
Smith | Y | Y | Y | - | - |
Jackson | Y | - | Y | Y | Y |
Douglas | Y | - | - | - | Y |
The table layout above is the basic form of a most inflexible table layout because the maximum transaction length and the number of different items are set when the table is created.
IDMMX.DM_RuleSettings()..DM_useRuleDataSpec(...) ..DM_setItemFormat(2) ..DM_setFldUsageType('Group ID', 2)
The command DM_setItemFormat(2) specifies that all columns except the deactivated column Group ID are item columns of item format 2. Item format 2 means that only the item column names and not the item values are included in the generated rules.
Some of the item fields contain numeric values. There is no restriction to categorical values.
Group ID | checking account | saving account | credit card | loan | custody account |
---|---|---|---|---|---|
Smith | 122.76 | 25,183.15 | 1 | - | - |
Jackson | -3,000 | - | 5 | long term | high activity |
Douglas | 11,877.43 | - | - | - | low activity |
IDMMX.DM_RuleSettings()..DM_useRuleDataSpec(...) ..DM_setItemFomat(0) ..DM_setFldUsageType('Group ID', 2)
If you join two different tables for your input table, some entries might be duplicated in the resulting view. The duplicated items do not cause problems because the mining function ignores duplicated items in a group.
For example, you might want to join the following two tables:
Group ID | checking account | saving account | credit card | loan | custody account |
---|---|---|---|---|---|
Smith | Y | Y | Y | - | - |
Jackson | Y | - | Y | Y | Y |
Douglas | Y | - | - | - | Y |
If you join the two tables above, the following table is created.
Group ID | Item1 | Item2 | Item3 | Item4 | Customer access |
---|---|---|---|---|---|
Smith | checking account | saving account | credit card | - | Internet |
Smith | checking account | saving account | credit card | - | ATM/Desk |
Jackson | checking account | credit card | loan | custody account | Internet |
Douglas | checking account | custody account | - | - | Internet |
Intelligent Miner® supports different ways how the item names in Association rule models or in Sequence rule models are built from the content and the column names of the training data table. These different ways are called item formats. Some of these item formats are described in Definition of the input table layout.
This format is another version of the format 0. Format 10 does not have restrictions, it accepts all item values.
This format is another version of format 1. Format 11 does not have restrictions, it accepts all item values.
<item>itemValue1</item>...<item>itemValueN</item>
DM_RuleSettings..DM_getItemFormat( fldName )