Properties view of the Sequences operator

Use this view to define the properties for the Sequences operator.

A Sequences operator is a graphical icon representing a mining task that you place on the mining editor canvas to find typical sequences of events in the connected input table.

In the Properties view, set the properties for this operator by completing the fields in the following tabs:
  • General
  • Model Name
  • Mining Settings
  • Name Maps
  • Taxonomy
  • Column Properties
  • Item Format
  • Rule Filter

General tab

Label
You can rename the operator by specifying a new name. This new name appears on the operator icon in the mining editor canvas.
Description
You can add a description for the operator. When the generated model is stored in the IDMMX.RuleModels table, the description is copied into the DESCRIPTION column of the table.

Model Name tab

Prefix
The prefix for the name of the model created by the operator.

Default value: Name of the data warehousing project for the mining flow

Model name
The name of the model that you want to build. The model is stored in the IDMMX.RuleModels table. If a model with the same name already exists, the previous model is replaced with the new model. The maximum name length is 240 characters.
Default value: System determined

Mining Settings tab

The columns that you select from the Available columns table and move to the Output columns table will appear in the scorer output; the others are ignored.

Group column
The name of the column that contains the group ID. The group ID marks several items or events as parts of a particular transaction group. This is typically (but need not to be) a timestamp column. Initially, the value of this property is <none> , but it is mandatory to select a column from the input table. The columns of the input table that are not defined as GROUP column or SEQUENCE column are used as item columns.
Default value: <none>
Sequence column
The name of the column that contains the sequence ID. The sequence ID marks several items or events as related to one object (e.g. a customer). Initially, the value of this property is <none>, but it is mandatory to select a column from the input table. The columns of the input table that are not defined as GROUP column or SEQUENCE column are used as item columns.
Default value: <none>
Maximum rule length
The value for the maximum rule length. The rule length determines the maximum number of items sets that occur in a sequences rule. You must specify a value greater than or equal to 2, or 0. For example, if you specify 3 as the maximum rule length, the sequences rule contains two item sets in the rule body and one item set in the rule head. If you specify 0, the maximum rule length is not limited.
Default value: 2
Maximum number of rules
The value for the maximum number of rules that are generated in the model. If you specify 0, the maximum number of rules is not limited.
Default value: 10000
Minimum confidence
The minimum confidence for all rules expressed as a percentage. You can specify a value between 0 and 100. If you specify 0, the default value of 25% is automatically used as the lower limit for the confidence of a rule.
Default value: 25
Minimum support
The minimum support for all rules expressed as a percentage. You can specify a value between 0 and 100. If you specify 0, the value for minimum support is automatically determined to produce a result that contains at least some rules.
Default value: 2
Number of Bins
The number of bins that are automatically created for numerical columns. The minimum number is 2. The maximum number is not limited.

Default value: 5

Optional Parameters
With optional parameter strings, you can modify default parameters of the Easy Mining procedures. This is for advanced users only. For supported optional parameters refer to the Intelligent Miner® Easy Mining Procedures documentation.

Example for the sequences operator: DM_setPowerOptions('-buf 2000000'), DM_setItemFormat(10)

Name Maps tab

The tab shows a table that allows you to define the item ID column and the name column for each name mapping (virtual) table. The table has three columns:

Map Name (= Port)
The table has one row for every Names sub-port that is connected to a name mapping table. This column contains the name of the sub-port that is referenced by the row (the sub-port name is used as identifier for the name mapping at the same time). The values in this column cannot be edited; to add or delete rows in the table, add or delete the corresponding sub-ports in the mining editor.
Item ID Column
This column contains the name of the column in the name mapping table that contains the item IDs.
Default value: <none> (undefined). You must explicitly define this value.
Item Name Column
This column contains the name of the column in the name mapping table that contains the item names.
Default value: <none> (undefined). You must explicitly define this value.

The name mapping table is allowed to have more than two columns; only the columns selected as Item ID and Item Name are used; other columns are ignored.

There is an additional property tab that is displayed when you select one of the Names sub-ports: This Property tab allows you to enter the item ID column and the name column information for that particular name mapping table. This is just another way to enter this information individually on sub-port level instead of operator level.

Taxonomy tab

The tab shows a table that allows you to define various properties for each category mapping (virtual) table. The table has six columns:

Category Map
The table has one row for every Category sub-port that is connected to a category mapping table. This column contains the name of the sub-port that is referenced by the row (the sub-port name is used as identifier for the category mapping at the same time). The values in this column cannot be edited; to add or delete rows in the table, add or delete the corresponding sub-ports in the mining editor.
Child Column
This column contains the name of the column in the category mapping table that contains the child item or category IDs.
Default value: <none> (undefined). You must explicitly define this value.
Parent Column
This column contains the name of the column in the category mapping table that contains the parent category IDs.
Default value: <none> (undefined). You must explicitly define this value.
Recursive Map
In this column you can specify if the category map is a recursive map. Select “Yes” if the map is a recursive map, “No” otherwise. A recursive category map can hold relations between more than two consecutive levels in the category hierarchy.
Default value: No
Name Mapping
You can assign a name mapping table to the category mapping table. The name mapping table contains names for the category IDs in the category mapping table. In this column, you can select one of the existing map names.
Default value: <none> (undefined)
Description
In this column you can enter a description of the category mapping. This is just for informational purposes.
Default value: Empty

Column Properties tab

The tab shows a table that allows you to define various properties for each column in the input (virtual) table. The table has the following columns:

Input Column
This column contains the names of all columns of the input table. The column that is defined as group column (association rules) or sequence column (sequence rules) is not displayed. For all the other columns of the input table, column properties can be defined here. The values in this column cannot be edited.
SQL Type
This column contains the SQL Data Type of the input table column. The values in this column cannot be edited.
Field Type
Data-mining algorithms distinguish between numerical input data and categorical input data. Additionally, categorical input data can be of type set-valued categorical.
The mining function determines automatically whether the input data is numerical or categorical. However, you can override the input-data type for a column by setting the column type to categorical.
The mining function cannot determine whether the input data is of type set-valued categorical. If an input column is of this type, you must explicitly set the column type to set-valued categorical.
Default value: System Determined.
Field Usage Type
By default, a data mining algorithm automatically decides which of the columns of the input table are used to create a mining model. You can override this setting by specifying a column as “Active” or “Inactive”. You can also specify that a column is used as "Weight" column.
Default value: System Determined.
Name Mapping
Here you can assign a name mapping to the input table's column. The name mappings defined on the name mapping tab need to be assigned to those columns whose item IDs are to be mapped to names.
Default value: <none>
Taxonomy
Here you can specify if the taxonomy defined on the taxonomy tab is used for this column. Select “Yes” if the taxonomy is to be used, “No” otherwise.
Default value: No
Weight
Here you can assign weight columns to input table's item columns. Select weight columns from the list of available weight columns. This list includes the columns whose field usage type you previously set to "Weight". Select <none> to remove the assignment.

In a table, columns can be considered as item columns or weight columns. By default, columns are considered as item columns. Optionally, you can specify columns as weight columns by selecting the field usage type "Weight" for a column. At least one column in a table must be an item column.

In the Weight column, you can assign weight columns to item columns.

  • For every row in the table that represents an item column, a list of weight columns is available. The list of weight columns contains the columns whose field usage type you previously set to "Weight". If you did not set the field usage type to Weight for any column, the list of weight columns is empty.
  • For every row in the table that represents a weight column, the list of weight columns is empty.

By default, weight columns are not assigned to item columns. Therefore the entries in the table column Weight are set to None. After you have assigned a weight column to an item column, the entry in the Weight column shows the name of the assigned item column.

For weight columns, the properties field type, name mapping, or taxonomy are not available.

If you define a column as weight column without assigning it to other item columns, a validation error occurs.

You can remove a weight assignment by selecting <none>.

If you remove the field usage type "Weight" from a column by specifying one of the other field usage types, the previous weight assignments that might exist to other item columns are also removed.

Default value: <none>

Item Format tab

Item format
The item format defines how the item names are built from the column value and the column name. It also specifies the table layout that is assumed in the input data. The item format refers to all input columns except of the column that was defined as group column.

Default value: [Column value] if only one item column exists and [Column name = column value] if more than one item column exists.

The following values are supported:

[Column name = column value]
Item name is chosen as [Column name = column value], e.g. if you have a column named MARITAL_STATUS that contains the value 'married' the item name is 'MARITAL_STATUS = married'.
[Field value]
Item name is chosen as [Field value], e.g. if you have a column named ITEM that contains the value 'Apple' the item name is 'Apple'.
[Field name]
Item name is chosen as [Field name], e.g. if you have a column named MARRIED that contains the value 'Y' the item name is 'MARRIED'.

Rule Filter tab

With rule filters, you can determine the rules that you want to include in the results by specifying range constraints, count constraints, or item constraints.

The Rule Filter tab shows a table with the following columns:
Type
One of the following constraint types:
Range
Limits the search space for new rules by restricting the allowed range of rule property values.
Rule properties are, for example, support, confidence, lift, elapsed time, or length of the rules.
Count
If you have specified the maximum number of rules, you can select the "best" rules by using count constraints.
Item
Limits the amount of rules in result to the rules that include a particular condition. For example, you might want to include only the rules in the result that include the item "drink" in the rule body.
Description
A textual description of the defined constraint.

To edit the constraints, click the Edit icon above the table. The Rule filter editor is opened. The Rule Filter editor provides the following tabs:

Range
Limits the search space for new rules by restricting the allowed range of rule property values.

Rule properties are, for example, support, confidence, lift, elapsed time, or length of the rules.

Range constraints are grouped in the following properties:
Statistical properties
The following statistical properties are available:
  • Support
  • Confidence
  • Lift
  • Support*Confidence
Length limits
The following length limits are available:
  • Number of items
  • Number of item sets
  • Number of items in the rule body
  • Number of items in the rule head
Time step limits
The following time-step limits are available:
  • Total elapsed time from beginning to end of the rule
  • Elapsed time between adjacent parts of the rule
Cost/weight limits
The following cost/weight limits are available:
  • Weight of the rule
  • Weight of the rule body
  • Weight of the rule head
  • Support times weight of the rule
  • Weight of each item
  • Expected revenue
  • Total business volume that supports the rule
For each of the properties, you must set a lower limit, an upper limit, and a predicate. Assuming that a complete set of values ranges from 1 to 7. The lower limit is 3. The upper limit is 5. The predicates have the following meaning:
Subset including limits
From all values, the defined subset includes the lower limit and the upper limit. Based on the assumption above, the values are 3, 4, and 5.
Subset excluding limits
From all values, the defined subset excludes the lower limit and the upper limit. Based on the assumption above, the values is, 4.
All but subset including limits
All values but the defined subset including the lower limit and the upper limit. Based on the assumption above, the values are 1,2,6, and 7.
All but subset excluding limits
All values but the defined subset excluding the lower limit and the upper limit. Based on the assumption above, the values are 1, 2, 3, 5, 6, and 7.
Count
If you have specified the maximum number of rules, you can select the "best" rules by using count constraints.
Count constraints are grouped in the following properties:
Statistical properties
The following statistical properties are available:
  • Support
  • Confidence
  • Lift
  • Support*Confidence
Length limits
The following length limits are available:
  • Number of items
  • Number of item sets
Time step limits
The following time-step limits are available:
  • Total elapsed time from beginning to end of the rule
  • Elapsed time between adjacent parts of the rule
Cost/weight limits
The following cost/weight limits are available:
  • Weight of the rule
  • Weight of the rule body
  • Weight of the rule head
  • Support times weight of the rule
  • Expected revenue
  • Total business volume that supports the rule

If you want to use count constraints, you must specify the maximum number of rules and at least one count constraint.

To set count constraints, follow these steps:
  1. Set the maximum number of rules to be included in the results by typing an integer greater than 0 in the appropriate entry field.
  2. Select count constraints by moving count constraints from the left list to the right list.
  3. Select the sort order, for example, ascending.
  4. Select the priority for the count constraints by using the arrows from the tool bar.
Item
Limits the amount of rules in result to the rules that include a particular condition. For example, you might want to include only the rules in the result that include the item "bread" in the rule body.
A filter condition for an item constraint contains one of the following sets of elements:
  • A column name, a category, and the area where you want the item to appear
  • A column name, a value, and the area where you want the item to appear
You can combine more filter conditions by using one of the following logical operators:
  • And
  • Or
  • Not
For example, a filter condition might look like this::
PRODUCT = 'Orange juice' isIn BODY 
AND NOT PRODUCT = 'Toys' categoryIsIn RULE

On the Item page of the Filters wizard, the Columns container includes the names of item columns. It does not include the group column and the weight column of association rules and the group column, the weight column, and the sequence column of sequence rules.

If you select a column in the Columns container, the Categories container displays the categories that are related to the selected column. If a name mapping is defined for the taxonomy, the names that are mapped to the categories are also displayed.

The values of the columns are displayed in the Values container. If a name mapping is defined for the selected column, the names that are mapped to the values are also displayed.



Feedback