Use
this view to define the properties for the Sequences
operator.
A Sequences
operator is a graphical icon representing
a mining task that you place on the mining editor canvas
to find typical sequences of events in the connected input table.
In
the Properties view, set the properties for this operator by completing
the fields in the following tabs:
- General
- Model Name
- Mining Settings
- Name Maps
- Taxonomy
- Column
Properties
- Item Format
- Rule Filter
General tab
- Label
- You
can rename the operator by specifying a new name. This new
name appears on the operator icon in the mining editor canvas.
- Description
- You can add a
description for the operator. When the generated
model is stored in the IDMMX.RuleModels table, the description is
copied into the DESCRIPTION column of the table.
Model Name tab
- Prefix
- The
prefix for the name of the model created by the operator.
Default
value: Name of the data warehousing project for
the mining flow
- Model name
- The name of the model that you want to build. The model is stored
in the IDMMX.RuleModels table. If a model with the same name already
exists, the previous model is replaced with the new model. The maximum
name length is 240 characters.
- Default value: System determined
Mining Settings tab
The
columns that you
select from the Available columns table and move to the Output columns
table will appear in the scorer output; the others are ignored.
- Group column
- The name of the column that contains
the group ID. The group ID
marks several items or events as parts of a particular transaction
group. This is typically (but need not to be) a timestamp column.
Initially, the value of this property is <none> , but it is
mandatory to select a column from the input table. The columns of
the input table that are not defined as GROUP column or SEQUENCE column
are used as item columns.
- Default value: <none>
- Sequence column
- The name of
the column that contains the sequence ID. The sequence
ID marks several items or events as related to one object (e.g. a
customer). Initially, the value of this property is <none>,
but it is mandatory to select a column from the input table. The columns
of the input table that are not defined as GROUP column or SEQUENCE
column are used as item columns.
- Default value: <none>
- Maximum rule length
- The value
for the maximum rule length. The rule length determines
the maximum number of items sets that occur in a sequences rule. You
must specify a value greater than or equal to 2, or 0. For example,
if you specify 3 as the maximum rule length, the sequences rule contains
two item sets in the rule body and one item set in the rule head.
If you specify 0, the maximum rule length is not limited.
- Default
value: 2
- Maximum number of rules
- The value for the maximum number of rules that are generated in
the model. If you specify 0, the maximum number of rules is not limited.
- Default value: 10000
- Minimum
confidence
- The minimum confidence for all rules expressed
as a percentage.
You can specify a value between 0 and 100. If you specify 0, the default
value of 25% is automatically used as the lower limit for the confidence
of a rule.
- Default value: 25
- Minimum support
- The minimum support for all rules expressed
as a percentage. You
can specify a value between 0 and 100. If you specify 0, the value
for minimum support is automatically determined to produce a result
that contains at least some rules.
- Default value: 2
- Number of Bins
- The number of bins that
are automatically created for numerical
columns. The minimum number is 2. The maximum number is not limited.
Default
value: 5
- Optional Parameters
- With optional
parameter strings, you can modify default parameters
of the Easy Mining procedures. This is for advanced users only. For
supported optional parameters refer to the Intelligent Miner® Easy Mining Procedures
documentation.
Example for the sequences operator: DM_setPowerOptions('-buf
2000000'), DM_setItemFormat(10)
Name Maps tab
The tab shows a table that
allows you to define the item ID column and the name column for each
name mapping (virtual) table. The table has three columns:
- Map Name (= Port)
- The table has one row for every
Names sub-port that is connected
to a name mapping table. This column contains the name of the sub-port
that is referenced by the row (the sub-port name is used as identifier
for the name mapping at the same time). The values in this column
cannot be edited; to add or delete rows in the table, add or delete
the corresponding sub-ports in the mining editor.
- Item ID Column
- This column contains the name
of the column in the name mapping
table that contains the item IDs.
- Default value: <none>
(undefined). You must explicitly define
this value.
- Item Name Column
- This column contains the name of the column in the name mapping
table that contains the item names.
- Default value: <none>
(undefined). You must explicitly define
this value.
The name mapping table is allowed
to have more than
two columns; only the columns selected as Item ID and Item Name are
used; other columns are ignored.
There is an additional property
tab that is displayed when you select one of the Names sub-ports:
This Property tab allows you to enter the item ID column and the name
column information for that particular name mapping table. This is
just another way to enter this information individually on sub-port
level instead of operator level.
Taxonomy
tab
The tab shows a table that
allows you to define various properties for each category mapping
(virtual) table. The table has six columns:
- Category
Map
- The table has one row for every Category sub-port that
is connected
to a category mapping table. This column contains the name of the
sub-port that is referenced by the row (the sub-port name is used
as identifier for the category mapping at the same time). The values
in this column cannot be edited; to add or delete rows in the table,
add or delete the corresponding sub-ports in the mining editor.
- Child Column
- This column contains
the name of the column in the category mapping
table that contains the child item or category IDs.
- Default
value: <none> (undefined). You must explicitly define
this value.
- Parent Column
- This
column contains the name of the column in the category mapping
table that contains the parent category IDs.
- Default value: <none>
(undefined). You must explicitly define
this value.
- Recursive Map
- In
this column you can specify if the category map is a recursive
map. Select “Yes” if the map is a recursive map, “No” otherwise.
A recursive category map can hold relations between more than two
consecutive levels in the category hierarchy.
- Default value:
No
- Name Mapping
- You can
assign a name mapping table to the category mapping table.
The name mapping table contains names for the category IDs in the
category mapping table. In this column, you can select one of the
existing map names.
- Default value: <none> (undefined)
- Description
- In this column
you can enter a description of the category mapping.
This is just for informational purposes.
- Default value: Empty
Column
Properties tab
The
tab shows a table that allows you to define various properties for
each column in the input (virtual) table. The table has the following
columns:
- Input Column
- This column contains
the names of all columns of the input table.
The column that is defined as group column (association rules) or
sequence column (sequence rules) is not displayed. For all the other
columns of the input table, column properties can be defined here.
The values in this column cannot be edited.
- SQL Type
- This column contains the SQL Data Type
of the input table column.
The values in this column cannot be edited.
- Field Type
- Data-mining algorithms
distinguish between numerical input data
and categorical input data. Additionally, categorical input data can
be of type set-valued categorical.
- The
mining function determines automatically whether the input
data is numerical or categorical. However, you can override the input-data
type for a column by setting the column type to categorical.
- The mining function cannot determine whether the input data is
of type set-valued categorical. If an input column
is of this type, you must explicitly set the column type to set-valued
categorical.
- Default value: System Determined.
- Field Usage Type
- By default,
a data mining algorithm automatically decides which
of the columns of the input table are used to create a mining model.
You can override this setting by specifying a column as “Active” or
“Inactive”. You can also specify that a column is used as "Weight"
column.
- Default value: System Determined.
- Name Mapping
- Here you can assign a name mapping
to the input table's column.
The name mappings defined on the name mapping tab need to be assigned
to those columns whose item IDs are to be mapped to names.
- Default
value: <none>
- Taxonomy
- Here you can specify if the taxonomy defined on the taxonomy tab
is used for this column. Select “Yes” if the taxonomy is to be used,
“No” otherwise.
- Default value: No
- Weight
- Here you can assign weight columns to input table's item columns.
Select weight columns from the list of available weight columns. This
list includes the columns whose field usage type you previously set
to "Weight". Select <none> to remove the assignment.
In a
table, columns can be considered as item columns or weight columns.
By default, columns are considered as item columns. Optionally, you
can specify columns as weight columns by selecting the field usage
type "Weight" for a column. At least one column in a table must be
an item column.
In the Weight column, you can assign weight
columns to item columns.
- For every row in the table that
represents an item column, a list
of weight columns is available. The list of weight columns contains
the columns whose field usage type you previously set to "Weight".
If you did not set the field usage type to Weight for any column,
the list of weight columns is empty.
- For every row in the
table that represents a weight column, the
list of weight columns is empty.
By default, weight columns
are not assigned to item columns.
Therefore the entries in the table column Weight are set to None.
After you have assigned a weight column to an item column, the entry
in the Weight column shows the name of the assigned item column.
For
weight columns, the properties field type, name mapping, or taxonomy
are not available.
If you define a column as weight column without
assigning it to other item columns, a validation error occurs.
You
can remove a weight assignment by selecting <none>.
If
you remove the field usage type "Weight" from a column by specifying
one of the other field usage types, the previous weight assignments
that might exist to other item columns are also removed.
- Default
value: <none>
Item
Format tab
- Item format
- The item format
defines how the item names are built from the
column value and the column name. It also specifies the table layout
that is assumed in the input data. The item format refers to all input
columns except of the column that was defined as group column.
Default
value: [Column value] if only one item column exists and [Column name
= column value] if more than one item column exists.
The following
values are supported:
- [Column name = column value]
- Item name is chosen as [Column name = column value], e.g. if you
have a column named MARITAL_STATUS that contains the value 'married'
the item name is 'MARITAL_STATUS = married'.
- [Field
value]
- Item name is chosen as [Field value], e.g. if you have
a column
named ITEM that contains the value 'Apple' the item name is 'Apple'.
- [Field name]
- Item name is chosen as
[Field name], e.g. if you have a column
named MARRIED that contains the value 'Y' the item name is 'MARRIED'.
Rule Filter tab
With
rule filters, you can determine the rules that you want to include
in the results by specifying range constraints, count constraints,
or item constraints.
The Rule Filter tab shows a table with the following
columns:
- Type
- One of the following constraint types:
- Range
- Limits the search space for new rules by restricting the allowed range
of rule property values.
- Rule properties are, for example, support, confidence, lift, elapsed time,
or length of the rules.
- Count
- If you have specified the maximum number of rules, you can select the
"best" rules by using count constraints.
- Item
- Limits the amount of rules in result to the rules that include a particular
condition. For example, you might want to include only the rules in the result
that include the item "drink" in the rule body.
- Description
- A textual description of the defined constraint.
To edit the constraints, click the Edit icon
above the table. The Rule filter editor is opened. The Rule Filter editor
provides the following tabs:
- Range
- Limits the search space for new rules by
restricting the allowed
range of rule property values.
Rule properties are, for example,
support, confidence, lift, elapsed time, or length of the rules.
Range
constraints are grouped in the following properties:
- Statistical properties
- The following statistical properties are available:
- Support
- Confidence
- Lift
- Support*Confidence
- Length limits
- The following length
limits are available:
- Number of items
- Number of item
sets
- Number of items in the rule body
- Number of items
in the rule head
- Time step limits
- The following time-step
limits are available:
- Total elapsed time from beginning to
end of the rule
- Elapsed time between adjacent parts of the
rule
- Cost/weight limits
- The
following cost/weight limits are available:
- Weight of the rule
- Weight of the rule body
- Weight of the rule head
- Support
times weight of the rule
- Weight of each item
- Expected
revenue
- Total business volume that supports the rule
For each of the properties, you must set a lower
limit, an upper limit, and a predicate. Assuming that a complete set
of values ranges from 1 to 7. The lower limit is 3. The upper limit
is 5. The predicates have the following meaning:
- Subset
including limits
- From all values, the defined subset includes
the lower limit and
the upper limit. Based on the assumption above, the values are 3,
4, and 5.
- Subset excluding limits
- From
all values, the defined subset excludes the lower limit and
the upper limit. Based on the assumption above, the values is, 4.
- All but subset including limits
- All
values but the defined subset including the lower limit and
the upper limit. Based on the assumption above, the values are 1,2,6,
and 7.
- All but subset excluding limits
- All values but the defined subset excluding the lower limit and
the upper limit. Based on the assumption above, the values are 1,
2, 3, 5, 6, and 7.
- Count
- If you have specified the maximum number of rules, you can select
the "best" rules by using count constraints.
Count constraints are
grouped in the following properties:
- Statistical
properties
- The following statistical properties are available:
- Support
- Confidence
- Lift
- Support*Confidence
- Length limits
- The following
length limits are available:
- Number of items
- Number
of item sets
- Time step limits
- The following time-step limits are available:
- Total elapsed
time from beginning to end of the rule
- Elapsed time between
adjacent parts of the rule
- Cost/weight
limits
- The following cost/weight limits are available:
- Weight
of the rule
- Weight of the rule body
- Weight of the
rule head
- Support times weight of the rule
- Expected
revenue
- Total business volume that supports the rule
If you want to use count constraints, you must
specify the maximum number of rules and at least one count constraint.
To
set count constraints, follow these steps:
- Set the maximum
number of rules to be included in the results
by typing an integer greater than 0 in the appropriate entry field.
- Select count constraints by moving count constraints from the
left list to the right list.
- Select the sort order, for example,
ascending.
- Select the priority for the count constraints by
using the arrows
from the tool bar.
- Item
- Limits the amount of rules in result to the rules that include
a particular condition. For example, you might want to include only
the rules in the result that include the item "bread" in the rule
body.
A filter condition for an item constraint contains one of
the following sets of elements:
- A column name, a category,
and the area where you want the item
to appear
- A column name, a value, and the area where you want
the item to
appear
You can combine more filter conditions by using
one of
the following logical operators:
For example, a filter condition might look like this::
PRODUCT = 'Orange juice' isIn BODY
AND NOT PRODUCT = 'Toys' categoryIsIn RULE
On the
Item page of the Filters wizard, the Columns container includes the
names of item columns. It does not include the group column and the
weight column of association rules and the group column, the weight
column, and the sequence column of sequence rules.
If you select
a column in the Columns container, the Categories container displays
the categories that are related to the selected column. If a name
mapping is defined for the taxonomy, the names that are mapped to
the categories are also displayed.
The values of the columns
are displayed in the Values container. If a name mapping is defined
for the selected column, the names that are mapped to the values are
also displayed.