|
The Apriori node extracts a set of rules from the data, pulling out the rules with the
highest information content. Apriori offers five different methods of selecting rules and uses a
sophisticated indexing scheme to process large data sets efficiently. For large problems, Apriori is
generally faster to train; it has no arbitrary limit on the number of rules that can be retained,
and it can handle rules with up to 32 preconditions. Apriori requires that input and output fields
all be categorical but delivers better performance because it is optimized for this type of data.
|
Example
node = stream.create("apriori", "My node")
# "Fields" tab
node.setPropertyValue("custom_fields", True)
node.setPropertyValue("partition", "Test")
# For non-transactional
node.setPropertyValue("use_transactional_data", False)
node.setPropertyValue("consequents", ["Age"])
node.setPropertyValue("antecedents", ["BP", "Cholesterol", "Drug"])
# For transactional
node.setPropertyValue("use_transactional_data", True)
node.setPropertyValue("id_field", "Age")
node.setPropertyValue("contiguous", True)
node.setPropertyValue("content_field", "Drug")
# "Model" tab
node.setPropertyValue("use_model_name", False)
node.setPropertyValue("model_name", "Apriori_bp_choles_drug")
node.setPropertyValue("min_supp", 7.0)
node.setPropertyValue("min_conf", 30.0)
node.setPropertyValue("max_antecedents", 7)
node.setPropertyValue("true_flags", False)
node.setPropertyValue("optimize", "Memory")
# "Expert" tab
node.setPropertyValue("mode", "Expert")
node.setPropertyValue("evaluation", "ConfidenceRatio")
node.setPropertyValue("lower_bound", 7)
Table 1. apriorinode properties
apriorinode Properties |
Values |
Property description |
consequents
|
field
|
Apriori models use Consequents and Antecedents in place of the standard target and input
fields. Weight and frequency fields are not used. See the topic Common modeling node properties for more information.
|
antecedents
|
[field1 ... fieldN] |
|
min_supp
|
number
|
|
min_conf
|
number
|
|
max_antecedents
|
number
|
|
true_flags
|
flag
|
|
optimize
|
Speed
Memory
|
|
use_transactional_data
|
flag
|
When the value is true , the score for each transaction ID is independent
from other transaction IDs. When the data to be scored is too large to obtain acceptable
performance, we recommend separating the data. |
contiguous
|
flag
|
|
id_field
|
string
|
|
content_field
|
string
|
|
mode
|
Simple
Expert
|
|
evaluation
|
RuleConfidence
DifferenceToPrior
ConfidenceRatio
InformationDifference
NormalizedChiSquare
|
|
lower_bound
|
number
|
|
optimize
|
Speed
Memory
|
Use to specify whether model building should be optimized for speed or for memory. |