Using Frequency and Weight Fields
Frequency and weight fields are used to give extra importance to some records over others, for example, because you know that one section of the population is under-represented in the training data (weight) or because one record represents a number of identical cases (frequency).
- Values for a frequency field should be positive integers. Records with a negative or zero frequency weight are excluded from the analysis. Non-integer frequency weights are rounded to the nearest integer.
- Case weight values should be positive but need not be integer values. Records with a negative or zero case weight are excluded from the analysis.
Scoring Frequency and Weight Fields
Frequency and weight fields are used in training models, but are not used in scoring, because the score for each record is based on its characteristics regardless of how many cases it represents. For example, suppose you have the data in the following table.
Married | Responded |
---|---|
Yes | Yes |
Yes | Yes |
Yes | Yes |
Yes | No |
No | Yes |
No | No |
No | No |
Based on this, you conclude that three out of four married people respond to the promotion, and two out of three unmarried people didn’t respond. So you will score any new records accordingly, as shown in the following table.
Married | $-Responded | $RP-Responded |
---|---|---|
Yes | Yes | 0.75 (three/four) |
No | No | 0.67 (two/three) |
Alternatively, you could store your training data more compactly, using a frequency field, as shown in the following table.
Married | Responded | Frequency |
---|---|---|
Yes | Yes | 3 |
Yes | No | 1 |
No | Yes | 1 |
No | No | 2 |
Since this represents exactly the same dataset, you will build the same model and predict responses based solely on marital status. If you have ten married people in your scoring data, you will predict Yes for each of them regardless of whether they are presented as ten separate records, or one with a frequency value of 10. Weight, although generally not an integer, can be thought of as similarly indicating the importance of a record. This is why frequency and weight fields are not used when scoring records.
Evaluating and Comparing Models
Some model types support frequency fields, some support weight fields, and some support both. But in all cases where they apply, they are used only for model building and are not considered when evaluating models using an Evaluation node or Analysis node, or when ranking models using most of the methods supported by the Auto Classifier and Auto Numeric nodes.
- When comparing models (using evaluation charts, for example), frequency and weight values will be ignored. This enables a level comparison between models that use these fields and models that don't, but means that for an accurate evaluation, a dataset that accurately represents the population without relying on a frequency or weight field must be used. In practical terms, you can do this by making sure that models are evaluated using a testing sample in which the value of the frequency or weight field is always null or 1. (This restriction only applies when evaluating models; if frequency or weight values were always 1 for both training and testing samples, there would be no reason to use these fields in the first place.)
- If using Auto Classifier, frequency can be taken into account if ranking models based on Profit, so this method is recommended in that case.
- If necessary, you can split the data into training and testing samples using a Partition node.