Advanced Options (simulation)
Maximum Number of Cases. This specifies the maximum number of cases of simulated data, and associated target values, to generate. When sensitivity analysis is specified, this is the maximum number of cases for each iteration.
Target for stopping criteria. If your predictive model contains more than one target, then you can select the target to which stopping criteria are applied.
Stopping criteria. These choices specify criteria for stopping the simulation, potentially before the maximum number of allowable cases has been generated.
- Continue until maximum is reached. This specifies that simulated cases will be generated until the maximum number of cases is reached.
- Stop when the tails have been sampled. Use this option
when you want to ensure that one of the tails of a specified target
distribution has been adequately sampled. Simulated cases will be generated
until the specified tail sampling is complete or the maximum number
of cases is reached. If your predictive model contains multiple targets
then select the target, to which this criteria will be applied, from
the Target for stopping criteria dropdown list.
Type. You can define the boundary of the tail region by specifying a value of the target such as 10,000,000 or a percentile such as the 99th percentile. If you choose Value in the Type dropdown list, then enter the value of the boundary in the Value text box and use the Side dropdown list to specify whether this is the boundary of the Left tail region or the Right tail region. If you choose Percentile in the Type dropdown list, then enter a value in the Percentile text box.
Frequency. Specify the number of values of the target that must lie in the tail region in order to ensure that the tail has been adequately sampled. Generation of cases will stop when this number has been reached.
- Stop when the confidence interval of the mean is within the specified threshold. Use
this option when you want to ensure that the mean of a given target
is known with a specified degree of accuracy. Simulated cases will be generated
until the specified degree of accuracy has been achieved or the maximum
number of cases is reached. To use this option, you specify a confidence
level and a threshold. Simulated cases will be generated until the
confidence interval associated the specified level is within the threshold. For
example, you can use this option to specify that cases are generated
until the confidence interval of the mean at the 95% confidence level
is within 5% of the mean value. If your predictive model contains multiple targets
then select the target, to which this criteria will be applied, from
the Target for stopping criteria dropdown list.
Threshold Type. You can specify the threshold as a numeric value or as a percent of the mean. If you choose Value in the Threshold Type dropdown list, then enter the threshold in the Threshold as Value text box. If you choose Percent in the Threshold Type dropdown list, then enter a value in the Threshold as Percent text box.
Number of cases to sample. This specifies the number of cases to use when automatically fitting distributions for simulated inputs to the active dataset. If your dataset is very large you might want to consider limiting the number of cases used for distribution fitting. If you select Limit to N cases, the first N cases will be used.
Goodness of fit criteria (Continuous). For continuous inputs, you can use the Anderson-Darling test or the Kolmogorov-Smirnoff test of goodness of fit to rank distributions when fitting distributions for simulated inputs to the active dataset. The Anderson-Darling test is selected by default and is especially recommended when you want to ensure the best possible fit in the tail regions.
Empirical Distribution. For continuous inputs, the Empirical distribution is the cumulative distribution function of the historical data. You can specify the number of bins used for calculating the Empirical distribution for continuous inputs. The default is 100 and the maximum is 1000.
SET THREADS
command
syntax. The number of threads might change if you run the simulation on a different computer because
an internal algorithm is used to determine the number of threads for each computer.User-missing values for inputs with a Categorical distribution. These controls specify whether user-missing values of inputs with a Categorical distribution are treated as valid. System-missing values and user-missing values for all other types of inputs are always treated as invalid. All inputs must have valid values for a case to be included in distribution fitting, computation of correlations, and computation of the optional contingency table.