Advanced Options (simulation)

Maximum Number of Cases. This specifies the maximum number of cases of simulated data, and associated target values, to generate. When sensitivity analysis is specified, this is the maximum number of cases for each iteration.

Target for stopping criteria. If your predictive model contains more than one target, then you can select the target to which stopping criteria are applied.

Stopping criteria. These choices specify criteria for stopping the simulation, potentially before the maximum number of allowable cases has been generated.

  • Continue until maximum is reached. This specifies that simulated cases will be generated until the maximum number of cases is reached.
  • Stop when the tails have been sampled. Use this option when you want to ensure that one of the tails of a specified target distribution has been adequately sampled. Simulated cases will be generated until the specified tail sampling is complete or the maximum number of cases is reached. If your predictive model contains multiple targets then select the target, to which this criteria will be applied, from the Target for stopping criteria dropdown list.

    Type. You can define the boundary of the tail region by specifying a value of the target such as 10,000,000 or a percentile such as the 99th percentile. If you choose Value in the Type dropdown list, then enter the value of the boundary in the Value text box and use the Side dropdown list to specify whether this is the boundary of the Left tail region or the Right tail region. If you choose Percentile in the Type dropdown list, then enter a value in the Percentile text box.

    Frequency. Specify the number of values of the target that must lie in the tail region in order to ensure that the tail has been adequately sampled. Generation of cases will stop when this number has been reached.

  • Stop when the confidence interval of the mean is within the specified threshold. Use this option when you want to ensure that the mean of a given target is known with a specified degree of accuracy. Simulated cases will be generated until the specified degree of accuracy has been achieved or the maximum number of cases is reached. To use this option, you specify a confidence level and a threshold. Simulated cases will be generated until the confidence interval associated the specified level is within the threshold. For example, you can use this option to specify that cases are generated until the confidence interval of the mean at the 95% confidence level is within 5% of the mean value. If your predictive model contains multiple targets then select the target, to which this criteria will be applied, from the Target for stopping criteria dropdown list.

    Threshold Type. You can specify the threshold as a numeric value or as a percent of the mean. If you choose Value in the Threshold Type dropdown list, then enter the threshold in the Threshold as Value text box. If you choose Percent in the Threshold Type dropdown list, then enter a value in the Threshold as Percent text box.

Number of cases to sample. This specifies the number of cases to use when automatically fitting distributions for simulated inputs to the active dataset. If your dataset is very large you might want to consider limiting the number of cases used for distribution fitting. If you select Limit to N cases, the first N cases will be used.

Goodness of fit criteria (Continuous). For continuous inputs, you can use the Anderson-Darling test or the Kolmogorov-Smirnoff test of goodness of fit to rank distributions when fitting distributions for simulated inputs to the active dataset. The Anderson-Darling test is selected by default and is especially recommended when you want to ensure the best possible fit in the tail regions.

Empirical Distribution. For continuous inputs, the Empirical distribution is the cumulative distribution function of the historical data. You can specify the number of bins used for calculating the Empirical distribution for continuous inputs. The default is 100 and the maximum is 1000.

Replicate results. Setting a random seed allows you to replicate your simulation. Specify an integer or click Generate, which will create a pseudo-random integer between 1 and 2147483647, inclusive. The default is 629111597.
Note: For a particular random seed, results are replicated unless the number of threads is changed. On a particular computer, the number of threads is constant unless you change it by running SET THREADS command syntax. The number of threads might change if you run the simulation on a different computer because an internal algorithm is used to determine the number of threads for each computer.

User-missing values for inputs with a Categorical distribution. These controls specify whether user-missing values of inputs with a Categorical distribution are treated as valid. System-missing values and user-missing values for all other types of inputs are always treated as invalid. All inputs must have valid values for a case to be included in distribution fitting, computation of correlations, and computation of the optional contingency table.