To run a simulation, each input field must be specified as fixed or simulated. Simulated inputs are those whose values are uncertain and will be generated by drawing from a specified probability distribution. When historical data are available for the inputs to be simulated, the distributions that most closely fit the data can be automatically determined, along with any correlations between those inputs. You can also manually specify distributions or correlations if historical data are not available or you require specific distributions or correlations.
Fixed inputs are those whose values are known and remain constant for each case generated in the simulation. For example, you have a linear regression model for sales as a function of a number of inputs including price, and you want to hold the price fixed at the current market price. You would then specify price as a fixed input.
For simulations based on a predictive model, each predictor in the model is an input field for the simulation. For simulations that do not include a predictive model, the fields that are specified on the Model tab are the inputs for the simulation.
Automatically fitting distributions and calculating correlations for simulated inputs. If the active dataset contains historical data for the inputs that you want to simulate, then you can automatically find the distributions that most closely fit the data for those inputs as well as determine any correlations between them. The steps are as follows:
- Verify that each of the inputs that you want to simulate is matched
up with the correct field in the active dataset. Inputs are listed
in the Input column and the Fit to column displays the matched field
in the active dataset. You can match an input to a different field
in the active dataset by selecting a different item from the Fit to
A value of -None- in the Fit to column indicates that the input could not be automatically matched to a field in the active dataset. By default, inputs are matched to dataset fields on name, measurement level and type (numeric or string). If the active dataset does not contain historical data for the input, then manually specify the distribution for the input or specify the input as a fixed input, as described below.
- Click Fit All.
The closest fitting distribution and its associated parameters are displayed in the Distribution column along with a plot of the distribution superimposed on a histogram (or bar chart) of the historical data. Correlations between simulated inputs are displayed on the Correlations settings. You can examine the fit results and customize automatic distribution fitting for a particular input by selecting the row for the input and clicking Fit Details. See the topic Fit Details (simulation) for more information.
You can run automatic distribution fitting for a particular input by selecting the row for the input and clicking Fit. Correlations for all simulated inputs that match fields in the active dataset are also automatically calculated.
- Cases with missing values for any simulated input are excluded from distribution fitting, computation of correlations, and computation of the optional contingency table (for inputs with a Categorical distribution). You can optionally specify whether user-missing values of inputs with a Categorical distribution are treated as valid. By default, they are treated as missing. For more information, see the topic Advanced Options (simulation).
- For continuous and ordinal inputs, if an acceptable fit cannot be found for any of the tested distributions, then the Empirical distribution is suggested as the closest fit. For continuous inputs, the Empirical distribution is the cumulative distribution function of the historical data. For ordinal inputs, the Empirical distribution is the categorical distribution of the historical data.
Manually specifying distributions. You can manually specify the probability distribution for any simulated input by selecting the distribution from the Type dropdown list and entering the distribution parameters in the Parameters grid. Once you have entered the parameters for a distribution, a sample plot of the distribution, based on the specified parameters, will be displayed adjacent to the Parameters grid. Following are some notes on particular distributions:
- Categorical. The categorical distribution describes an
input field that has a fixed number of values, referred to as categories.
Each category has an associated probability such that the sum of the
probabilities over all categories equals one. To enter a category,
click the left-hand column in the Parameters grid and specify the
category value. Enter the probability associated with the category
in the right-hand column. Note: Categorical inputs from a PMML model have categories that are determined from the model and cannot be modified.
- Negative Binomial - Failures. Describes the distribution of the number of failures in a sequence of trials before a specified number of successes are observed. The parameter thresh is the specified number of successes and the parameter prob is the probability of success in any given trial.
- Negative Binomial - Trials. Describes the distribution of the number of trials required before a specified number of successes are observed. The parameter thresh is the specified number of successes and the parameter prob is the probability of success in any given trial.
- Range. This distribution consists of a set of intervals
with a probability assigned to each interval such that the sum of
the probabilities over all intervals equals 1. Values within a given
interval are drawn from a uniform distribution defined on that interval.
Intervals are specified by entering a minimum value, a maximum value
and an associated probability.
For example, you believe that the cost of a raw material has a 40% chance of falling in the range of $10 - $15 per unit and a 60% chance of falling in the range of $15 - $20 per unit. You would model the cost with a Range distribution consisting of the two intervals [10 - 15] and [15 - 20], setting the probability associated with the first interval to 0.4 and the probability associated with the second interval to 0.6. The intervals do not have to be contiguous and they can even be overlapping. For example, you could have specified the intervals $10 - $15 and $20 - $25 or $10 - $15 and $13 - $16.
- Weibull. The parameter c is an optional location parameter, which specifies where the origin of the distribution is located.
Parameters for the following distributions have the same meaning as in the associated random variable functions available in the Compute Variable dialog box: Bernoulli, beta, binomial, exponential, gamma, lognormal, negative binomial (trials and failures), normal, Poisson and uniform.
See the topic Random variable functions for more information.
Specifying fixed inputs. Specify a fixed input by selecting Fixed from the Type dropdown list in the Distribution column and entering the fixed value. The value can be numeric or string depending on whether the input is numeric or string. String values should not be enclosed in quotation marks.
Specifying bounds on simulated values. Most distributions support specifying upper and lower bounds on the simulated values. You can specify a lower bound by entering a value into the Min text box and you can specify an upper bound by entering a value into the Max text box.
Locking inputs. Locking an input, by selecting the check box in the column with the lock icon, excludes the input from automatic distribution fitting. This is most useful when you manually specify a distribution or a fixed value and want to ensure that it will not be affected by automatic distribution fitting. Locking is also useful if you intend to share your simulation plan with users who will be running it in the Run Simulation dialog and you want to prevent any changes to certain inputs. In that regard, specifications for locked inputs cannot be modified in the Run Simulation dialog.
Sensitivity Analysis. Sensitivity analysis allows you to investigate the effect of systematic changes in a fixed input or in a distribution parameter for a simulated input by generating an independent set of simulated cases—effectively, a separate simulation—for each specified value. To specify sensitivity analysis, select a fixed or simulated input and click Sensitivity Analysis. Sensitivity analysis is limited to a single fixed input or a single distribution parameter for a simulated input. See the topic Sensitivity Analysis (simulation) for more information.
Fit status icons
Icons in the Fit to column indicate the fit status for each input field.
||No distribution has been specified for the input and the input has not been specified as fixed. In order to run the simulation, you must either specify a distribution for this input or define it to be fixed and specify the fixed value.|
||The input was previously fit to a field that does not exist in the active dataset. No action is necessary unless you want to refit the distribution for the input to the active dataset.|
||The closest fitting distribution has been replaced with an alternate distribution from the Fit Details dialog.|
||The input is set to the closest fitting distribution.|
||The distribution has been manually specified or sensitivity analysis iterations have been specified for this input.|