SIMINPUT subcommand (SIMPLAN command)

The SIMINPUT subcommand specifies input fields whose values will be simulated. It is a required subcommand. When the MODEL subcommand is used, each input in the associated model file must be specified as either simulated or fixed. Use the FIXEDINPUT subcommand to specify fixed inputs.

INPUT keyword

The INPUT keyword specifies names and optional settings for one or more simulated inputs. The INPUT keyword is required. The specification for each input is the field name with an optional qualifier of the form (MAPTO=name FORMAT=format), where the parentheses are required. Use a blank space to separate specifications for multiple inputs. The keywords TO and ALL are not supported for variable lists specified on the INPUT keyword. For example:

INPUT=input1(MAPTO=field1 FORMAT=F,4) input2(FORMAT=DOLLAR)

  • MAPTO. Maps a simulated field to a field in the active dataset. The MAPTO keyword is only needed when you are automatically fitting to data in the active dataset and the name of a simulated field differs from the name of the associated field in the active dataset. The value of MAPTO should be the name of the field in the active dataset.
  • FORMAT. Specifies the output format of the field. The format consists of a format type, such as F, optionally followed by a comma and the number of decimal places. If the number of decimal places is omitted then it is assumed that there are 0 decimal places. The following formats are supported:
Table 1. Format specifications
Format Specification Definition
F,d Numeric
E,d Scientific notation
N,d Restricted numeric
DOT,d Numeric with dots
COMMA,d Numeric with commas
DOLLAR,d Numeric with commas and dollar sign
PCT,d Numeric with percent sign
CCA,d Custom currency
CCB,d Custom currency
CCC,d Custom currency
CCD,d Custom currency
CCE,d Custom currency

If no format is specified, then the Numeric format F is used for numeric inputs.

OUTPUT. Specifies whether the inputs listed on the INPUT keyword are included in table and chart output. The default is YES.

TYPE keyword

The TYPE keyword specifies whether the probability distribution for this simulated input field is determined by automatically fitting to the data for this field in the active dataset or by manually specifying the distribution.

  • MANUAL(LOCK = YES | NO SAVEASFITTED = YES | NO). Indicates that the probability distribution is manually specified. When MANUAL is specified, the DISTRIBUTION keyword must be used unless the global SOURCE keyword specifies a simulation plan file (for more information, see the topic SOURCE Keyword (SIMPLAN command)).

    The LOCK keyword specifies whether the distribution for this simulated input field will be locked. Locked distributions will not be modified when automatically fitting distributions interactively in the user interface, using the Simulation Builder or the Run Simulation dialog. If you are creating a simulation plan that you or someone else will work with in the user interface and you want to prevent refitting a simulated input to historical data, then be sure to specify LOCK=YES for that input. Users who open the plan in the Run Simulation dialog will not be able to make any modifications to the distribution of the input. Users who open the plan in the Simulation Builder will be able to make changes to the distribution once they unlock the input.

    The SAVEASFITTED keyword occurs when pasting syntax for a simulated input whose distribution was automatically fitted to historical data and then locked in the user interface. The keyword allows the state of the input to be stored in the plan file so that it can be restored in the user interface when re-opening the plan file. The default is SAVEASFITTED=NO.

  • AUTOFIT. Indicates that the probability distribution associated with this input field will be automatically determined from the data for the associated field in the active dataset. By default, the measurement level of the field is used to determine the set of distributions that are considered.

    For nominal input fields, the default set of distributions only includes the categorical distribution.

    For ordinal input fields, the default set of distributions includes the following: binomial, negative binomial and Poisson. The chi-square test is used to determine the distribution that most closely fits the data.

    For continuous input fields, the default set of distributions includes the following: beta, exponential, gamma, lognormal, normal, triangular, uniform and Weibull. By default, the Anderson-Darling test for goodness of fit is used to determine the distribution that most closely fits the data. Optionally, you can specify the Kolmogorov-Smirnoff test for goodness of fit. See the topic AUTOFIT Subcommand (SIMPLAN command) for more information.

    You can override the default set of distributions by explicitly specifying one or more distributions on the AUTOFIT keyword, but you cannot mix distributions belonging to different measurement levels. For example, you can specify one or more of the distributions for ordinal data (BINOM, NEGBIN and POISSON) but you cannot specify those in conjunction with distributions for continuous data such as NORMAL.

    Note: For the negative binominal distribution (NEGBIN), AUTOFIT uses the form of the distribution that describes the probability of a given number of failures before a given number of successes occur. If you require the alternate parameterization that describes the probability of a given number of trials before a given number of successes occur, then manually specify the distribution with the DISTRIBUTION keyword. Also note that for the Weibull distribution, AUTOFIT only considers the case where the location parameter C equals 0.

The DISTRIBUTION keyword specifies the probability distribution for a simulated input field and is used when you want to explicitly specify the probability distribution rather than have it automatically determined from the data for the associated field in the active dataset. The DISTRIBUTION keyword can only be used with TYPE = MANUAL.

BERNOULLI(PROB=value). Bernoulli distribution.

BETA(SHAPE1=value SHAPE2=value). Beta distribution.

BINOM(N=value PROB=value). Binomial distribution.

CATEGORICAL(CATS=valuelist PROBS=valuelist CONTINGENCY=YES | NO**). Categorical distribution. The Categorical distribution describes an input field that has a fixed number of values, referred to as categories. Each category has an associated probability such that the sum of the probabilities over all categories equals one.

  • The CATS keyword specifies a list of the categories and the PROBS keyword specifies a list of the probabilities associated with each category. The n-th item in each list specifies the associated value for the n-th category. The number of values in each of the lists must be the same.
  • For string inputs with a Categorical distribution, the category values can be specified with or without quotation marks; however, numeric categories for such inputs must be enclosed in quotation marks.
  • The CONTINGENCY keyword specifies whether the input is included in a multiway contingency table that is computed from the active dataset and that describes associations between inputs with categorical distributions. By default, the input is not included in a contingency table. When CONTINGENCY=YES, MULTIWAY=YES must be specified on the CONTINGENCY subcommand.

    Because the contingency table is computed from the active dataset, inputs with CONTINGENCY=YES must either exist in the active dataset or be mapped to a field in the active dataset with the MAPTO keyword. In addition, when CONTINGENCY=YES values specified for the CATS and PROBS keywords are ignored because the categories and category probabilities are determined from the contingency table.

    Inputs specified as TYPE=AUTOFIT that are fit to a categorical distribution are automatically included in the contingency table when MULTIWAY=YES is specified on the CONTINGENCY subcommand.

Note: If the source of your predictive model is a PMML model file and you are specifying a categorical distribution for an input that is categorical in the model (such as gender), then you should specify the same category values as used in the model (or a subset of those values).

EMPIRICAL([SOURCE=AUTOFIT** | 'filespec']). Empirical distribution. The empirical distribution is calculated from the data in the active dataset corresponding to the input field. EMPIRICAL is only supported for inputs with a continuous or ordinal measurement level.

  • For continuous inputs, the empirical distribution is the cumulative distribution function of the data.
  • For ordinal inputs, the empirical distribution is the categorical distribution of the data.
  • For nominal input fields, use TYPE=AUTOFIT(CATEGORICAL).
  • The SOURCE keyword is deprecated. Use the global SOURCE keyword instead, which has the same specifications but applies to both contingency tables and parameters for empirical distributions. See the topic SOURCE Keyword (SIMPLAN command) for more information.

EXP(SCALE=value). Exponential distribution.

GAMMA(SHAPE=value SCALE=value). Gamma distribution.

LNORMAL(A=value B=value). Lognormal distribution.

NEGBIN(TYPE=FAILURES | TRIALS THRESHOLD=value PROB=value). Negative Binomial distribution. Two parameterizations of the negative binomial distribution are supported.

  • TYPE=FAILURES specifies a distribution that describes the probability of a given number of failures before a given number of successes occur.
  • TYPE=TRIALS specifies a distribution that describes the probability of a given number of trials before a given number of successes occur, and is the parameterization used in the command syntax function PDF.NEGBIN.

NORMAL(MEAN=value STDDEV=value). Normal distribution.

POISSON(MEAN=value). Poisson distribution.

TRIANGULAR(MIN=value MAX=value MODE=value). Triangular distribution.

UNIFORM(MIN=value MAX=value). Uniform distribution.

USER_RANGES(MIN=valuelist MAX=valuelist PROBS=valuelist). User-defined ranges. This distribution consists of a set of intervals with a probability assigned to each interval such that the sum of the probabilities over all intervals equals 1. Values within a given interval are drawn from a uniform distribution defined on that interval.

  • The MIN keyword specifies a list of the left endpoints of each interval, the MAX keyword specifies a list of the right endpoints of each interval, and the PROBS keyword specifies a list of the probabilities associated with each interval. The n-th item in each list specifies the associated value for the n-th interval. The number of values in each of the lists must be the same. The specified endpoints are included in the intervals.
  • Intervals can be overlapping. For example, you can specify MIN = 10 12 and MAX = 15 20, which defines the two intervals [10,15] and [12,20].

WEIBULL(A=value B=value [C=value]). Weibull distribution. The parameter C is an optional location parameter, which specifies where the origin of the distribution is located. Omitting the value of C is equivalent to setting its value to 0. When C equals 0, the distribution reduces to the Weibull distribution function in command syntax (PDF.WEIBULL).

Iterating distribution parameters

For any of the above distributions, you can specify multiple values for one of the distribution parameters. An independent set of simulated cases--effectively, a separate simulation--is generated for each specified value, allowing you to investigate the effect of varying the input. This is referred to as Sensitivity Analysis, and each set of simulated cases is referred to as an iteration.

  • The set of specified values for a given distribution parameter should be separated by spaces.
  • For the CATEGORICAL distribution, the CATS parameter can only specify a single set of category values, but you can specify multiple sets of values for the PROBS parameter, allowing you to vary the set of probabilities associated with the categories. Each set of probabilities should be separated by a semicolon.
  • For the USER_RANGES distribution, the MIN and MAX parameters can only specify a single set of intervals, but you can specify multiple sets of values for the PROBS parameter, allowing you to vary the set of probabilities associated with the specified intervals. Each set of probabilities should be separated by a semicolon.
  • You can only iterate distribution parameters for a single simulated input. An error results if you specify iterations of distribution parameters and there are multiple fields on the INPUT keyword.

Example: Normal distribution with iterated parameters

The following specification results in two iterations, one with NORMAL(MEAN=15 STDDEV=2) and one with NORMAL(MEAN=15 STDDEV=3).

DISTRIBUTION= NORMAL(MEAN=15 STDDEV=2 3)

Example: Categorical distribution with iterated sets of probabilities

The example shows three iterations for the set of probabilities associated with the specified set of categories.

DISTRIBUTION= CATEGORICAL(CATS=1 2 3 PROBS=0.5 0.25 0.25; 0.4 0.3 0.3; 0.2 0.6 0.2)

The CATS keyword specifies the set of categories. The probabilities for the first iteration are specified by the set of values up to the first semi-colon following the PROBS keyword. Thus the category with value 1 has probability 0.5, the category with value 2 has probability 0.25 and the category with value 3 has probability 0.25. The probabilities for the second and third iterations are (0.4, 0.3, 0.3) and (0.2, 0.6, 0.2) respectively.

Example: User Ranges distribution with iterated sets of probabilities

The example shows two iterations for the set of probabilities associated with the specified intervals.

DISTRIBUTION= USER_RANGES(MIN=10 13 17 MAX=12 16 20 PROBS=0.3 0.3 0.4; 0.2 0.3 0.5)

The MIN and MAX keywords specify the three intervals [10-12], [13-16] and [17-20]. For the first iteration, the interval from 10 to 12 has probability 0.3, the interval from 13 to 16 has probability 0.3 and the interval from 17 to 20 has probability 0.4. The probabilities for the second iteration are (0.2, 0.3, 0.5).

MINVAL Keyword

The MINVAL keyword specifies the minimum allowed value for the simulated input field. If MINVAL is omitted, the minimum value is determined by the range of the associated probability distribution. If the specified value is less than the minimum allowed for the associated probability distribution, then the minimum for the probability distribution is used. MINVAL is not supported for the following distributions: Bernoulli, categorical, empirical, triangular, uniform and user ranges.

MAXVAL Keyword

The MAXVAL keyword specifies the maximum allowed value for the simulated input field. If MAXVAL is omitted, the maximum value is determined by the range of the associated probability distribution. If the specified value is greater than the maximum allowed for the associated probability distribution, then the maximum for the probability distribution is used. MAXVAL is not supported for the following distributions: Bernoulli, categorical, empirical, triangular, uniform and user ranges.