DATASETTINGS subcommand (TCM MODEL command)

The DATASETTINGS subcommand specifies settings that are used to prepare the data for the analysis. It includes the following settings:
  • The dimension fields that define each time series for multidimensional data.
  • The field, or fields, that define the observations, the time interval between observations, and the time interval for the analysis.
  • The method by which data are aggregated or distributed when the time interval for the analysis differs from the time interval between the observations.
  • Handling of missing values.
DIMENSIONS
Specifies the dimension fields, if any, that identify each time series. Specify one or more field names. The keyword TO is supported but the keyword ALL is not supported.
MAXDIMVALUES
Specifies the maximum number of distinct values per dimension field. The default is 10000. The value must be an integer that is greater than 100.
TIMETYPE
Specifies how the time of each observation is defined. The keyword is required unless the active dataset has a date specification, in which case it cannot be specified. Date specifications are created with the DATE command.
DATETIMEVAR
Specifies that a field with a date, datetime, or time (duration) format defines the observations.
ISO8601
Specifies that the observations are defined by a string variable that represents dates or times (durations). Dates are specified in the form yyyy-mm-dd. Datetimes are specified in the form yyyy-mm-dd hh:mm:ss. Times are specified in the form hh:mm:ss. Two digits are not required for month, day, hours, minutes or seconds specifications. For example, 2014-1-1 is interpreted as 2014-01-01 and 3:2:1 is interpreted as 03:02:01.
CYCLICPERIODS
Specifies that the observations are defined by one or more integer fields that define a hierarchy of periodic levels. The field that defines the lowest level is referred to as the period field. Fields that define higher levels are referred to as cycle fields.

With this structure, you can describe series of observations that don't fit one of the standard time intervals. For example, a fiscal year with only 10 months can be described with a cycle field that represents years and a period field that represents months, where the length of one cycle is 10.

Field values for each level, except the highest, must be periodic with respect to the next highest level. Values for the highest level cannot be periodic. For example, in the case of the 10-month fiscal year, months are periodic within years and years are not periodic.

NONE
Specifies that the observations are defined by record order so that the first record represents the first observation, the second record represents the second observation, and so on. It is then assumed that the records represent observations that are equally spaced in time. The value NONE cannot be used when the DIMENSIONS keyword is specified.
TIMEVAR
Specifies the field, or fields, that define each observation. It is required when TIMETYPE=DATETIMEVAR, TIMETYPE=ISO8601, or TIMETYPE=CYCLICPERIODS.
var
Specifies the date, datetime, or time field that defines the observations when TIMETYPE=DATETIMEVAR. It also specifies the string field that defines the observations when TIMETYPE=ISO8601. Specify a field name for var.
var(TYPE=PERIOD [INCREMENT=integer] [START=integer])
Specifies the period field when TIMETYPE=CYCLICPERIODS. Specify a field name for var.
  • The optional INCREMENT keyword specifies the integer increment between successive observations. The default is 1.
  • The optional START keyword specifies the starting value when there are one or more cycle fields. The value must be a positive integer. The default is 1. For example, a field with the periodic sequence 2,3,4,5 has a starting value of 2. START is ignored if there are no cycle fields.
var(TYPE=CYCLE LEVEL=integer LENGTH=integer [START=integer])
Specifies a cycle field when TIMETYPE=CYCLICPERIODS. Specify a field name for var. An arbitrary number of cycle fields can be specified.
  • The LEVEL keyword is required and must specify a positive integer. The lowest level cycle field must have LEVEL=1. The next cycle field in the hierarchy must have LEVEL=2, and so on.
  • The LENGTH keyword is required and specifies the length of the cycle. The length of a cycle at a particular level is the periodicity of the next lowest level. Consider the example of a 10-month fiscal year that is described by a period field for month and a first level cycle field for year. The cycle field for year has a length of 10 since the next lowest level represents months and there are 10 months in the specified fiscal year. The value of LENGTH must be a positive integer.
  • The optional START keyword specifies the starting value when the cycle field is periodic. The value must be a positive integer, and the default is 1. This setting is necessary for detecting missing values. For example, if a periodic field starts from 2 but the starting value is specified as 1, then the procedure assumes that there is a missing value for the first period in each cycle of that field. START is ignored for the highest level cycle field.
INPUTINTERVAL
Specifies the interval between successive observations and any additional information needed to specify the sequence of observations. The keyword can be specified only when TIMETYPE=DATETIMEVAR or TIMETYPE=ISO8601.
AUTODETECT
Specifies that the interval between successive observations will be automatically determined from the data. This is the default.
IRREGULAR
Specifies that successive observations are not equally spaced, such as observations that represent the time at which a sales order is processed. When IRREGULAR is specified, you must specify ANALYSISINTERVAL.
YEAR
Specifies that the interval between successive observations is one or more years. Use the INCREMENT keyword to specify the number of years between successive observations. The default is 1.
QUARTER
Specifies that the interval between successive observations is one quarter.
MONTH
Specifies that the interval between successive observations is one or more months. Use the INCREMENT keyword to specify the number of months between successive observations. The default is 1.
WEEK
Specifies that the interval between successive observations is one week.
DAY
Specifies that the interval between successive observations is one or more days.
  • Use the WKSTART keyword to specify the start of the week. The default is SUN, which specifies Sunday.
  • Use the DAYSWK keyword to specify the number of days per week. Specify an integer between 1 and 7. The default is 7.
  • Use the INCREMENT keyword to specify the number of days between successive observations. The default is 1.
HOUR
Specifies that the interval between successive observations is one or more hours and that observations are represented as a datetime.
  • Use the WKSTART keyword to specify the start of the week. The default is SUN, which specifies Sunday.
  • Use the DAYSWK keyword to specify the number of days per week. Specify an integer between 1 and 7. The default is 7.
  • Use the HRSDAY keyword to specify the number of hours per day. Specify an integer between 1 and 24. The default is 24.
  • Use the DAYSTART keyword to specify the starting hour of the day. Specify an integer between 0 and 23. The default is 0.
  • Use the INCREMENT keyword to specify the number of hours between successive observations. The default is 1.
MINUTE
Specifies that the interval between successive observations is one or more minutes and that observations are represented as a datetime.
  • Use the WKSTART keyword to specify the start of the week. The default is SUN, which specifies Sunday.
  • Use the DAYSWK keyword to specify the number of days per week. Specify an integer between 1 and 7. The default is 7.
  • Use the HRSDAY keyword to specify the number of hours per day. Specify an integer between 1 and 24. The default is 24.
  • Use the DAYSTART keyword to specify the starting hour of the day. Specify an integer between 0 and 23. The default is 0.
  • Use the INCREMENT keyword to specify the number of minutes between successive observations. The default is 1.
SECOND
Specifies that the interval between successive observations is one or more seconds and that observations are represented as a datetime.
  • Use the WKSTART keyword to specify the start of the week. The default is SUN, which specifies Sunday.
  • Use the DAYSWK keyword to specify the number of days per week. Specify an integer between 1 and 7. The default is 7.
  • Use the HRSDAY keyword to specify the number of hours per day. Specify an integer between 1 and 24. The default is 24.
  • Use the DAYSTART keyword to specify the starting hour of the day. Specify an integer between 0 and 23. The default is 0.
  • Use the INCREMENT keyword to specify the number of seconds between successive observations. The default is 1.
DURATION
Specifies that observations are represented by a field with a time (duration) format, as in hh:mm:ss.
  • The INTERVAL keyword specifies the interval between observations and is required.
  • Use the INCREMENT keyword to specify the number of hours, minutes, or seconds between observations. The default is 1.
ANALYSISINTERVAL
Specifies the time interval for the analysis. For example, if the time interval of the observations is Days, you might choose Months for the time interval for analysis. The data are then aggregated from daily to monthly data before the model is built. You can also choose to distribute the data from a longer to a shorter time interval. For example, if the observations are quarterly then you can distribute the data from quarterly to monthly data.
  • USEINPUT specifies that the time interval for analysis is the same as the interval between the observations. It is the default unless INPUTINTERVAL=IRREGULAR. If INPUTINTERVAL=IRREGULAR, ANALYSISINTERVAL must be specified and it cannot be specified as USEINPUT.
  • CYCLE(LEVEL=integer) applies when TIMETYPE=CYCLICPERIODS and there are one or more cycle fields (in addition to the period field). It specifies to aggregate the data to the cycle level specified by the LEVEL keyword. The allowed values for LEVEL are 1 to the number of cycle fields.
    Note: When TIMETYPE=CYCLICPERIODS, only CYCLE(LEVEL=integer) or USEINPUT can be specified for ANALYSISINTERVAL.
  • When TIMETYPE=DATETIMEVAR or TIMETYPE=ISO8601, the following rules apply:
    • ANALYSISINTERVAL can be set to any value that is longer than, or equal to, the value of INPUTINTERVAL. For example, if INPUTINTERVAL=QUARTER then you can set ANALYSISINTERVAL to QUARTER or YEAR. The one exception is that when INPUTINTERVAL=WEEK, ANALYSISINTERVAL cannot be set to MONTH or QUARTER.
    • ANALYSISINTERVAL can be set to certain values that are shorter than the value of INPUTINTERVAL. The values of INPUTINTERVAL, followed by the allowed values of ANALYSISINTERVAL for this case, are as follows:
      • YEAR: QUARTER, MONTH
      • QUARTER: MONTH
      • MONTH: DAY
      • WEEK: DAY
      • DAY: HOUR
      • HOUR or DURATION(INTERVAL=HOUR): MINUTE
      • MINUTE or DURATION(INTERVAL=MINUTE): SECOND
      Note: If INCREMENT > 1 (for INPUTINTERVAL) then ANALYSISINTERVAL cannot be shorter than INPUTINTERVAL.
  • When the active dataset has a date specification, aggregation is supported but distribution is not supported. For example, if the date specification is YEAR, QUARTER then ANALYSISINTERVAL can be set to YEAR but it cannot be set to MONTH.
    • You cannot aggregate from MONTH to QUARTER for a date specification.
    • If the date specification has a cycle field, then CYCLE(LEVEL=1) can be specified for ANALYSISINTERVAL.
GROUP
Specifies the method to use when multiple observations occur in the same time interval. For example, if INPUTINTERVAL is MONTH then multiple dates in the same month are grouped together. The specification for GROUP applies to all time series fields. You can, however, override that specification for particular fields. For example, to group values of temperature using the mean of the values, specify temperature(MEAN). The following methods are available:
SUM
The sum of the original values. This is the default.
MEAN
The mean of the original values.
MODE
The mode of the original values.
MIN
The minimum of the original values.
MAX
The maximum of the original values.
Note: Although grouping is a form of aggregation, it is done before any handling of missing values whereas formal aggregation is done after any missing values are handled. When the time interval of the observations is specified as Irregular, aggregation is done only with the grouping function.
AGGREGATE
Specifies the method to be used when the data are aggregated. Aggregation is done when the value for ANALYSISINTERVAL is longer than the value for INPUTINTERVAL; for example, when ANALYSISINTERVAL is YEAR and INPUTINTERVAL is MONTH. The specification for AGGREGATE applies to all time series fields. You can, however, override that specification for particular fields. For example, to aggregate values of temperature using the mean of the values, specify temperature(MEAN). The available methods are the same as for the GROUP keyword.
DISTRIBUTE
Specifies the method to be used when data are distributed. Distribution is done when the value for ANALYSISINTERVAL is shorter than the value for INPUTINTERVAL; for example, when ANALYSISINTERVAL is QUARTER and INPUTINTERVAL is YEAR. The specification for DISTRIBUTE applies to all time series fields. You can, however, override that specification for particular fields. For example, to distribute values of temperature using the mean of the values, specify temperature(MEAN).
INTERPOLATE
Specifies the method to be used for interpolating missing values. The following options are available:
LINT
Replaces missing values by using a linear interpolation. The last valid value before the missing value and the first valid value after the missing value are used for the interpolation. If the first or last observation in the series has a missing value, then the two nearest non-missing values at the beginning or end of the series are used. This option is the default.
MEAN
Replaces missing values with the mean for the entire series.
KMEAN(integer)
Missing values are replaced by the mean of the specified number of nearest preceding and subsequent values. The default is 2.
KMEDIAN(integer)
Missing values are replaced by the median of the specified number of nearest preceding and subsequent observations. The default is 2.
LTREND
This option uses all non-missing observations in the series to fit a simple linear regression model, which is then used to impute the missing values.
USERMISSING
Specifies whether user-missing values are treated as valid values.
EXCLUDE
Specifies that user-missing values are treated as invalid. This is the default.
INCLUDE
Specifies that user-missing values are treated as valid.
MAXMISSING
Specifies the maximum percentage of missing values that are allowed for any series. Series with more missing values than the specified maximum are excluded from the analysis. The value is specified as a percentage and the default is 25.
CROSSDAYHRS
Specifies whether observations with times that cross a day boundary are aggregated to the values for the previous day. For example, for hourly observations with an eight-hour day that starts at 20:00, this setting specifies whether observations between 00:00 and 04:00 are included in the aggregated results for the previous day. CROSSDAYHRS only applies when INPUTINTERVAL is HOUR, MINUTE, or SECOND and ANALYSISINTERVAL is DAY.
EXCLUDE
Specifies that observations that cross a day boundary are not aggregated to the results for the previous day. This is the default.
INCLUDE
Specifies that observations that cross a day boundary are aggregated to the results for the previous day.