DATASETTINGS subcommand (TCM MODEL command)
The
DATASETTINGS
subcommand
specifies settings that are used to prepare the
data for the analysis. It includes the following settings:- The dimension fields that define each time series for multidimensional data.
- The field, or fields, that define the observations, the time interval between observations, and the time interval for the analysis.
- The method by which data are aggregated or distributed when the time interval for the analysis differs from the time interval between the observations.
- Handling of missing values.
- DIMENSIONS
- Specifies the dimension fields, if any, that
identify each time series. Specify one or more
field names. The keyword
TO
is supported but the keywordALL
is not supported. - MAXDIMVALUES
- Specifies the maximum number of distinct values per dimension field. The default is 10000. The value must be an integer that is greater than 100.
- TIMETYPE
- Specifies how the time of each observation is
defined. The keyword is required unless the active
dataset has a date specification, in which case it
cannot be specified. Date
specifications are created with the
DATE
command.- DATETIMEVAR
- Specifies that a field with a date, datetime, or time (duration) format defines the observations.
- ISO8601
- Specifies that the observations are defined by a string variable that represents dates or times (durations). Dates are specified in the form yyyy-mm-dd. Datetimes are specified in the form yyyy-mm-dd hh:mm:ss. Times are specified in the form hh:mm:ss. Two digits are not required for month, day, hours, minutes or seconds specifications. For example, 2014-1-1 is interpreted as 2014-01-01 and 3:2:1 is interpreted as 03:02:01.
- CYCLICPERIODS
- Specifies that the observations are defined by
one or more integer fields that define a hierarchy
of periodic levels. The field that defines the
lowest level is referred to as the
period field. Fields that define
higher levels are referred to as
cycle fields.
With this structure, you can describe series of observations that don't fit one of the standard time intervals. For example, a fiscal year with only 10 months can be described with a cycle field that represents years and a period field that represents months, where the length of one cycle is 10.
Field values for each level, except the highest, must be periodic with respect to the next highest level. Values for the highest level cannot be periodic. For example, in the case of the 10-month fiscal year, months are periodic within years and years are not periodic.
- NONE
- Specifies that the observations are defined by
record order so that the first record represents
the first observation, the second record
represents the second observation, and so on. It
is then assumed that the records represent
observations that are equally spaced in time. The
value
NONE
cannot be used when theDIMENSIONS
keyword is specified.
- TIMEVAR
- Specifies the field, or fields, that define
each observation. It is required when
TIMETYPE=DATETIMEVAR
,TIMETYPE=ISO8601
, orTIMETYPE=CYCLICPERIODS
.var
- Specifies the date, datetime, or time field
that defines the observations when
TIMETYPE=DATETIMEVAR
. It also specifies the string field that defines the observations whenTIMETYPE=ISO8601
. Specify a field name forvar
. var(TYPE=PERIOD [INCREMENT=integer] [START=integer])
- Specifies the period field when
TIMETYPE=CYCLICPERIODS
. Specify a field name forvar
.- The optional
INCREMENT
keyword specifies the integer increment between successive observations. The default is 1. - The optional
START
keyword specifies the starting value when there are one or more cycle fields. The value must be a positive integer. The default is 1. For example, a field with the periodic sequence 2,3,4,5 has a starting value of 2.START
is ignored if there are no cycle fields.
- The optional
var(TYPE=CYCLE LEVEL=integer LENGTH=integer [START=integer])
- Specifies a cycle field when
TIMETYPE=CYCLICPERIODS
. Specify a field name forvar
. An arbitrary number of cycle fields can be specified.- The
LEVEL
keyword is required and must specify a positive integer. The lowest level cycle field must haveLEVEL=1
. The next cycle field in the hierarchy must haveLEVEL=2
, and so on. - The
LENGTH
keyword is required and specifies the length of the cycle. The length of a cycle at a particular level is the periodicity of the next lowest level. Consider the example of a 10-month fiscal year that is described by a period field for month and a first level cycle field for year. The cycle field for year has a length of 10 since the next lowest level represents months and there are 10 months in the specified fiscal year. The value ofLENGTH
must be a positive integer. - The optional
START
keyword specifies the starting value when the cycle field is periodic. The value must be a positive integer, and the default is 1. This setting is necessary for detecting missing values. For example, if a periodic field starts from 2 but the starting value is specified as 1, then the procedure assumes that there is a missing value for the first period in each cycle of that field.START
is ignored for the highest level cycle field.
- The
- INPUTINTERVAL
- Specifies the interval between successive
observations and any additional information needed
to specify the sequence of observations. The
keyword can be specified only when
TIMETYPE=DATETIMEVAR
orTIMETYPE=ISO8601
.- AUTODETECT
- Specifies that the interval between successive observations will be automatically determined from the data. This is the default.
- IRREGULAR
- Specifies that successive observations are not
equally spaced, such as observations that
represent the time at which a sales order is
processed. When
IRREGULAR
is specified, you must specifyANALYSISINTERVAL
. - YEAR
- Specifies that the interval between successive
observations is one or more years. Use the
INCREMENT
keyword to specify the number of years between successive observations. The default is 1. - QUARTER
- Specifies that the interval between successive observations is one quarter.
- MONTH
- Specifies that the interval between successive
observations is one or more months. Use the
INCREMENT
keyword to specify the number of months between successive observations. The default is 1. - WEEK
- Specifies that the interval between successive observations is one week.
- DAY
- Specifies that the interval between successive
observations is one or more days.
- Use the
WKSTART
keyword to specify the start of the week. The default isSUN
, which specifies Sunday. - Use the
DAYSWK
keyword to specify the number of days per week. Specify an integer between 1 and 7. The default is 7. - Use the
INCREMENT
keyword to specify the number of days between successive observations. The default is 1.
- Use the
- HOUR
- Specifies that the interval between successive
observations is one or more hours and that
observations are represented as a datetime.
- Use the
WKSTART
keyword to specify the start of the week. The default isSUN
, which specifies Sunday. - Use the
DAYSWK
keyword to specify the number of days per week. Specify an integer between 1 and 7. The default is 7. - Use the
HRSDAY
keyword to specify the number of hours per day. Specify an integer between 1 and 24. The default is 24. - Use the
DAYSTART
keyword to specify the starting hour of the day. Specify an integer between 0 and 23. The default is 0. - Use the
INCREMENT
keyword to specify the number of hours between successive observations. The default is 1.
- Use the
- MINUTE
- Specifies that the interval between successive
observations is one or more minutes and that
observations are represented as a datetime.
- Use the
WKSTART
keyword to specify the start of the week. The default isSUN
, which specifies Sunday. - Use the
DAYSWK
keyword to specify the number of days per week. Specify an integer between 1 and 7. The default is 7. - Use the
HRSDAY
keyword to specify the number of hours per day. Specify an integer between 1 and 24. The default is 24. - Use the
DAYSTART
keyword to specify the starting hour of the day. Specify an integer between 0 and 23. The default is 0. - Use the
INCREMENT
keyword to specify the number of minutes between successive observations. The default is 1.
- Use the
- SECOND
- Specifies that the interval between successive
observations is one or more seconds and that
observations are represented as a datetime.
- Use the
WKSTART
keyword to specify the start of the week. The default isSUN
, which specifies Sunday. - Use the
DAYSWK
keyword to specify the number of days per week. Specify an integer between 1 and 7. The default is 7. - Use the
HRSDAY
keyword to specify the number of hours per day. Specify an integer between 1 and 24. The default is 24. - Use the
DAYSTART
keyword to specify the starting hour of the day. Specify an integer between 0 and 23. The default is 0. - Use the
INCREMENT
keyword to specify the number of seconds between successive observations. The default is 1.
- Use the
- DURATION
- Specifies that observations are represented by
a field with a time (duration) format, as in hh:mm:ss.
- The
INTERVAL
keyword specifies the interval between observations and is required. - Use the
INCREMENT
keyword to specify the number of hours, minutes, or seconds between observations. The default is 1.
- The
- ANALYSISINTERVAL
- Specifies the time interval for the analysis.
For example, if the time interval of the
observations is Days, you might choose Months for the time interval for analysis.
The data are then aggregated from daily to monthly data before the model is built.
You can also choose to distribute the data from a longer to a shorter time interval.
For example, if the observations are quarterly then you can distribute the data from
quarterly to monthly data.
USEINPUT
specifies that the time interval for analysis is the same as the interval between the observations. It is the default unlessINPUTINTERVAL=IRREGULAR
. IfINPUTINTERVAL=IRREGULAR
,ANALYSISINTERVAL
must be specified and it cannot be specified asUSEINPUT
.CYCLE(LEVEL=integer)
applies whenTIMETYPE=CYCLICPERIODS
and there are one or more cycle fields (in addition to the period field). It specifies to aggregate the data to the cycle level specified by theLEVEL
keyword. The allowed values forLEVEL
are 1 to the number of cycle fields.Note: WhenTIMETYPE=CYCLICPERIODS
, onlyCYCLE(LEVEL=integer)
orUSEINPUT
can be specified forANALYSISINTERVAL
.- When
TIMETYPE=DATETIMEVAR
orTIMETYPE=ISO8601
, the following rules apply:ANALYSISINTERVAL
can be set to any value that is longer than, or equal to, the value ofINPUTINTERVAL
. For example, ifINPUTINTERVAL=QUARTER
then you can setANALYSISINTERVAL
toQUARTER
orYEAR
. The one exception is that whenINPUTINTERVAL=WEEK
,ANALYSISINTERVAL
cannot be set toMONTH
orQUARTER
.ANALYSISINTERVAL
can be set to certain values that are shorter than the value ofINPUTINTERVAL
. The values ofINPUTINTERVAL
, followed by the allowed values ofANALYSISINTERVAL
for this case, are as follows:- YEAR: QUARTER, MONTH
- QUARTER: MONTH
- MONTH: DAY
- WEEK: DAY
- DAY: HOUR
- HOUR or DURATION(INTERVAL=HOUR): MINUTE
- MINUTE or DURATION(INTERVAL=MINUTE): SECOND
Note: IfINCREMENT
> 1 (forINPUTINTERVAL
) thenANALYSISINTERVAL
cannot be shorter thanINPUTINTERVAL
.
- When the active dataset has a date
specification, aggregation is supported but
distribution is not supported. For example, if the
date specification is YEAR, QUARTER then
ANALYSISINTERVAL
can be set to YEAR but it cannot be set to MONTH.- You cannot aggregate from MONTH to QUARTER for a date specification.
- If the date specification has a cycle field,
then
CYCLE(LEVEL=1)
can be specified forANALYSISINTERVAL
.
- GROUP
- Specifies the method to use when multiple
observations occur in the same time interval. For
example, if
INPUTINTERVAL
is MONTH then multiple dates in the same month are grouped together. The specification forGROUP
applies to all time series fields. You can, however, override that specification for particular fields. For example, to group values of temperature using the mean of the values, specifytemperature(MEAN)
. The following methods are available:- SUM
- The sum of the original values. This is the default.
- MEAN
- The mean of the original values.
- MODE
- The mode of the original values.
- MIN
- The minimum of the original values.
- MAX
- The maximum of the original values.
Note: Although grouping is a form of aggregation, it is done before any handling of missing values whereas formal aggregation is done after any missing values are handled. When the time interval of the observations is specified as Irregular, aggregation is done only with the grouping function. - AGGREGATE
- Specifies the method to be used when the data
are aggregated. Aggregation is done when the value
for
ANALYSISINTERVAL
is longer than the value forINPUTINTERVAL
; for example, whenANALYSISINTERVAL
is YEAR andINPUTINTERVAL
is MONTH. The specification forAGGREGATE
applies to all time series fields. You can, however, override that specification for particular fields. For example, to aggregate values of temperature using the mean of the values, specifytemperature(MEAN)
. The available methods are the same as for theGROUP
keyword. - DISTRIBUTE
- Specifies the method to be used when data are
distributed. Distribution is done when the value
for
ANALYSISINTERVAL
is shorter than the value forINPUTINTERVAL
; for example, whenANALYSISINTERVAL
is QUARTER andINPUTINTERVAL
is YEAR. The specification forDISTRIBUTE
applies to all time series fields. You can, however, override that specification for particular fields. For example, to distribute values of temperature using the mean of the values, specifytemperature(MEAN)
. - INTERPOLATE
- Specifies the method to be used for
interpolating missing values. The following
options are available:
- LINT
- Replaces missing values by using a linear interpolation. The last valid value before the missing value and the first valid value after the missing value are used for the interpolation. If the first or last observation in the series has a missing value, then the two nearest non-missing values at the beginning or end of the series are used. This option is the default.
- MEAN
- Replaces missing values with the mean for the entire series.
- KMEAN(integer)
- Missing values are replaced by the mean of the specified number of nearest preceding and subsequent values. The default is 2.
- KMEDIAN(integer)
- Missing values are replaced by the median of the specified number of nearest preceding and subsequent observations. The default is 2.
- LTREND
- This option uses all non-missing observations in the series to fit a simple linear regression model, which is then used to impute the missing values.
- USERMISSING
- Specifies whether user-missing values are
treated as valid values.
- EXCLUDE
- Specifies that user-missing values are treated as invalid. This is the default.
- INCLUDE
- Specifies that user-missing values are treated as valid.
- MAXMISSING
- Specifies the maximum percentage of missing values that are allowed for any series. Series with more missing values than the specified maximum are excluded from the analysis. The value is specified as a percentage and the default is 25.
- CROSSDAYHRS
- Specifies whether observations with times that cross a day
boundary are aggregated to the values for the previous day. For example, for
hourly observations with an eight-hour day that starts at 20:00, this
setting specifies whether observations between 00:00 and 04:00 are included
in the aggregated results for the previous day.
CROSSDAYHRS
only applies whenINPUTINTERVAL
is HOUR, MINUTE, or SECOND andANALYSISINTERVAL
is DAY.- EXCLUDE
- Specifies that observations that cross a day boundary are not aggregated to the results for the previous day. This is the default.
- INCLUDE
- Specifies that observations that cross a day boundary are aggregated to the results for the previous day.