Time fields in forecasting data
A time field is identified by a time icon in front of the field label in the Data pane.
You can specify time field properties by using the following properties: Data type or Represents Time.
Data type
A field is recognized as a time field if it has one of the following data types: Date, Time, or Timestamp. Data type is inherited from the data source and cannot be changed.
Date, Time, and Timestamp data types are designed to support the full range of date and time formats that are covered by the ISO 8601 basic and extended formats. The following table shows the supported data types together with an example of format and a data example for each.
Data type | Format example | Data example |
---|---|---|
Date | yyyy-mm-dd | 2019-07-01 |
Time | hh:mm:ss | 12:34:56 |
Timestamp | yyyy-mm-dd’T’hh:mm:ss | 2019-07-01T12:34:56 |
Represents Time
A field is recognized as a time field if the data property Represents is set to Time. Text and Integer fields that contain time data are also recognized as time fields. Time fields are defined automatically during data import or enrichment. The possible definitions are Date, Year, Quarter, Season, Month, Week, Day, Hour, Minute, or Second.
If time fields are not recognized automatically, you can specify them as time fields. Ensure that the field values are in one of the supported formats, otherwise you might receive an Unsupported date format error.
Nested time fields
You can drag multiple time fields into the same visualization slot to specify a nested time field. For example, a field that represents Week can be dragged into the slot along with a field that represents Day to create a forecast by Days of the Week.
Nested fields in the slot must be in time hierarchy order. For example, Week must be placed above Day.
Nested fields cannot skip levels in the time hierarchy that would result in ambiguity. The following table describes acceptable hierarchies.
Time field | Acceptable lower fields |
---|---|
Year | Quarter, Month, Week, Day |
Quarter | Month |
Month | Day |
Week | Day |
Day (of Year, Month, or Week) | Hour, Time |
Hour | Minute |
Minute | Second |
If Year is absent in the time hierarchy, then the system defaults to the current year. This can cause issues due to differences between leap and non-leap years. Consider providing the Year in such instances.
Chronological data order
Specified time fields define a chronological order for the time points in the visualization. They are used to sort the points on the visualization in chronological order when forecasting is enabled. The chronological order includes the historical points, along with the new forecasted points. Any other sorting criteria that are specified for the visualization are ignored when forecasting is enabled. For example, the first day of the week is always Sunday even if you specify Monday as the first day of the week in the custom sort order.
Any invalid time labels are moved to the beginning of the sequence and excluded from building the model and computing the forecast.
Time interval detection
Time interval detection is possible when the data is ordered chronologically. The time interval is the size of the smallest interval between any two adjacent time points, such as “2 weeks”. If varying time intervals are detected, they must all be integer multiples of the smallest interval. Otherwise, the data is deemed irregular and cannot be forecast. Missing time points that arise as a result of multiple intervals are filled in for the detected interval. Corresponding measure values are set to missing. If the number of missing values is larger than 33% of the series length, a Too many missing values error is reported.
Measure fields
One or more fields of any type can be specified as measure fields for forecasting analysis by adding them to a corresponding visualization slot. Each measure field is analyzed separately. Multiple time series can also be specified by adding a field to the Color slot, splitting the measure values by the categories of the specified field.
All measure field values that correspond to the same time point are summarized by using one of the following summarization levels: Sum, Minimum, Maximum, Average, Count, and Count distinct. The field must be numeric to support Sum, Minimum, Maximum , or Average summarization. All possible data types and summarization levels are supported for forecasting. However consider the following points:
- Small number of different measure values can result in unexpected or uninformative forecasts. For example, when Count distinct summary is used.
- Zero measure values can unduly influence results, especially when they represent missing measurements.
Interpolating missing values
Missing values are computed and filled in by the Linear Interpolation
algorithm. The computation is based on the nearest neighbors in a chronologically ordered time
series with detected time interval. The new value is (previous value + next
value)/2
. For example, with the values [3, 6, missing, 12], the interpolated value to
replace the missing value is (6 + 12) / 2 or 9. The interpolation algorithm can also handle
contiguous missing values.
Data points with missing values at the first or the last historical time points are excluded from the series before building a model. Missing values at the last historical time points get forecasted as well.