Building a time series experiment

Use AutoAI to create a time series experiment to predict future activity (such as stock prices or temperatures) over a specified date/time range.

Tech preview This is a technology preview and is not yet supported for use in production environments.

Time series overview

A time series experiment is a method of forecasting that uses historical observations to predict future values. The experiment automatically builds many pipelines that incorporate machine learning models, such as random forest regression and support vector machines (SVMs), as well as statistical time series models, such as ARIMA and Holt-Winters, and then recommends the best pipeline according to the pipeline performance evaluated on a holdout data set or backtest data sets.

Unlike a standard AutoAI experiment, which builds a set of pipelines to completion then ranks them, a time series experiment evaluates pipelines earlier in the process and only completes and test the best-performing pipelines.

AutoAI time series pipeline generation process

For details on the various stages of training and testing a time series experiment, see Time series implementation details.

Data requirements

Note the current data requirements for training a time series experiment.

Choose data from your project or upload it from your file system or from the asset browser, then press Continue. You can click the Preview icon after the data source name to review your data. Optionally add a second file as holdout data for testing the trained pipelines.

Choosing a compute configuration

When you configure your experiment, you can choose a compute configuration that supplies the computing resources for your experiment. The configuration you choose governs how many prediction targets your experiment can have.

Compute configuration Resources Limits
Medium 4 CPU and 16 GB up to 10 prediction columns
Large 8 CPU and 32 GB up to 20 prediction columns

Configuring a time series experiment

When you configure the details for an experiment, click Yes to Enable time series, and fill in the experiment details.

Field Description
Prediction columns The time series columns that you want to predict based on the previous values. You can specify one or more columns to predict.
Date/time column The column that indicates the date/time at which the time series values occur.
Lookback window A parameter that indicates how many previous time series values are used to predict the current time point.
Forecast window The range you want to predict based on the data in the lookback window.

The prediction summary at the bottom of the panel shows you the type of experiment, the CUH allocated, and the optimized metric selected for the experiment.

Configuring prediction settings

To configure more details for your time series experiment, open the Experiment settings panel.

General data source settings

On the General panel for prediction settings, you can optionally change the metric used to optimize the experiment or specify the algorithms to consider or the number of pipelines to generate. For details on the available metrics and algorithms, see AutoAI time series implementation details.

Field Description
Duplicate rows Not supported for time series experiments.
Subsample data Not supported for time series experiments.
Text feature engineering Not supported for time series experiments.
Final training data set Not supported for time series experiments
Training and holdout data Choose to reserve some data from your training data set to test the experiment.
Select features to include In a time series experiment, columns other than the prediction column and the date/time column are ignored when training the model.

Time series configuration details

On the Time series panel for prediction settings, configure the details for how to train the experiment and generate predictions.

Field Description
Date/time column View or change the date/time column for the experiment.
Lookback window View or update the number of previous time series values used to predict the current time point.
Forecast window View or update the range you want to predict based.

Configuring time series data

To configure the time series data, you can adjust the settings for the time series data related to backtesting the experiment. Backtesting provides a means of validating a time-series model using historical data.

In a typical machine learning experiment, you can hold back part of the data randomly to test the resulting model for accuracy. To perform validation on a time series model, you must preserve the time order relationship between the training data and testing data.

The backtesting sequence follows these steps:

  1. The training data length is determined based on the number of backtests, gap length, validation length, and holdout size.
  2. Starting from the oldest data, the experiment is trained using the training data and evaluated on the first validation data set, skipping over any data in the gap if gap length is nonzero.
  3. The training data window is advanced by the holdout size and gap length to form a new training set. The experiment is trained using this new data and evaluated on the next validation data set.
  4. Step 3 is repeated for the remaining backtesting periods.

To adjust the backtesting configuration:

  1. Open Experiment settings.
  2. Click the Time series tab in the Data sources panel.
  3. Optionally adjust the settings as shown in the table.
Field Description
Number of backtests Backtesting is similar to cross-validation for date/time periods. Optionally customize the number of backtests for your experiment.
Holdout The size of the holdout set and each validation set for backtesting. The validation length can be adjusted by changing the holdout length.
Gap length The number of time points between the training data set and validation data set for each backtest. When the parameter value is non-zero, the time series values in the gap will not be used to train the experiment or evaluate the current backtest.

The visualization for the configuration settings lets you examine the backtesting flow. The graphic is interactive, so you can manipulate the settings from the graphic as well as from the configuration fields. For example, by adjusting the gap length, you can see model validation results on earlier time periods of the data without increasing the number of backtests.

Interpreting the experiment results

After you run your time series experiment, you can examine the resulting pipelines to get insights into the experiment details. To view details:

Next steps

Additional resources

Learn more

AutoAI overview

Parent topic: AutoAI