Building a time series experiment
Use AutoAI to create a time series experiment to predict future activity (such as stock prices or temperatures) over a specified date/time range.
Tech preview This is a technology preview and is not yet supported for use in production environments.
Time series overview
A time series experiment is a method of forecasting that uses historical observations to predict future values. The experiment automatically builds many pipelines that incorporate machine learning models, such as random forest regression and support vector machines (SVMs), as well as statistical time series models, such as ARIMA and Holt-Winters, and then recommends the best pipeline according to the pipeline performance evaluated on a holdout data set or backtest data sets.
Unlike a standard AutoAI experiment, which builds a set of pipelines to completion then ranks them, a time series experiment evaluates pipelines earlier in the process and only completes and test the best-performing pipelines.
For details on the various stages of training and testing a time series experiment, see Time series implementation details.
Data requirements
Note the current data requirements for training a time series experiment.
- The training data must be a single file in CSV format. Joining data is not currently supported for time series experiments.
- The file must contain one or more time series columns and optionally contain a timestamp column. For a list of supported date/time formats see AutoAI time series implementation details.
- If the data source contains a timestamp column, the data must be sampled at a uniform frequency. That is, the difference in timestamps of adjacent rows is the same. For example, data can be in increments of one minute, one hour, or one day. The specified timestamp is used to determine the lookback window to improve the model accuracy. Note: If the file size is larger than 1GB, sort the data in descending order by the timestamp, and only the first 1GB is used to train the experiment.
- If the data source does not contain a timestamp column, make sure the data is sampled at regular intervals and sorted in ascending order according to the date/time at which it was sampled. That is, the value in the first row is the oldest, and the value in the last row is the most recent. Note: If the file size is larger than 1GB, truncate the file so it is smaller than 1GB.
Choose data from your project or upload it from your file system or from the asset browser, then press Continue. You can click the Preview icon after the data source name to review your data. Optionally add a second file as holdout data for testing the trained pipelines.
Choosing a compute configuration
When you configure your experiment, you can choose a compute configuration that supplies the computing resources for your experiment. The configuration you choose governs how many prediction targets your experiment can have.
Compute configuration | Resources | Limits |
---|---|---|
Medium | 4 CPU and 16 GB | up to 10 prediction columns |
Large | 8 CPU and 32 GB | up to 20 prediction columns |
Configuring a time series experiment
When you configure the details for an experiment, click Yes to Enable time series, and fill in the experiment details.
Field | Description |
---|---|
Prediction columns | The time series columns that you want to predict based on the previous values. You can specify one or more columns to predict. |
Date/time column | The column that indicates the date/time at which the time series values occur. |
Lookback window | A parameter that indicates how many previous time series values are used to predict the current time point. |
Forecast window | The range you want to predict based on the data in the lookback window. |
The prediction summary at the bottom of the panel shows you the type of experiment, the CUH allocated, and the optimized metric selected for the experiment.
Configuring prediction settings
To configure more details for your time series experiment, open the Experiment settings panel.
General data source settings
On the General panel for prediction settings, you can optionally change the metric used to optimize the experiment or specify the algorithms to consider or the number of pipelines to generate. For details on the available metrics and algorithms, see AutoAI time series implementation details.
Field | Description |
---|---|
Duplicate rows | Not supported for time series experiments. |
Subsample data | Not supported for time series experiments. |
Text feature engineering | Not supported for time series experiments. |
Final training data set | Not supported for time series experiments |
Training and holdout data | Choose to reserve some data from your training data set to test the experiment. |
Select features to include | In a time series experiment, columns other than the prediction column and the date/time column are ignored when training the model. |
Time series configuration details
On the Time series panel for prediction settings, configure the details for how to train the experiment and generate predictions.
Field | Description |
---|---|
Date/time column | View or change the date/time column for the experiment. |
Lookback window | View or update the number of previous time series values used to predict the current time point. |
Forecast window | View or update the range you want to predict based. |
Configuring time series data
To configure the time series data, you can adjust the settings for the time series data related to backtesting the experiment. Backtesting provides a means of validating a time-series model using historical data.
In a typical machine learning experiment, you can hold back part of the data randomly to test the resulting model for accuracy. To perform validation on a time series model, you must preserve the time order relationship between the training data and testing data.
The backtesting sequence follows these steps:
- The training data length is determined based on the number of backtests, gap length, validation length, and holdout size.
- Starting from the oldest data, the experiment is trained using the training data and evaluated on the first validation data set, skipping over any data in the gap if gap length is nonzero.
- The training data window is advanced by the holdout size and gap length to form a new training set. The experiment is trained using this new data and evaluated on the next validation data set.
- Step 3 is repeated for the remaining backtesting periods.
To adjust the backtesting configuration:
- Open Experiment settings.
- Click the Time series tab in the Data sources panel.
- Optionally adjust the settings as shown in the table.
Field | Description |
---|---|
Number of backtests | Backtesting is similar to cross-validation for date/time periods. Optionally customize the number of backtests for your experiment. |
Holdout | The size of the holdout set and each validation set for backtesting. The validation length can be adjusted by changing the holdout length. |
Gap length | The number of time points between the training data set and validation data set for each backtest. When the parameter value is non-zero, the time series values in the gap will not be used to train the experiment or evaluate the current backtest. |
The visualization for the configuration settings lets you examine the backtesting flow. The graphic is interactive, so you can manipulate the settings from the graphic as well as from the configuration fields. For example, by adjusting the gap length, you can see model validation results on earlier time periods of the data without increasing the number of backtests.
Interpreting the experiment results
After you run your time series experiment, you can examine the resulting pipelines to get insights into the experiment details. To view details:
- Hover over nodes on the visualization to get details about the pipelines as they are being generated.
- Toggle to the Progress Map view to see a different view of the training process. You can hover over each node in the process for details.
- After the final pipelines are completed and written to the leaderboard, you can click a pipeline to see the performance details.
- Click the link to View discarded pipelines to view the algorithms used for the pipelines not selected as top performers.
- Save the experiment code as notebook that you can review.
- Save a particular pipeline as a notebook that you can review.
Next steps
- Follow a step-by-step tutorial to train a univariate time series model to predict minimum temperatures using sample data.
- Follow a step-by-step tutorial to train a multivariate time series model to predict energy consumtion for 3 customers using sample data.
- Learn about scoring a deployed time series model.
- Learn about using the API for AutoAI time series experiments.
Additional resources
- For an introduction to forecasting with AutoAI time series experiments, see the blog post Right on time(series): Introducing Watson Studio’s AutoAI Time Series.
- For more information about creating a time series experiment, see this blog post about creating a new time series experiment.
Learn more
Parent topic: AutoAI