Machine Learning From Weather Forecast
JeanFrancoisPuget 2700028FGP Comments (4) Visits (16798)
Using weather data in machine learning is promising. For instance, everyone knows that weather forecast influences buying patterns, be it for apparel, food, or travel. Wouldn't it be nice to capture weather forecast effect on these?
All we need to do is to use weather data in addition to other data we have , then use our favorite machine learning toolbox.
It sounds simple but there is a catch. I try to explain what it is below.
For the sake of simplicity we will assume we want to predict future sales, but what follows applies to any situation where we want to use weather forecast as part of a machine learning application.
If you omit weather data, then sales forecast is a classical problem for which several statistical and machine learning techniques can be applied, for instance Arima. These techniques deal with time series: past sales data is ordered by time, and the model extrapolates the time series. In a nutshell, the model finds trends in past sales data, and applies the trend to the current data.
One way to assess model accuracy is to run it against historical data, and compare predictions with actual sales. For this you must use historical data that was not used when creating the model, see Overfitting In Machine Learning to know why, if it is not clear.
Assuming you use held out historical data, you would, for each week w, run your model with all weeks wi with wi < w as input, then you would compare the output of the model with sales at week w. If your model is good, then you would get something like this where the predicted values are close to the actual values:
You can then use the model to predict future sales.
If you want to predict with weather forecast, your model will take two time series as input:
and it will output sales forecast for the coming week.
Issue is that past forecasts aren't available in general. Most weather forecast providers store past observed weather, but they don't store past weather forecast because it would require way more storage capacity.
The usual way to get around for the lack of past weather forecast is to approximate it by using past observed weather. The weather forecast for past week w will be the observed weather for week w.
While this seems appealing it creates an issue. Indeed, when using actual weather in place of weather forecast we assume perfect weather forecast. But when we will use our model on current data, we will have current weather forecast as input, which is unlikely to be perfect. Our model will assume it is perfect anyway, and it may rely on it too much. This may negatively impact our sales forecast accuracy.
The cure to this issue is to really use past forecast. Unfortunately, as said above, this data isn't stored in general. One way to cope with its absence is to reconstruct it by running weather forecasting models with past weather data. For each week in the past we would run the weather forecast model with previous weeks as input, and store the result. This is what our weather forecast team at IBM does using Deep Thunder technology ( wikipedia link ). This is the rich man's solution, expensive but quite effective.
If you cannot reconstruct past weather forecasts, a poor man solution can be to add noise to past weather data when you use it as a proxy for weather forecast. This way you no longer assume perfect weather forecasts, and your model will probably be better off.
Chose the solution you want, but do not use raw past weather as a proxy for past weather forecast. It will lead you to disappointing results.