Gold Medal Winning Solution to Sales Forecasting Kaggle Competition
JeanFrancoisPuget 2700028FGP Visits (7619)
I had the pleasure to team with Kaggle grandmaster Giba, aka Gilberto Titericz Junior, currently rank
Our solution is a mix of deep learning and gradient boosting models. It is described in more details in our Kaggle writeup. The most noticeable event for us is that we were never near the top during the competition, and we were around the 45th place on the public leaderboard at the end. But we jumped to the 8th place on the private leaderboard. This may sound abstract but its significance is clear when we know how the public/private test data split was done. The public leaderboard is computed on the predictions made for the next 5 days, while the private leaderboard is computed on the predictions made for the days 6 to 16 to come. Therefore, our model was ranked 45th on the first 5 days predictions, and 8th on the longer term predictions. Our model is getting relatively better as the prediction period grows, which means we capture trends better than most.
I think this is due to the way we did cross validation. Cross validation is a general technique to evaluate a model performance when you have a rather small dataset. This idea is to repeatedly split the dataset into a train dataset and a validation dataset, train a model on the train dataset, and evaluate it on the validation dataset. When doing cross validation, depending on the split, a given piece of the dataset can be either part of the train dataset, or the validation dataset.
When dealing with time series forecasting, as in the Corporación Favorita competition, we must be careful as time is not symmetrical. The validation dataset should always be in the future compared to the train dataset. We describe our cross validation approach in detail in our writeup, but the main idea is to create several train/validation splits based on time, then use all these splits as usual in cross validation.
A side effect of this competition is that I became the only Kagg