Predicting the future with Monte Carlo simulations over IBM Cloud Functions
Who hasn’t dreamed of hopping into a time machine for a quick jump? We’re all fascinated by the idea of seeing into the future: which sports team will take the Cup, what will happen to stock prices, or what is the chance a particular event is going to occur, succeed, or fail? While time travel is still considered science fiction, predicting the future outcomes of certain events is actually more science than fiction, thanks to Monte Carlo simulations.
Monte Carlo simulations are mathematical methods that have been around for more than a century and are used to estimate the future outcomes of certain hard-to-predict events. This makes Monte Carlo the perfect candidate for a broad range of scenarios, starting from mathematical simulations through weather forecasts to complex financial predictions. The first step is to create a model that represents your scenario. For example, the model can describe stock prices or weather conditions. Then Monte Carlo methods use a random sampling on the input data to estimate the outcome of future data for the model. The more you sample, the more accurate your sampling range, the better your estimation.
Understanding the logic behind Monte Carlo is beyond the scope of this blog, but we’re going to demonstrate how IBM Cloud Functions can provide a phenomenal boost to a Monte Carlo simulation, which is considered to be an important High-Performance Computing workload. IBM Cloud Functions is a serverless functions-as-a-service platform that executes code in response to incoming events; it costs nothing when not in use. As we will further show, we managed to complete the entire Monte Carlo simulation in about 90 seconds with 1000 concurrent invocations, compared to 247 minutes with almost 100% CPU utilization running the same flow over a laptop with 4 CPU cores.
Monte Carlo simulations for stock prices
Using Monte Carlo simulations to estimate stock prices has also been around for about a century. Nevertheless, this remains a hot research topic, with dozens of recent research papers and blogs. The general idea is to use past stock prices as input and run Monte Carlo simulations to generate a forecast for the future stock price. You may wonder, how this is possible, since there are so many uncertainties when it comes to predicting the price. This is exactly where Monte Carlo excels – estimating the outcome of what is hard to predict.
Monte Carlo simulations are more accurate for long term predictions, so the more input you use, the more forecasts you generate, and the further ahead you can predict, resulting in a more accurate estimation. While Monte Carlo is a fascinating topic, it’s clearly not that simple to get correct stock estimations and requires extensive knowledge, research, and special techniques that need to be taken into account. But it’s the combination of Monte Carlo and Cloud Functions that really got us excited.
Gil Vernik, Ohad Zohar, Ido Yehezkel (from left to right )
Intrigued by Monte Carlo algorithms, two students at the Technion’s computer science department decided to explore different platforms that could be used to run Monte Carlo simulations. The students, Ido Yehezkel and Ohad Zohar, wrote Python code to process a number of forecasts that would predict stock prices for a specific number of days. They successfully tested their code locally on a small number of forecasts. Then, they wanted to take the next step and run the same code, but with massive scale and parallelism to process more than 100,000 forecasts predicting 1095 days of stock prices. Scaling so many forecasts on a local computer was not feasible due to the amount of computation required. Moreover, using cloud based VMs also demanded additional complexity to scale the forecasts inside the VM.
Exploring alternative ways to address the challenge with minimal code efforts, Ido and Ohad approached me and a joint collaboration was born. We decided to explore how we could leverage IBM Cloud Functions for Monte Carlo simulations. Instead of running all the computations locally, we would let IBM Cloud Functions run the computations on a massive scale.
This would allow us to focus on the business logic of running Monte Carlo predictions and worry less about how to make it happen in parallel. We came to the conclusion that the PyWren-IBM-Cloud framework was the perfect way to go.
Boosting Monte Carlo predictions with PyWren over IBM Cloud Functions
The goal of PyWren, developed by RiseLab, is to provide a simple ‘push to the cloud’ experience: Users can focus on their Python code, while PyWren focuses on the code execution in the cloud. PyWren is a Python library that leverages serverless computing to execute any Python function with its dependencies. PyWren runs the function at massive scale while monitoring executions, obtaining results, and much more. PyWren is unique in that users don’t need to be familiar with serverless code or be concerned with invocation aspects. It knows how to ‘deliver’ a user’s Python code to the serverless cloud.
PyWren-IBM-Cloud is an open source project, based on PyWren. PyWren-IBM-Cloud is not, however, just a mere reimplementation of PyWren’s API on top of IBM Cloud Functions. It can be viewed as an advanced extension of PyWren to run broader types of jobs, better object storage integration, etc. You can read more about PyWren-IBM-Cloud here.
Ido and Ohad wrote two Python functions: predict() method to run a number of forecasts each predicting a specific number of days, and combine() to summarize the results of predict() method. The approach they took was to execute multiple stock prediction forecasts ( predict() ), all running as separate invocations inside IBM Cloud Functions. Once all the forecasts were completed, a single combiner (aka ‘reducer’ in map-reduce paradigm) was used to aggregate the results from all the invocations and generate the graph with the subsequent predictions.
The main challenge was how to scale the forecast functions across thousands of IBM Cloud Function invocations. The students were concerned that additional code and complex testing processes would be needed to monitor and coordinate the executions, and then summarize the results.
This is exactly where PyWren-IBM-Cloud came to the rescue. With only two lines of additional code, they managed to execute 100,000 forecasts distributed across 1000 concurrent Cloud Function invocations.
pw = pywren.ibm_cf_executor() res = pw.map_reduce(predict, invocations=1000, combine).get_result()
As you see, this is incredibly simple, without any need to write additional code or learn how to scale the code as serverless actions. We simply let PyWren-IBM-Cloud scale the function in the IBM Cloud Functions. That’s it.
Try it yourself
We created a public Jupiter notebook here that contains all the code and steps needed to create an input data set for you to experiment. Just open an account in the IBM Cloud and then register to IBM Cloud Functions and IBM Cloud Object Storage. You can import the example notebook into any Jupiter notebook or use Watson Studio.
Results of the experiment
As mentioned, it’s a complex matter to estimate stock prices and many researchers are creating more accurate estimation models. Our experiment was not to demonstrate how Monte Carlo predicts stock prices, but to show how IBM Cloud Functions can be used as a platform for Monte Carlo simulations. We’ll leave you to decide what else it can predict.
In our experiment, we used Monte Carlo simulation with and without IBM Cloud Functions to estimate stock prices and forecast the value of IBM stock. As input, we used IBM daily stock prices for 2014, 2015, and 2016—and we generated a future prediction of IBM stock prices for the end of 2019. We ran a simulation consisting of 100,000 forecasts, each predicting 1095 days.
Running this simulation on our local laptop with a 2.5 GHz Intel Core i7 having 4 cores and 16 GB memory took us about 247 minutes, with almost 100% CPU utilization. We then used PyWren-IBM-Cloud to run exactly the same code. PyWren-IBM-Cloud distributed all 100,000 forecasts across 1000 concurrent IBM Cloud Function invocations. Using PyWren-IBM-Cloud we managed to complete the entire simulation in about 90 seconds!
PyWren-IBM-Cloud is the perfect tool to scale embarrassingly parallel algorithms. Monte Carlo is only one family of those algorithms. We plan to explore and publish more use cases and examples where PyWren-IBM-Cloud does the magic by scaling user code in the cloud. On the technical side, we are still working to add more capabilities and improve existing flows. PyWren-IBM-Cloud is an open source project and IBM Cloud has a free plan … so enough words … let’s try it!
The author would like to thank Josep Sampe, for his technical support and enthusiasm for this work. Josep is a major open source contributor to PyWren-IBM-Cloud.
Monte Carlo simulations are being used today to predict weather forecasts, damage from natural disasters, breast cancer, DNA damage to electrons, and more. Readers are welcome to share their suggestions for additional use cases that we can try out.
Disclaimer: The author of this blog and the students mentioned did not innovate the Monte Carlo Method for handling the unpredictability of stock prices, nor do they provide any results for prediction based on long-term data sources.