Using Serverless to Run Your Python Code on 1000 Cores by Changing Two Lines of Code

The integration of Lithops with IBM Cloud Code Engine – our next-generation serverless offering — offers unprecedented flexibility.

Object storage data (pre-)processing, hyperparameter optimization, searching and processing logs, heavy computational tasks (e.g., Monte Carlo simulations or genome analytics), downloading large volumes of data, web scraping, model scoring, etc. are just a few examples of scenarios where lots of CPU, memory and/or network-intensive work needs to be done.

A common approach to handling this programmatically is to just run a “for” loop and kick off asynchronous processing within that loop. A common approach in Python is to use multiprocessing.Pool or concurrent.futures, where the map operation is called with the function to be executed as one parameter, and the list of (many) objects to be processed as a parameter. The remainder of this doc focuses on the former one for the sake of simplicity, but there is also an equivalent approach for concurrent.futures.

Below is a conceptual example of how this works. The map operation receives two parameters:

results = multiprocessing.Pool.map(convert_image, [image1, image2, image3])
 
def convert_image(image_url):
    <logic downloading the image and converting it somehow>
 
print(results[0].get())
print(results[1].get())
…

The beauty of this approach is that it is entirely serverless. In order to use it, a developer only has to pass in the operation to be executed n times and the n objects they’d like to have processed. However, with the original version of these libraries, the restriction is that this can only take advantage of the CPU cores, memory, network bandwidth, etc. available to the (virtual) machine the Python process is running on.

Wouldn’t it be nice if for each call of Pool.map with n objects as a parameter, n containers got spun up behind the scenes (or a smaller number, in case the elements are chunked)? Each of the containers would handle their part of the work, and they would vanish automatically once the work was completed. This would also demonstrate nicely how the often-discussed relevance of cloud to developers can be realized. The developer only has to write code; when executing the code, hundreds or thousands of CPU cores are (de-)provisioned transparently behind the scenes:

Lithops + IBM Cloud Code Engine

This foundational approach is implemented in the open source Lithops project as a client-side library.

The integration of Lithops with IBM Cloud Code Engine — our next-generation serverless offering — offers unprecedented flexibility. Amongst several other things, Code Engine allows you to allocate a large number of parallel containers behind the scenes. Each of them can be provisioned within seconds, with a max of 32 GBs of memory, 8 CPU cores and a maximum execution time of 2 hours (and these are just the defaults every user gets out-of-the-box; we can raise them further on a per-user basis).

The code changes required for adopting Lithops in any existing Python program are very minimal. In the ideal case, it’s literally only about following a so-called “drop-in library” approach — changing an import statement from multiprocessing.Pool to lithops.multiprocessing.Pool.

The initial config instructions are also very minimal — you can find them documented here.

Beyond that, Lithops allows for pure parallelization across a conceptually unlimited pool of resources and the application of a reduce/aggregation step at the end. That can be accomplished by simply passing in the reduce function as a third parameter — for example, map_reduce(business_logic, <list_of_data_elements, reduce_operation).

The price doesn’t change — whether it’s 10 cores for 100 seconds or 100 cores for 10 seconds, it doesn’t make a difference for the price. Obviously, depending on the nature of the problem, there is a certain percentage of overhead for distributing the work, which needs to be taken into account.

Another advantage is that instead of an approach where capacity has to be allocated based on the most expensive operation in your Python program, each individual map operation in a longer Python program would just dynamically allocate exactly the capacity need. This allows for a significant performance boost in combination with significant cost savings (see diagram below):

Obviously, this approach can be leveraged in interactive (data science and other) scenarios, where the data scientist wants to run some heavy processing operation while waiting in front of their screen for it to finish — by using pure Python with an editor, using a Jupyter notebook (e.g., in Watson Studio or elsewhere), etc. It’s also applicable to continuously running backend applications written in Python (or, basically, any other piece of Python code which needs to do some heavy lifting).

As indicated in the diagrams above, from a developer perspective, this looks like a single program running on a single computer. In reality, a very large amount of distributed capacity is being (de-)allocated dynamically. All of this takes us one step further towards our vision of a serverless supercomputer, where we treat the cloud as a single computer in itself vs. a collection of VMs, containers, web servers, app servers and database servers. Watch this space.

Get started

Try it out by yourself by using IBM Cloud Code Engine and following the instructions captured here.

We would like to express our special thanks to URV, in particular Josep Sampe and Pedro Garcia-Lopez, who have been absolutely instrumental in developing Lithops and contributed multi-processing API. We very much enjoy and appreciate the collaboration.

Was this article helpful?

YesNo

Michael Behrendt

Distinguished Engineer, Serverless/FaaS Strategy

Gil Vernik

IBM Research

Josep Sampé

Postdoctoral Researcher