Contents


On demand data in Python, Part 3

Coroutines and asyncio

Learn how to dramatically improve the performance of software with input/output processing

Comments

Content series:

This content is part # of # in the series: On demand data in Python, Part 3

Stay tuned for additional content in this series.

This content is part of the series:On demand data in Python, Part 3

Stay tuned for additional content in this series.

In Part 1 of this series, you looked at Python iterators, in Part 2 you learned about itertools. In this part, you're going to learn about coroutines, a special sort of generator function. You'll also learn about another powerful but tricky standard library module: asyncio.

Imagine you walk into a tiny restaurant. There are three tables and only one server. You know what to expect. The server will come to you with the menu, and return for your order. Once the cook has prepared the order, the server will bring it to you. After you finish eating the server brings you the check, and returns to the table when you're ready with payment.

The other tables are happily enjoying their meals because while you are thinking of what to order, or while the cook is preparing your food, or while you are eating your food, the server is available to handle any of these steps with other tables. You might have to wait a few minutes from time to time when more than one table needs attention from the server at the same time, but this shouldn't be too noticeably the case.

If it takes the average dinner party an hour from walking in the door to walking out the door as the only occupied table, it might take an hour and ten minutes if there is one other table, and an hour and twenty if both other tables are occupied. That's not too bad.

Now imagine instead that you walk in, but it turns out that the server only attends to one table until they have completed all the steps. You could wait up to two hours before you even begin your meal because the server deals with the other tables first. This would not be a popular restaurant.

Synchronous versus Asynchronous

Surprisingly, much of the computer code we write works like this very inefficient restaurant. In computer terms, the unpopular restaurant case is called serial operation and the server's actions are said to be synchronous. The restaurant situation we're used to, where the server can attend to different tables at a time as they progress through their needs, is called parallel operation and the server's actions are said to be asynchronous.

The reason I'm spending so much time on this analogy is that it illustrates one of the most important techniques a developer should properly learn in order to write scalable applications which use databases, networks, and other such resources that depend on input/output (I/O). Real world restaurants use asynchronous processes because they wouldn't be desirable or competitive otherwise. Ideally, real-world programs would tend to use asynchronous processes, but doing so takes the right tools, libraries, skill, and practice by the developer. This tutorial, the third part of the series, is a gentle introduction of how to do so in Python.

I do want to mention that Python's facilities to support asynchronous programming are quite layered and tricky, with many different ways to do things. Some of these facilities are relatively recent additions and still have some of the trimmings from their experimental stage. Nevertheless, the topic is important, and it's well worth persevering. I will deliberately guide you through a pragmatic subset of these facilities, and once you are familiar with the basic ideas, you can explore other approaches on your own.

Coroutines

You learned in the previous tutorial about generator functions, and how these were different from regular functions. When the caller invokes regular functions, the process starts at the top and exits at one place depending on the function's logic. With generators, the caller can enter and exit a single function multiple times, suspending and resuming its execution.

A function that can be entered and exited multiple times, suspended and resumed each time, is called a coroutine. A generator is just a simplified sort of coroutine. Python has several types of coroutines, but the focus of this tutorial is the type that's designed to support asynchronous programming. Let's go back to the restaurant analogy. The menu/order/eat/check/payment sequence for each table is a separate coroutine, but the server suspends and resumes attention to each table so that all three coroutines can run at the same time, possibly in different stages of the process. The well-trained brain of the server acts as a scheduler for juggling these parallel coroutines.

In the synchronous restaurant, everything is a regular function. It is entered once, when the party arrives, and exited once, when they leave. Only one such function is running at a time, so parties might have to wait as long as two hours to start their own dining experience.

In the asynchronous restaurant, a coroutine function is entered for the first time when the party arrives, creating a coroutine object, and exited for the last time when they leave, in which case the coroutine object is no longer needed. However, after the server brings the menus they can suspend the coroutine object for that particular table and check to see if any of the other tables need attention. Same thing after any given table has ordered, received their check, etc.

Restaurant server code

Taking advantage of the fact that Python is almost as readable as pseudocode, here is an actual implementation of the server coroutine.

async def serve_table(table_number):
    await get_menus()
    print('Welcome. Please sit at table', table_number, 'Here are your menus')
    order = await get_order()
    print('Table', table_number, 'what will you be having today?')
    await prepare_order(order)
    print('Table', table_number, 'here is your meal:', order)
    await eat()
    print('Table', table_number, 'here is your check')
    await get_payment()
    print('Thanks for visiting us! (table', table_number, ')')

Rather than just def, this function is defined using async def. This marks it as an asynchronous coroutine function. I'll mention in passing that there are also asynchronous coroutine generator functions, which have a yield statement somewhere in the body, but those are a special case and beyond the scope of this tutorial series. Honestly, the zoo of function/generator/coroutine types in Python 3 is rather bewildering, but again I'm going to ignore some of the possibilities in this tutorial series and present a simple pathway to get you started.

Within the body of serve_table are a series of await statements. This creates a coroutine object from the called coroutine function and invokes this object, also yielding control to any other coroutines that are ready to run. This is the equivalent of the restaurant server starting a process such as having the cook begin preparing a meal and at the same time checking to see if any of the other tables need attention.

This juggling of tasks happens in the well-trained server's brain, and the equivalent of this in Python is called the event loop. We'll return to this in a moment.

More coroutines

Let's look at the implementations of the other coroutines invoked by serve_table.

async def get_menus():
    delay_minutes = random.randrange(3) #0 to 3 minutes
    await asyncio.sleep(delay_minutes) #Pretend a second is a minute

async def get_order():
    delay_minutes = random.randrange(10)
    await asyncio.sleep(delay_minutes)
    order = random.choice(['Special of the day', 'Fish & Chips', 'Pasta'])
    return order

async def prepare_order(order):
    delay_minutes = random.randrange(10, 20) #10 to 20 minutes
    await asyncio.sleep(delay_minutes)
    print('   [Order ready from kitchen: ', order, ']')

async def eat():
    delay_minutes = random.randrange(20, 40)
    await asyncio.sleep(delay_minutes)

async def get_payment():
    delay_minutes = random.randrange(10)
    await asyncio.sleep(delay_minutes)

These functions use a sleep timer to simulate taking time to do some processing. The random.randrange function gives a range of integers from which one is picked at random. The function asyncio.sleep is a special coroutine which suspends action for the given number of seconds. During this sleep period, the event loop is, of course, free to run any other coroutine that's ready. As usual, you invoke this using the await keyword.

I'll take this moment to mention that you can only use the await keyword from the body of an asynchronous coroutine function (for example, defined using async def). Using await anywhere else is a syntax error.

Notice the get_order coroutine returns a value. This value is passed back in the await statement of the caller.

Pulling it all together: the event loop

I mentioned the event loop earlier. You need some special set-up code to get into asynchronous mode, creating an event loop that schedules and manages coroutines as you have coded them to run as cooperating tasks. asyncio coroutines are also called tasks, which keeps things simple. When a coroutine uses await to turn over control to another coroutine, it's actually handing control back to the event loop. The event loop is like the well-trained brain of the server.

Here is code for running the restaurant serve coroutines we've defined so far.

#Create coroutines for three tables
gathered_coroutines = asyncio.gather(
    serve_table(1),
    serve_table(2),
    serve_table(3)
)

#asyncio uses event loops to manage its operation
loop = asyncio.get_event_loop()
#This is the entry from synchronous to asynchronous code. It will block
#Until the coroutine passed in has completed
loop.run_until_complete(gathered_coroutines)
#We're done with the event loop
loop.close()

The special coroutine asyncio.gather takes one or more other coroutines and schedules them all to run, and only completes after all the gathered coroutines have completed. It's used here to run the coroutines for three tables in the event loop, which is first obtained using asyncio.get_event_loop. The next line runs the given coroutine until it completes. Because it's passed a gathered set of three coroutines, it ends up running until all three of those are complete. Of course, each serve_table coroutine invokes additional coroutines, such as get_menus and get_order, invoked using await and then scheduled by the event loop.

The full program

Listing 1. serve_tables.py is the entire program
import random
import asyncio

async def get_menus():
    delay_minutes = random.randrange(3) #0 to 3 minutes
    await asyncio.sleep(delay_minutes) #Pretend a second is a minute

async def get_order():
    delay_minutes = random.randrange(10)
    await asyncio.sleep(delay_minutes)
    order = random.choice(['Special of the day', 'Fish & Chips', 'Pasta'])
    return order

async def prepare_order(order):
    delay_minutes = random.randrange(10, 20) #10 to 20 minutes
    await asyncio.sleep(delay_minutes)
    print('   [Order ready from kitchen: ', order, ']')

async def eat():
    delay_minutes = random.randrange(20, 40)
    await asyncio.sleep(delay_minutes)

async def get_payment():
    delay_minutes = random.randrange(10)
    await asyncio.sleep(delay_minutes)

async def serve_table(table_number):
    await get_menus()
    print('Welcome. Please sit at table', table_number, 'Here are your menus')
    order = await get_order()
    print('Table', table_number, 'what will you be having today?')
    await prepare_order(order)
    print('Table', table_number, 'here is your meal:', order)
    await eat()
    print('Table', table_number, 'here is your check')
    await get_payment()
    print('Thanks for visiting us! (table', table_number, ')')

#Create coroutines for three tables
gathered_coroutines = asyncio.gather(
    serve_table(1),
    serve_table(2),
    serve_table(3)
)

#asyncio uses event loops to manage its operation
loop = asyncio.get_event_loop()
#This is the entry from synchronous to asynchronous code. It will block
#Until the coroutine passed in has completed
loop.run_until_complete(gathered_coroutines)
#We're done with the event loop
loop.close()

Here is an example of output from running this program.

Welcome. Please sit at table 1 Here are your menus
Welcome. Please sit at table 2 Here are your menus
Table 1 what will you be having today?
Welcome. Please sit at table 3 Here are your menus
Table 3 what will you be having today?
Table 2 what will you be having today?
   [Order ready from kitchen:  Pasta ]
Table 1 here is your meal: Pasta
   [Order ready from kitchen:  Fish & Chips ]
Table 3 here is your meal: Fish & Chips
   [Order ready from kitchen:  Special of the day ]
Table 2 here is your meal: Special of the day
Table 3 here is your check
Table 1 here is your check
Thanks for visiting us! (table 3 )
Thanks for visiting us! (table 1 )
Table 2 here is your check
Thanks for visiting us! (table 2 )

Notice a delay of a few seconds between most of these lines. This is the sleep delay in the various coroutines which simulates the time it takes to do things in the restaurant. Time is compressed, with one second of the program representing one minute in the restaurant. Because the sleep delays are of a random length, the messages appear in a different order each time you run the program.

Also, notice that the program doesn't always begin neatly with table 1, then table 2, then table 3. The asyncio.gather coroutine schedules the coroutines you give it but in no particular order.

The main thing to appreciate here is the flow of cooperative multitasking. Study the full listing above, while running and tweaking the code until you have a good feel of how coroutines release and regain control. Sometimes all three serve_table coroutine objects invoke one of the other coroutines, all of which happen to be waiting on a sleep delay. That's when you don't see any output for a few seconds. At those times the event loop is patiently checking each coroutine to see when it's ready to resume.

Adding a coroutine

I mentioned how you get delays between the output running the program in listing 1. It is more user-friendly to show some sort of progress indicator. You can use the magic of cooperative multitasking to implement this. The coroutine function below displays a dot a couple times a second as a progress indicator.

async def progress_indicator(delay, loop):
    while True:
        try:
            await asyncio.sleep(delay)
        except asyncio.CancelledError:
            break
        #Print a dot, with no newline afterward & force the output to appear immediately
        print('.', end='', flush=True)
        #Check if this is the last remaining task, and exit if so
        num_active_tasks = [ task for task in asyncio.Task.all_tasks(loop)
                                  if not task.done() ]
        if len(num_active_tasks) == 1:
            break

This function takes two parameters, the minimum delay between printing a dot, and the event loop object. This is the object that for example you've seen created near the bottom of listing 1. There are quite a few bits of asyncio to which you'll want to pass a loop object, just to make sure you're keeping the cooperation among a controlled group of coroutines. In this case, pass the event loop to asyncio.Task.all_tasks, which then returns a list of all the tasks (i.e. coroutines), which have been scheduled in that event loop, including those which have completed. To get only the ones that have not completed, screen the list further using task.done.

Say you create a coroutine object from this function, passing in a delay of 0.5. It goes straight into an infinite loop, in a way you might remember from infinite generators in the previous tutorial. It then invokes the sleep delay but accounts for an exception if an external entity cancels the coroutine, which can happen in several ways. In such cases, the coroutine is interrupted with the asyncio.CancelledError exception, causing us to break out of the infinite loop.

After the coroutine resumes normally, it prints a dot and then checks whether all other coroutines have run their course. If progress_indicator is the single remaining coroutine, it breaks out of the infinite loop.

Listing 2. A full listing updated to use the progress_indicator coroutine
import random
import asyncio

async def get_menus():
    delay_minutes = random.randrange(3) #0 to 3 minutes
    await asyncio.sleep(delay_minutes) #Pretend a second is a minute

async def get_order():
    delay_minutes = random.randrange(10)
    await asyncio.sleep(delay_minutes)
    order = random.choice(['Special of the day', 'Fish & Chips', 'Pasta'])
    return order

async def prepare_order(order):
    delay_minutes = random.randrange(10, 20) #10 to 20 minutes
    await asyncio.sleep(delay_minutes)
    print('   [Order ready from kitchen: ', order, ']')

async def eat():
    delay_minutes = random.randrange(20, 40)
    await asyncio.sleep(delay_minutes)

async def get_payment():
    delay_minutes = random.randrange(10)
    await asyncio.sleep(delay_minutes)

async def progress_indicator(delay, loop):
    while True:
        try:
            await asyncio.sleep(delay)
        except asyncio.CancelledError:
            break
        #Print a dot, with no newline afterward & force the output to appear immediately
        print('.', end='', flush=True)
        #Check if this is the last remaining task, and exit if so
        num_active_tasks = [ task for task in asyncio.Task.all_tasks(loop)
                                  if not task.done() ]
        if len(num_active_tasks) == 1:
            break

async def serve_table(table_number):
    await get_menus()
    print('Welcome. Please sit at table', table_number, 'Here are your menus')
    order = await get_order()
    print('Table', table_number, 'what will you be having today?')
    await prepare_order(order)
    print('Table', table_number, 'here is your meal:', order)
    await eat()
    print('Table', table_number, 'here is your check')
    await get_payment()
    print('Thanks for visiting us! (table', table_number, ')')

#asyncio uses event loops to manage its operation
loop = asyncio.get_event_loop()

#Create coroutines for three tables
gathered_coroutines = asyncio.gather(
    serve_table(1),
    serve_table(2),
    serve_table(3),
    progress_indicator(0.5, loop)
)

#This is the entry from synchronous to asynchronous code. It will block
#Until the coroutine passed in has completed
loop.run_until_complete(gathered_coroutines)
#We're done with the event loop
loop.close()

Notice that now the event loop comes before creating the gathering of coroutines. That's because this loop must pass to progress_indicator, as you can see in the list of coroutines to be gathered.

The following output is from a sample run:

.Welcome. Please sit at table 3 Here are your menus
..Welcome. Please sit at table 2 Here are your menus
Welcome. Please sit at table 1 Here are your menus
........Table 3 what will you be having today?
......Table 1 what will you be having today?
....Table 2 what will you be having today?
..........   [Order ready from kitchen:  Fish & Chips ]
Table 3 here is your meal: Fish & Chips
..............   [Order ready from kitchen:  Fish & Chips ]
Table 2 here is your meal: Fish & Chips
......   [Order ready from kitchen:  Pasta ]
Table 1 here is your meal: Pasta
..............................Table 3 here is your check
Thanks for visiting us! (table 3 )
..........Table 2 here is your check
..Thanks for visiting us! (table 2 )
......................Table 1 here is your check
..........Thanks for visiting us! (table 1 )
.

The progress indicator dots appear regularly, about every half second.

What sort of multitasking is this?

If you've ever done multithreading or multiprocessing in Python, you might wonder how this asyncio cooperative multitasking approach compares. The main difference is that in the asyncio approach, you're not actually trying to have two coroutines do something at the same physical moment in time, just as a restaurant server can't give table 1 menus at the exact same time that they serve table 3 its meal. What the asyncio event loop is doing is taking advantage of the natural downtimes within tasks, allowing coroutines to do work when there is work to do, but then cede control to other coroutines when they go idle.

A coroutine doesn't have control of when it gets to run again, and there is a reason this is called cooperative multitasking. If one coroutine spends too long without yielding control back to the event loop, it blocks everything, causing unnecessary delay, and you lose the multitasking benefits. This means you must first of all make sure your program is suited to be implemented this way, and you must then carefully code your program by breaking it into coroutines which release control to each other at suitable times. This can be trickier than it sounds because you could innocently call a regular function from a coroutine which takes a long time, and the problem won't be readily apparent.

As a general rule of thumb asyncio event loops are best for programs that frequently connect to networks, or that do a lot of querying of a database and the like. Waiting for a remote server or database to respond to a request or query is an ideal time to release control to the event loop. In the past, programmers tended to use threads in such cases, but asyncio event loops are a much clearer and flexible way to program than multithreading. One complication is that to gain the full benefits of asyncio event loops you need your network and database APIs to be coded in asyncio coroutines. Luckily, there are now many Python third-party libraries implemented to take advantage of asyncio.

Nevertheless, you might sometimes run into a case where you want to use asyncio but need to use a library that does not support asyncio. In other words, you need to call synchronous code from asynchronous code without spoiling the multitasking. You can do this with asyncio executors which run the synchronous code in a separate thread or process. I wanted to mention this because you might be wondering, but further detail is outside the scope of these tutorials.

Conclusion

As you get more and more proficient with asyncio, you'll learn of other exotic concepts related to the technique, including the impressively named "futures." You'll also learn that there are different ways for a coroutine to release control to the event loop, including async with, and if you're using Python 3.6 or newer, async for. I won't cover the latter since this tutorial series has Python 3.5 as the minimum requirement, but in the next tutorial, you will learn about async with, along with other cool techniques.

Related topics

See the tech talks, code patterns and read blogs about Python.

Browse the Python courses on cognitiveclass.ai


Downloadable resources


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Data and analytics, Open source
ArticleID=1061896
ArticleTitle=On demand data in Python, Part 3: Coroutines and asyncio
publish-date=07022018