Charming Python

SimPy simplifies complex models

Simulate discrete simultaneous events for fun and profit


Content series:

This content is part # of # in the series: Charming Python

Stay tuned for additional content in this series.

This content is part of the series:Charming Python

Stay tuned for additional content in this series.

I learned of the SimPy package when I was contacted by one of its creators, Klaus Müller. Dr. Müller had read the Charming Python installments that presented techniques for using Python 2.2+ generators to implement semi-coroutines and "weightless" threads. In particular -- and much to my gratification -- he found that these techniques were useful for implementing Simula-67 style simulations in Python.

It turned out that Tony Vignaux and Chang Chui had previously created another Python library that was conceptually closer to Simscript, and that used standard threading rather than my semi-coroutine technique. Working together, the group determined that the generator-based style was much more efficient and recently launched a GPL'd project on SourceForge, called SimPy (see Related topics for a link to the SimPy home page), currently at beta status. Professor Vignaux hopes to use the unified SimPy package in his future university teaching at the University of Victoria at Wellington; I believe the library is also quite applicable to a variety of applied problems.

Prior to my recent correspondence and investigation, I admit I had no background in simulation as a programming field. I suspect most readers of this installment share my limitation. Although there is some novelty involved in the way one thinks about this style of programming, simulations are useful for understanding the behavior of resource-limited real-life systems. Whether you are interested in limited bandwidth networks, automobile traffic behavior, market and commercial optimization, biological/evolutionary interactions, or other "stochastic" systems, SimPy provides a simple Python tool for such modeling.

In this column, I'll be using the fairly simple example of a grocery store checkout area with multiple aisles. Using the presented simulation, we can ask questions about the economic and wait-time implications of various modifications to scanner technologies, shopper habits, staffing needs, and so on. The nice thing about this modeling is that it lets you make policy decisions in advance while having a clear idea of the implications of changes made. Obviously, most readers will not run a grocery store specifically, but the techniques can be widely applied to a large variety of systems.

Simulation concepts

There are just three abstract/parent classes provided by the SimPy library, and they correspond to the three basic concepts of a simulation. A number of other general functions and constants are used to control the run of a simulation, but the important concepts are tied to the classes.

The central concept in a simulation is a process. A process is simply an object that does something, then sometimes waits for a while before it is ready to do a next thing. In SimPy you can also "passivate" a process, which means that it will not do anything further unless called upon to do so by another process. It is often useful to think of a process as trying to complete a goal. Often a process is programmed as a loop in which multiple actions are performed. Between each action, you insert a Python "yield" statement, which lets the simulation scheduler carry out the actions of each waiting process before returning control.

Many of the actions that processes perform depend on the use of resources. A resource is simply anything that is limited in availability. In a biological model, a resource might be a food supply; in a network model, a resource could be a router or a finite-bandwidth channel; in our market simulation, a resource is a checkout aisle. The only thing a resource does is restrict its usage to one particular process at any given time. Under the SimPy programming model, a process alone determines how long it wants to retain a resource, the resource itself is just passive. In a real-life system, the SimPy model may or may not fit the conceptual scheme; it is easy to imagine a resource that inherently limits its utilization (for example, a server computer that severs a connection if it does not get a satisfactory response in a required timeframe). But as a programming matter, it does not matter greatly whether a process or a resource is the "active" party (just make sure you understand your intention).

The final SimPy class is a monitor. A monitor is not really essential, it is merely a convenience. All a monitor does is record events reported to it, and save statistics about those events (averages, counts, variances, etc). The provided Monitor class is a useful tool for logging simulation measures, but you could record events by any other technique you wished also. In fact, my example subclasses Monitor to provide some (slight) enhanced capabilities.

Setting up shop: programming a simulation

Most of the time in my articles, I present sample applications all at once, but in this case I think it is more useful to walk you through each step of the grocery store application instead. You can clip together each portion if you wish; the SimPy authors will include my example in future releases.

The first step in a SimPy simulation is a few general import statements:

Listing 1. Importing the SimPy library
#!/usr/bin/env python
from __future__ import generators
from SimPy import Simulation
from SimPy.Simulation import hold, request, release, now
from SimPy.Monitor import Monitor
import random
from math import sqrt

Some of the examples accompanying SimPy use an import * style, but I prefer to be more explicit about the namespaces I populate. For Python 2.2 (the minimum version needed for SimPy-) you will need to import the generators feature as indicated. After Python 2.3, this will not be necessary.

For my application, I define a few runtime constants that describe the scenarios I am interested in during a particular simulation run. As I change the scenario, I must edit these constants within the main script. If this were a more fleshed-out application, I might configure these parameters with command-line options, environment variables, or a configuration file. But for now, this style is adequate:

Listing 2. Configuring the simulation parameters
AISLES = 5         # Number of open aisles
ITEMTIME = 0.1     # Time to ring up one item
AVGITEMS = 20      # Average number of items purchased
CLOSING = 60*12    # Minutes from store open to store close
AVGCUST = 1500     # Average number of daily customers
RUNS = 10          # Number of times to run the simulation

The main thing our simulation needs to do is define one or more processes. For the grocery store simulation, the process we are interested in is a customer who checks out at an aisle.

Listing 3. Defining what a customer does
class Customer(Simulation.Process):
    def __init__(self):
        # Randomly pick how many items this customer is buying
        self.items = 1 + int(random.expovariate(1.0/AVGITEMS))
    def checkout(self):
        start = now()           # Customer decides to check out
        yield request, self, checkout_aisle
        at_checkout = now()     # Customer gets to front of line
        yield hold, self, self.items*ITEMTIME
        leaving = now()         # Customer completes purchase
        yield release, self, checkout_aisle

Each customer has decided to purchase a certain number of items. (Our simulation does not cover the selection of items from the grocery aisles; customers simply arrive at checkout with their carts). I am not confident that exponential-variate distribution is genuinely an accurate model here. At the low end it feels right, but I feel like there should be a fuzzy top limit on how many items a real-life shopper ever buys. In any case, you can see how easy it would be to adjust our simulation should better model information become available.

The actions a customer takes is the thing to focus on. The "execution method" of a customer is .checkout(). This process method is often named .run() or .execute(), but in my example, .checkout() seemed most descriptive. You can use whatever name you wish. The only real actions taken by a Customer object are checking the simulated time at a couple of points, and logging some durations to the waittime and checkouttime monitors. But in between the actions are the crucial yield statements. In the first case, the customer requests a resource (a checkout aisle). Customers cannot do anything else until they get the needed resource. Once at the checkout aisle, the customer actually checks out -- which takes an amount of time proportionate to the number of items purchased. Finally, once through the checkout, the customer releases the resource so that other customers can use it.

The above code defines what a Customer class does, but we need to create some actual customer objects before we run a simulation. We could generate a customer object for each customer who will shop during the day, and assign each one an appropriate checkout time. But a more parsimonious approach is to let a factory object generate the needed customer objects "as each customer enters the store." The simulation is not really interested simultaneously in all the customers that will shop during a day, but only in the ones who contend for checkout aisles at the same time. Notice that the Customer_Factory class is itself part of the simulation -- it is a process. Despite possible associations with manufactured robot workers, a la Fritz Lang's Metropolis, you should just think of the customer factory as a programming convenience; it does not correspond directly to anything in the modeled domain.

Listing 4. Generating the customer stream
class Customer_Factory(Simulation.Process):
    def run(self):
        while 1:
            c = Customer()
            Simulation.activate(c, c.checkout())
            arrival = random.expovariate(float(AVGCUST)/CLOSING)
            yield hold, self, arrival

As I mentioned earlier, I want to collect some statistics that the current SimPy Monitor class does not address. Namely, I am not just interested in the average checkout time, but also in the worst case in a given scenario. So I created an enhanced monitor that collects minimum and maximum tallied values.

Watching the simulation with a monitor
class Monitor2(Monitor):
    def __init__(self):
        self.min, self.max = (int(2**31-1),0)
    def tally(self, x):
        Monitor.tally(self, x)
        self.min = min(self.min, x)
        self.max = max(self.max, x)

The final step in our simulation is, of course, to run it. In most of the standard examples, a simulation is run just once. But for my grocery store, I decided to loop through several simulations, each one corresponding to a day of business. This turns out to be a good idea, since some statistics differ fairly significantly from day to day (as different random values are used for customer arrivals and item counts).

Listing 6. Running the simulation day-by-day
for run in range(RUNS):
    waittime = Monitor2()
    checkouttime = Monitor2()
    checkout_aisle = Simulation.Resource(AISLES)
    cf = Customer_Factory()
    Simulation.activate(cf,, 0.0)

    #print "Customers:", checkouttime.count()
    print "Waiting time average: %.1f" % waittime.mean(), \
          "(std dev %.1f, maximum %.1f)" % (sqrt(waittime.var()),waittime.max)
    #print "Checkout time average: %1f" % checkouttime.mean(), \
    #      "(standard deviation %.1f)" % sqrt(checkouttime.var())

Three's a crowd: some outcomes (and what they mean)

When I first thought of my grocery store model, I thought a simulation would answer a few direct questions. For example, I imagined that an owner might have a choice between buying improved scanners (reducing ITEMTIME) or hiring more clerks (increasing AISLES). I thought I would just run the simulation under each scenario -- assuming given labor and technology costs -- and determine which was cheaper.

Once I played with the simulation, I realized something that is probably more interesting than I had anticipated. Looking at all my collected data, I noticed that I did not know what it was I was trying to optimize. For example, is it more important to reduce average checkout time, or to reduce the worst case time? Which makes for better overall customer satisfaction? Moreover, how do I compare the time a customer spends waiting to arrive at a checkout aisle with the time spent ringing up the items? In my personal experience, I get impatient waiting in the line, but I feel less worried once my items are being rung up (even if it takes a while).

Not owning a grocery store, of course, I don't know the answer to all these questions. But the simulation does let me determine exactly what the tradeoffs are; and it is simple enough to tweak behaviors (including those not yet explicitly parameterized -- for example, "do customers really arrive consistently throughout the day?").

Let me show just one closing example that illustrates the value of the model. I wrote above that the behavior of complex systems is hard to conceptualize. Here is something I consider a proof of that fact. What do you think happens when available aisles are reduced from six to five (with other parameters constant)? I would initially expect slightly worst checkout times. The reality is different:

Listing 7. Two sample runs with varying aisles
% python
Waiting time average: 0.5 (std dev 0.9, maximum 4.5)
Waiting time average: 0.3 (std dev 0.6, maximum 3.7)
Waiting time average: 0.4 (std dev 0.8, maximum 5.6)
Waiting time average: 0.4 (std dev 0.8, maximum 5.2)
Waiting time average: 0.4 (std dev 0.8, maximum 5.8)
Waiting time average: 0.3 (std dev 0.6, maximum 5.2)
Waiting time average: 0.5 (std dev 1.1, maximum 5.2)
Waiting time average: 0.5 (std dev 1.0, maximum 5.4)

% python
Waiting time average: 2.1 (std dev 2.3, maximum 9.5)
Waiting time average: 1.8 (std dev 2.3, maximum 10.9)
Waiting time average: 1.3 (std dev 1.7, maximum 7.3)
Waiting time average: 1.7 (std dev 2.1, maximum 9.5)
Waiting time average: 4.2 (std dev 5.6, maximum 21.3)
Waiting time average: 1.6 (std dev 2.6, maximum 12.0)
Waiting time average: 1.3 (std dev 1.6, maximum 7.5)
Waiting time average: 1.5 (std dev 2.1, maximum 11.2)

Rather than being 1/5 worse or something like that, eliminating one checkout aisle causes average waiting time to increase by around 4 times. Moreover, the unluckiest customer (during these particular runs) goes from a 6 minute wait to a 21 minute wait. If I were the manager, I think knowing this threshold condition would be extremely important to customer satisfaction. Who would have known?

Downloadable resources

Related topics

ArticleTitle=Charming Python: SimPy simplifies complex models