This entry is a follow-on to my most recent entry. The idea is that random variables are are

**the****way**describe the uncertain quantities that arise in managing development efforts. They are a natural extension of the fixed variables we all grew up with. In fact, a fixed variable in a random variable that has probability one of taking a given value and probability zero of taking any other value. In this entry, I explain how you can (with computer assistance) calculate with random variables.Suppose you want to add two random variables, v1 and v2. This need might arise if you have two serialized tasks in a Gantt chart, each described by a distribution as explain in the previous entry and you would like to know how long it would take to complete both of them,

How would you proceed? First note the sum would be another random variable. Therefore what you need is the probability distribution of the sum. There is no formula for that distribution, but there is an effective, commonly used numerical approach, known as Monte Carlo simulation.

*i.e.*the sum of their durations.How would you proceed? First note the sum would be another random variable. Therefore what you need is the probability distribution of the sum. There is no formula for that distribution, but there is an effective, commonly used numerical approach, known as Monte Carlo simulation.

The idea behind Monte Carlo simulation is to use a pseudo random number generator take a sample value of v1 and a sample value of v2 and then add them. For more detail, follow this link. Note that the values are selected according to the probability distributions of each of the variables. The more likely values are taken more often. Now save that sum and do the same thing many times, say 100,000 times, and store each of the sums. For each of the sums, you can compute its probability by looking at its frequency in the collection of saved sums (some sums are more frequent than others) and divide by the number of samples (actually you have to round the sums to get the counts). What you get is an approximation of the distribution of the sums.

Lets look at an example, if v1 has a triangular distribution with L = 3, E = 4, H=7, as shown in Figure 2 and v2 has a triangular distribution with L = 1, E = 6, H= 7 as seen figure 3.

The distribution of the sum, found using the Monte Carlo simulator in Focal Point, is given by figure 3.

First note the sum is not another triangular distribution. It is
smoother. This is to be expected from the mathematics of probability. On
the other hand, the distribution of the sum makes sense. For example,
we would expect the most likely value of the sum to be 10, the sum of
the two most likely values, but the simulation found 9.98. The
discrepancy is due to chance and would diminish with more samples. Also
note the probability is zero below 4, the sum of the lows, and above 14,
the sums of the highs.

For fun, here is the distribution of the product for the variables:

For fun, here is the distribution of the product for the variables:

The reader can check if this looks sensible. Note also that the product does not have a triangular distribution. The peak is much smoother.

So random variables can be used in place of fixed variables in any computation. So they have all of the utility of fixed variables and enable us to express uncertainty. They may seem foreign at first, but they are worth the trouble to learn. Like anything else, they become intuitive after a while.

## Comments (4)

1PankajSinha commented PermalinkGreat article Murray, thanks for sharing. Focal Point can really get some method out of chaotic random variables which can really help with decision making!

2JimDensmore commented PermalinkBy extension, my experience is that probability distributions of results are smoother than new users expect. A naive approach to a situation where cost per user is given by the triangular distribution ($1,000, $1,500, $3,000) and the number of users is (20, 27, 30) is to assume that the total cost will be between $20,000 and $90,000. Well, the assumption is technically correct of course, but in practice the 10% value in the result distribution is much further to the right than most intuitively expect - in this case, it's at $33,000. The 90% is ~$64,000 - even further from the $90,000 absolute limit. In other words, if one propagates the low and high values of the multiplication as $20k/$90k instead of doing the arithmetic with Monte Carlo simulation, it significantly skews the results, it just doesn't reflect reality.

3SriramMahalingam commented PermalinkGreat article that explains. Equally revealing is Jim's example in @2.

4mcantor@us.ibm.com commented PermalinkThanks Pankaj, Jim, Sriram,