 | Level: Advanced Joe Marasco, CEO, Ravenflow
15 Aug 2002 from The Rational Edge: While shipwrecked on a desert island, our fictional project manager comes across a seemingly simple math problem. He takes us through the solution, makes some observations, and poses a challenge at the end for readers.
In the heart of the baseball season, our old friend Roscoe Leroy returns with a tale from his past. Whilst shipwrecked on a desert island, he came across a seemingly simple math problem. He takes us through the solution, makes some observations, and poses a challenge at the end for our readers.
Roscoe Sets the Stage
"Remember the time I got shipwrecked in the South Pacific with my pal named Monday?" Roscoe began. Monday, it should be noted, was Roscoe's answer to Robinson Crusoe's Friday--his traveling companion and overall helper. Roscoe and Monday had spent many a night under the stars, and, if two guys had to be marooned, those two had a chance of surviving together without killing each other.
"Yeah," I replied. "You guys were really lucky; as I recall, you were the only survivors. The island was tropical; there was plenty of food, and you had shelter from the elements. All you had to do was wait to be rescued."
"Well," snapped back Roscoe, "all we could do was wait. There was no way we could accelerate anything. Our biggest problem was boredom; there were no books in the flotsam and jetsam, and we desperately needed a way to pass the time without going bonkers."
It turns out that in going through their salvage inventory, Roscoe and Monday discovered a mint set of baseball cards. After awhile, Monday suggested they use the statistics on the baseball cards to construct two teams, so that they could conduct some "fantasy" baseball games to pass the time. Fortunately, they had an ample supply of pencils and paper. Roscoe had even rescued his pocket slide rule.
"As soon as Monday came up with the idea," said Roscoe, "I was keen on it. It would give us something to do, and it would be a harmless form of competition. So I set about figuring out how to create the game.
"And that is how the thorny problem came about."
Simulating the Batter
"In all these simulations," continued Roscoe, "you need to devise a sort of random number generator."
"Well," I said, "what devices did you have at your disposal?"
"Not much," responded Roscoe. "All we had was three identical dice of the usual variety, with one to six spots on each. Still, I figured it would be easy to simulate probabilities with them, since that was really what we needed to do. We decided that the simplest thing to do was roll the three dice simultaneously, add up the number of spots, and use that total to determine success or failure."
"Probabilities? I'm not sure I understand," I said.
"Sure, probabilities! That's how the fantasy baseball concept works. For example, assume you have a hitter at the plate with a batting average of .250. At the lowest level of sophistication (forget, for a minute, about walks) we need a way of randomly deciding if he gets a hit for this at-bat, so we need to create a random event that has a probability of occurring 250 times out of a thousand." I had never known Roscoe to be that interested in probability and statistics, but I was about to be impressed.
"Now in reality, it's more complicated. In our game, for example, we ignored the quality of the pitcher, and complicated situations like sacrifice flies, and so on."
"Well, assuming you have made these simplifications, how do you actually simulate a batter's appearance at the plate?" I asked.
"We have only two things to think about," Roscoe replied. "Plate appearances and official at-bats. When a player gets a walk, it counts as a plate appearance, but not as an official at-bat. So basically, you need two numbers: the percentage of plate appearances that yield official at-bats for that player, and then his batting average. Suppose, for example, that a player walks in one out of every ten plate appearances.
1
What you would do is then first determine if the player walks, by asking for a successful trial of an event with probability 0.1. With the three dice, you'd need to know what combined number on a given throw has that probability. So imagine that you roll the dice, and you get that total--then, bingo! The batter walks to first base, and you're done.
"On the other hand, if you don't roll that total, then the player does
not walk; instead, he has an official at-bat. Now you take his batting
average, say .250, and you roll for that probability. If you're successful,
he makes a hit; if not, he is out. Then, of course, you can roll for what
kind of hit (single, double, triple, home run) by using those statistics
per at-bat. And so on. I used batting average as the generic example,
but basically you can refine the simulation to your heart's content, depending
on how many of the "corner cases" you want to include.
2
It will always, however, come down to simulating an event with a certain
probability. And, because sometimes we will need probabilities for things
other than batting average, we need to be able to cover the entire range
from zero (total failure) to one (certain success.)"
"Fair enough. I get the gist of it. So what's the problem?" I replied in turn.
"The problem," said Roscoe, "is this: Can we figure out how to simulate
probabilities using a device as simple as three dice, and still have enough
granularity to make the simulation reasonable? For example, if we can
only simulate .250, .500, and .750, we don't have enough values to do
a good simulation."
"Well, if you throw all three dice simultaneously, you get totals that range from 3 to 18; that's sixteen different outcomes, so that's a start," I responded.
"Indeed," said Roscoe, "that is exactly how we proceeded."
First Steps
"Monday got right on the basic combinatorics. With 3 six-sided dice, there were 6 x 6 x 6 possible outcomes, or 216 possibilities. Of course, as there were only 16 different totals, many of these yielded the same answer. So Monday constructed the following table:
|
Total
|
Number of Ways
| | 3 | 1 | | 4 | 3 | | 5 | 6 | | 6 | 10 | | 7 | 15 | | 8 | 21 | | 9 | 25 | | 10 | 27 | | 11 | 27 | | 12 | 25 | | 13 | 21 | | 14 | 15 | | 15 | 10 | | 16 | 6 | | 17 | 3 | | 18 | 1 | | | | Sum | 216 |
|
"Looks good to me," I replied. "There is some obvious symmetry. For example, both 3 and 18 come up exactly once, as you would expect. And 4 and 17 are the same, and so on. The most frequent occurrences are 10 and 11, as there are lots of combinations that will yield those totals. And the number of total ways adds up to 216, so you can't be too far off the mark. But it looks like there are, so far, only 8 distinct probabilities in the offing."
"Appearances can be deceiving," said Roscoe. "But let's add in the probabilities we have so far to double check our work." He then produced the following table:
3
|
Total
|
Number of Ways
|
Probability
| | 3 | 1 | 0.00463 | | 4 | 3 | 0.01389 | | 5 | 6 | 0.02778 | | 6 | 10 | 0.04630 | | 7 | 15 | 0.06944 | | 8 | 21 | 0.09722 | | 9 | 25 | 0.11574 | | 10 | 27 | 0.12500 | | 11 | 27 | 0.12500 | | 12 | 25 | 0.11574 | | 13 | 21 | 0.09722 | | 14 | 15 | 0.06944 | | 15 | 10 | 0.04630 | | 16 | 6 | 0.02778 | | 17 | 3 | 0.01389 | | 18 | 1 | 0.00463 | | | | | Sum | 216 | 1.00000 |
|
"Whoop de do!" I exclaimed. "Roscoe can divide by 216."
"Calm down, Sonny," said Roscoe. "The fun is only beginning."
Second Steps
"Well, of course, it should be obvious that if you want to simulate a probability of 0.00463, all you require to be successful is that the shooter roll a 3. You could ask him to roll an 18, which is the symmetrical result; however, we are looking for distinct probabilities, so we pick one or the other. We pick 3." Roscoe waited a second, and then sprang his first surprise.
"But what happens if you define 'success' as rolling either a 3 or a 4? Now the probability of rolling a 3 is 0.00463, and the probability of rolling a 4 is 0.01389, so the probability of rolling either a 3 or a 4 is simply 0.00463 + 0.01389, or 0.01852. So you can see that we can create some new distinct probabilities by considering multiple totals as 'successful.'"
"Let me see if I get this," I replied. "To simulate 0.00463, I demand that the shooter roll a 3. To simulate 0.01389, I require that he roll a 4. And to simulate 0.01852, I ask that he roll either a 3 or a 4. Is it that simple?"
"Yeah, you've got the hang of it. But how do you start to figure out all the other possibilities?" Roscoe smiled that smile that bordered on a smirk. I started to crank up my "thinking on my feet" engine.
Generating More Probabilities
"Hmm," I said. "Let's add a column to your table. We can generate some new probabilities by considering more 'cumulative' outcomes." So I augmented Roscoe's table, as follows:
|
Total
|
Number of Ways
|
Probability
|
Probability of "N or Less"
| | 3 | 1 | 0.00463 | 0.00463 | | 4 | 3 | 0.01389 | 0.01852 | | 5 | 6 | 0.02778 | 0.04630 | | 6 | 10 | 0.04630 | 0.09259 | | 7 | 15 | 0.06944 | 0.16204 | | 8 | 21 | 0.09722 | 0.25926 | | 9 | 25 | 0.11574 | 0.37500 | | 10 | 27 | 0.12500 | 0.50000 | | 11 | 27 | 0.12500 | 0.62500 | | 12 | 25 | 0.11574 | 0.74074 | | 13 | 21 | 0.09722 | 0.83796 | | 14 | 15 | 0.06944 | 0.90741 | | 15 | 10 | 0.04630 | 0.95370 | | 16 | 6 | 0.02778 | 0.98148 | | 17 | 3 | 0.01389 | 0.99537 | | 18 | 1 | 0.00463 | 1.00000 | | | | | | Sum | 216 | 1.00000 | |
|
"Well, you're on the right track," Roscoe offered. "Your fourth column definitely adds some new probabilities. For example, the probability of shooting either a 3 or a 4 is 0.00463 + 0.01389, which you have calculated to be 0.01852. You express this '3 or 4' as the probability of '4 or less.' By '5 or less' you mean that rolling a total of either 3, 4, or 5 defines a successful outcome.
"In fact, that's exactly how Monday and I proceeded. Problem is, you haven't gone nearly far enough."
"Before you jump ahead," I said, "let's look at how many probabilities we have so far. I'll shade the non-redundant ones in the table." My table now looked like this:
|
Total
|
Number of Ways
|
Probability
|
Probability of "N or Less"
| | 3 | 1 | 0.00463 | 0.00463 | | 4 | 3 | 0.01389 | 0.01852 | | 5 | 6 | 0.02778 | 0.04630 | | 6 | 10 | 0.04630 | 0.09259 | | 7 | 15 | 0.06944 | 0.16204 | | 8 | 21 | 0.09722 | 0.25926 | | 9 | 25 | 0.11574 | 0.37500 | | 10 | 27 | 0.12500 | 0.50000 | | 11 | 27 | 0.12500 | 0.62500 | | 12 | 25 | 0.11574 | 0.74074 | | 13 | 21 | 0.09722 | 0.83796 | | 14 | 15 | 0.06944 | 0.90741 | | 15 | 10 | 0.04630 | 0.95370 | | 16 | 6 | 0.02778 | 0.98148 | | 17 | 3 | 0.01389 | 0.99537 | | 18 | 1 | 0.00463 | 1.00000 | | | | | | Sum | 216 | 1.00000 | |
|
"So I count 8 in the first column and 14 in the second column, for a total of 22. Is that what you get?" I asked.
"Not exactly," said Roscoe. "We have what might be called an 'accidental
degeneracy.' Note that the probability of rolling a 6 is identical to
the probability of rolling a '5 or less.' That's because there are 10
ways to make a 6, and (1 + 3 + 6 = 10) ways to make a 3 or 4 or 5, which
is 5 or less. So I guess you have 21 distinct probabilities so far.
"But, as I said, you haven't gone far enough. There are lots more combinations."
Of Course, We've Already Left the World of Baseball
Things were getting a little hairy. "What did you guys do next?" I asked Roscoe.
"Well, a couple of things became apparent," replied Roscoe. "In fact,
Monday sat me down for a chat, and what he said made a lot of sense.
"First thing he told me was that if we wanted to generate baseball probabilities, this was not the best way to proceed. He came up with a scheme whereby we could roll a single die several times and generate what we needed for baseball in about 3 rolls. I had to agree with him on that," Roscoe continued.
"But he also said that the problem of creating probabilities out of
the total of 3 identical dice rolled simultaneously was interesting, in
and of itself. He was now more interested in that than in the original
problem. So we decided that we would concentrate our focus on seeing if
we could figure that one out in all its generality.
"This sometimes happens in the real world, by the way," remarked Roscoe. "We start out trying to solve one problem, only to discover a new, more interesting, problem in the process. I think it is called serendipity, or something like that."
Reality Is Ugly
"The next thing we decided was that figuring out all the combinations was going to be laborious. We started making some tables but quickly gave up. Monday said he wanted to sleep on it a bit." Roscoe lit up a stogie, and I figured the rest of the tale would come out now.
"Wouldn't you know it, that Monday came back with the answer the next day," continued Roscoe.
4
"Monday concluded that the first thing to do was to decide how many probabilities were possible as an upper limit. Since there were only 216 different ways to roll the dice, that had to be the maximum. So the best we could possibly do was to cover the interval from zero to one in 216 steps. From a granularity point of view, the optimum solution would be to have equal steps of 1/216, or 0.00463."
"Well," I said, "we know we can generate the first one!"
Roscoe grinned.
"Why was it important to try to determine the maximum possible number of probabilities?" I asked.
"Well," said Roscoe, "if you don't do that, you have a problem. Suppose
you find that there are 87 distinct probabilities.
5
How do you know if you've got them all, and not missed some? Once you
know the maximum possible, you can stop if you attain it. If not, you
have to figure out why you couldn't get the others.
"Actually, all we have to do is figure out if we can do 108 possibilities up to a probability of 0.5. Then we can get the other 108 by taking the complementary solution," said Roscoe.
That made sense. This was a common trick in this area. If you know that rolling a 3 has a probability of 0.00463, then rolling anything but a 3 will have a probability of (1 - 0.00463), or 0.99537. So getting to a probability of 0.5 is always good enough.
Monday's Solution
"Monday started out systematically. First, he worked with the number of ways a total could be rolled, knowing that we can always convert to probabilities by dividing the number of ways by 216. That, it turns out, is easier to think about than decimal numbers."
"His first approach," continued Roscoe, "was to see if all of the first 9 'ways' could be constructed. Here is the little table he came up with:
|
Ways
|
Need to Roll
| | 1 | 3 | | 2 | 3 or 18 | | 3 | 4 | | 4 | 3 or 4 | | 5 | 3 or 4 or 18 | | 6 | 5 | | 7 | 3 or 5 | | 8 | 3 or 5 or 18 | | 9 | 4 or 5 |
|
"I'm beginning to see," I said. "What Monday is going to try to do is see if he can cover the entire set of 'ways' up to 108, using the elements in the 'ways' column of 1, 3, 6, 10, 15, 21, 25, and 27. Very clever."
"Yes," replied Roscoe, "that's the idea. But remember, he can only use each element twice. The symmetry of the problem is helpful, but notice that 'ways' get used up, so we need to be careful. As an example, suppose we used 3 and 18, and needed one more 'way.' We would be stuck. So there are constraints we have to watch out for."
"Wow," I said, "so the answer is still in doubt."
"Monday was equal to the task, it turns out," said Roscoe. "Here is the rest of his logic: To get 10 ways, you just use a roll of 6. By using 6 and its companion of 15, we have two '10s' to work with, so we can now get from 1 to 29. We can now add 1 to 29 to anything, provided we don't need any additional totals of 3, 4, 5, 6, 15, 16, 17, or 18. We have basically used up the 'ways' elements corresponding to 1, 3, 6, and 10."
"Well, 29 is still a long way from 108," I said.
Roscoe completed Monday's solution. "If you now use one of the '21s,' say 8, you extend the range from 29 to 50. And you still have both '15s', both '25s', and both '27s' to work with. Using the second 21 extends the range from 50 to 71. The pair of '27s' takes us to 125, well past the 108 we needed. So it is possible."
"What you are saying, then," I responded, "is that every way is possible, so that we can uniformly cover the interval in steps of 1/216. That is pretty amazing. Did you actually construct the table?"
"It was easy, once we knew it could be done," said Roscoe. "Here it is."
"Note," said Roscoe, "that for some of these the answer is not unique. But remember also that it doesn't have to be. We just need to find one set of totals that gives us the right number of 'ways' to get there."
Lessons Learned
Roscoe seemed less than totally happy at the end of the story. "What's bugging you, Roscoe?" I ventured.
"Well, first of all," he replied, "I got suckered once again. I thought
I had a baby problem on my hands, and it turned out to be much more complex
than I first thought. That's always annoying.
"Second, one of my time-honored techniques washed out on me," he continued.
"Usually when a problem starts to grow teeth, I revert to a simpler instance
of the problem to gain insight. But in this case, the simpler case of
the sum of two dice was totally useless.
"Third, Monday's solution is convincing, especially when you actually have the table in hand. But even there, it seems like he used some heuristics instead of a logical proof. Although I am the last guy in the world to criticize anyone for getting the answer any way he can. Never let it be said that I was an advocate of excess purity, let alone elegance."
Now, I take a much more clement approach. Roscoe, and especially his
buddy Monday, had done a great job using pencil, paper, slide rule, and
common sense. In fact, had Roscoe lost his slide rule in the shipwreck,
Monday still could have solved the problem, as the division by 216 is
a purely cosmetic event--we can state probabilities as 67/216 if we
have to.
6
That the problem can be
solved on a desert island, with no computers, no Microsoft Excel, and
no pivot tables, by using just time, energy, and intellectual curiosity,
is wonderful. It also means that Blaise Pascal could have solved the problem
in the seventeenth century; he had all the tools he needed back then.
And he didn't lack for intelligence or intellectual curiosity, either.
Roscoe and Monday wound up being able to extract a uniform probability distribution to about half a percent by simply rolling three dice and looking at the total, so long as they used the algorithm of specifying the probability first and then deciding whether the roll was successful or not. In the original baseball example, it meant they could get batting averages to within 5 points.
7
"And here's the last laugh," Roscoe concluded. "That Monday showed he really understood the problem with the following observation. If you do the same experiment with four dice, you can prove that it is impossible to cover the interval from zero to one uniformly, as you could with three. That is, you can generate many more probabilities, but you cannot have them spread out equally. Now that is an interesting result, and it is true not only for four dice, but for any number of dice greater than three. Put that in your pipe and smoke it!"
I encourage readers to see if they can rediscover Monday's impossibility proof for four or more dice.
Notes
1
We note here that statistics such as official at-bats and walks (from which plate appearances can be calculated) are the kinds of numbers that appear on the baseball cards mentioned above in the salvage inventory.
2
Another example of a "corner case" would be the
probability, in the case of an out, that the batter hits into a double
play if there is a man on first base. There are actually people who construct
these kinds of fantasy simulations for a living; as baseball mavens record
almost every kind of statistic imaginable, this is a natural result. It
should also be noted that our "scheme" for simulating the batter's plate
appearance is certainly not unique; there are many ways to achieve the
same result. For example, you could recompute the batting average based
on plate appearances, and then compute walks, singles, doubles, triples,
and home runs from renormalized statistics. There are many, many alternative
formulations, of which we have picked just one as an example. However,
regardless of the formulation, one of the big challenges before the widespread
availability of computers was figuring out how to do the probability simulation
using simple, low-cost devices available to most people.
3
Note that the probabilities have, in some cases, 5 significant figures. On his desert island, Roscoe and his slide rule could do 3 at best. 0.00463 represents 3 significant figures; the others should be appropriately rounded. The display is less jarring this way, but we should not attribute any significance to figures smaller than thousandths.
4
Some of you have come to the conclusion that Roscoe is an avatar for your humble author. If that is true, then Monday is an avatar for my son David, who figured out the solution to this problem. I may be good at discovering interesting problems, but David is even better at solving them. And the name of the avatar is Monday, because that is the day we come to work.
5
As Roscoe himself did in an early attempt to solve this problem.
6
Not to mention the obvious: Roscoe could still do "long division" by hand if he had to.
7
Note that averages of .400 or greater for a single season have not been achieved in more than 60 years. The recently deceased Ted Williams hit .406 in 1941, and is the last player to have batted over .400 for a season.
About the author  | 
|  | Recently appointed CEO of Ravenflow, an Emeryville, California-based company delivering precision requirements validation for software developers, Joe Marasco served as senior vice president and business unit manager for Rational Software prior to the company's acquisition by IBM. He held numerous positions of responsibility in marketing, development, and the field sales organization, overseeing initiatives for Apex and Visual Modeler for Microsoft Visual Studio. After retiring from Rational in 2003, he published The Software Development Edge, a collection of his essays on software project management originally published in The Rational Edge. He holds a Ph.D. in physics from the University of Geneva, Switzerland, and an M.B.A. from the University of California, Irvine. |
Rate this page
|  |