Skip to main content

skip to main content

developerWorks  >  Rational  >

Desert Island Math

developerWorks
Document options

Document options requiring JavaScript are not displayed


My developerWorks needs you!

Connect to your technical community


Rate this page

Help us improve this content


Level: Advanced

Joe Marasco, CEO, Ravenflow

15 Aug 2002

from The Rational Edge: While shipwrecked on a desert island, our fictional project manager comes across a seemingly simple math problem. He takes us through the solution, makes some observations, and poses a challenge at the end for readers.

Illustration In the heart of the baseball season, our old friend Roscoe Leroy returns with a tale from his past. Whilst shipwrecked on a desert island, he came across a seemingly simple math problem. He takes us through the solution, makes some observations, and poses a challenge at the end for our readers.

Roscoe Sets the Stage

"Remember the time I got shipwrecked in the South Pacific with my pal named Monday?" Roscoe began. Monday, it should be noted, was Roscoe's answer to Robinson Crusoe's Friday--his traveling companion and overall helper. Roscoe and Monday had spent many a night under the stars, and, if two guys had to be marooned, those two had a chance of surviving together without killing each other.

"Yeah," I replied. "You guys were really lucky; as I recall, you were the only survivors. The island was tropical; there was plenty of food, and you had shelter from the elements. All you had to do was wait to be rescued."

"Well," snapped back Roscoe, "all we could do was wait. There was no way we could accelerate anything. Our biggest problem was boredom; there were no books in the flotsam and jetsam, and we desperately needed a way to pass the time without going bonkers."

It turns out that in going through their salvage inventory, Roscoe and Monday discovered a mint set of baseball cards. After awhile, Monday suggested they use the statistics on the baseball cards to construct two teams, so that they could conduct some "fantasy" baseball games to pass the time. Fortunately, they had an ample supply of pencils and paper. Roscoe had even rescued his pocket slide rule.

"As soon as Monday came up with the idea," said Roscoe, "I was keen on it. It would give us something to do, and it would be a harmless form of competition. So I set about figuring out how to create the game.

"And that is how the thorny problem came about."



Back to top


Simulating the Batter

"In all these simulations," continued Roscoe, "you need to devise a sort of random number generator."

"Well," I said, "what devices did you have at your disposal?"

"Not much," responded Roscoe. "All we had was three identical dice of the usual variety, with one to six spots on each. Still, I figured it would be easy to simulate probabilities with them, since that was really what we needed to do. We decided that the simplest thing to do was roll the three dice simultaneously, add up the number of spots, and use that total to determine success or failure."

"Probabilities? I'm not sure I understand," I said.

"Sure, probabilities! That's how the fantasy baseball concept works. For example, assume you have a hitter at the plate with a batting average of .250. At the lowest level of sophistication (forget, for a minute, about walks) we need a way of randomly deciding if he gets a hit for this at-bat, so we need to create a random event that has a probability of occurring 250 times out of a thousand." I had never known Roscoe to be that interested in probability and statistics, but I was about to be impressed.

"Now in reality, it's more complicated. In our game, for example, we ignored the quality of the pitcher, and complicated situations like sacrifice flies, and so on."

"Well, assuming you have made these simplifications, how do you actually simulate a batter's appearance at the plate?" I asked.

"We have only two things to think about," Roscoe replied. "Plate appearances and official at-bats. When a player gets a walk, it counts as a plate appearance, but not as an official at-bat. So basically, you need two numbers: the percentage of plate appearances that yield official at-bats for that player, and then his batting average. Suppose, for example, that a player walks in one out of every ten plate appearances. 1 What you would do is then first determine if the player walks, by asking for a successful trial of an event with probability 0.1. With the three dice, you'd need to know what combined number on a given throw has that probability. So imagine that you roll the dice, and you get that total--then, bingo! The batter walks to first base, and you're done.

"On the other hand, if you don't roll that total, then the player does not walk; instead, he has an official at-bat. Now you take his batting average, say .250, and you roll for that probability. If you're successful, he makes a hit; if not, he is out. Then, of course, you can roll for what kind of hit (single, double, triple, home run) by using those statistics per at-bat. And so on. I used batting average as the generic example, but basically you can refine the simulation to your heart's content, depending on how many of the "corner cases" you want to include. 2 It will always, however, come down to simulating an event with a certain probability. And, because sometimes we will need probabilities for things other than batting average, we need to be able to cover the entire range from zero (total failure) to one (certain success.)"

"Fair enough. I get the gist of it. So what's the problem?" I replied in turn.

"The problem," said Roscoe, "is this: Can we figure out how to simulate probabilities using a device as simple as three dice, and still have enough granularity to make the simulation reasonable? For example, if we can only simulate .250, .500, and .750, we don't have enough values to do a good simulation."

"Well, if you throw all three dice simultaneously, you get totals that range from 3 to 18; that's sixteen different outcomes, so that's a start," I responded.

"Indeed," said Roscoe, "that is exactly how we proceeded."



Back to top


First Steps

"Monday got right on the basic combinatorics. With 3 six-sided dice, there were 6 x 6 x 6 possible outcomes, or 216 possibilities. Of course, as there were only 16 different totals, many of these yielded the same answer. So Monday constructed the following table:

Total Number of Ways
31
43
56
610
715
821
925
1027
1127
1225
1321
1415
1510
166
173
181
Sum216

"Looks good to me," I replied. "There is some obvious symmetry. For example, both 3 and 18 come up exactly once, as you would expect. And 4 and 17 are the same, and so on. The most frequent occurrences are 10 and 11, as there are lots of combinations that will yield those totals. And the number of total ways adds up to 216, so you can't be too far off the mark. But it looks like there are, so far, only 8 distinct probabilities in the offing."

"Appearances can be deceiving," said Roscoe. "But let's add in the probabilities we have so far to double check our work." He then produced the following table: 3

Total Number of Ways Probability
310.00463
430.01389
560.02778
6100.04630
7150.06944
8210.09722
9250.11574
10270.12500
11270.12500
12250.11574
13210.09722
14150.06944
15100.04630
1660.02778
1730.01389
1810.00463
Sum2161.00000

"Whoop de do!" I exclaimed. "Roscoe can divide by 216."

"Calm down, Sonny," said Roscoe. "The fun is only beginning."



Back to top


Second Steps

"Well, of course, it should be obvious that if you want to simulate a probability of 0.00463, all you require to be successful is that the shooter roll a 3. You could ask him to roll an 18, which is the symmetrical result; however, we are looking for distinct probabilities, so we pick one or the other. We pick 3." Roscoe waited a second, and then sprang his first surprise.

"But what happens if you define 'success' as rolling either a 3 or a 4? Now the probability of rolling a 3 is 0.00463, and the probability of rolling a 4 is 0.01389, so the probability of rolling either a 3 or a 4 is simply 0.00463 + 0.01389, or 0.01852. So you can see that we can create some new distinct probabilities by considering multiple totals as 'successful.'"

"Let me see if I get this," I replied. "To simulate 0.00463, I demand that the shooter roll a 3. To simulate 0.01389, I require that he roll a 4. And to simulate 0.01852, I ask that he roll either a 3 or a 4. Is it that simple?"

"Yeah, you've got the hang of it. But how do you start to figure out all the other possibilities?" Roscoe smiled that smile that bordered on a smirk. I started to crank up my "thinking on my feet" engine.



Back to top


Generating More Probabilities

"Hmm," I said. "Let's add a column to your table. We can generate some new probabilities by considering more 'cumulative' outcomes." So I augmented Roscoe's table, as follows:

Total Number of Ways Probability Probability of
"N or Less"
310.004630.00463
430.013890.01852
560.027780.04630
6100.046300.09259
7150.069440.16204
8210.097220.25926
9250.115740.37500
10270.125000.50000
11270.125000.62500
12250.115740.74074
13210.097220.83796
14150.069440.90741
15100.046300.95370
1660.027780.98148
1730.013890.99537
1810.004631.00000
Sum2161.00000

"Well, you're on the right track," Roscoe offered. "Your fourth column definitely adds some new probabilities. For example, the probability of shooting either a 3 or a 4 is 0.00463 + 0.01389, which you have calculated to be 0.01852. You express this '3 or 4' as the probability of '4 or less.' By '5 or less' you mean that rolling a total of either 3, 4, or 5 defines a successful outcome.

"In fact, that's exactly how Monday and I proceeded. Problem is, you haven't gone nearly far enough."

"Before you jump ahead," I said, "let's look at how many probabilities we have so far. I'll shade the non-redundant ones in the table." My table now looked like this:

Total Number of Ways Probability Probability of
"N or Less"
310.004630.00463
430.013890.01852
560.027780.04630
6100.046300.09259
7150.069440.16204
8210.097220.25926
9250.115740.37500
10270.125000.50000
11270.125000.62500
12250.115740.74074
13210.097220.83796
14150.069440.90741
15100.046300.95370
1660.027780.98148
1730.013890.99537
1810.004631.00000
Sum2161.00000

"So I count 8 in the first column and 14 in the second column, for a total of 22. Is that what you get?" I asked.

"Not exactly," said Roscoe. "We have what might be called an 'accidental degeneracy.' Note that the probability of rolling a 6 is identical to the probability of rolling a '5 or less.' That's because there are 10 ways to make a 6, and (1 + 3 + 6 = 10) ways to make a 3 or 4 or 5, which is 5 or less. So I guess you have 21 distinct probabilities so far.

"But, as I said, you haven't gone far enough. There are lots more combinations."



Back to top


Of Course, We've Already Left the World of Baseball

Things were getting a little hairy. "What did you guys do next?" I asked Roscoe.

"Well, a couple of things became apparent," replied Roscoe. "In fact, Monday sat me down for a chat, and what he said made a lot of sense.

"First thing he told me was that if we wanted to generate baseball probabilities, this was not the best way to proceed. He came up with a scheme whereby we could roll a single die several times and generate what we needed for baseball in about 3 rolls. I had to agree with him on that," Roscoe continued.

"But he also said that the problem of creating probabilities out of the total of 3 identical dice rolled simultaneously was interesting, in and of itself. He was now more interested in that than in the original problem. So we decided that we would concentrate our focus on seeing if we could figure that one out in all its generality.

"This sometimes happens in the real world, by the way," remarked Roscoe. "We start out trying to solve one problem, only to discover a new, more interesting, problem in the process. I think it is called serendipity, or something like that."



Back to top


Reality Is Ugly

"The next thing we decided was that figuring out all the combinations was going to be laborious. We started making some tables but quickly gave up. Monday said he wanted to sleep on it a bit." Roscoe lit up a stogie, and I figured the rest of the tale would come out now.

"Wouldn't you know it, that Monday came back with the answer the next day," continued Roscoe. 4

"Monday concluded that the first thing to do was to decide how many probabilities were possible as an upper limit. Since there were only 216 different ways to roll the dice, that had to be the maximum. So the best we could possibly do was to cover the interval from zero to one in 216 steps. From a granularity point of view, the optimum solution would be to have equal steps of 1/216, or 0.00463."

"Well," I said, "we know we can generate the first one!"

Roscoe grinned.

"Why was it important to try to determine the maximum possible number of probabilities?" I asked.

"Well," said Roscoe, "if you don't do that, you have a problem. Suppose you find that there are 87 distinct probabilities. 5 How do you know if you've got them all, and not missed some? Once you know the maximum possible, you can stop if you attain it. If not, you have to figure out why you couldn't get the others.

"Actually, all we have to do is figure out if we can do 108 possibilities up to a probability of 0.5. Then we can get the other 108 by taking the complementary solution," said Roscoe.

That made sense. This was a common trick in this area. If you know that rolling a 3 has a probability of 0.00463, then rolling anything but a 3 will have a probability of (1 - 0.00463), or 0.99537. So getting to a probability of 0.5 is always good enough.



Back to top


Monday's Solution

"Monday started out systematically. First, he worked with the number of ways a total could be rolled, knowing that we can always convert to probabilities by dividing the number of ways by 216. That, it turns out, is easier to think about than decimal numbers."

"His first approach," continued Roscoe, "was to see if all of the first 9 'ways' could be constructed. Here is the little table he came up with:

Ways Need to Roll
13
23 or 18
34
43 or 4
53 or 4 or 18
65
73 or 5
83 or 5 or 18
94 or 5

"I'm beginning to see," I said. "What Monday is going to try to do is see if he can cover the entire set of 'ways' up to 108, using the elements in the 'ways' column of 1, 3, 6, 10, 15, 21, 25, and 27. Very clever."

"Yes," replied Roscoe, "that's the idea. But remember, he can only use each element twice. The symmetry of the problem is helpful, but notice that 'ways' get used up, so we need to be careful. As an example, suppose we used 3 and 18, and needed one more 'way.' We would be stuck. So there are constraints we have to watch out for."

"Wow," I said, "so the answer is still in doubt."

"Monday was equal to the task, it turns out," said Roscoe. "Here is the rest of his logic: To get 10 ways, you just use a roll of 6. By using 6 and its companion of 15, we have two '10s' to work with, so we can now get from 1 to 29. We can now add 1 to 29 to anything, provided we don't need any additional totals of 3, 4, 5, 6, 15, 16, 17, or 18. We have basically used up the 'ways' elements corresponding to 1, 3, 6, and 10."

"Well, 29 is still a long way from 108," I said.

Roscoe completed Monday's solution. "If you now use one of the '21s,' say 8, you extend the range from 29 to 50. And you still have both '15s', both '25s', and both '27s' to work with. Using the second 21 extends the range from 50 to 71. The pair of '27s' takes us to 125, well past the 108 we needed. So it is possible."

"What you are saying, then," I responded, "is that every way is possible, so that we can uniformly cover the interval in steps of 1/216. That is pretty amazing. Did you actually construct the table?"

"It was easy, once we knew it could be done," said Roscoe. "Here it is."

"Note," said Roscoe, "that for some of these the answer is not unique. But remember also that it doesn't have to be. We just need to find one set of totals that gives us the right number of 'ways' to get there."



Back to top


Lessons Learned

Roscoe seemed less than totally happy at the end of the story. "What's bugging you, Roscoe?" I ventured.

"Well, first of all," he replied, "I got suckered once again. I thought I had a baby problem on my hands, and it turned out to be much more complex than I first thought. That's always annoying.

"Second, one of my time-honored techniques washed out on me," he continued. "Usually when a problem starts to grow teeth, I revert to a simpler instance of the problem to gain insight. But in this case, the simpler case of the sum of two dice was totally useless.

"Third, Monday's solution is convincing, especially when you actually have the table in hand. But even there, it seems like he used some heuristics instead of a logical proof. Although I am the last guy in the world to criticize anyone for getting the answer any way he can. Never let it be said that I was an advocate of excess purity, let alone elegance."

Now, I take a much more clement approach. Roscoe, and especially his buddy Monday, had done a great job using pencil, paper, slide rule, and common sense. In fact, had Roscoe lost his slide rule in the shipwreck, Monday still could have solved the problem, as the division by 216 is a purely cosmetic event--we can state probabilities as 67/216 if we have to. 6 That the problem can be solved on a desert island, with no computers, no Microsoft Excel, and no pivot tables, by using just time, energy, and intellectual curiosity, is wonderful. It also means that Blaise Pascal could have solved the problem in the seventeenth century; he had all the tools he needed back then. And he didn't lack for intelligence or intellectual curiosity, either.

Roscoe and Monday wound up being able to extract a uniform probability distribution to about half a percent by simply rolling three dice and looking at the total, so long as they used the algorithm of specifying the probability first and then deciding whether the roll was successful or not. In the original baseball example, it meant they could get batting averages to within 5 points. 7

"And here's the last laugh," Roscoe concluded. "That Monday showed he really understood the problem with the following observation. If you do the same experiment with four dice, you can prove that it is impossible to cover the interval from zero to one uniformly, as you could with three. That is, you can generate many more probabilities, but you cannot have them spread out equally. Now that is an interesting result, and it is true not only for four dice, but for any number of dice greater than three. Put that in your pipe and smoke it!"

I encourage readers to see if they can rediscover Monday's impossibility proof for four or more dice.



Back to top


Notes

1 We note here that statistics such as official at-bats and walks (from which plate appearances can be calculated) are the kinds of numbers that appear on the baseball cards mentioned above in the salvage inventory.

2 Another example of a "corner case" would be the probability, in the case of an out, that the batter hits into a double play if there is a man on first base. There are actually people who construct these kinds of fantasy simulations for a living; as baseball mavens record almost every kind of statistic imaginable, this is a natural result. It should also be noted that our "scheme" for simulating the batter's plate appearance is certainly not unique; there are many ways to achieve the same result. For example, you could recompute the batting average based on plate appearances, and then compute walks, singles, doubles, triples, and home runs from renormalized statistics. There are many, many alternative formulations, of which we have picked just one as an example. However, regardless of the formulation, one of the big challenges before the widespread availability of computers was figuring out how to do the probability simulation using simple, low-cost devices available to most people.

3 Note that the probabilities have, in some cases, 5 significant figures. On his desert island, Roscoe and his slide rule could do 3 at best. 0.00463 represents 3 significant figures; the others should be appropriately rounded. The display is less jarring this way, but we should not attribute any significance to figures smaller than thousandths.

4 Some of you have come to the conclusion that Roscoe is an avatar for your humble author. If that is true, then Monday is an avatar for my son David, who figured out the solution to this problem. I may be good at discovering interesting problems, but David is even better at solving them. And the name of the avatar is Monday, because that is the day we come to work.

5 As Roscoe himself did in an early attempt to solve this problem.

6 Not to mention the obvious: Roscoe could still do "long division" by hand if he had to.

7 Note that averages of .400 or greater for a single season have not been achieved in more than 60 years. The recently deceased Ted Williams hit .406 in 1941, and is the last player to have batted over .400 for a season.



About the author

Joe Marasco

Recently appointed CEO of Ravenflow, an Emeryville, California-based company delivering precision requirements validation for software developers, Joe Marasco served as senior vice president and business unit manager for Rational Software prior to the company's acquisition by IBM. He held numerous positions of responsibility in marketing, development, and the field sales organization, overseeing initiatives for Apex and Visual Modeler for Microsoft Visual Studio. After retiring from Rational in 2003, he published The Software Development Edge, a collection of his essays on software project management originally published in The Rational Edge. He holds a Ph.D. in physics from the University of Geneva, Switzerland, and an M.B.A. from the University of California, Irvine.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top