IBM®
Skip to main content
    Country/region [select]      Terms of use
 
 
      
     Home      Products      Services & solutions      Support & downloads      My account     

developerWorks > Security >
developerWorks
Make your software behave: Playing the numbers
e-mail it!
Contents:
How do pseudo-random number generators work?
Borland's implementation of Random ()
Sewing the seeds of destruction
Try not to be so predictable
How to cheat in online gambling
How to read "secret" Netscape messages
Resources
About the authors
Rate this article
Subscriptions:
dW newsletters
dW Subscription
(CDs and downloads)
Truly secure software needs an accurate random number generator

Gary McGraw, Reliable Software Technologies
John Viega, Reliable Software Technologies

04 Apr 2000

Computers, being completely deterministic machines, are particularly bad at behaving randomly (software missteps aside). So when programmers need a truly random number, or a series of them, they must approximate random number generation by various means. In this installment, the first of three articles on the subject, Gary McGraw and John Viega examine how it's done, and present a variety of tricks that can be accomplished as a result. In the next installment in this series, Gary and John will discuss how to approach truly random number generation through hardware.

In this article, we turn to a topic that's recently been bandied about in the news: random number generation. Well, that's not quite accurate. Actually, an exploit that was based on a problem with random number generation was in the news; the cause of the problem itself barely garnered a mention! You may have seen the CNN story on Reliable Software Technology's Internet gambling exploit (see Resources), or perhaps you recall the stories a few years ago about Netscape's SSL implementation being compromised. Even though these two applications sound very different, the root cause of each problem is the same.

The problem occurs when many developers assume that a function called random() in their favorite language is aptly named. Unfortunately, that's a flawed assumption. Typically, a call to random() is really a call to what is known as a "pseudo-random" number generator, and pseudo-random numbers turn out to be not very random. That fact has a profound impact on software security. We'll provide two examples to show why this is so.

Randomness is not as cut-and-dried as you might think: Some streams of numbers are more random than others. Computers, being completely deterministic machines, are particularly bad at behaving randomly (bad OS software aside). In fact, the only true sources for (partially) random numbers involve measuring physical phenomena, such as the timing of radioactive decay, which can be distilled into purely random sequences using some mathematical tricks.

Without access to physical devices, computer programs that need random numbers are forced to generate the numbers themselves. But the determinism of computers makes this algorithmically quite difficult. As a result, most programmers turn to pseudo-random numbers. In this article, we'll look at pseudo-random number generators (PRNGs). We'll go into how they work, and show why they don't work well for security. In our next installment, we'll look at hardware-based solutions for randomness. Then, we'll look at software solutions that straddle the middle ground, offering better security than a pseudo-random number generator, while being more practical than more secure hardware-based solutions.

How do pseudo-random number generators work?
We have established that random() really invokes a pseudo-random number generator. But what is a pseudo-random number generator? Suppose we want to generate a random number between 1 and 10, where every number has an equal probability of appearing. Ideally, we would generate a value on the range from 0 to 1 where every value will occur with equal probability, regardless of the previous value, then multiply that value by 10. Note that there is an infinite number of values between 0 and 1. Also, note that computers do not offer infinite precision!

To write code to implement something like the algorithm presented above, a pseudo-random number generator typically produces an integer on the range from 0 to N, and returns that number divided by N. The resulting number is always between 0 and 1. Subsequent calls to the generator take the integer result from the first run and pass it through a function to produce a new integer between 0 and N, then return the new integer divided by N. That means the number of unique values returned by any pseudo-random number generator is limited by the number of integers between 0 and N.

In most common random number generators, N is 232 -1 (approximately 4 billion), which is the largest value that will fit into a 32-bit number. Put another way, there are at most 4 billion possible values produced by the sort of number generator encountered most often. To tip our hand a bit, this 4 billion number is not all that large.

A number known as the "seed" is provided to a pseudo-random number generator as an initial integer to pass through the function. The seed gets the ball rolling. The output of a pseudo-random number generator contains nothing that is unpredictable. Each value returned by a pseudo-random number generator is completely determined by the previous value it returned (and ultimately, the seed that started it all). If we know the integer used to compute any one value, then we can figure out every subsequent value returned from the generator.

In the end, a pseudo-random number generator is a deterministic program that produces a completely predictable series of numbers (called a stream). A well-written PRNG creates a sequence that shares many of the same properties as a sequence of real random numbers. For example:

  • The PRNG can generate any number in a range with equal probability.
  • The PRNG can produce a stream with any statistical distribution.
  • The stream of numbers produced by the PRNG can lack discernible patterns.

What PRNGs can't do is be unpredictable. If you know the seed, and you know the algorithm, you can easily figure out the sequence.

On one hand, pseudo-random number generators have many useful applications. They work well for Monte Carlo simulations (a group of methods that use random numbers to simulate physical events or to solve mathematical problems) and other statistical sampling or simulation models. These applications typically require only moderate sophistication in their "randomness," requiring only that patterns in a sequence don't resonate with naturally occurring sequences.

On the other hand, pseudo-random number generators are hard to use in a secure manner in applications that require unpredictability, such as shuffling virtual cards or encrypting data.

An example:
The pseudo-random number generator distributed with Borland compilers is reproduced below. This generator is in a class of generators called "linear congruent" generators. This type of generator is incredibly common, and takes on a very specific mathematical form:


Xn+1 = (aXn + b) mod c

In other words, the (n+1)th number is equal to the nth number times some constant value a, plus some constant value b. If the result is greater than or equal to some constant c, then its value is forced to be in range, by taking the remainder when dividing the result by c. Note that a, b, and c are usually prime numbers. Donald Knuth, in "The Art of Computer Programming" (see Resources), goes into detail about how to pick good values for these constants.

Back to the Borland generator. If we know that the current value of RandSeed is 12345, then the next integer produced will be 1655067934. The same thing happens every time you set the seed to 12345.

Borland's implementation of Random ()

Sewing the seeds of destruction
Based on historical precedent, seeds for PRNGs are usually produced by reference to the system clock. The idea is to use some aspect of system time as the seed. This implies if you can figure out what time a generator is seeded, you will know every value produced by the generator (including the order numbers will appear). The upshot of all this is that there is nothing unpredictable about pseudo-random numbers seeded with a clock. As we'll see, this fact has a profound impact on shuffling algorithms and cryptographic keys.

You might think that you would be safe if you could come up with a truly random number as a seed for a PRNG. After all, if we start our deterministic stream at a random beginning point, we're OK, right? Wrong! Even if a bad guy doesn't know the particular number used to seed the generator, an attacker can still predict PRNG-generated numbers by observing your program and making some good guesses. The problem is that an attacker only needs to see one single number as it comes out of the generator to predict the next one (and all other subsequent ones). If you ask for numbers between 0 and 50, and the attacker can see those numbers, the attacker's job becomes harder, but it still isn't a big deal.

Here's why. There are only 4 billion possible places on a standard random number wheel (with many random-number generation algorithms, including linear-congruent generators; all the numbers between 0 and 4,294,967,295 are generated exactly once, then the sequence repeats). If we observe enough numbers, even if the numbers are adjusted to be somewhere between 0 and 50, we will eventually be able to figure out how the generator was seeded. We can do this by trying all 4 billion possible seeds in order and looking for a sub-sequence in the PRNG stream that matches the one your program is exhibiting. Four billion is not a big number these days. Given a handful of good quality PCs, the supposedly-big-enough space can be "brute forced" in almost real time.

One of our problems here is that most machines currently use 32-bit PRNGs. Fortunately, we can solve this one. Using a truly random number to seed a 64-bit PRNG would make things a whole lot harder to crack (not impossible, mind you, even with a good PRNG), since cracking the 64-bit space would probably take months, even with some serious computational horsepower. If you use a good 128-bit PRNG along with a truly random seed, you might hope that's good enough (although not always, it turns out; we'll discuss why in a little while). The reasoning behind the upswing in the seed space is exactly the same as with cryptographic keys: The more bits you use, the harder both keys are to break (as long as everything is being used properly).

Let's explore the crypto key/seed analogy a bit. A brute-force cryptographic attack involves attempting every possible key in order to decrypt a secret message. Likewise, a brute-force attack against a PRNG involves examining every possible seed. A significant body of research studying the necessary lengths of cryptographic keys already exists. Generally speaking, things look like this (in October 1999):

AlgorithmWeak keyTypical keyStrong key
DES40 or 5656Triple-DES
RC46080128
RSA512768 or 10242048
ECC125170230

People used to think that cracking 56-bit DES in real time would take too long to be feasible, but history has shown otherwise. In January 1997, a secret DES key was recovered in 96 days. Later efforts broke keys in 41 days, then 56 hours, and, in January 1999, in 22 hours and 15 minutes. The leap forward in cracking ability doesn't bode well for small key lengths or small sets of PRNG seeds!

People have even gone so far as to invent special machines to crack cryptographic algorithms. In 1998, the EFF created a special-purpose machine to crack DES messages. The purpose of the machine was to emphasize just how vulnerable DES (a popular, government-sanctioned cryptographic algorithm) really is. (To read more on the DES cracker, see Resources.) The ease of "breaking" DES is directly related to the length of its key. Special machines to uncover RNG seeds are not outside the realm of possibility.

Does a 128-bit seed solve all problems? It provides enough of a search space to rule out brute-force attacks. But other types of attacks can be waged against a generator. In practice, it depends on the type of algorithm you are using to generate random numbers. Linear congruent generators (and, in fact, all polynomial congruent generators) turn out to be easy to break with relatively little information, and they should be avoided at all costs. Two articles from now, we'll look at a PRNG algorithm based on cryptographic primitives that seems to do a better job than traditional generators.

Try not to be so predictable
We've focused on one important problem with traditional PRNGs, but we're not out of the woods yet! We don't want to mislead you into thinking that using a 64-bit PRNG (or better) along with a good algorithm will give you any security. Even 128-bit PRNGs are completely predictable! If you want security, you need to seed the number generator with a truly random number. You can't hard-code a value from the top of your head into your program. Attackers are likely to notice when your code generates the same sequence of numbers every time you run it. Likewise, you can't ask someone to type in a number manually, because people aren't a good source of randomness.

Other common sources for seeds when calling a random number generator include network addresses, host names, people's names, and the programmer's mother's maiden name. These kinds of data are also used frequently when picking cryptographic keys. Note that these values don't change very often, if at all. If an attacker can figure out how you're picking your data (usually a safe bet), it becomes a matter of hunting down the requisite information. And, sometimes, the information can be as easy to guess as a clock value, or even easier.

Every pseudo-random number generator algorithm has its comparative advantages and disadvantages, and there are many algorithms to choose from. No such algorithm is appropriate for all tasks. In fact, none of the pseudo-random number generation algorithms based on well-understood mathematics is (by itself) all that good for security, for the reasons described above.

If you have an application where pseudo-random numbers are essential, several good sources exist for learning more about properties of the different algorithms. The classic reference is Chapter 3 of Donald Knuth's "The Art of Computer Programming, Volume 2 (Seminumerical Algorithms)" (see Resources). In addition, researchers in the pLab at the University of Salzburg's math department spend all of their time working on random number generation in theory and in practice. Their Web page (see Resources) provides many excellent pointers to help you choose which generator is right for your application.

Using pseudo-random number generators incorrectly leads to some spectacular security problems. Don't let this happen to you.

How to cheat in online gambling
The Software Security Group at Reliable Software Technologies (RST) (see Resources) recently discovered a serious flaw in the implementation of Texas Hold 'em Poker that is distributed by ASF Software, Inc. (see Resources). The exploit allows a cheating player to calculate the exact deck being used for each hand in real time. That means a player using the exploit knows the cards in every opponent's hand as well as the cards that will make up the flop (cards placed face up on the table after rounds of betting). A cheater can "know when to hold 'em and know when to fold 'em" every time. A malicious attacker could use the exploit to bilk innocent players of actual money without ever being caught.

The flaw exists in the card shuffling algorithm used to generate each deck. Ironically, the code was publicly displayed at http://www.planetpoker.com/ppfaq.htm with the idea of showing how fair the game is to interested players (the page has since been taken down). In the code, a call to randomize() is included to produce a random deck before each deck is generated. The implementation, built with Delphi 4 (a Pascal IDE), seeds the random number generator with the number of milliseconds since midnight, according to the system clock. That means the output of the random number generator is easily predicted. As we've discussed, a predictable random number generator is a very serious security problem.

The shuffling algorithm used in the ASF software always starts with an ordered deck of cards, and then generates a sequence of random numbers used to reorder the deck. In a real deck of cards, there are 52! (approximately 2226) possible unique shuffles. Recall that the seed for a 32-bit random number generator must be a 32-bit number, meaning that there are just over 4 billion possible seeds. Since the deck is reinitialized and the generator reseeded before each shuffle, only 4 billion possible shuffles can result from this algorithm. Four billion possible shuffles is alarmingly less than 52!.

To make matters worse, the flawed algorithm chooses the seed for the random number generator using the Pascal function Randomize(). This particular Randomize() function chooses a seed based on the number of milliseconds since midnight. There are a mere 86,400,000 milliseconds in a day. Since this number was being used as the seed for the random number generator, the number of possible decks now reduces to 86,400,000. Eighty-six million is alarmingly less than four billion. But that's not all. It gets worse.

The system clock seed gave Reliable Software Technologies an idea that reduced the number of possible shuffles even further. By synchronizing their program with the system clock on the server generating the pseudo-random number, they are able to reduce the number of possible combinations down to a number on the order of 200,000 possibilities. After that move, the system is broken, since searching through this tiny set of shuffles is trivial and can be done on a PC in real time.

The RST-developed tool to exploit this vulnerability requires five cards from the deck to be known. Based on the five known cards, the program searches through the few hundred thousand possible shuffles and deduces which one is a perfect match. In the case of Texas Hold 'em Poker, this means the program takes as input the two cards that the cheating player is dealt, plus the first three community cards that are dealt face up (the flop). These five cards are known after the first of four rounds of betting, and are enough to determine (in real time, during play) the exact shuffle. The figure shows the graphical interface that RST slapped on its gambling exploit. The Site Parameters box in the upper left is used to synchronize the clocks. The Game Parameters box in the upper right is used to enter the five cards and initiate the search. The figure is a screen shot taken after all cards have been determined by the program. The cheating attacker knows who holds what cards, what the rest of the flop looks like, and who is going to win in advance.

The interface for Reliable Software Technology's Internet poker exploit

Once the program knows the five cards, it generates shuffles until it discovers the shuffle that contains the five known cards in the proper order. Since the Randomize() function is based on the server's system time, it is not very difficult to guess a starting seed with a reasonable degree of accuracy. (The closer you get, the fewer possible shuffles you have to look through.) Here's the kicker though: After finding a correct seed once, it is possible to synchronize the exploit program with the server to within a few seconds. This post facto synchronization allows the program to determine the seed being used by the random number generator, and to identify the shuffle being used during all future games in under one second!

Technical details aside, this exploit garnered spectacular press coverage. The coverage emphasizes the human side of any such discovery. Visit the RST Web site (listed in Resources) for the original press release, the CNN video clip, and a New York Times story.

How to read "secret" Netscape messages
In a separate but equally significant development, a study by Ian Goldberg and David Wagner from January 1996 demonstrates how important random numbers are to cryptographic security (see Resources). Their exploit showed how serious flaws in one of Netscape's early implementations of SSL made it possible to decrypt encoded communications. This flaw is ancient history, of course, but the lessons are important enough to bear repeating.

SSL encrypts messages (like most cryptographic algorithms) with a secret key. The idea is to create a key that is a large random number known only to the sender of a message and the receiver. As in the poker shuffling case, if the key is predictable, then the whole system collapses like a house of cards. Simply put, it is essential that the secret keys come from an unpredictable source. (Sound familiar?)

Netscape's problem was choosing a bad way to seed their pseudo-random number generator. Their seed could be completely determined from knowledge of the time of day, the process ID, and the parent process ID. Not good.

An attacker using the same box as the attacked browser (say, on a multiuser machine) can easily find out process IDs (in UNIX) using the ps command. Goldberg and Wagner also detail an attack that doesn't require knowledge of the process IDs (based on the fact that there are usually only 15 bits in each). All that remains is guessing the time of day. Sniffers can be used to snag the precise time a packet goes by, which gives a great starting point for guessing the time of day on the system running Netscape. Getting within a second is pretty easy, and the milliseconds field is crackable in real time (as the poker exploit demonstrated).

Goldberg and Wagner used their attack successfully against Netscape version 1.1. Both the 40-bit international version and the 128-bit domestic version were vulnerable. The take-home lesson is simple: If you need to generate random numbers to use in cryptography, do it with care.

Over the next two installments of this article, we'll provide some pointers (and a bunch of code) on doing random number generation properly, both in hardware and in software. If you want to read up in advance, an excellent, but dry, resource is "Randomness Recommendations for Security" (RFC 1750). Another good place to look is in Bruce Schneier's excellent book, "Applied Cryptography" (see Resources).

Resources

About the authors
Gary McGraw is the vice president of corporate technology at Reliable Software Technologies in Dulles, VA. Working with Consulting Services and Research, he helps set technology research and development direction. McGraw began his career at Reliable Software Technologies as a Research Scientist, and he continues to pursue research in Software Engineering and computer security. He holds a dual Ph.D. in Cognitive Science and Computer Science from Indiana University and a B.A. in Philosophy from the University of Virginia. He has written more than 40 peer-reviewed articles for technical publications, consults with major e-commerce vendors including Visa and the Federal Reserve, and has served as principal investigator on grants from Air Force Research Labs, DARPA, National Science Foundation, and NIST's Advanced Technology Program.
McGraw is a noted authority on mobile code security and co-authored both "Java Security: Hostile Applets, Holes, & Antidotes" (Wiley, 1996) and "Securing Java: Getting down to business with mobile code" (Wiley, 1999) with Professor Ed Felten of Princeton. Along with RST co-founder and Chief Scientist Dr. Jeffrey Voas, McGraw wrote "Software Fault Injection: Inoculating Programs Against Errors" (Wiley, 1998). McGraw regularly contributes to popular trade publications and is often quoted in national press articles.


John Viega is a Senior Research Associate, Software Security Group co-founder, and Senior Consultant at Reliable Software Technologies. He is the Principal Investigator on a DARPA-sponsored grant for developing security extensions for standard programming languages. John has authored over 30 technical publications in the areas of software security and testing. He is responsible for finding several well-publicized security vulnerabilities in major network and e-commerce products, including a recent break in Netscape's security. He is also a prominent member of the open-source software community, having written Mailman, the GNU Mailing List Manager, and, more recently, ITS4, a tool for finding security vulnerabilities in C and C++ code. Viega holds a M.S. in Computer Science from the University of Virginia.



e-mail it!
Rate this article

This content was helpful to me:

Strongly disagree (1)Disagree (2)Neutral (3)Agree (4)Strongly agree (5)

Comments?



developerWorks > Security >
developerWorks
  About IBM  |  Privacy  |  Terms of use  |  Contact