|Make your software behave: Playing the numbers|
In this article, we turn to a topic that's recently been bandied about in the news: random number generation. Well, that's not quite accurate. Actually, an exploit that was based on a problem with random number generation was in the news; the cause of the problem itself barely garnered a mention! You may have seen the CNN story on Reliable Software Technology's Internet gambling exploit (see Resources), or perhaps you recall the stories a few years ago about Netscape's SSL implementation being compromised. Even though these two applications sound very different, the root cause of each problem is the same.
The problem occurs when many developers assume that a function called
Randomness is not as cut-and-dried as you might think: Some streams of numbers are more random than others. Computers, being completely deterministic machines, are particularly bad at behaving randomly (bad OS software aside). In fact, the only true sources for (partially) random numbers involve measuring physical phenomena, such as the timing of radioactive decay, which can be distilled into purely random sequences using some mathematical tricks.
Without access to physical devices, computer programs that need random numbers are forced to generate the numbers themselves. But the determinism of computers makes this algorithmically quite difficult. As a result, most programmers turn to pseudo-random numbers. In this article, we'll look at pseudo-random number generators (PRNGs). We'll go into how they work, and show why they don't work well for security. In our next installment, we'll look at hardware-based solutions for randomness. Then, we'll look at software solutions that straddle the middle ground, offering better security than a pseudo-random number generator, while being more practical than more secure hardware-based solutions.
How do pseudo-random number generators work?
To write code to implement something like the algorithm presented above, a pseudo-random number generator typically produces an integer on the range from 0 to N, and returns that number divided by N. The resulting number is always between 0 and 1. Subsequent calls to the generator take the integer result from the first run and pass it through a function to produce a new integer between 0 and N, then return the new integer divided by N. That means the number of unique values returned by any pseudo-random number generator is limited by the number of integers between 0 and N.
In most common random number generators, N is 232 -1 (approximately 4 billion), which is the largest value that will fit into a 32-bit number. Put another way, there are at most 4 billion possible values produced by the sort of number generator encountered most often. To tip our hand a bit, this 4 billion number is not all that large.
A number known as the "seed" is provided to a pseudo-random number generator as an initial integer to pass through the function. The seed gets the ball rolling. The output of a pseudo-random number generator contains nothing that is unpredictable. Each value returned by a pseudo-random number generator is completely determined by the previous value it returned (and ultimately, the seed that started it all). If we know the integer used to compute any one value, then we can figure out every subsequent value returned from the generator.
In the end, a pseudo-random number generator is a deterministic program that produces a completely predictable series of numbers (called a stream). A well-written PRNG creates a sequence that shares many of the same properties as a sequence of real random numbers. For example:
What PRNGs can't do is be unpredictable. If you know the seed, and you know the algorithm, you can easily figure out the sequence.
On one hand, pseudo-random number generators have many useful applications. They work well for Monte Carlo simulations (a group of methods that use random numbers to simulate physical events or to solve mathematical problems) and other statistical sampling or simulation models. These applications typically require only moderate sophistication in their "randomness," requiring only that patterns in a sequence don't resonate with naturally occurring sequences.
On the other hand, pseudo-random number generators are hard to use in a secure manner in applications that require unpredictability, such as shuffling virtual cards or encrypting data.
In other words, the
Back to the Borland generator. If we know that the current value of RandSeed is 12345, then the next integer produced will be 1655067934. The same thing happens every time you set the seed to 12345.
Sewing the seeds of destruction
You might think that you would be safe if you could come up with a truly random number as a seed for a PRNG. After all, if we start our deterministic stream at a random beginning point, we're OK, right? Wrong! Even if a bad guy doesn't know the particular number used to seed the generator, an attacker can still predict PRNG-generated numbers by observing your program and making some good guesses. The problem is that an attacker only needs to see one single number as it comes out of the generator to predict the next one (and all other subsequent ones). If you ask for numbers between 0 and 50, and the attacker can see those numbers, the attacker's job becomes harder, but it still isn't a big deal.
Here's why. There are only 4 billion possible places on a standard random number wheel (with many random-number generation algorithms, including linear-congruent generators; all the numbers between 0 and 4,294,967,295 are generated exactly once, then the sequence repeats). If we observe enough numbers, even if the numbers are adjusted to be somewhere between 0 and 50, we will eventually be able to figure out how the generator was seeded. We can do this by trying all 4 billion possible seeds in order and looking for a sub-sequence in the PRNG stream that matches the one your program is exhibiting. Four billion is not a big number these days. Given a handful of good quality PCs, the supposedly-big-enough space can be "brute forced" in almost real time.
One of our problems here is that most machines currently use 32-bit PRNGs. Fortunately, we can solve this one. Using a truly random number to seed a 64-bit PRNG would make things a whole lot harder to crack (not impossible, mind you, even with a good PRNG), since cracking the 64-bit space would probably take months, even with some serious computational horsepower. If you use a good 128-bit PRNG along with a truly random seed, you might hope that's good enough (although not always, it turns out; we'll discuss why in a little while). The reasoning behind the upswing in the seed space is exactly the same as with cryptographic keys: The more bits you use, the harder both keys are to break (as long as everything is being used properly).
Let's explore the crypto key/seed analogy a bit. A brute-force cryptographic attack involves attempting every possible key in order to decrypt a secret message. Likewise, a brute-force attack against a PRNG involves examining every possible seed. A significant body of research studying the necessary lengths of cryptographic keys already exists. Generally speaking, things look like this (in October 1999):
People used to think that cracking 56-bit DES in real time would take too long to be feasible, but history has shown otherwise. In January 1997, a secret DES key was recovered in 96 days. Later efforts broke keys in 41 days, then 56 hours, and, in January 1999, in 22 hours and 15 minutes. The leap forward in cracking ability doesn't bode well for small key lengths or small sets of PRNG seeds!
People have even gone so far as to invent special machines to crack cryptographic algorithms. In 1998, the EFF created a special-purpose machine to crack DES messages. The purpose of the machine was to emphasize just how vulnerable DES (a popular, government-sanctioned cryptographic algorithm) really is. (To read more on the DES cracker, see Resources.) The ease of "breaking" DES is directly related to the length of its key. Special machines to uncover RNG seeds are not outside the realm of possibility.
Does a 128-bit seed solve all problems? It provides enough of a search space to rule out brute-force attacks. But other types of attacks can be waged against a generator. In practice, it depends on the type of algorithm you are using to generate random numbers. Linear congruent generators (and, in fact, all polynomial congruent generators) turn out to be easy to break with relatively little information, and they should be avoided at all costs. Two articles from now, we'll look at a PRNG algorithm based on cryptographic primitives that seems to do a better job than traditional generators.
Try not to be so predictable
Other common sources for seeds when calling a random number generator include network addresses, host names, people's names, and the programmer's mother's maiden name. These kinds of data are also used frequently when picking cryptographic keys. Note that these values don't change very often, if at all. If an attacker can figure out how you're picking your data (usually a safe bet), it becomes a matter of hunting down the requisite information. And, sometimes, the information can be as easy to guess as a clock value, or even easier.
Every pseudo-random number generator algorithm has its comparative advantages and disadvantages, and there are many algorithms to choose from. No such algorithm is appropriate for all tasks. In fact, none of the pseudo-random number generation algorithms based on well-understood mathematics is (by itself) all that good for security, for the reasons described above.
If you have an application where pseudo-random numbers are essential, several good sources exist for learning more about properties of the different algorithms. The classic reference is Chapter 3 of Donald Knuth's "The Art of Computer Programming, Volume 2 (Seminumerical Algorithms)" (see Resources). In addition, researchers in the pLab at the University of Salzburg's math department spend all of their time working on random number generation in theory and in practice. Their Web page (see Resources) provides many excellent pointers to help you choose which generator is right for your application.
Using pseudo-random number generators incorrectly leads to some spectacular security problems. Don't let this happen to you.
How to cheat in online gambling
The flaw exists in the card shuffling algorithm used to generate each deck. Ironically, the code was publicly displayed at http://www.planetpoker.com/ppfaq.htm with the idea of showing how fair the game is to interested players (the page has since been taken down). In the code, a call to randomize() is included to produce a random deck before each deck is generated. The implementation, built with Delphi 4 (a Pascal IDE), seeds the random number generator with the number of milliseconds since midnight, according to the system clock. That means the output of the random number generator is easily predicted. As we've discussed, a predictable random number generator is a very serious security problem.
The shuffling algorithm used in the ASF software always starts with an ordered deck of cards, and then generates a sequence of random numbers used to reorder the deck. In a real deck of cards, there are 52! (approximately 2226) possible unique shuffles. Recall that the seed for a 32-bit random number generator must be a 32-bit number, meaning that there are just over 4 billion possible seeds. Since the deck is reinitialized and the generator reseeded before each shuffle, only 4 billion possible shuffles can result from this algorithm. Four billion possible shuffles is alarmingly less than 52!.
To make matters worse, the flawed algorithm chooses the seed for the random number generator using the Pascal function
The system clock seed gave Reliable Software Technologies an idea that reduced the number of possible shuffles even further. By synchronizing their program with the system clock on the server generating the pseudo-random number, they are able to reduce the number of possible combinations down to a number on the order of 200,000 possibilities. After that move, the system is broken, since searching through this tiny set of shuffles is trivial and can be done on a PC in real time.
The RST-developed tool to exploit this vulnerability requires five cards from the deck to be known. Based on the five known cards, the program searches through the few hundred thousand possible shuffles and deduces which one is a perfect match. In the case of Texas Hold 'em Poker, this means the program takes as input the two cards that the cheating player is dealt, plus the first three community cards that are dealt face up (the flop). These five cards are known after the first of four rounds of betting, and are enough to determine (in real time, during play) the exact shuffle. The figure shows the graphical interface that RST slapped on its gambling exploit. The Site Parameters box in the upper left is used to synchronize the clocks. The Game Parameters box in the upper right is used to enter the five cards and initiate the search. The figure is a screen shot taken after all cards have been determined by the program. The cheating attacker knows who holds what cards, what the rest of the flop looks like, and who is going to win in advance.
Once the program knows the five cards, it generates shuffles until it discovers the shuffle that contains the five known cards in the proper order. Since the
Technical details aside, this exploit garnered spectacular press coverage. The coverage emphasizes the human side of any such discovery. Visit the RST Web site (listed in Resources) for the original press release, the CNN video clip, and a New York Times story.
How to read "secret" Netscape messages
SSL encrypts messages (like most cryptographic algorithms) with a secret key. The idea is to create a key that is a large random number known only to the sender of a message and the receiver. As in the poker shuffling case, if the key is predictable, then the whole system collapses like a house of cards. Simply put, it is essential that the secret keys come from an unpredictable source. (Sound familiar?)
Netscape's problem was choosing a bad way to seed their pseudo-random number generator. Their seed could be completely determined from knowledge of the time of day, the process ID, and the parent process ID. Not good.
An attacker using the same box as the attacked browser (say, on a multiuser machine) can easily find out process IDs (in UNIX) using the ps command. Goldberg and Wagner also detail an attack that doesn't require knowledge of the process IDs (based on the fact that there are usually only 15 bits in each). All that remains is guessing the time of day. Sniffers can be used to snag the precise time a packet goes by, which gives a great starting point for guessing the time of day on the system running Netscape. Getting within a second is pretty easy, and the milliseconds field is crackable in real time (as the poker exploit demonstrated).
Goldberg and Wagner used their attack successfully against Netscape version 1.1. Both the 40-bit international version and the 128-bit domestic version were vulnerable. The take-home lesson is simple: If you need to generate random numbers to use in cryptography, do it with care.
Over the next two installments of this article, we'll provide some pointers (and a bunch of code) on doing random number generation properly, both in hardware and in software. If you want to read up in advance, an excellent, but dry, resource is "Randomness Recommendations for Security" (RFC 1750). Another good place to look is in Bruce Schneier's excellent book, "Applied Cryptography" (see Resources).