![]() |
|
|||||||||||||||
|
||||||||||||||||
|
| Make your software behave: Beating the bias | ||||
| How to approach truly random number generation through hardware
We know from our last installment of this column that random numbers often have security-critical ramifications. The problem is that deterministic machines don't have an easy time being random. The best answer is to use a good physical measurement to generate random numbers (or at least the seed for a pseudo-random number generator). Many natural phenomena will suffice. The trick is they must have some measurable property with behavior that is at least as good as random (if not actually random). (This sort of randomness is introduced at the quantum mechanics level. It isn't known whether randomness is really an intrinsic part of quantum mechanics, or if our scientific theories just aren't good enough to predict effects that we should be able to predict.) Real hardware devices for collecting randomness are based on measuring such things as radioactive decay of atoms or fluctuations in heat emissions (some Pentium III processors do the latter). In this installment, we'll look at hardware sources of random numbers. These sources can sometimes offer you higher assurance than all-software solutions (although there are a number of broken hardware random number generators, too). Suffice it to say that we do realize that hardware-based solutions aren't always feasible. For example, you might wish to distribute an application to hundreds of thousands of users around the globe, and still need good random numbers on each and every desktop. Until every computer in the world is fitted with a good random number generator in hardware, there will always be a need for software that comes as close as possible to true randomness. Before we dive in, we need to consider whether it is even possible to fill such a need for randomness. We discussed traditional pseudo-random number generators in our last installment, and talked about how absolutely awful they are for security-critical purposes. There's simply no way PRNG algorithms will suffice. Hardware (covered below) is a good option, but an expensive and unwieldy one. Fortunately, there is a middle ground. It's possible to do a reasonable randomness job with software, without resorting to hardware-based solutions. Of course, hardware will always be capable of doing better than the software ever could, even if it doesn't always, and you need to keep that in mind. We'll cover alternative software solutions in our next installment. How do real random number generators work? The only way to do something non-deterministic on a computer is to collect data from some natural stochastic process. One of our favorite ones involves the use of an electronic Geiger counter that generates a pulse every time it detects a radioactive decay. The time between decays has a strong, pure random component. In particular, nobody can predict whether the time until the next decay will be greater than or less than the time since the previous decay. That yields one bit of random information. We could potentially get more. Let's say, for the sake of example, that we're observing some material that we expect to decay every second, but the decay could happen up to a tenth of a second off in either direction. Instead of comparing the times between two decays, we could time the decay, and use the individual digits as individual Base 10 random numbers. Maybe the 100ths of a second would be one digit, and the 1000ths of a second another. We could siphon off digits in proportion to how accurate our timer is. This technique may or may not be good enough; it's hard to tell without trying it, and doing some statistical analysis of the data! What if the 100ths of a second digit is almost always 0, but infrequently is 1? That isn't very random. What if our clock isn't all that functional, and the 1000ths digit is always even, or at least far more likely to be even? These kinds of problems are actually pretty commonly encountered. All sorts of things can go wrong, biasing the data. On the other hand, if we are conservative, and try to milk the decay for a single bit of data in a conservative way, there's less of a chance of measurement-induced bias. Unfortunately, if there is some bias, it's very difficult to tell how much bias there is. What we care about is how many bits of entropy (true randomness) are really available. We'd like to be able to measure that number with some precision. To make things easier on ourselves, we'll shoot for just one. Any sort of source has a potential for bias. One popular hardware generator was examined, and found to return 1's 49.81% of the time, meaning that the generator was ever so slightly biased toward zero. Perhaps the underlying natural phenomena is biased in these cases, but usually our measurement tools, being less than perfect, are at fault. Beating the bias, confounding the correlation A potential problem with the above technique for removing bias is that there can be a correlation between two successive bits (or even between non-adjacent bits) that an attacker might leverage, even after you do the XOR. In fact, the XOR can help with any bias, but it amplifies the correlation between bits. Consider the technique we gave for extracting a single bit from timed events. What if we used the same technique to try to get randomness from someone typing at the keyboard? The basic idea is to time the duration between keystrokes. If a duration is greater than the previous duration, a 1 is generated; otherwise a 0 is generated. This technique is pretty common in programs that need to generate a key, and have a user sitting in front of the computer who won't be able to complain if made to bang on the keyboard for a few minutes. For example, some versions of PGP use this technique (others perform similar magic with mouse events). But all is not peachy keen. Here's what can go wrong. The timing between key-presses isn't a completely random phenomenon. We've seen programs that provide sample text for the user to type over and over again. The problem is, some keys are easier to reach than others. If you're typing the word "kiss," the time between typing the "i" and the "s" is almost certainly going to be greater than the time between typing the two "s" characters. The event that caused one bit can definitely be related to the event causing the next bit, leading to an ultimate increase in bias. How do we get rid of this kind of correlation? One solution is to take multiple sources of random data, and XOR the streams of data together. People traditionally use a pseudo-random stream as the second source of data. Of course, if someone can break the pseudo-random stream, then the correlation will still be discernible (and can be used against you, as we will show). So what? Who cares if there's a little bit of bias in the random number generator? In theory, the bias makes the job of an attacker easier. However, in practice, a small bias isn't all that likely to lead to a break of a system. For example, Bruce Schneier notes that even with a generator that produces 55% 0's, there are still 0.99277 bits of entropy per bit generated. That means a well-directed brute-force search on a 168-bit key would, on average, succeed in 2,109 attempts, instead of the 2,111 attempts that would be expected if no bias existed. Nonetheless, most people would rather err on the side of security, just in case they're wrong. A very common solution for removing bias that solves all of the above problems is to apply an algorithm to a stream of random bytes that will, by nature, remove any statistical trends. Compression algorithms and hash functions are both commonly used. What's a good hardware source? Probably the best source for random numbers in common use is measuring radioactive decay. It's not at all easy to bias, and comes with relatively little natural bias. Another good source is measuring the thermal noise off a semiconductor diode, although if improperly shielded, nearby hardware can bias the output. Several hardware devices for generating random numbers are commercially available. Probably the most widely used device is the ComScire QNG (see Resources), which is an external device that connects to a PC using the parallel port. The device is believed to be well-designed, and has done extremely well in statistical analyses of its output. It is capable of generating 20,000 bits (2,500 bytes) per second, which is more than sufficient for most security-critical applications. Even when it is not sufficient, data from the generator could be stored in advance, or multiple generators could be used. Another benefit of this package is that the device driver does statistical tests on numbers as you generate them, and returns an error if data that are statistically non-random start being generated. As of this writing, the device is $295, and is available from ComScire. An unbiased review of this generator by Robert Davies (see Resources) claims that it "seems to be the only generator specifically designed for statistical purposes and where the manufacturer has made a real effort to understand the effect of bias and correlation on the numbers that are finally produced". A similar device is the RBG 1210 from Tundra (see Resources). Their device has sometimes exhibited a small bias, but otherwise has passed bevies of statistical tests. Using a transformation to remove bias is definitely recommended. This product used to have some assembly problems that often caused total failure, but they are believed to be fixed. The RBG 1210 is an internal card for a PC, and has a variable sampling rate. If you sample too much, successive bits will have a high correlation, because the internal hardware doesn't change output fast enough to meet the sampling rate. However, this device should produce up to 160,000 bits (20,000 bytes) per second. Another one that has great fun-and-games appeal is lavarand (see Resources). This hilarious approach relies on the chaos inherent in a set of operating Lava Lite lamps. A digital camera is set up in front of six lava lamps and is used to take a picture every once in a while. The resulting "noisy" digital image is then cryptographically hashed into a 140-bit seed. This seed is then fed through an excellent pseudo-random number generator (which we will discuss in the next installment). Don't use the lavarand in your security-critical code, however! Remember, everybody, including bad guys, can see the lavarand sequence on the Web. SGI doesn't sell lavarand installations, either. It wouldn't be all that hard to build lavarand yourself if you follow the fine, detailed advice on the lavarand Web page. Nonetheless, lavarand is not intended to compete with other devices we have discussed. Most of the installations are costly (what with the high price of lava lamps these days), and the generator only outputs about 550 bits (a bit less than 70 bytes) per second. It's also space-intensive for such a device, requiring a lamp and a camera. Finally, it's prone to failures such as broken hardware. To solve the later problem, SGI uses six lamps on automatic timers, of which three are always on. Everything in their system seems to work extraordinarily well, just with a fairly low bandwidth. Part of the problem is that a photograph needs to be taken and scanned. But the bulk of the problem is that the data have high potential for statistical bias, so a lot of effort is spent removing any such traces, using seven cryptographic hashes and a fairly slow (but very good) pseudo-random number generator. Much of the security community became excited last year when Intel announced that they would start adding thermal-based hardware random number generators to their chipsets, at essentially no additional cost to the consumer. People became even more excited when prominent cryptographers declared the generator an excellent source of random numbers. So far, a few CPUs with hardware RNGs onboard have been shipped. All Pentium IIIs with an 8xx chipset (810 or higher) should have this functionality. As we get our hands on hardware with this generator, we'll pass along some code for you to use when this functionality is available. (We have heard a nasty rumor that the RNG is being nixed long-term due to a lack of interest. We hope this does turn out to be only a rumor.) Other lesser-known products are out there for generating random numbers in hardware. A few resources are on the Web that can show those of you who are adept at building hardware how to build your own hardware random number generator on the cheap (see Resources). How good are my random numbers? Plenty of statistical tests are out there. They take various forms, but the common thread is that they all examine streams of data from a generator statistically, trying to see if any patterns can be found in the data that would not be present if the data were truly random. The ComScire device performs these tests as you use it, and will fail at runtime if the tests don't succeed. That's a nice safety measure, but, in practice, people often make do only with checking the stream up-front before first use, and perhaps, on rare occasions, to ensure that the generator is running at full steam. The most widely recognized test suite for ensuring the randomness of a data stream is the DIEHARD package by George Marsaglia (see Resources). It performs numerous tests on a data stream. Another reasonable package for such tests is pLab (see Resources). What these tests do or don't do doesn't really matter; we just have to trust that they're testing the quality of our generator, and listen well if the generator we are using fails them. There are pitfalls in building a hardware device for random numbers that test suites such as DIEHARD won't catch. For example, some types of bias will not show up in most statistical tests. For this reason, we strongly recommend using a well-trusted commercial hardware source for your random numbers, if at all possible, instead of building your own, because you might end up testing your generator improperly. If you still feel a great need to test a generator yourself, we recommend the essay "True Random Number Generators" (see Resources). Conclusion Next time, we'll stake out a middle ground, where the random numbers are really good, even if they're generated without use of a hardware device, and even if they're a bit slower in coming. We'll discuss a more powerful class of pseudo-random number generators, and talk about how to collect randomness from good sources without resorting to specialized hardware.
| ||||||||||||||||
| About IBM | Privacy | Terms of use | Contact |