![]() |
|
|||||||||||||||
|
||||||||||||||||
|
| Make your software behave: Everything to hide | ||||
| Knowing the right cryptographic algorithm to use, as well as when and how to use it, is essential to developing secure software
In this first of three installments on cryptography, Gary and John discuss the goals that cryptographic systems are designed to meet, as well as what can go wrong in these systems. In addition, they discuss pre-computer-era cryptography, to explain the foundations upon which our modern algorithms are based. Cryptography is a subject most developers know a little about -- usually just enough to be dangerous. The most commonly encountered mistakes include failing to apply cryptography when it's called for, and incorrect application of cryptography, even when the need has been properly identified. The usual result of the misuse (or nonuse) of cryptography is that software applications become vulnerable to a smorgasbord of different attacks. Over the next three installments of our software security column, we'll take on three cryptography topics:
Three installments might seem like a lot of treatment, but it really isn't. Cryptography is a huge area with many interesting facets and subtle issues. There is no way we can give this topic the full justice it deserves in this small space. To do so would require an entire book. The good news is that several good books devoted to cryptography are available. We highly recommend Bruce Schneier's Applied Cryptography (see Resources), which has the advantage of being fairly complete, and is aimed at all audiences, including people with little or no experience in cryptography. Not only does Schneier's book go into more detail than we do, but it also covers fairly esoteric algorithms that we don't cover in this column. For example, if you want to implement a protocol for secure voting -- one where no outsider can figure out who you voted for, and no one can forge a vote -- this book will tell you how to do it. We won't. We're only going to discuss the most important, commonly encountered stuff here. The material we cover should give you a good enough overview to be able to intelligently add cryptography to your software, in most cases. After all, you're probably a programmer, not a cryptographer. In this first of three installments on the subject, we'll set the stage by providing some background on cryptography. We'll discuss the goals that cryptographic systems are designed to meet. Then, we'll talk about what sorts of things can go wrong in these systems. Finally, we'll touch on traditional, pre-computer-era cryptography, so we can get an understanding of the foundations upon which modern algorithms are built. People have been scrambling messages for thousands of years, and they have learned a few things in the process. In the next installment, we'll talk about the two primary classes of cryptographic algorithms: symmetric (secret key) cryptography and public key cryptography. In the third installment, we'll discuss a handful of other common cryptographic algorithms and their applications, including cryptographic hash algorithms and digital signatures. We're not going to get into any code, for the time being. In a few months, we'll come back to this topic, and actually get our hands dirty with some common cryptographic libraries. For now, we'll just give you a sense of what types of algorithms are out there, which ones you should use, which ones you should avoid, and common pitfalls that are often encountered. Ultimate goals of cryptography Confidentiality: Ensuring confidentiality of data amounts to ensuring that only authorized parties are able to understand the data. We're not trying to keep unauthorized parties from knowing there is some data (perhaps whizzing by on the wire). In fact, they can even copy it all as long as they can't understand it. We're controlling access by scrambling things up. In most cases, sensitive data must be transferred over an insecure medium, such as a computer network, while remaining confidential. Only authorized people are allowed access to understand. Usually the term authorized is loosely defined to imply that anyone who has a particular secret (usually a cryptographic key) is allowed to decode the data, and thus understand it. In a perfect world, we could make a list of whom we want to be able to read a particular piece of data, and no one else in the world would ever be able to read it. In practice, confidentiality is supplied through encryption -- a message is scrambled using some special information and a specialized cryptographic algorithm. The original message in its pre-encrypted form is called the plaintext. After encryption, the scrambled version of the message is called ciphertext. Cryptographic algorithms themselves are often called ciphers. Usually, only people with secret information are able to unscramble (decrypt) the message. By using cryptography, software developers can keep data confidential. Authentication: Data authentication ensures that whoever supplies or accesses sensitive data is an authorized party. Usually, this involves having some sort of password or key that is used to prove who you are. In a perfect world we would always be able to tell, with absolute certainty, with whom we are communicating. We would never have to worry about lost or stolen passwords or keeping keys secret. There are several practical ways to provide authentication. Having possession of a secret key that can be authenticated via standard cryptographic techniques is one standard way. We will discuss common approaches to the authentication problem, including digital signature systems. Integrity: Data integrity is maintained only if authorized parties are allowed to modify data. In the case of messages going over a network, the ultimate goal is to be sure that when a message arrives, it is the same message that was originally sent -- that is, the message has not been altered en route. Data integrity mechanisms usually work by detecting if a message has been altered. Cryptographic hashes (non-reversible transformations) are checksums of a sort that allow us to guarantee data integrity. Generic encryption can also be used to ensure data integrity. Non-repudiation: In communication, data non-repudiation involves two notions. First, the sender should be able to prove that the intended recipient actually got a sent message. Second, the recipient should be able to prove that the alleged sender actually sent the message in question. Absolute non-repudiation is fairly difficult to demonstrate in practice, because it is very difficult to show that a person's key has not been compromised. In a perfect world, cryptography would provide a total, unbreakable solution for each of the four central characteristics. But the real world tends not to be so perfect. In fact, weaknesses are often uncovered in commonly used cryptographic algorithms. And misuse of good cryptographic algorithms themselves is also all too common. There is even a science of breaking cryptographic algorithms called cryptanalysis. We'll discuss cryptanalytic attacks on cryptography later. All this trouble aside, one important thing to understand is that well-applied cryptography is powerful stuff. In practice, the cryptographic part of a system is the most difficult part of a system to attack directly. Other aspects of a system's software tend to be cruftier. To demonstrate this fact, let's draw an analogy with your car. Consider your car key. Because of the way car locks are designed and implemented, there are many cars out there that a particular key will fit. If you find other cars of the same make and year as yours, you'll probably find that your key will open perhaps one in 50 of them. One of the authors tested this theory once as a teenager in the parking lot of a rock concert, and managed to open someone else's Toyota with the key to his after trying only six cars. Here's the deal, though. Do thieves break into cars by carrying around a set of keys, and trying each one until they can open your car? Nope. There are far easier ways to break into a car. Thieves use a slim-jim or some similar device to pop the lock, or maybe they break a window, or perhaps they hold you up at gunpoint and steal your car, or maybe they just keep looking until they find a car that's been left unlocked or in an otherwise unsecured state. Although the key-lock combination may seem to be a fairly meager form of protection, it turns out to be a big enough barrier in practice, that bad guys generally turn to other forms of attack. Cryptography is pretty much the same. First off, the notion of a key in cryptography is a metaphor named directly after its hardware counterpart -- without a key, you can't _lock_ or _unlock_ a particular piece of data. In the car case, an attacker gets a one in 50 or so chance that a given key will work; in cryptography, there are generally billions of possible keys (for most types of cryptographic algorithms, the number is usually 2n, where n is the length of the key in bits). However, computers can automatically try out keys at mind-boggling speed. A thief might take several minutes to try out 50 different car keys, but a computer takes a mere fraction of a second (a bazillionth) to try out 50 cryptographic keys. Fortunately, in many cases, a 128-bit key is powerful enough to define many more keys than all the world's computers could possibly try out within a reasonable amount of time. As in the car situation, attackers don't often try the direct _try each key out_ approach, mostly because there are usually much easier ways to attack. Security is a chain, so one weak link is all it takes to break the chain. Used properly, cryptography is quite a strong link. Of course, one link alone doesn't provide much of a chain. In other words, by itself, cryptography is not security. The two most common ways around the good use of cryptography involve attacking machines at either end of the cryptographic pipeline. That is, it is often easier for an attacker to hack your machine and steal your plaintext before it is encrypted or after it is decrypted. There are so many software security flaws out there that a direct attack on a host computer is often the attacker's best bet. Common attacks on cryptography Known-ciphertext attacks. In this attack, the cryptanalyst has an encrypted version of a message (or multiple messages encrypted by the same algorithm) called ciphertext. The ciphertext message is manipulated in an attempt to reconstruct the original plaintext, and hopefully determine the cryptographic key needed to decrypt the message (and subsequent messages). One common version of a known-ciphertext attack is the brute-force attack, where each possible key is used to decrypt the ciphertext (recall our car analogy). After each key is attempted, the ciphertext is examined to see if it seems to be a valid message. Given 264 possible keys that may have been used to encrypt a message, the expected number of keys that must be examined before finding the actual key is 263 . That's an awful lot of keys. Known-plaintext attacks. In this attack, the cryptanalyst has both the ciphertext of a message and the associated plaintext for that message, and tries to figure out the cryptographic key that decrypts the cyphertext to the known plaintext. Once that key is known, perhaps future messages will be decipherable as well. Chosen-plaintext attacks. This type of attack is similar to a known-plaintext attack, but the cryptanalyst is able to watch as a particular piece of plaintext (chosen by the analyst) is encrypted. If the cryptanalyst can use feedback from previous messages to construct new messages and see how those are encrypted, this becomes known as an adaptive-chosen-plaintext attack. Chosen-ciphertext attacks. In this attack, the cryptanalyst has some way of choosing encrypted messages and seeing their decrypted version. The goal is to determine the key used to decrypt the message. Related-key attacks. In this attack, the cryptanalyst uses knowledge about the relationship of different keys and their associated outputs to guess the key used to encrypt something. Side-channel attacks. Sometimes seemingly incidental information can be determined from the execution of the encryption or decryption that gives hints as to the key in use. For example, if there is a predictable relationship between the time it takes to encode a particular string and the key used to encode it, then a cryptanalyst might be able to greatly reduce the number of keys he or she would have to examine in a brute-force attack. This sort of attack usually requires known ciphertext and chosen plaintext. Plenty of attacks are not really mathematical in nature. Instead, they rely on nefarious means to obtain a key. Examples include theft and bribery. Theft and bribery are certainly quite effective, and are often very easy to carry out. Another example is using a hardware device to launch a "Tempest" attack. In this attack, sophisticated hardware is placed within a few hundred feet of the monitor on which sensitive data will be displayed. The radiation fluctuation from the monitor delivers lots of information, allowing an attacker to "see" the plaintext as it is displayed. General recommendations One sweeping recommendation applies to every use of cryptography. Unfortunately, it's one that is very often ignored by software developers -- never roll your own cryptography. Some people place lots of faith in security by obscurity. They assume that keeping an algorithm secret is as good as relying on mathematically sound cryptographic strength. This is just plain wrong. There is no advantage, against a determined and experienced cryptanalyst, in hiding a bad algorithm. The best bet is to use a published, well-used, tried-and-tested algorithm that's been through the mill for a few years. Attacking a scrambling algorithm usually proceeds with an attacker considering input/output pairs. The algorithm is treated as a black box. With a solid algorithm, knowing the exact process reveals nothing about the secret key used to scramble the plaintext. In most software application cases, a cryptographic algorithm is available to an attacker anyway (embedded in the code). Keeping an embedded function secret is nearly impossible, and gives no strength to your cryptography. Consider RC4, a proprietary symmetric cipher owned by RSA Security. RSA guards RC4 as a trade secret and has not released the code except to licensees. However, in 1994, someone reverse-engineered the algorithm and posted it to the Internet anonymously. Actual licensees of the code have verified that the algorithm posted on the Internet is a correct (input/output-compatible) version of RC4. In the case of RC4, public disclosure of the algorithm (its release into the world) wasn't much of a problem in terms of security, because the algorithm was designed to withstand attack under the assumption that attackers know the algorithm. Ron Rivest, a highly respected cryptographer, created RC4. By contrast, most home-grown algorithms are created by amateur cryptographers, and are easily defeated by an expert cryptanalyst. In fact, most home-grown algorithms we have seen involve simply XORing the data with some constant, and then performing some other slight transformations to the text. This sort of encryption is almost no better than no encryption at all. A telling version of what goes wrong with the na? use of cryptography was recently the focus of a flurry of press attention. In the case in question, Netscape used a very simple scrambling algorithm to try to hide POP3/IMAP passwords it was storing on a client machine. The encryption algorithm used by Netscape to scramble passwords is exceptionally weak. Tim Hollebeek, a Reliable Software Technologies research associate, and John Viega, co-author of this series of articles, were able to deduce the algorithm after only eight hours of work. No reverse engineering of the software was involved. Instead, a few hundred carefully chosen passwords were analyzed using pencil and paper. The algorithm turns out to be a simple combination of XOR with a constant key, and a substitution cipher weaker than those found in puzzle magazines. For more details, see Resources. Once the cipher is known, recovering a POP3 or IMAP password stored on a machine is trivial. An attacker requires either physical access to the victim's machine, or the ability to run code on it. A mobile code attack need only steal a copy of a file with the scrambled password in it. That is, only read access is required. Current JavaScript attacks are able to do this. Reliable Software Technologies has created a working password-snagging attack in the lab. A successful attack allows the bad guy to download and read a victim's e-mail from a remote machine. Since careful use of the hack would not leave too many obvious clues, a victim's e-mail could be snooped indefinitely. The only workaround is to turn off the "remember password" feature. Although stealing mail alone is a very serious security and privacy problem, more dangerous scenarios exist. The largest risk is that people use the same password for POP3 and other logins to remote machines (and maybe even their PGP passphrase). In particular, many people use IMAP or POP3 to access work-related e-mail from home, and their mail password is also the login password they use at work. In fact, the login account and the mail account are often the same. Home computers are notoriously insecure and easy to penetrate. A malicious attacker can read the POP3 password stored on an insecure home computer (often over the Internet) and use it to log in to a more secure machine run by the victim's employer. The attacker can then take control of an account, read sensitive information, attack more privileged accounts, and set up remote monitoring systems inside a corporate network. The Reliable Software Technologies exploit code could also be used as a payload to deliver a much more insidious version of the Melissa virus. Cryptography export laws The cryptography laws are meant to protect national security, guarding against use of strong cryptography by terrorists and drug traffickers. However, all other countries on the planet already have the same algorithms the U.S. is futilely trying to protect. As they say in Tennessee, there is no sense in closing the barn door after the horse is gone. Silly cryptography laws only serve to hurt American commerce, because U.S. companies can't sell products with strong cryptography to people in other countries. Currently, no special permission is needed to export cryptography of 56-bit strength or less. Anything stronger requires an export license. By the time you read this, the laws may have changed, so don't take our word for it! The best we can do is recommend that if you want to export software with encryption, check with a lawyer. Countries other than the U.S. have export restrictions on cryptography too; some even have import restrictions. It's up to you to check your local laws. There are some sneaky ways to get past legal restrictions on strong cryptography. Some companies do a reasonable job. They don't export cryptography. Instead, they import it. They either develop the cryptographic software outside the country, or get someone outside the country to add cryptography to their software. This sort of solution is probably not feasible for everybody. A more practical approach is to design your software to work with no cryptography, but also to interoperate easily with off-the-shelf, freely available cryptography packages -- packages that are available to people outside the U.S., of course. People can download the cryptographic package and simply drop it into your application themselves. This approach is fairly common, with some packages even providing detailed instructions as to how to download and install the appropriate cryptographic package along with the software distribution. Traditional cryptography Cryptography has been used to protect military and diplomatic secrets being sent through insecure channels for thousands of years. All traditional cryptosystems work by agreeing on a cipher and perhaps a key before messages can be sent. Modern cryptographers believe public ciphers are the best sort to use. In the old days, the cipher had to remain secret or the security of all communications would be compromised. If any kind of key was used, it was nowhere near as powerful as the keys found in current modern approaches. Prior to this century, most cryptographic systems were based on character substitution or character transposition (called substitution ciphers and transposition ciphers). Perhaps the best known traditional cipher is the Caesar cipher, which is generally believed to be one of the first encryption algorithms invented. In the Caesar cipher, a plaintext message was converted to ciphertext by rotating each letter a fixed number of places down in the alphabet. When using a one-character rotation, the letter "A" would be encrypted to "B," "B" would be encrypted to "C," and "Z" would be encrypted to "A." For example, the string "HELLO WORLD," when rotated one character, is "IFMMP XPSME." With this algorithm, there are 25 keys (in English, anyway) that can possibly determine the number of letters by which each character is rotated. If an attacker knows the cipher being used, trying all 25 possible decodings to find the key does not take long. In most cases, only one possible decoding will make any sense. Caesar ciphers are a special case of substitution ciphers. In their simplest form, substitution ciphers are a permutation of the 26 letters of the alphabet. Here's an example of a substitution cipher:
To encrypt a message, we look up each character in the plaintext in the top row, replacing it with the letter in the bottom row. For example, "HELLO WORLD" would encrypt to "ITSSG VGKSR." To recover the plaintext of this message, we look up the ciphertext letters in the bottom row, replacing them with letters from the top row. Assume the cipher shown here can be securely distributed. Also assume that a potential attacker guesses that a straightforward substitution cipher is being used, but doesn't know the exact permutation. There are somewhere between 288 and 289 different permutations of a 26-letter alphabet. In a society without computers, a brute force attack is clearly not feasible. Nonetheless, such ciphers tend not to be too difficult to break, given enough ciphertext. The key is to count the instances of each letter of ciphertext. Since E is the most commonly used letter in the English language, given a large enough ciphertext, the most common letter in that text is highly likely to be an E. If it isn't, then it's probably an A, I, or O. Experts in traditional cryptanalysis can perform this sort of statistical attack quite quickly with only a few dozen letters of ciphertext. A substitution cipher can get arbitrarily more complex, hopefully removing elements from the ciphertext that lead to easy analysis. For example, we could make a cipher whose ciphertext is difficult to analyze using statistics by having the same ciphertext letter represent different plaintext letters, depending on the context. We could have one ciphertext digit map to combinations of plaintext digits. We can also "cascade" encryption, so that the encryption of one digit is dependent on the decryption of the previous digit. For example, notice in our previous example that the "L" appears three times in plaintext, and the corresponding ciphertext letter "S" also appears three times. Let's modify our algorithm so that we rotate our plaintext letter n digits, and then encrypt the rotated digit. The value of n will depend on the previous letter of ciphertext. If there is no previous letter, we do not rotate. Otherwise, if the previous letter of ciphertext is "A," we will rotate one place before performing our table lookup. If the previous letter is "Y," we will rotate 25 places. Let's try our new cipher on the string "HELLO WORLD." We first encrypt the letter "H." Since there is no previous ciphertext, we translate it based on our table, so it encrypts to "I." We now consider "E." Since the previous ciphertext letter was an "I," we rotate the "E" nine places, getting "N." We then translate the letter "N" based on our table to "F." Our ciphertext so far is "IF." Following this procedure for each letter, we get "IFKCK YFBFP." It's not too hard to reverse this process if you know the cipher. The letter "F" shows up three times in the ciphertext, but each time it maps to a completely different letter in the plaintext (E, O, and L). The letter "K" also appears twice, mapping to both "L" and "D." This scheme makes it significantly more difficult to perform a statistical attack against the ciphertext. Substitution is the primary primitive in traditional ciphers, but transposing the actual positions of the characters is another primitive. Most modern cryptographic algorithms are built on these same basic principles. The difference is that modern algorithms do a whole lot more substitution and transposition. Traditional algorithms were, of course, applied by hand. Even with a fairly simple cipher, scrambling and descrambling a message was quite time-consuming, even with a fairly short message. Then along came computers. (Cryptography was high among the first uses of computers.) Computers can use algorithms that are more complex by many orders of magnitude, and still encrypt and decrypt a lot faster than a human can. In that way, modern ciphers are worlds beyond traditional ciphers. In our next installment, we'll talk about essentials of modern cryptography, namely symmetric (secret key) cryptography, and public key cryptography.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| About IBM | Privacy | Terms of use | Contact |