![]() |
|
|||||||||||||||
|
||||||||||||||||
|
| Make your software behave: Cryptography essentials | ||||
| Using hashing algorithms for data integrity and authentication
One-way functions One-way functions may not sound immediately useful because you can't get the plaintext back out of a one-way computed ciphertext. Why would you want to compute a cipher that you can't undo? Of course, functions that are almost one-way are pretty useful, as all public key functions are, at their essence, one-way functions with "trap doors." Functions that are good candidates for public key cryptography are those that are easy to compute in one direction, but very difficult to compute in the other direction unless you know some secret. Thus, we find public key algorithms are based on factoring and other hard mathematical tricks. Hash functions The idea is to take a piece of plaintext and convert it to a piece of (usually smaller) ciphertext in a way that is irreversible. Ideally, all possible plaintexts will hash to a unique ciphertext, but that is not usually what happens. Most of the time, there are infinitely many different strings that can produce the exact same hash value. However, with a good cryptographic hashing function, it should be extremely difficult in practice to find two intelligible strings that hash to the same value. Another property of a good hash function is that the output will not reflect the input in any discernable way. Hash functions usually produce constant-sized digests. Many algorithms produce very small digests, however, the security of the algorithm rests largely in the size of the resulting digest. We recommend selecting an algorithm that provides a digest size of no fewer than 128 bits. SHA-1, which gives a 160-bit hash, is a good hash function to use. You can use hash functions to ensure data integrity, much like a traditional checksum. If you publicly release a regular cryptographic hash for a document, anyone can verify the hash, assuming they know the hashing algorithm. Most hashing algorithms that people use in practice are published and well-understood. Again, let us remind you that using proprietary cryptographic algorithms, including hash functions, is usually a bad idea. Internet distribution Be forewarned -- there is still a problem with this distribution scheme. What if you, as the software consumer, somehow download the wrong checksum? For example, let's say that we distribute the "xyzzy" software package. One night, some cracker breaks into the distribution machine and replaces the xyzzy software with a slightly modified version, including a malicious Trojan horse. The attacker also replaces our publicly distributed hash with the hash of the Trojaned copy of the release. Now, when some innocent user downloads the target software package, a malicious copy is what arrives. The victim also downloads a cryptographic checksum, and tests it against the software package. It checks out, and the malicious code appears safe for use. Obviously, the hash alone is not a complete solution if we can't guarantee that the hash itself has not been modified. In short, we need a way to authenticate the hash. Authentication problems Message authentication codes
MACs are useful in lots of other situations, too. If you need to perform basic message authentication without resorting to encryption (perhaps for efficiency reasons), MACs are the right tool for the job. Even when you are already using encryption, MACs are an excellent way to ensure that the encrypted bit stream is not maliciously modified in transit. If designed carefully, a good MAC can help solve other common protocol problems. One widespread problem with many protocols is evidenced during what is called a playback attack (or a capture-replay attack). Say we send a request to our bank, asking to transfer $50 into John Doe' s bank account from ours. If John Doe intercepts that communication, he can later send an exact duplicate of the message to the bank! In some cases the bank might believe we sent two requests! Playback attacks
If the server ever sees a request with an out-of-date timestamp (say, more than 60 seconds old), it refuses the request. This may or may not be sufficient, because it still results in a 60-second window in which playback attacks can occur. You might consider forbidding two requests from occurring in the same unit of time, and cache information about what valid requests have arrived in the past 60 seconds. That solution will probably work if you can handle the special case where the same transaction is performed twice in the same unit of time. There's a much easier solution, though. When you calculate your MAC, hash not only the data and the secret key, but also a unique, ordered sequence number. The remote host need only keep track of the last sequence number it serviced, and make sure that it never services a request older than the next expected sequence number. This is a common approach. In many cases, authentication may not really be an issue. For example, consider using a cryptographic hash to authenticate users logging into a machine from the console. When the user enters a passphrase for the first time in many systems, the passphrase itself is never actually stored. Instead, a cryptographic hash of the passphrase is stored. That's because most users feel better if they believe system administrators can't retrieve their password at will. Assuming the operating system can be trusted (which is a laughably big assumption), we can assume our database of crypto hashed passphrases is correct. When the user tries to log in, and types in a passphrase, the login program hashes it, and compares the newly hashed passphrase to the stored hash. If the two are equal, we assume the user typed in the right passphrase, and login proceeds. Telnet protocol Of course, many other important security considerations surround the implementation of password authentication systems. This problem is a big enough topic that we'll discuss it in even more gory detail in another article. Other attacks Another attack on cryptographic hashes is much easier to carry out than the average brute-force attack. Consider the following scenario: Alice shows Bob a document and a cryptographic hash to validate the document where Alice agrees to pay Bob $5 per widget. Bob doesn't want to store the document on his server, so he just stores the cryptographic hash. Alice would like to only pay $1 per widget, so she would like to create a second document that gives the same hash value as the $5 one, then go to court, claiming that Bob over-billed her. When she gets to court, Bob will present a hash value, believing that Alice_s document will not hash to that value, because it isn't the original document she showed him. If her attack is successful, Alice will be able to demonstrate that the document she has does indeed hash to the value Bob stored, and the courts will rule in her favor. But how? Alice uses what is called a birthday attack. In this attack, she creates two documents, one with the $5 per widget price, the other with the $1 per widget price. Then, in each document, she identifies n places where a cosmetic change can be made (for example, places where a space could be replaced with a tab). A good value of n is usually one more than half the bit length of the resulting hash output (so n = m/2+1 if we assign m to the bit length of the hash output). For a 64-bit hash algorithm, she would select 33 places in each document. She then iterates through the different permutations of each document, creating and storing a hash value. Generally, she will expect to find two documents that hash to the same value after hashing about To get the same security against birthday attacks as you'd want against brute-force attacks, given a symmetric cipher with key length p, you should pick a hashing algorithm that gives a digest of size p*2. Therefore, for applications that require very high levels of security, it is a good idea to demand hashing algorithms that yield message digests that are 256 bits, or even 512 bits. What's a good hashing algorithm to use? We're particularly fond of SHA-1. Bruce Schneier recommends the same algorithm. But SHA-1 is not sufficient if a hash of length longer than 160 bits is required. For a big bit hash, try using a symmetric cipher adapted to perform hashing. One good example is the GOST hash algorithm, derived from the GOST cipher, with a hash length of 256 bits. Longer hash lengths are likely to require some coding in order to adapt a symmetric cipher. Schneier outlines constructions for building such algorithms in Applied Cryptography. Neither SHA-1 nor the GOST hash algorithm has intellectual property restrictions of any significance. Digital signatures
Even in hand-written signatures, these ideals are merely conceptual, and don't really reflect reality. For example, forging signatures is possible, even though few people are skilled enough to actually do it. However, signatures tend not to be abused all that often, and thus hold up in court very well. All-in-all, ink signatures are a "good enough" solution. Electronic signatures can do at least as well as physical signatures. This fact often amazes people, because they figure such a signature is akin to the signature files that people often place at the bottom of e-mail messages (a bunch of ASCII). If that were what digital signatures were like, then they wouldn't be very useful at all. It would be far to easy to copy a signature from one file, and append it directly to another file for a perfect forgery. It would also be possible to modify a document easily after it's signed, and no one would be the wiser. Thankfully, digital signatures are completely different. Most digital signature systems use a combination of public key cryptography and cryptographic hashing algorithms. As we have explained, public key cryptosystems are usually used to encrypt a message using the receiver's public key, which the receiver then decrypts with the corresponding private key. A private key can also be used to encrypt a message that can only be decrypted using the corresponding public key. If a person keeps their private key completely private, (you haven't been hacked lately, have you?) being able to decrypt a message using the corresponding public key then constitutes proof that the person in question encrypted the original message. Digital signatures are not just useful for signing documents -- they're useful for almost all authentication needs. For example, they are often used in conjunction with encryption in order to achieve both data privacy and data authentication. A digital signature for a document is usually constructed by cryptographically hashing the document, and then encrypting the hash using a private key. The resulting ciphertext is called the signature. Anyone can validate the signature by hashing the document themselves, and then decrypting the signature (using either a public key or a shared secret key), and comparing the two hashes. If the two hashes are equal, then the signature is considered validated (assuming the person doing the validation believes the public key they used really belongs to you). The signature doesn't need to be stored with the document. Also, the signature applies to any exact digital replica of the document. The signature can also be replicated, but there is no way to apply it to other documents, as the resulting hash would not match the decrypted hash. The problem with digital signatures Most public key algorithms, including RSA and ElGamal, can be easily extended to support digital signatures. In fact, a good software package that supports one of these algorithms should also support digital signatures. We recommend using your favorite public key cryptography algorithm for digital signatures. Try to use built-in primitives instead of building your own construction out of an encryption algorithm and a hash function.
What's next? In our next installment, we'll go into nitty-gritty detail about how to add password management to a software application, and (perhaps more importantly) how not to. Cryptography is a huge field, but only one facet of software security. We won't even pretend we'll cover it completely.
| |||||||||||||||||||
| About IBM | Privacy | Terms of use | Contact |