More subtle problems can lead to dictionary-like attacks as well. For example, say that some pseudo-random algorithm, rather than a human user, selects the key. This is likely to be an improvement, but maybe not enough of one. Attacker Mallory might decide to cryptanalyze the key-generation algorithm rather than the encryption per se. A less than adequate key generator might produce all kinds of statistical regularities in the keys it creates. It would be an amazingly bad algorithm that only produced 100,000 possible keys (as humans might); but a less than perfect key generator might very well, for example, produce significantly more ones in even-index key bits than zeros in those same positions. A few statistical regularities in generated keys can knock several orders of magnitude off Mallory's required efforts in guessing keys. Making a key generator weak does not require that it will never generate the key K -- it is enough to know that K is significantly more or less likely to occur than other keys. It is not good enough for a protocol to be secure "some of the time".
Plain text messages often have properties that aid cryptanalysis. For the purpose of explanation, consider messages written in English and encoded in ASCII text files (other file types have other regularities). A few concepts are general and important in understanding plain text regularities. These concepts are entropy, rate-of-language, and unicity distance. The significance of these concepts is that statistical regularities in plain texts are nearly as helpful for cryptanalysts as would be actually knowing the exact messages in question.
Entropy: The amount of underlying information content of a message. For tutorial users familiar with compression programs, we can mention that if a message is (losslessly) compressible, it ipso facto has an entropy less than its bit-length. Take a simple example of a message with less entropy than its length might suggest. Suppose we create a database field called "sex" and have it store six ASCII characters. However, "male" and "female" are restrictions of the allowable values. This database field contains just one bit of entropy, even though it occupies 96 bits of storage space (assuming 8-bit bytes and so on).
Rate-of-language: The amount of underlying information added by each successive letter of a message. English prose, for example, turns out to contain something like 1.3 bits of entropy (information) per letter. This might seem outrageous to claim -- after all there are more than 2^(1.3) letters in English! But the problem is that some letters occur a lot more than others, and pairs (digraphs) and triplets (trigraphs) of letters cluster together also. The rate of English doesn't depend just on the alphabet, but on the patterns in the whole text. The low rate of English prose is what makes it such a good compression candidate.
Unicity distance: The length of cipher text necessary for an attacker to determine whether a guessed decryption key unlocks a uniquely coherent message. For example, if Alice encrypts the single letter "A", attacker Mallory might try various keys and wind up with possible messages "Q", "Z", "W". With this little plain text, Mallory has no way of knowing whether he has come across the right decryption key. However, he is pretty safe in assuming that "Launch rockets at 7 p.m." is a real message, while "qWsl*(dk883 slOO1234 >" is an unsuccessful decryption. The actual mathematics of unicity distance depend on key length, but for DES and English prose as plain text, unicity distance is about eight characters of text.
An attacker might take a number of approaches in breaking a protocol. The protocol itself -- and also the consistency with which it is followed -- will affect which attacks are possible in a given case. The attacks described in the next section are all most relevant to encryption protocols as such, and only indirect to other sorts of protocols that might be compromised (for example, digital signatures, online secure betting, authentication, secret sharing; some of these other protocols might still involve regular encryption in some of their steps). This tutorial will not attempt to detail just how each of these attacks might proceed (it depends on too many non-general issues); but in a general way, the fewer angles for attacks a protocol leaves open, the more secure it is likely to be.
Cipher text only: This attack is almost always open to an attacker. The idea is that based solely on the encrypted message, an attacker tries to deduce the plain text. Brute-force attack on the key is one example of this type of attack.
Known plain text: In some cases, an attacker might know some or all of the encrypted plain text. This knowledge might make it easier for the attacker to determine the key and/or decipher other messages using the protocol. Typical examples of known plain text exposure come when an attacker knows that encrypted content consists of file types that contain standard headers, or when an attacker knows the message concerns a named subject. In other cases, entire messages might get leaked by means other than a break of the encryption, thus helping an attacker break other messages.
Chosen plain text: An attacker might have a way of inserting specially selected plain text into messages prior to their encryption. Initially, this might seem unlikely to occur; but let's look at a plausible example. Suppose Alice runs a mail server that filters out suspected e-mail viruses. Furthermore, she forwards an encrypted copy of suspect e-mails to virus expert Bob. Attacker Mallory can deliberately mail a virus (or something that resembles one) to Alice, knowing that its specific content will appear in a message from Alice to Bob.
A few less commonly available attacks can add significantly to Mallory's chances of success:
Adaptive chosen plain text: This attack is just a more specialized version of a general chosen plain text attack. Each time Mallory inserts one chosen plain text and intercepts its encrypted version, he determines some statistical property of the encryption. Later chosen plain texts are selected in order to exercise different properties of the encryption.
Chosen key: An attacker might have a means of encrypting messages using a specified key. Or the specified key might only have certain desired properties. For example, if a key is indirectly derived from a different part of the protocol, an attacker might be able to hack that other part of the protocol, creating usable key properties.
Chosen cipher text: An attacker might be able to determine how selected cipher texts become decrypted. For example, Mallory might spoof an encrypted message from Bob to Alice. Upon attempting to decrypt the message, Alice will wind up with gibberish. But Alice might then mail this gibberish back to Bob or store it in an insecure way. By choosing cipher texts (or really pseudo-cipher texts) with desired properties, Mallory might gain insight into the actual decryption.
There are attacks on ciphers, and then there are compromises of ciphers. There are many ways of breaking a protocol that have little to do with analysis of the mathematical behavior of its algorithms.
The greatest vulnerabilities of actual encryption systems usually come down to human factors. One colorful term for such human vulnerabilities is "rubber-hose cryptanalysis." That is, people can be tortured, threatened, harassed, or otherwise coerced into revealing keys and secrets. Another colorful term emphasizing a different style of human factor vulnerabilities is "purchase-key attack" -- that is, people can be bribed, cajoled, or tempted to reveal information.
Of course, still other human factor vulnerabilities arise in real-world encryption. You can search people's drawers for passwords on scribbled notes. You can look over someone's shoulder while they read confidential messages or type in secret passwords. You can call people and pretend to be someone who has a legitimate reason to need the secrets (Kevin Mitchnik, the [in]famous hacker, has called this "human engineering"). In many cases it is enough just to ask people what their passwords are!
Technical, but non-cryptological, means are also available to determined attackers. For example, workstations that need to protect truly high-level secrets should have TEMPEST shielding. Remote detection devices can pull off the characters and images displayed on a computer screen unbeknownst to its user. Maybe you do not need to worry about an attacker having this level of technology for your letter to your Aunt Jane, but if you are in the business of launching bombs (or even just protecting gigadollars of bank transactions), it is worth considering.
Any cryptographic system is only as good as its weakest link.
Implicit in much of this tutorial is the concept of computational feasibility. Some attacks on cryptographic protocols can be done on computers, while others exceed the capabilities that improving computers will obtain. Of course, just because one line of attack is computationally infeasible does not mean that a whole protocol, or even an algorithm involved, is secure. Attackers can try approaches other than those you protect yourself against.
We refer to a protocol that is computationally invulnerable to any form of attack as "computationally secure." Keep in mind that "human factor" approaches are really properly described as "compromises" rather than as attacks per se (especially in this context). However, it turns out that we can do even better than computational security. Let's take a look in the next section.
A "one-time pad (OTP)" is an encryption technique that provably produces unconditional security. An OTP has several distinguishing properties: (1)The key used in OTP encryption/decryption must be as long as the message encoded; (2)The key used in OTP encryption must be truly random data; (3)Each bit of an OTP key is used to encode one bit of the message, typically by XOR'ing them. Mathematically, (3) is not strictly necessary -- there are other ways to do it right -- but practically, inventing other variants just invites design mistakes. A lot of "snake-oil" cryptographers claim to avoid requirement #2. Don't trust them. Using pseudo-random data (including anything you can generate on a determinate state machine like a computer CPU) makes the encryption less than unconditionally secure. It comes down to entropy: If you can specify how to generate N bits of key using M < N bits of program code, ipso facto, the key contains less than N bits of entropy.
It is actually quite easy to see why an OTP is
unconditionally secure. Suppose Mallory intercepts a
cipher text C and wants to decrypt it (say, by brute-force
attack). However, for any possible decryption M, Mallory
can attempt to use a key K such that
M = C xor K.
Mallory can attempt decryption until the end of time, but
he has no way, based on the known cipher text and an unknown key,
to determine if he has hit upon the correct key.