IBM®
Skip to main content
    Country/region [select]      Terms of use
 
 
      
     Home      Products      Services & solutions      Support & downloads      My account     

developerWorks > Security >
developerWorks
Protecting passwords: Part 1
e-mail it!
Contents:
Password privacy
Don't leave your keys
Storing your hash
Details from the crypt
Want salt with that?
Taking a crack at ciphertext
Adding users to a password database
Decoding the code
Cleaner code
Resources
About the authors
Rate this article
Subscriptions:
dW newsletters
dW Subscription
(CDs and downloads)
Storing passwords

Gary McGraw, Cigital
John Viega, Cigital

01 Aug 2000

Like many security technologies we have discussed in this series on developerWorks, the idea is simple and elegant, but getting everything exactly right is harder than it first appears. In this installment, we'll focus on the problem of storing passwords. Next time, we'll discuss authenticating users with passwords.

Almost everyone who has used a computer in the last decade knows what a password is. For better or for worse, username/password pairs are by far the most popular method for authenticating users. The underlying premise of passwords is that the user and an authenticating agent (for example, the user's machine) share a secret password. When it comes time to authenticate, the user types the password. If it matches the one stored by the authenticating agent, the connection is approved.

Password privacy
The first hurdle to overcome in any password system is the privacy problem. Most people don't want other people to be able to read (or use) their passwords. If your code stores my password in plaintext, you can read it at will. Even worse, if your security isn't absolutely perfect, other people can read the passwords too! Yuck.

So how can we solve these problems? Some people say, "trust us" we won't read your password unless you need us to." But there is no reason to believe that. Even if the people running the service are telling the truth, they might be forced to reveal your password by a court of law. But let's say that's fine with you; you trust the organization running the authenticating agent. The question then becomes one of what they are doing to protect your password from attackers.

Don't leave your keys
One solution is to encrypt your password. However, this solution is nontrivial. A key is required to encrypt and decrypt your password. This transforms the question from a password storage problem into a password key storage (and use) problem. Of course, the authenticating software needs the password key to be able to validate you when you log in. We don't want our authentication software to require human intervention if we can help it, so the key will probably need to sit where our software can access it. The problem is that if the software can access it, a successful attacker can too. Our goal thus becomes finding a way to make the cryptographic key as hard as possible to find. That's a form of "security through obscurity," which is a practice we should avoid if at all possible. It turns out that hiding keys isn't a very easy problem anyway. (We'll discuss key hiding in the near future, when we talk about license management software.)

Storing your hash
Fortunately there is a better way. In fact, that better way not only gets rid of the key storage problem, it also solves the password storage problem. In a nutshell, the authenticating entity in our solution stores and uses a cryptographic hash of the password. When a user types in the password, the password gets sent to the authentication agent, which hashes it, and compares the hash with its stored hash. If the two hashes are the same, authentication succeeds!

Now the authenticating entity no longer has to hold on to the plaintext password. But that doesn't mean the machine can't hold on to it! We've said that the authenticating agent performs the password check. We implied that the agent then "forgets" the password. A hostile administrator could hold onto it, maybe in a secret file of users' passwords. Maybe an attacker could gain access to this file. Or maybe the administrator is an attacker. There's obviously no need for the administrator to compromise the system in question, given the access we expect an administrator to have. The problem is that lots of people tend to use the same passwords on different machines. Any smart attacker will try your old passwords on new accounts before trying anything else. Next, an attacker will try variations on your old passwords. There are some complex solutions to this problem (we can use public key cryptography, for example -- see Tried and true encryption on developerWorks for more information on public key cryptography), but for today, we simply fall back into trusting the administrator and the authentication system.

Details from the crypt
One common implementation of a hash-based password storage mechanism is the Posix crypt() call. The crypt() call doesn't technically perform a one-way hash. It encrypts a string of zeros using a modified version of DES that essentially uses your password as an encryption key. In practice, this works out to be functionally the same as a one-way hash. When encrypting all zeros, only the person who knows the password should be able to produce the key that will produce the same ciphertext. The nice feature that the crypt() solution has over a traditional cryptographic one-way hash is the guarantee that no two passwords will encrypt to the same string. With cryptographic hashes, there is always some very small possibility that an attacker will hit upon a collision, where a second password (once hashed) gives the exact same ciphertext as the first (once hashed). Generally, the odds of a collision are so small that it is not a significant risk.

Unlike cryptographic hash functions such as MD5 and SHA-1, which can operate on arbitrary length strings, the crypt() call is pretty limited. It can only handle passwords of eight characters or fewer. If you have a 12-character password, only the first eight characters really matter. The crypt() call will ignore everything after eight. There are 96 different characters that can go into a password. That means there are under 253 unique passwords. That sounds like a pretty big number (the actual number is about 7,213 trillion). It is, but not in cryptographic terms. Let's assume for a second that every password maps uniquely to a single piece of ciphertext. With a few hundred terabytes of storage, we could store every possible password and ciphertext pair, and look them up in a hurry. We'd need to precompute all the data, but we'd only have to do it once. With pretty reasonable resources, that might only take a year. A large enough organization could probably get it done in a few weeks at worst.

Want salt with that?
When crypt() was designed, this kind of an attack (called a dictionary attack) was foreseen. However, people didn't expect that it would ever be possible to make a complete, usable dictionary, even though it is possible these days. What they expected was that particular passwords would be very common -- for example, dictionary words, proper nouns, and the like. To prevent a common database of passwords from being created, a step was added to the crypt() process. The string of zeros isn't directly encrypted using your password as a key. What happens is that a two byte salt is chosen. A salt is basically some random but completely public data that anybody can see. The salt is concatenated to the password to create a key. That way, when the same password gets put through crypt() process multiple times, it always encrypts to a different string (assuming you always get a different salt).

Only 64 unique characters can be used in the salt, and the salt can only be two bytes. Therefore, there are only 4,096 different salts. That makes a comprehensive dictionary attack about 4,000 times harder (and about 4,000 times more space intensive). Under the salt approach, the space of possible keys (the key space) goes up from about 253 to 266. When crypt() was originally written, that was a huge key space. These days, it isn't so big, especially considering that the salt should be considered compromised. The salt is generally saved with the ciphertext as it is commonly assumed that an attacker is somehow able to get that text anyway. Unless the password database is made unreadable to all except those with special privileges, getting to the text is usually possible.

Taking a crack at ciphertext
The implications of getting to the ciphertext are important. Once someone has the ciphertext, matching it with a brute-force attack -- trying all 253 passwords until the password that encrypts to the given text with the right given salt -- is only a matter of time. Many people believe that the government has the computational resources to perform such a crack in a matter of minutes, or perhaps even seconds. A brute-force attack is definitely within the reach of an organization with reasonable resources. Storing a full dictionary for every possible password with every possible salt would take up a few million terabytes of disc space, which currently can be done, but only at an extraordinary cost. In a few years, it's likely to be a more practical approach. Without a precomputed dictionary, it's still not too hard to get machines that can try on the order of 218 different passwords per second. A sizable effort distributed across 8,000 or so machines of this quality would be able to crack absolutely any password in no more than 48 days. Of course, most passwords fall into a small set of a few million, making them much, much easier to crack. (We'll spend some more time on the password picking problem next time.) Note that the numbers above pertain to precomputing every password and its crypted value given a single salt. So it would require a massive amount of computing power to build a precomputed dictionary with all salts, and take somewhere in the scope of a decade.

We discussed desirable sizes of key spaces when we discussed symmetric key cryptography. Our preference lies with a key space of 128 bits or better. This level of security would be nice. The problem is that the more bits of security you want, the longer passwords have to be. And people want shorter passwords, not longer ones. You would need to have a 20-character password, each character randomly chosen, to get 128 bits of security. That is way too many characters for most people. Plus, those characters each need to be completely random. Because people tend to pick poor passwords, an attacker only has to search a small fraction of the space before finding a password.

Adding users to a password database
Let's consider the problem of adding a user to a file-stored password database in C (we'll cover authenticating the user in a bit). We'll assume that the user in question is sitting in front of the computer, and that we can read the user's input from standard input. First, we'll look at na? implementations with several problems. We'll discuss those problems, and refine our program to fix them.

Here's a sample program that adds a new user to a UNIX-style password database. You should change the value of PW_FILE as appropriate, and run "touch" on that file before using this program. And remember that this example code has some problems, so don't use it in any production systems.

Decoding the code
Now that we have something to talk about, we can turn to the question of what's wrong with the code in the sample above. The answer? Plenty. The most obvious problem is that the password gets echoed to the screen when we run the program. Oops. We can fix that by adding an "echo" parameter to the ReadLine function, and then use the ioctl system call to cause the terminal attached to stdin to refrain from echoing if the parameter is set to zero. We should also check to make sure there is a terminal connected to stdin (there won't be if input to the program is piped in).

A second problem is our misuse of the crypt() library. It's fairly common to see crypt(pw, pw) in code. Unfortunately, that's very bad practice. The problem is that the salt is the first two characters of the password. The salt must be stored in the clear. Therefore, anyone who can see the password database only has to guess six characters instead of eight. Crack programs will find a break much more quickly. Plus, we've given up all the benefits of having a salt. All users who have the password "blah" will have the same encrypted password, "blk1x.w.IslDw." A stored dictionary attack is a very feasible attack on most passwords in this case. Storing several million of the most likely passwords only takes up a few dozen GB of disk space.

We really need to select a random salt. Because the salt is public, it doesn't actually matter much if we use "secure" random number generation to pick a salt; pseudo-random number generators seeded on the clock are OK for this purpose. The risk is that an attacker can have some influence on the random number. The attacker can increase the likelihood that the selected salt is one they have already done extensive precomputation on. This attack is far more theoretical than practical, as most crack programs are highly effective without ever needing to precompute anything. Still, we might as well play things safe, and get our numbers from a reasonable random number source. For our example, we'll use the Linux-based random number library we developed in a previous article, but compromise by using "strong" random numbers that may not be 100% secure.

The third problem with the sample code above is that we should be more paranoid about leaving around plaintext passwords for any longer than necessary, even in memory we control. Sometimes attackers will have access to a program's memory as it runs, or they may be able to force a core dump that can later be examined to reclaim the password. We should try to minimize these windows of vulnerability by minimizing the amount of time in which the password is stored in an unencrypted format. As soon as we read in the password, we should encrypt it, then zero out the memory containing the password. Even when we compare to make sure that two passwords are equal to confirm that the user typed in the right password, we should compare them based on the resulting ciphertexts and not on the plaintexts.

A fourth problem is accessibility of the database. The user running this program must be able to read and write the database. That's not really a great idea in most cases. Heck, we'd even like to prevent anyone other than the program from reading the database if possible. If logins are coming in over the network, instead of on the local machine, it's less likely that people will be able to read and modify the database. But if there's an exploitable stack overflow, or some other problem that the attacker can leverage to run code, then mucking with the database is certainly not impossible. In the UNIX world, the answer to this problem is to use setuid programming, which comes with its own set of risks, and is a complicated topic. We'll get to that topic right after we finish up with authentication. For now, we'll leave our code broken in this respect and we'll go back and fix it in a future article.

Cleaner code
Based on our newfound insight, here's a mostly fixed version of the example program (note that it requires the secure random number library we built in a previous column):

Of course, we still don't like crypt() because it limits passwords to eight characters, and limits the number of salts we can have. We could provide a replacement version of crypt that uses MD5 as follows:


md5_crypt.h:

#include 
#define DIGEST_LEN_BYTES 16

char *md5_crypt(char *pw, char *salt);


md5_crypt.c:

#include "md5_crypt.h"
char *md5_crypt(char *pw, char *salt) {
  MD5_CTX context;
  char digest[DIGEST_LEN_BYTES];
  char *result = malloc(strlen(salt) + DIGEST_LEN_BYTES + 1);

  if(!strlen(salt)) abort();
  MD5Init(&context);
  MD5Update(&context, salt, strlen(salt));
  MD5Update(&context, pw, strlen(pw));
  MD5Final(digest, &context);
  strcpy(result, digest);
  strcat(result, salt);
}

This function has slightly different behavior from standard crypt, because the salt is variable length. In particular, it is not valid to say md5_crypt(typedpw, storedpw). With traditional crypt this kind of code is OK, because only the first two bytes of the second parameter are considered. Unfortunately, because the salt isn't fixed size, md5_crypt could never distinguish a salt with a postpended MD5 hash from a salt alone -- even if the salt were placed before the hash. Unlike traditional crypt, we place the salt after the hash, so that we can easily call md5_crypt when checking a password, as follows:

md5_crypt(typedpw, storedpw[DIGEST_LEN_BYTES]);

In our next installment, we will discuss how to create a simple authentication system based on passwords.

Resources

About the authors
Gary McGraw is the vice president of corporate technology at Reliable Software Technologies in Dulles, VA. Working with Consulting Services and Research, he helps set technology research and development direction. McGraw began his career at Reliable Software Technologies as a Research Scientist, and he continues to pursue research in Software Engineering and computer security. He holds a dual Ph.D. in Cognitive Science and Computer Science from Indiana University and a B.A. in Philosophy from the University of Virginia. He has written more than 40 peer-reviewed articles for technical publications, consults with major e-commerce vendors including Visa and the Federal Reserve, and has served as principal investigator on grants from Air Force Research Labs, DARPA, National Science Foundation, and NIST's Advanced Technology Program.
McGraw is a noted authority on mobile code security and co-authored both "Java Security: Hostile Applets, Holes, & Antidotes" (Wiley, 1996) and "Securing Java: Getting down to business with mobile code" (Wiley, 1999) with Professor Ed Felten of Princeton. Along with RST co-founder and Chief Scientist Dr. Jeffrey Voas, McGraw wrote "Software Fault Injection: Inoculating Programs Against Errors" (Wiley, 1998). McGraw regularly contributes to popular trade publications and is often quoted in national press articles.


John Viega is a Senior Research Associate, Software Security Group co-founder, and Senior Consultant at Reliable Software Technologies. He is the Principal Investigator on a DARPA-sponsored grant for developing security extensions for standard programming languages. John has authored over 30 technical publications in the areas of software security and testing. He is responsible for finding several well-publicized security vulnerabilities in major network and e-commerce products, including a recent break in Netscape's security. He is also a prominent member of the open-source software community, having written Mailman, the GNU Mailing List Manager, and, more recently, ITS4, a tool for finding security vulnerabilities in C and C++ code. Viega holds a M.S. in Computer Science from the University of Virginia.



e-mail it!
Rate this article

This content was helpful to me:

Strongly disagree (1)Disagree (2)Neutral (3)Agree (4)Strongly agree (5)

Comments?



developerWorks > Security >
developerWorks
  About IBM  |  Privacy  |  Terms of use  |  Contact