|Protecting sensitive data in memory|
Security-conscious programmers often need to protect sensitive data in memory, such as passwords and cryptographic keys. In order to do this effectively, a programmer should keep sensitive data in memory for as short a time as possible, and should try to ensure that the data never gets written to disk.
What if an attacker breaks into the machine through some means and captures, say, a password? In such a case, the attacker would have gained some kind of access to an account in your application. Additionally, people tend to reuse passwords on different kinds of accounts, just to keep things simple.
Some might say that those who use the same password for multiple accounts deserve what they get. This theory offloads too much of the security burden onto users, who are probably more concerned about other things; it is the responsibility of the application to be reasonably diligent with regard to security. Additionally, such a lax approach isn't fair to those people who don't reuse passwords, because it is still fairly easy for an attacker to compromise the account at hand.
Therefore, you should use care with any data that a user might consider sensitive. Your primary high-level goal should be to keep sensitive data in memory for as short a time as possible. And when it is necessary to store such data, you should do your best to prevent that data from ever being recovered by unauthorized parties.
These goals are simple, but they are often tough to realize in practice. In this article, we'll explore the best methods for protecting data in real applications.
What to protect against
The most important thing to avoid is putting sensitive data in a file on the file system. If you absolutely must store sensitive data here for any extended length of time (that is, if you cannot use a cryptographic checksum), you should use encryption to protect the data. We'll discuss this in more detail below.
Additionally, you should prevent your program from leaving
memory dumps around when the program crashes. Memory dumps are stored
in regular old files, and it's very easy to get ASCII strings that
were in memory at the time of a crash out of such files. You can
forbid core dumps by using the
Generally, the best way to use
Another way your data can make it to disk is by being swapped out. The operating system can decide to take parts of your running program in memory and save them to disk. In C, you may be able to "lock" your data to keep it from swapping out; your program will generally need administrative privileges to do this successfully, but it never hurts to try. Here's a simple way to lock memory when possible:
There are some potentially negative consequences here. First, If your process locks two buffers that happen to live on the same page, then unlocking either one will unlock the entire page, causing both buffers to unlock. Second, when locking lots of data, it is easy to lock more pages than necessary (the operating system doesn't move data around once it has been allocated), which can slow down machine performance significantly.
Therefore, you should allocate all memory that might need to contain sensitive data at the same time, preferably in one big chunk. Assuming all the data fits onto a single page, you should lock the entire chunk when you need secure memory. As soon as you have no need for secure memory at a particular moment in time, unlock the entire chunk (there's no need to risk hampering performance when there is no data to secure).
Unlocking a chunk of memory looks exactly the same as locking it,
except that you call
If you require lots of sensitive data, it is possible to lock all
memory in the address space using the
In most cases, these calls are not available: Often, programs are unable to run with admistrative permission; other times the language being used does not support page locking at all. In such cases, your best bet is to use as small a chunk of memory as possible, and to use it and erase it as quickly as possible, thus minimizing your window of vulnerability. If you are constantly accessing the buffer from the time you place the data in until the time you erase it, then you minimize your risk; paging rules will likely (but not definitely) keep the page in question from swapping.
Another way to get a block of memory that will not swap is to use a
RAM disk. That is, the operating system will provide you with a "disk
drive" that is really part of the system memory. It's much easier to
In addition, you may also have problems actually erasing data from memory. We'll discuss this in the next section.
Erasing data from memory
Of course, at some point you may have no choice but to handle a password in its raw format. For example, the user doesn't directly enter a checksum; this needs to be computed from input text. In such a case, you should erase the password immediately after computing the checksum.
To erase data in memory, write over the data itself. The following is not sufficient in a C program:
This is insufficient because we haven't erased the actual memory that stores the password; we've only erased a pointer to the password. If we know the password is null-terminated, we can erase it as such:
In other languages, we might have a more difficult time erasing sensitive data. High-level languages often have data types that are immutable. The program can only write to an immutable object once, at creation time. For example, consider the following Python code:
Even after this code has run, the unencrypted password may still exist
in memory. This is because assigning
You might think to fix the problem by directly overwriting each character of the string. You could try the following:
Unfortunately, this approach will not work, because Python does not allow the user to overwrite any part of a string. You have no way of knowing when the language will decide to actually write over the stored memory. Strings in Java, Perl, Tcl, and most other high-level languages have the exact same problem.
The only solution to this is to use mutable data structures. That is, you must only use data structures that allow you to dynamically replace elements. For example, in Python you can use lists to store an array of characters. However, every time you add or remove an element from a list, the language might copy the entire list behind your back, depending on the implementation details. To be safe, if you have to dynamically resize a data structure, you should create a new one, copy data, and then write over the old one. For example:
The inability to use immutable data types for sensitive data is quite inconvenient. For example, in most high-level languages, you can no longer use standard input functions to read in a string; you must read characters in one at a time. This can be quite a task in and of itself, depending on the language.
A similar problem exists in some languages that support garbage collection, even if they provide mutable data types. The garbage collector may copy sensitive memory while it is in use (for efficiency purposes). Languages that support reference-counting only, such as Python, will not have this problem. However, even if most Java implementations do not have this problem, some of them might.
Erasing the disk
Usually, "deleting" a file means simply removing a file system entry that points to a file. The file will still exist somewhere, at least until it gets overwritten. Unfortunately, the file will also exist even after it gets overwritten. Disk technology is such that even files that have been overwritten can be recovered, given the right equipment and know-how. Some people claim that if you want to securely delete a file, you should overwrite it seven times. The first time, overwrite it with all ones, second with all zeroes. Then, overwrite it with an alternating pattern of ones and zeros. Finally, overwrite the file four times with random data, such as that generated from /dev/urandom or a similar source.
Unfortunately, this technique probably isn't sufficient. It is widely believed that the United States government has disk recovery technology that can thwart such a scheme. If you are really concerned about this, then we recommend implementing Peter Gutmann's 35-pass scheme as a bare minimum (see Resources).
Of course, anyone who gives you a maximum number of times to write over data is misleading you. No one knows how many times will be sufficient. If you want to take no chances at all, then you need to ensure that the bits of interest are never written to disk with encryption, decrypting them directly into locked memory. There is no other alternative.
Future programming languages may effectively address this security requirement. Until then, developers will be forced to make tough trade-offs when working with sensitive data.