![]() |
|
|||||||||||||||
|
||||||||||||||||
|
| Make your software behave: Security by obscurity | ||||
Many businesses choose to hide their code in an attempt to keep software secrets out of the hands of hackers. But keeping this information hidden isn't always as easy as it seems, and even if the information is successfully obscured, there are plenty of other ways for hackers to find security vulnerabilities. The technical side of business places lots of emphasis on keeping secrets: Design documents are not published, code is treated as a trade secret, and sometimes algorithms themselves are kept secret. Software is often the mechanism used to keep secrets out of reach of attackers and competitors; so it is not surprising that the approach taken makes a great deal of difference. In this article, we discuss the implications of trying to keep things secret using software. There are a lot of good reasons for keeping secrets. Every company has intellectual property to protect, often including algorithms built right into the software being sold to customers. Companies also have cryptographic keys that must remain private in order to retain their utility. Despite popular trends toward openness, including the open source movement, most software companies still embrace secrecy when it comes to their computer programs. The problem is, secrecy is often used as a crutch and may not be effective. Probably the most popular way to keep secrets in code is to hide away the source and release only an executable version in machine code. Not releasing source code certainly can help keep hackers from stealing your secrets. However, doing so is not nearly as effective as many people believe. There are plenty of problems with this technique (often labeled "security by obscurity"), but the main problem stems from a false belief that code compiled into binary will remain secret just because the source is not available. This is wrong. Simply put, if your code runs, determined people can eventually find out exactly what it is doing. Watching software behave As an example, consider a problem discovered in Netscape Communicator's security near the end of 1999. This particular problem affected users of Netscape mail who chose to save their POP mail password using the mail client. In this case, a "feature" placed in Netscape for the convenience of users turned out to introduce a large security problem. Obviously, saving a user's mail password means storing it somewhere permanent. The question is, where and how is the information stored? Software security attackers ask this sort of question all the time. Clearly, Netscape's programmers needed to make sure that casual users (including attackers) could not read the password directly off the disk, while at the same time providing access to the password for the program that must use it in the POP protocol. In an attempt to solve this problem, Netscape's developers attempted to encrypt the password before storing it, making it unreadable (in theory, anyway). The programmers chose a "password encryption algorithm" that they believed was good enough, considering that the source code wasn't to be made available. (Of course, most of the source was eventually made available in the form of Mozilla, but the password storing code we're focusing on here did not show up in the release). Unfortunately, the algorithm that made its way into Communicator was seriously flawed. As it turns out, on Windows machines the "encrypted" password is stored in the registry. Not good. A relevant software security tip for Windows programmers is always to assume that people can read any entries you put into the registry! If you choose to store something there, make sure it is well-protected with strong cryptography. While experimenting with a simple program designed to find passwords hidden with XOR (that is, passwords XOR'ed against a simple pattern of ones and zeros), software security gurus at Cigital noticed that similar plain text versions of Netscape mail passwords stored in the registry tended to look similar to each other in encrypted form. That sort of thing is usually a sign of bad cryptography. With good cryptography, a one-character change to the password affects at least half the bits in the encrypted version. In this case, no such change was observed. From this point, systematic changes to a few hundred test passwords revealed a pattern in the way the encrypted version changed. By looking carefully for patterns, enough information was ultimately gathered to construct (or, rather, reconstruct) an algorithm that behaved in exactly the same way as Netscape Communicator's encryption routine. All this was accomplished without ever having to see any actual code. Given a copy of the encryption algorithm, it was easy to come up with a little program that decrypted actual Netscape POP passwords. The encryption algorithm used in the product was a poor one, developed in-house and apparently without the help of a cryptographer. By comparison with any algorithm an actual cryptographer would endorse, the Communicator algorithm was laughable. In this case, perhaps the folks at Netscape felt that their algorithm was good enough since it was never published and was thus "secret." That is, they were relying on security by obscurity! When it comes to cryptography, though, this is a bad idea. Good cryptographic algorithms remain good even if other people know exactly what they do, and bad algorithms will be broken, even if the algorithm is never directly published. It all comes down to math. Consider the Enigma machine built by the Allies in World War II. Cryptographers, including Alan Turing, were able to figure out everything about the German crypto algorithm simply by observing encoded messages. The Allies never saw an actual Enigma machine until after the code was completely broken (and by then, they didn't need one). Even the algorithms once considered good by World War II-era standards are considered lousy by today's. In fact, some ciphers considered unbreakable by the people who broke the German Enigma can today be easily broken using modern cryptanalysis techniques. In the end, it takes a real expert in cryptography to have any hope of designing a reasonably good algorithm. The people responsible for the Netscape algorithm were not good cryptographers. The question is why Netscape chose not to use a "real" cryptographic algorithm like DES or Blowfish. It turned out that a very similar Netscape POP password encryption algorithm had been broken more than a year before Cigital rediscovered the problem. Flaws in the original algorithm were clearly pointed out to Netscape, and in the version of the browser Cigital tested, Netscape had "fixed" the problem. The fix should have involved using a real algorithm. Instead, Netscape chose to make superficial changes to their fundamentally flawed algorithm, making it only slightly more complex. Once again, however, Netscape should not have relied on security by obscurity; instead, they should have turned to professional cryptographers. The moral of our story is simple: Some malicious hackers will get curious and begin to explore anytime they see any anomaly while using your software. Any hint at the presence of a vulnerability is enough. For example, a simple program crash can often be a sign of an exploitable vulnerability. Of course, hackers do not have to be passive about their exploration. People interested in breaking your software will prod it with inputs you might not have been expecting, hoping to see behavior that indicates a security problem. One common technique is to feed very long inputs to a program wherever input can be accepted. This sort of test will often cause a program crash. If the crash affects the machine in a particularly well-defined way, then the hacker may have found an exploitable buffer overflow condition (see Resources). Usually, an attacker does not have to look directly at your software to know that a problem exists. The output of Dr. Watson is all that is necessary (clicking on the "More Details" button in the Windows 98 program crash dialog is also sufficient). From there, an attacker can intelligently construct inputs to your program to glean more information, ultimately producing an exploit -- all this without ever having looked at your code! Serious security problems are found this way all the time. For example, a recent buffer overflow in Microsoft Outlook was uncovered in exactly this manner, and eEye found a handful of bugs in Microsoft's IIS Web server using the same technique. The take-home message here is always to be security conscious when writing code, even if you are not going to show the code to anyone. Attackers can often infer what your code does by simply examining its behavior. Reverse engineering Some hackers can read machine code, but even that is not a necessary skill. It is very easy to acquire and use reverse engineering tools that can turn standard machine code into something much easier to digest, such as assembly, or, in many cases, C source code. Disassemblers are a very mature technology. Most debugging environments, in fact, come with excellent disassemblers. A disassembler takes some machine code and translates it into the equivalent assembly. It is true that assembly loses much of the high-level information that makes C code easy to read. For example, looping constructs are usually converted to counters and jump statements in assembly -- structures that are nowhere near as easy to understand as the original code. Still, a few good hackers can read and comprehend assembly as easily as most C programmers can read and comprehend C code. For most people, understanding assembly is a much slower process than understanding C, but all it takes is time. All the information on what your program does is there for the potential attacker to see. That means that enough effort can reveal any secret you try to hide in your code. Decompilers make an attacker's life even easier than disassemblers do, because they are designed to turn machine code directly into code in some high-level language such as C or the Java language. Decompilers are not as mature a technology as disassemblers, though. They often work, but sometimes they are unable to convert constructs into high-level code, especially if the machine code was hand-written in assembly, and not some high-level language. Machine code that is targeted to high-level machines is more likely to be understood easily after it comes out the end of a reverse engineering tool. For example, programs that run on the Java Virtual Machine can often be brought back to something very similar to the original source code with a reverse engineering tool, because little information gets thrown away in the process of compiling a Java program from source. On the other hand, C programs produced by decompilers do not often look the same as the originals, because much information tends to get thrown away. If the programmer has compiled with debugging options enabled, decompilation performance can be enhanced. In general, you should always code defensively, assuming that an attacker will finagle access to your source code. Always keep in mind that looking at binary code is usually just as good as having the source in hand. We will spend more time looking at reverse engineering tools in a future column. Code obfuscation In these cases, there is no absolute way to protect your secrets from people who have access to the binaries. Keep this in mind when you are writing client software: Attackers can read your client, and modify it however they want! If you are able to use a client-server architecture, try to unload secrets onto the server, where they have a better chance of being kept secure. Sometimes, though, this isn't feasible. Netscape is not willing to save POP passwords on some central server (and people would probably get very upset if they decided to do so, since that sort of move could easily be interpreted as an invasion of privacy). In these cases, the best a programmer can do is try to raise the bar as high as possible. The general idea, called code obfuscation, is to transform the code in such a way that it becomes more difficult for attackers to read and understand. Sometimes obfuscation will break reverse-engineering tools (usually decompilers, but rarely disassemblers). One common form of obfuscation is to rename all the variables in your code with arbitrary names. This obfuscation is not very effective, though; it turns out not to raise the anti-attacker bar very high. Code obfuscation is a relatively uncharted area. Not much work has been done in identifying program transformations. Most of the transformations that have been identified are not very good -- that is, they don't provide much of a hurdle, or can be quite easily undone. A few of the existing obfuscating transformations can raise the bar significantly, but they usually need to be applied in bulk. Unfortunately, currently available tools for automatic code obfuscation do not perform these kinds of transformations. Instead, they perform only simple obfuscations that are easily overcome by a skilled attacker. We will discuss the state of the art in code obfuscation in future columns. COTS security Conclusion In general, your decision as to whether or not to provide source code should be based on business factors, not security considerations. Do not use source availability, or the lack thereof, as an excuse to convince yourself that you've been duly diligent when it comes to security.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
| About IBM | Privacy | Terms of use | Contact |