The recent furor in Kansas and Pennsylvania over the teaching of intelligent design as an alternative to Darwin's theory of natural selection (or as some less accurately describe it, the theory of evolution) has generated much heat, but little light. The debate, if one can call it that, appears to have degenerated into both sides preaching to their respective choirs. It is not the purpose of this article to get down in the mud and revisit that controversy. Rather, I have been touched by some rather embarrassing side-products of the kerfuffle.
It has become clear to me that the distinction between a "scientific theory" and a "scientific law" is quite muddled in the minds of many people who wonder: "What is required for a scientific theory to be considered true?"
I will attempt to explain why this is the wrong question, since one can never do an experiment that "proves" a theory. I will set the stage by exploring in depth the nature of scientific proof. Then I'll show how these same principles can be applied to software development in some interesting and unexpected ways.
When you search for a theory to explain something, the best case is that you find one that is perfectly general. That is, your theory works all the time, in all circumstances, with no exceptions. If you had such a theory, you could begin describing it with the phrase, "It is always true that..."
Now, despite millennia of scientific research, it turns out that we have very few theories that fit this bill. Almost all theories have a "region of applicability." For example, we know today that the classical theory of mechanics, whose rules were established by Isaac Newton in the seventeenth century, work just fine until we get down to atomic dimensions. At that point, the classical theory no longer makes accurate predictions, and we need quantum mechanics, another theory, to explain what is going on. In a similar vein, at speeds that are low compared to the speed of light, classical theory works just fine; at or near the speed of light, one needs Einstein's theory of special relativity to make predictions. Note that a theory, therefore, is only "correct" or "very approximately correct" in a certain region of experience. Outside that region, another theory is required, and that other theory may or may not encompass the region of the previously held theory. So sometimes we have theories that are "more general" than others, with the "others" being true under some limiting condition.
Of course, if one has a theory that one believes to be "general" -- that is, it applies in all cases -- one must test it. Science is essentially an experimental exercise, and testing a general theory consists of progressively pushing the boundaries, successively removing limiting conditions in an attempt to show that the theory is indeed applicable in all cases.
A scientific experiment defines a region of applicability under which it is performed. The experiment, limited to that region of applicability, concludes with a result that either agrees with the theory or disagrees with it. That situation can be summarized by the 2x2 matrix shown in Figure 1.
Figure 1: Theory and experiment
Of course, in the real world, we cannot know whether we are on the left-hand side of the matrix or the right-hand side -- we don't know whether the theory is correct or not. All we can know, by doing an experiment, is whether the experiment agrees or disagrees with the proposed theory; that is, all we can "see" is the top-half or bottom-half of the chart. Let's analyze these four possible outcomes.
In the case that the experiment agrees (shown in the top half of Figure 1), we have two outcomes which may or may not be related:
A. The theory is incorrect, or limited, but no experiment, including the one we just performed, has yet disproved it (upper left).
B. The experiment shows the theory to be correct, and the theory grows in its acceptance and possibly its applicability (upper right).
I will have more to say about these two outcomes in a moment.
In the case that the experiment disagrees (shown in the bottom half of Figure 1), we have two conflicting yet important cases to consider. The results are inconsistent with the theory. This means that either:
C. The theory is correct, and the experimenter made a mistake (lower right).
D. The experiment is correct, and the theory must be rejected, modified, or restricted (lower left).
Let's take a look at these latter two outcomes first.
Experiments that disagree
Obviously, outcome C can be dangerous, because it takes only one "contrary data point" to falsify a theory. In all cases, when the experimenter makes a mistake, we get a false result. This is why it is important to check and double-check the experimental work, repeating experiments at different labs with different scientists to see if we can "replicate" the findings. But let's turn our attention to outcome D, which has two sub-cases:
D1: The theory is generally false.
D2: This experiment shows the theory to be false outside some region of applicability.
That is, the experiment has discovered a new region in which the theory no longer works. It may still be valid in the more restricted region in which it was previously tested. That is why there are three alternatives: reject, modify, or restrict the theory to some new range.
Sometimes, the scientific community is reluctant to reject accepted doctrine based on one contrary experiment. Often theories are modified or "patched" to accommodate new data. Only after several negative experiments and multiple patches does the community start to search for a better theory that explains all the results. This was Thomas Kuhn's revelation in his book The Structure of Scientific Revolutions, wherein he introduced the concept of a "paradigm shift."
Experiments that agree
Now let's return to the top half of the chart. We have agreement and a willingness to believe the experiment; that is, we have no reason to suspect that the experimenter made an error. Does that mean that the theory is "true"?
Of course, if the theory is actually based on verifiable truth (the upper-right box), we have a consistent result. I say "verifiable" truth, because some theories can't be proved factual until some technological breakthrough allows the necessary objectivity. For example, the idea that the earth is round was a theory of the ancient Greeks,1 but modern space travel has rendered this a visible fact. So, until we no longer need to "experiment" in order to add more evidence for the theory, what we have is another "positive data point" and therefore more reason to believe that the theory is pointing toward the truth. Perhaps this experiment extends the region of applicability, in that no previous experiment confirmed the theory in this region. So while the result improves our confidence in the theory, it can never "prove" that the theory is true for all cases. This is because no experiment ever tests the theory for all cases -- only a particular set of circumstances.
What about that last box, the one in the upper left? In this case, the underlying theory is incorrect, but the experiment does not demonstrate the contradiction. What went wrong here? Most likely, the theory has not been tested in the region in which it will fail. The test was limited to the region of applicability where the theory still works, so all we have done is to "confirm" that it works in that limited region. When we go to a region where the experimental test was not applied, all bets are off. That is why theories can continue to be "experimentally verified" for many years before they are found to be generally wanting. It is because no experiment has yet tested them in the region in which they will fail. Much of science consists of "pushing the boundaries," so that the region of applicability of a theory encompasses more and more of the available space. In general, it will take decades or centuries of the combined work of many experimenters to adequately populate the space available to a theory. And sometimes it takes a really novel experiment to discover the set of circumstances in which a long-established theory is found to be wanting.
As more and more of the available space is filled, the theory takes on increased validity and morphs into something we call a law. For example, we talk about Newton's Law of Universal Gravitation, which says that two bodies attract each other in proportion to the product of their masses and inversely proportionately to the square of the distance separating them. To date, we have yet to perform an experiment that produces any variance with the theory.
Yet even here there is discussion; it has been difficult to reconcile gravitation with quantum mechanics at the theoretical level. This really doesn't vitiate universal gravitation as a "law"; it just recognizes that at really small distances, other forces become more important.
Theories should not only "explain," a posteriori, data that has been obtained experimentally. Rather, a good theory should be able to predict, a priori, what the experimental results will be. If the results obtained agree with those predicted, we have a much more compelling argument in support of the theory. Really good theories have great predictive power based on a very small number of assumptions. Somewhat weaker theories predict loosely even after many input parameters have been carefully "adjusted." Another way to say this: be wary of theories that have too many "knobs." In some sense, this is Occam's Razor in action: we tend to favor simple theories over complicated ones. And we certainly have a bias against theories that suppose mechanisms that are difficult or impossible to measure, such as the ether, or "hidden variables" in quantum mechanics.
John Walker points out that in the observational sciences, such as astronomy and cosmology, one cannot run experiments but must test theories by making predictions with them and then seeing if they match the results of observations. This is sometimes called "retrodiction." For example, the Big Bang theory makes very precise predictions about the primordial abundances of hydrogen, helium, and deuterium. Precision measurements can potentially falsify this theory even though, in a sense, the experiment was run only once thirteen-billion years ago.
So here's the moral of the story thus far:
- It is impossible to prove a theory to be correct in the general case. If we discover something to be absolutely true, it's no longer a theory.
- One negative experiment can destroy a theory or limit its region of applicability.
- A positive experiment merely adds more supporting data or extends the region of applicability.
- Experiments must be designed so that we can always figure out the range of parameters for which the theory is actually being tested.
Can we ever know everything?
The standard "reductionist" view of modern Western science is that we continue to delve deeper and deeper into nature's mysteries, and eventually we will refine our theories to the point that we can explain everything. This is the old "peeling back the onion" school of thought.
Certainly there has been amazing progress in this arena. Starting with first principles, we can compute the energy levels of the hydrogen atom with amazing precision. This is a testament both to quantum theory and our ability to do very delicate experiments to substantiate it. Moving on from there, we have shown that our theories, however esoteric, ultimately can stand experimental test and verification. The question is, can this process continue forever, so that in some far distant future, we understand everything?
In theory, nothing would seem to prevent us from getting there. But, as Yogi Berra once said, "In theory there is little difference between theory and practice. In practice, there is." Of course, we should take that last bit to mean, "there's a HUGE practical difference between theory and practice."
First of all, we have learned via quantum mechanics and chaos theory that predictability with infinite precision is just not possible. Quantum mechanics has changed our world view such that we now understand that we can predict, at the atomic level, how things will work probabilistically, but not deterministically -- that is, we can be right on average, but can never have certainty about any one event. Similarly, because we can never know initial conditions with infinite precision, chaos theory teaches that long-range predictions can be very difficult, because small changes in those initial conditions can lead to very large differences later in time. This is what makes long-range weather prediction impossible.
Here is a key observation: Every theory in physics, chemistry, geology, biology, and astronomy ultimately leads to mathematical expressions. So any limitation on our mathematical apparatus will sooner or later carry over to the sciences. We already know that in a very real sense mathematics is "incomplete." Kurt Gödel showed in 1931 that no system of mathematical logic was complete in the sense that it could be self-contained, consistent, and complete. The statement of his theorem, taken from Wikipedia, is:
For any formal theory in which basic arithmetical facts are provable, it is possible to construct an arithmetical statement which, if the theory is consistent, is true but not provable or refutable in the theory.
This is a very "deep" and important result. It says that there are statements about math that we will never be able to prove true or false within the system of logic that contains them. The interesting and fortuitous thing is that this theorem itself can be proven!
So there is already reason to believe that some knowledge is inaccessible to us.
But Gödel is not alone. We know of at least two other areas where we believe we will never be able to get the answers to certain questions.
The first area is represented by the "Halting Problem" in computer science. Wikipedia describes the problem as follows:
Given a description of a program and its initial input, determine whether the program, when executed on this input, ever halts (completes). The alternative is that it runs forever without halting.
Alan Turing proved in 1936 that a general algorithm to solve the Halting Problem for all possible inputs cannot exist. We say that the Halting Problem is undecidable over Turing machines. So, once again, there is a class of problem for which we conclusively believe that no solution or answer is possible.
The second class of problem comes from the world of combinatorics.2 The most common example is the notorious "Traveling Salesman Problem," which, at first sight, appears almost innocuous:
Given a number of cities and the costs of traveling from any city to any other city, what is the cheapest round-trip route that visits each city once and then returns to the starting city?
It turns out that this problem is hard for a small number of cities, and its complexity grows very quickly as the number of cities becomes large. In fact, beyond a number of cities that is surprisingly small, it becomes impossible in practical terms to solve the problem even on a computer, because the number of possible combinations that needs to be examined grows much too fast.3 This problem is one of many that are called "NP Complete."4 Today the best we can do on this class of problem is to find approximate solutions. To the best of our current knowledge, there are no exact solutions to problems that are NP Complete.
The point is, we have several indications from the worlds of mathematics and computer science that there are certain problems that we cannot solve and certain propositions that we will never be able to prove true or false. But is this tragic? Personally, I can live with the idea that mankind will never know everything. We still may be able to approximate more than we can prove. There may be aspects of things that turn out to be unknowable. So be it.
Implications for the real world of software
There are two areas in the world of software development that directly play off all these notions of the "limits of knowledge" and "what constitutes a proof." The first of these is the role of testing in software development, and the second is the notion of iterative development itself.
Can a program ever be proven "correct"?
No. There may be some very trivial toy programs that under very special conditions can be proven "correct" -- that is, we can mathematically prove that they will never give an incorrect result.5 In the real world of programs, however, the answer is no. At first, one is tempted to believe that it is simply a question of combinatorics. That is, if one considers all the permutations of all the paths through any non-trivial piece of software, one quickly comes to the conclusion that exhaustively testing any piece of software is an impossible task. This is analogous to the NP Complete problem. But, in fact, it is even worse than that. It turns out that proving programs correct is equivalent to solving the Halting Problem. And we know from Turing's work that that is an impossible task.
So why do we test programs? Just as we can only conclude that a theory is correct in the region of applicability that our experiments cover, we can only maintain that a program is correct for the set of tests that we have subjected it to. Just as one "negative experiment" can invalidate a theory, one "failed test" can manifest a bug in a piece of software that was previously considered faultless. So we can never certify that a piece of software is "bug free." All we can do is continue to increase our confidence in it by subjecting it to greater and more intensive sets of tests.
Much of software testing concerns itself with "corner cases" and "stupid input." It is an unfortunate fact of life that a piece of software has to stand up to all sorts of things that the designer didn't foresee -- that some joker attempts to enter characters from the Cyrillic alphabet where digits were called for, as an example. Software testing is hard, and doing it right is an important part of our profession. We're getting better at it all the time, but we also need to be very aware of its limitations.
We can only show that a program has bugs. We can never show that it has no remaining bugs. The most valuable test is the one that fails. All the others just improve our confidence, but do not "validate" the program in any way, shape, or form.
How does iterative development help?
Since we can't know everything up front, iterative development helps us get close to where we want to go. Waterfall planning and development, which is a vestige of classical project management lore, makes the somewhat arrogant and naïve assumption that you can anticipate everything that will happen up front and plan for it. Nothing could be further from the truth.
Just as we have seen that there are definite theoretical limits to what is knowable, we also recognize that there are surprises on every project. In the software world in particular, new technologies are often found less than wonderful once we begin to use them; software vendors whose components we planned to reuse go out of business; and magnificent architectures turn out to produce software that just runs too darned slow to be useful. So we have to make mid-course corrections. This is not the exception -- it is the rule.
There is an interesting parallel here with experimental science. Contrary to popular belief, science does not progress in an orderly manner at all times, as I noted regarding Thomas Kuhn and his notion of paradigm shifts. The scientific process is one of constantly confronting surprises; we perform experiments and we get negative results. Our experience clashes with our theory. So once we are sure the experiments are valid, we begin to tinker with the theory.
A good software development project managed in an iterative fashion is like a series of experiments. We choose our experiment with the objective of proving or disproving some assumption about how the final system will look. We do the experiments that focus on the highest risk elements first. For example, if a new technology is crucial to the success of the piece of software under development, we have one iteration focus on wringing it out as completely as possible. If the experiment "fails," then we know that our "theory" -- that this technology will help us achieve the end goal -- is false, and we have to "modify the theory" by taking another approach. If the experiment confirms that the technology is robust, we continue down that path and test some other part of the theory on the next iteration.
In so doing, we make course corrections as necessary. We cope with the "surprises" in an orderly way. We don't expect every experiment to be successful -- quite the contrary. What is important in iterative development is to smoke out the parts of our "theory" that are false as quickly as possible and to use the "surprises" to effectively mold a product that works.
The polar opposite of the iterative development approach is to create a detailed, up-front plan before the project work begins, then stick to that plan at all costs -- as if the theory never needs testing. The problem is, the single test that occurs at the end of that process almost always results in disappointment for the team members, the project manager, and the user.
We have taken a long and winding path, so it is worthwhile to review:
- All knowledge is provisional and subject to experimental test.
- Experiments can only find defects in theories and never "prove" them correct.
- Theories become "laws" through an accumulation of evidence.
- There are known limitations to what is provable and probably to what is computable or knowable.
- The analog to scientific experiment is software testing.
- Programs can never be proven to be bug-free.
- The best test is the one that fails, just as the best experiment is the one that invalidates a theory.
- Iterative development acknowledges lack of perfect knowledge up front.
- Iterations are like scientific experiments.
- We use iteration results to reduce risk by indicating needed course corrections.
Software development is not an exact science, but there are a lot of parallels between good software development and good science. In particular, it is useful to borrow from math and science, which have a few millennia on computer software in terms of maturity, so that we don't attempt to do the impossible. We have learned that there are both practical and theoretical limits to knowledge, and adding massive computing power to the mix doesn't change that.
1 It is interesting to note that the first calculation of the diameter of the earth was made following a nice experiment by Eratosthenes, a contemporary of Archimedes in the second century B.C. Of course this calculation was based on the notion of a spherical earth, not a flat one. Eratosthenes' result agrees quite nicely with the actual value. This is same Eratosthenes who invented the "sieve" for determining whether a number is prime or not.
2 From the Wikipedia: "Combinatorics is a branch of mathematics that studies collections (usually finite) of objects that satisfy specified criteria. In particular, it is concerned with "counting" the objects in those collections. ... (It) is as much about problem solving as theory building, though it has developed powerful theoretical methods, especially since the later twentieth century. Much of combinatorics is about graphs, to whose study all types of combinatorics can contribute."
3 Some numbers: Direct algorithms can handle about 50 cities. More sophisticated algorithms work up to about 200. In 2001, a solution was found for a collection of about 15,000 cities in Germany, using an impressive array of large computers. The current record appears to be that of almost 25,000 cities in all of Sweden, accomplished in 2004, once again with the dedication of an incredible amount of processing power.
4 "NP" stands for "Non-deterministic Polynomial time."
5 Some finite state machines fall into this category, for example.

Recently appointed CEO of Ravenflow, an Emeryville, California-based company delivering precision requirements validation for software developers, Joe Marasco served as senior vice president and business unit manager for Rational Software prior to the company's acquisition by IBM. He held numerous positions of responsibility in marketing, development, and the field sales organization, overseeing initiatives for Apex and Visual Modeler for Microsoft Visual Studio. After retiring from Rational in 2003, he published The Software Development Edge, a collection of his essays on software project management originally published in The Rational Edge. He holds a Ph.D. in physics from the University of Geneva, Switzerland, and an M.B.A. from the University of California, Irvine.




