Explanatory power is a bit of a loaded term. I believe that we can come to a good understanding of what it is and how it related to machine learning by comparing and contrasting linear regression with neural nets.
The IBM Analytics Education Series has a good introductory analytics presentation that includes brief descriptions of neural nets and linear regression. It's a good video worth your viewing, though it does make one point with which I don't agree.
The speaker says that a challenge with neural nets in business applications is that they are black box, meaning that you can understand the inputs and the outputs but not really how it is deriving the outputs. Later, the speaker says that linear regression is a preferred technique because it has a very strong predictive and explanatory power.
It's not really true that linear regression has more explanatory power than neural nets. Rather, it is easier to understand the problems and the answers that can be solved by linear regression. By comparison, neural nets tend to be used to provide cognitive computing power to harder problems than linear regression can solve.
To put this another way, when you use linear regression, you actually begin by assuming linearity of the relation you want to predict. As the speaker points out, you can also make a nonlinear assumption, and you can accommodate this using a data transformation, for example. But the high order bit is that you are asked to assume the data relationship, and that assumption is what is giving you the illusion of explanatory power. You can explain that the data follows a line, but this is due to your own assumption. Note that an important aspect of completing a linear regression model is determining the R^{2} or goodness of fit of the model. This is the part where you make sure that your assumption of linearity is valid. And if the assumption is invalid, then the model has no predictive value, so it does not matter that you can explain how it operates.
Under the interpretation that explanatory power is akin to predictive power, it turns out that neural nets have greater predictive power because they can produce results for a wider array of applications than linear regression can. There a neat table that relates the cognitive power of a neural net to the number of hidden layers. From the table, you can see that when a relationship actually is linear, a neural net can solve it without even using any hidden layers of neurons. When one or two hidden layers of neurons is present, neural nets transcend the capabilities of linear regression, in part because they do not require you to make any assumption about what the data relationship actually is.
And that's where the confusions comes in. The linear regression model requires you to assume linearity and so you know at least what geometric shape the relationship looks like. The neural net requires no such assumption, but nor does the trained neural network give you any hint at what the relationship is. The lack of knowing the relationship is confused for having less explanatory power.
But if you look at this a bit more abstractly, the trained linear regression model has the same exact problem of not providing any additional insight. A neural net is really just a pile of numbers giving constant weights to the neural connections that can convert inputs to outputs. Similarly, a linear regression model is just a pile of numbers that give constant weights to inputs to be linearly combined into an output. Sure you know the data relationship, but that's because you assumed it. The actual linear regression model gives you no insight into why one dimension has a large slope constant where another has a small slope.
An analogy I like to use is that the value of the neural net is not diminished by our inability to explain how it is that the little gray cells which implement our personal neural nets can produce the cognitive results that they do, and who among us would prefer to have cognitive powers defined by linear regression instead?
In terms of explanatory power, our biological neural nets perform an additional key function that we have not hitherto been able to achieve with artificial neural nets. We are able to construct additional information in the output that reveals causal relationships, or insights into the reasons for the phenomena it predicts. Put simply: we say why something is true. We provide a rationale. This is an aspect of explanatory power that, when achieved, dramatically increases the value and utility of any cognitive analytic. Theorem provers and Prolog programs have been able to do this for the applications to which they apply. In the area of unstructured information processing and data mining, you can see a demo of this concept in Watson Paths.
Smarter Everyone, Smarter Everything, Smarter Everywhere
Explanatory Power in Machine Learning 
But Neural Networks Learn
As an interesting possible counterexample to my last blog about MLR models not understanding the knowledge they learn, consider the neural network. Our brains are neural networks, and we are capable of learning at all levels of Bloom's Taxonomy, not just the knowledge level. Shouldn't artificial neural networks be able to achieve the same things? 
Human Learning vs. Machine Learning
John M. Boyer
Tags:
socbiz
analytics
sociallearning
watson
ibm
cognitivecomputing
ibmwatson
smarterworkforce
ai
1 Comment
7,141 Views
Ever since my first blog entry in this recent series on artificial intelligence, I've been highlighting the lesser, calculational nature of machine intelligence and learning as well as the valuable role it nonetheless can play in driving more effective human understanding and decisions. I've been doing this by articulating mainly what machines do, as that is the primary interest of mine and most who would read a developerWorks blog. Still, our interests will be served by taking an entry to discuss human learning as a counterpoint or contrast. While we're on the subject of human learning and Bloom's Taxonomy, it makes sense to digress for a bit and mention the IBM Social Learning product. This is a SaaS educational platform intended to help enterprises achieve a Smarter Workforce. A few reasons for the digression are
The IBM Social Learning product has a very nice feature that enables educational administrators to implement Bloom's Taxonomy in their learning materials. A component of the product is the Kenexa LCMS, or learning content management system, which includes various subcomponents like a course designer and a metadata dictionary. The educational administrator can add any metadata tag, such as "Learning Goal", and any tag values, such as "Basic Knowledge", "Comprehension", "Application", etc. Once this is done, the educational administrator can use the metadata tag values to classify any learning item in the LCMS accord to Learning Goal. Once these classified learning materials are published, learners can use the "Learning Goal" as a new faceted search criterion in the platform's learning library. A learner would be able to isolate and focus on "knowledge" level learning in a subject area before proceeding to comprehension and then application, for example. This will enable learners to effectively use the natural way in which their learning blooms, i.e. Bloom's Taxonomy. 
Machine Learning versus Machine Intelligence
Machine learning today is every bit as calculated, as simulated, as is machine intelligence. It is easier to use machine intelligence to highlight how much greater human cognition is, which is why I've been using a machine intelligence algorithm over the last several entries. However, the conclusion drawn so far is that, while machine intelligence is only simulated, it is still quite effective and valuable as an aid to human insight and decision making. Machine learning offers another leap forward in the effectiveness and hence value of machine intelligence, so let's see what that is.
A line has a formula that looks like this: Y=C_{1}X_{1}+C_{0}, where C_{1} is a constant that governs the slant (slope) of the line, and C_{0} is a constant that governs how high or low the line is (C_{0} happens to be the point where the line meets the Yaxis, and the line slopes up or down from there). If we had more variables, then MLR would just compute more constants to go with each of them. For example, if we wanted to use two variable predictors of a dependent variable, then we'd be using MLR to create a line of the form Y=C_{2}X_{2}+C_{1}X_{1}+C_{0}. 
Alphabeta pruning optimizes the competitive prescriptive analytic algorithm
In the interest of space last time, I had to leave out an advanced topic on optimizing a "next best action" algorithm. Again, you can look at the full source we're discussing by just using the web browser's View Source on this page.
The optimization is known as alphabeta pruning. In the code snippet below, you see that we break the jloop that is scoring the response moves of a given move based on some condition involving the variables alpha and beta. Why does it make sense to stop looking at the competitive response moves for a given move? To see why, I've added the function declaration so we can discuss where the alpha value comes from and what it means.
KalahGame.scoreMove = function(board, player, move, alpha, lookahead) {
...
this.copyBoard(newBoard, board);
extraTurn = this.makeMove(newBoard, player, move);
moveScore = this.evalBoard(newBoard, player)  this.evalBoard(board, player);
...
player ^= 3;
beta = 10000;
for (j = 1; j <= 6; j++) {
if (newBoard[player][j] > 0) {
lookaheadScore = this.scoreMove(newBoard, player, j, beta, lookahead);
if (beta < lookaheadScore) {
beta = lookaheadScore;
}
if (moveScore  beta < alpha)
break;
}
}
moveScore = beta;
...
return moveScore;
};
Understanding alphabeta pruning requires you to take a more global view of the recursion that is doing the evaluation. The alpha values passed into scoreMove() are the beta values from the calling level of the Minimax algorithm. It will help to keep at least the player's moves and the opponents responses in mind as we go through this.
Let's say that scoreMove() has been called to score a player's K^{th} move. Beforehand, moves 1 to K1 will have been fully explored by depthfirst recursion, including the opponents responses, the player's counterresponses, and so on. The alpha value received by scoreMove() for move K reflects the best fully explored "net" score for the player on moves 1 to K1. Within scoreMove(), we first compute the raw benefit of the new move K, storing the result in moveScore. Now comes the alphabeta pruning trick. The jloop successively explores each opponent response move for the player's move K, and clearly the beta value takes on the value of the highest scoring response move that the opponent can make. The final score for move K is the raw benefit to the player of move K minus the benefit beta that the opponent can realize in response.
Thoughtprovoking question: Do we really need to know the absolute best move that the opponent can make in response to the player's move K? Or do we just need to find an opponent move that is good enough that, when subtracted from the raw benefit of move K, proves that the player would be better off choosing the earlier move associated with the alpha value? Of course, the answer is that we only need a good enough opponent move, and this is why we break the jloop when we find that move. If we were to continue the jloop, all we do is unnecessary work that might (or might not) find an even better opponent response move that would make move K look like an even worse decision for the player. But there is no need to do this extra work. Once the expression "(moveScorebeta < alpha)" becomes true, we have proven that move K is less beneficial than one of the moves 1 to K1.
From a practical standpoint, this optimization averages better than double the runtime performance of the "whatif" logic. Who doesn't want double, right? Well, this "whatif" analysis is a combinatorial explosion of analysis; to put that in perspective, you get less than one extra move of lookahead due to this optimization. Yet despite this dash of cold water about how much deeper you could take the "what if" logic due to alphabeta pruning, it remains true that, for a given level of explorative depth, everybody wants the result twice as fast or more, so alphabeta pruning is very handy.
