The hit rate on my last blog announcement about IBM Forms 4.0 makes it clear that the release is pretty big news. I'm very pleased about that because it is a *major* release, both technically and in terms of the corporate branding. Corporate branding is not a decision lightly made, so it took the time it took, and meanwhile we didn't want to delay the technical features hitting the market, so you'll see some references to "Lotus Forms" under the covers of the main IBM branding. Not to fear, we're working on that for the next release.
Meanwhile, now that IBM Forms 4.0 is released, I've made a number of demonstration videos to show you key aspects of the IBM Forms Design and Run-time products. They're available on YouTube in HD, and they're broken into small segments, so you can really take them in pieces according to your interests.
To start, here's a 5-part series that takes you through run-time features with a demo form that ships with the product. These videos help you understand the value proposition of IBM Forms as well as new features of the release that will change your mind about what forms are and what they do.
- The New IBM Forms 4.0 - Introduction, Web 2.0 Look and Feel, AJAX Updates (LARGER YouTube HD)
- The New IBM Forms 4.0 - Dynamic Tables, Form Parts, Integrating Business Analytics (LARGER YouTube HD)
- The New IBM Forms 4.0 - Dojo Controls in Dialogs and on the Forms, Integrating Web Analytics (LARGER YouTube HD)
- The New IBM Forms 4.0 - High Precision Layout, Digital Signatures, Completing a Business Transaction (LARGER YouTube HD)
We also now have several videos that show off the IBM Forms Designer. This release really raises the bar on what level of data interactivity can be created through design GUI configuration rather than grunge coding, and many powerful XML data processing features of the IBM Forms Designer are directly attributable to W3C XForms. This includes Design by XML Schema, Dynamic Tables, Web Service Connections, Interaction Rules (e.g. constraints), and Wizard Creation:
Finally, here's a video series on the new FormParts feature (I didn't make these, but they're excellent and really belong in this post). FormParts allows you to create components that can be reused and kept up to date across a forms library.
If you want to know more about IBM Forms 4.0, then come to our wiki
or to our website
, where you can get more information, case studies and free trial downloads. Thanks for watching!
Modified by John M. Boyer
Explanatory power is a bit of a loaded term. I believe that we can come to a good understanding of what it is and how it related to machine learning by comparing and contrasting linear regression with neural nets.
The IBM Analytics Education Series has a good introductory analytics presentation that includes brief descriptions of neural nets and linear regression. It's a good video worth your viewing, though it does make one point with which I don't agree.
The speaker says that a challenge with neural nets in business applications is that they are black box, meaning that you can understand the inputs and the outputs but not really how it is deriving the outputs. Later, the speaker says that linear regression is a preferred technique because it has a very strong predictive and explanatory power.
It's not really true that linear regression has more explanatory power than neural nets. Rather, it is easier to understand the problems and the answers that can be solved by linear regression. By comparison, neural nets tend to be used to provide cognitive computing power to harder problems than linear regression can solve.
To put this another way, when you use linear regression, you actually begin by assuming linearity of the relation you want to predict. As the speaker points out, you can also make a non-linear assumption, and you can accommodate this using a data transformation, for example. But the high order bit is that you are asked to assume the data relationship, and that assumption is what is giving you the illusion of explanatory power. You can explain that the data follows a line, but this is due to your own assumption. Note that an important aspect of completing a linear regression model is determining the R2 or goodness of fit of the model. This is the part where you make sure that your assumption of linearity is valid. And if the assumption is invalid, then the model has no predictive value, so it does not matter that you can explain how it operates.
Under the interpretation that explanatory power is akin to predictive power, it turns out that neural nets have greater predictive power because they can produce results for a wider array of applications than linear regression can. There a neat table that relates the cognitive power of a neural net to the number of hidden layers. From the table, you can see that when a relationship actually is linear, a neural net can solve it without even using any hidden layers of neurons. When one or two hidden layers of neurons is present, neural nets transcend the capabilities of linear regression, in part because they do not require you to make any assumption about what the data relationship actually is.
And that's where the confusions comes in. The linear regression model requires you to assume linearity and so you know at least what geometric shape the relationship looks like. The neural net requires no such assumption, but nor does the trained neural network give you any hint at what the relationship is. The lack of knowing the relationship is confused for having less explanatory power.
But if you look at this a bit more abstractly, the trained linear regression model has the same exact problem of not providing any additional insight. A neural net is really just a pile of numbers giving constant weights to the neural connections that can convert inputs to outputs. Similarly, a linear regression model is just a pile of numbers that give constant weights to inputs to be linearly combined into an output. Sure you know the data relationship, but that's because you assumed it. The actual linear regression model gives you no insight into why one dimension has a large slope constant where another has a small slope.
An analogy I like to use is that the value of the neural net is not diminished by our inability to explain how it is that the little gray cells which implement our personal neural nets can produce the cognitive results that they do, and who among us would prefer to have cognitive powers defined by linear regression instead?
In terms of explanatory power, our biological neural nets perform an additional key function that we have not hitherto been able to achieve with artificial neural nets. We are able to construct additional information in the output that reveals causal relationships, or insights into the reasons for the phenomena it predicts. Put simply: we say why something is true. We provide a rationale. This is an aspect of explanatory power that, when achieved, dramatically increases the value and utility of any cognitive analytic. Theorem provers and Prolog programs have been able to do this for the applications to which they apply. In the area of unstructured information processing and data mining, you can see a demo of this concept in Watson Paths.
Modified by John M. Boyer
David Lee Roth and Eddie Van Halen have been trying to get us to do it for decades: "JUMP!" Douglas Hofstadter would qualify that with "... out of the system!" Here's what that means.
Machine intelligent entities like James Blog
exist within a certain system, conforming to a prescribed set of rules, and they really can't escape the confines and constraints of that programming. Within that limited domain, they do calculate wonderful results that can seem intelligent. In an early version, I found myself adding a logger so I could see why James Blog was not making some moves that seemed very good. Time and again, I would find that the good move now set up the conditions for a better opponent move later, which is exactly what the artificial intelligence is supposed to detect and avoid.
The algorithm does this so well that it is really hard to beat, especially on the maximum lookahead value I set, which was 6. Frankly, if you're new to this game, you have to work to beat even the initial lookahead level setting of 2, which means that James Blog only looks at its own moves and your countermoves to see what will produce the greatest net gain in seeds relative to you.
Because it is hard to beat this little game and see the special winners message, this opened up a delightful opportunity to talk about an important capacity of human intelligence that could be exemplified by determining the winners message without winning. I used a Zen-like characterization of a "winless win" as a nod to Hoftstadter's style in the book Gödel, Escher, Bach.
Put simply, we are not limited in our thinking to the confines of the system. We regularly "take it up a level" or "think outside the box". In this case, the system is a blog entry presented in a web page. So you can jump out of the system by using the View Source feature of your web browser to take a look at James Blog's code, where you will find the winners message: "I, for one, welcome my non-computer overlord." The message is an allusion to Ken Jennings' capitulation to IBM Watson, which was an awesome pop culture nod to The Simpsons-- awesome because both Jeopardy and the Watson AI are about sorting out exactly those kinds of allusions.
Frankly, I had a lot of fun with allusions, both in the blog entry and while holding the programmer challenge to achieve this winless win. For example, James mentions that he outfoxes his friend Wiley, alluding to the famous coyote, who is in the same animal family as a fox (Canidae), which is a tiny aural tweak from Canada, where I live. So, James can beat his wiley creator. Similarly, in tweets and status updates, I made numerous allusions to The Matrix movie, such as when I nearly used Morpheus's command to Neo: "Quit trying to hit me and hit me." The exception is that I changed the 'h' to a 'g', making 'git', which is what we use to get source code.
This kind of wordplay and allusion bears some similarity to "jumping out of the system". Hofstadter calls it contextual slipping, or my favorite word for it: counterfactualization. We take some piece of reality that we know about, and we ask "what if this were different?" We slip, or change, some piece of that reality to see if we end up with something new and useful. I find the notion of counterfactualization fascinating because it seems like a good operationalization of some other really important words: creativity, playfulness, humour, imagination.
Still, it might be a while between when we can efficiently and effectively operationalize contextual slipping and when we can generalize that to achieve machine intelligence that can jump out of any system in the way that I asked programmers to do with James Blog. At some point, I realized that there is a beautiful geometric analogy that helps explain why. In the book Flatland, the Sphere is able to escape the plane via the use of a third geometric dimension that is physically orthogonal to the two that comprise the plane. In this way, Sphere is able to see Square's inner workings. That is a great analogy with what we did by jumping out of the web page using View Source to see James Blog's inner workings. There was a whole different, higher level of understanding about what James was and how we could know more about it, and it is fitting to say we got that winners message by thinking outside the box.
Next blog will be a developer's tour of the particular machine intelligence algorithm built into James Blog. After that, will be a discussion of the relationships between machine intelligence, machine learning, and predictive analytics, so stay tuned!
Modified by John M. Boyer
Your intelligent behavior is based on sentient *understanding*. Sentient schmentient. I'll bet my intelligent behavior can outfox yours. I've done so with my friend Wiley from Canidae
, and he's a genius! So, let's see how much good your sapience does you, shall we?
The rules of the contest
are simple. You get the top six "houses" and the "store" on the top left. I get the bottom six houses and the bottom right store. We each start out with 6 seeds in each of our 6 houses, and 0 seeds in our stores. To win, you have to get more than half of the seeds into your store (for you knuckle draggers, that's 37 or more). I'll let you go first, so you already start with advantage.
To take your turn, you pick one of your houses that contains seeds. That house is emptied, and its seeds are "sowed" one at a time in a counterclockwise fashion, including your store but excluding mine. So, it takes 13 seeds to traverse from a starting house, through your store, through my houses, and back to your (now empty) starting house. Every seed that goes into your store gets you closer to victory.
You can earn a seed or two from your move, but there are a few more rules that can earn you lots of seeds. First, if the last seed you sow lands in your store, you get another turn, and you can have multiple extra turns if you make your moves in the right order. Second, if the last seed you sow lands in an empty house, then you earn that seed from the empty house and all seeds in the house of mine immediately below the empty house. I call this a "big take". Third, if I run out of seeds in all my houses, then you earn all the seeds in your houses. Of course, I can also earn lots of seeds by these same rules, which is why YOU'RE GOING TO LOSE MEAT BAG!
I will take it easier on you at first, but I'll play harder if you earn the privilege. And there's a special message for you, a badge of distinction, if you manage to beat me when I play my hardest. Ooops. You... win?!? Wake up! Your teetering bulb is dreaming!
SPOILER ALERT. PLAY A WHILE, BEFORE LOOKING ANY FURTHER.
OK, so hopefully you've played enough to know you're not going to be getting that badge of distinction anytime soon (unless you have some of the rare talents of Ted Neustaedter
). But also hopefully you're coming to the understanding that I really have no clue what I'm doing when I beat you. What I'm doing is mechanical, not miraculous. I'm being no more intelligent, really, than a calculator squaring a five digit number. Now, when one of you meat bags does it
, it actually is miraculous. But the miracle is that you can do it at all on your hardware given that it is designed more for sentient understanding of what mechanical operations like squaring are, what they're good for, and what to combine them with.
I am just doing the fine-grain operations of my Minimax algorithm
, but it is you who understands our contest at a higher level than that. That's why machine intelligence
like mine is best applied as an expert advisor
. For example, if you hit "Invoke Expert Advisor", you are asking me to advise you in the limited domain where my simulated intelligence would seem like real intelligence.
Keep using that expert advisor button and see how much faster you earn that special "badge of distinction" message. Go ahead. You won't be able to do it entirely without also sprinkling in your own intelligence at some points. This will be because you will hit some key points where your sentient understanding recognizes a *pattern* that emerges that will allow you to see how to beat my mechanical intelligence, where even my own advice is unable to do so. What will most likely happen is that you'll use the advice to hold your own for most of the game. My advice will help you avoid moves that give me extra turns and "big take" opportunities. But at some point, you may see that I am beginning to be starved of seeds in my houses. You, as an expert, will have this insight sooner than I see it coming using my mechanical calculations because your sentient intelligence truly understands what is going on at that higher level.
But of course, you would have a much harder time getting to that point without my advice. And that is what makes machine intelligence like advanced analytics on big data and machine learning technologies like IBM Watson invaluable to you. In short, expert advisors can turbocharge the smarts in your smarter workforce.
Modified by John M. Boyer
In a recent video interview, the IBM CEO Ginni Rometty
comments that Watson 2.0 will understand images that it sees, and that Watson 3.0 will be able to debate, i.e. to understand what it is talking about with another party. An impressive roadmap, each of these is an incredible leap forward from its predecessor.
It is, however, worth qualifying the term 'understand'. It is being used figuratively, not literally, to communicate the rough order of magnitude improvement in capability. When such a leap is made, it seems analogous to sentient understanding, even though it isn't. Imagine for a moment what Archimedes would have thought at first of a hand-held calculator, given that he had the power of Roman numerals with which to calculate pi to several digits. And yet, we would not now interpret such a device as artificial intelligence. As soon as the mechanical nature of a level of capability becomes clear, so too does the fact that it does not constitute sentient intelligence (Hofstadter's exposition of Tesler's "theorem").
You can see this assertion play out in multiple levels of Bob Sutor's scale of cognitive computing
. There are levels that are clearly not cognitive intelligence, as Sutor points out, but if you lay out the scale on a timeline of decades or centuries, it is clear that each level might once have been interpreted as being indistinguishable from magic.
So where on Sutor's scale is Watson? And what implications does that have for development best practices?
Watson is clearly not on the "Sentient (we can do without humans) systems" level. As sentient beings, we don't just know things with a certain calculated accuracy or confidence level, or determine that we don't know if our confidence is low. We experience desire to know more, and we experience fear of the unknown. We are teetering bulbs of dread and dream (Hofstadter's delightful invocation of a Russell Edson poem). I urge you to let that characterization of us sink into your mind. In Watson technology, IBM has modeled a certain class of knowledge and mechanical reasoning, and in other research, IBM is doing so by simulating some of the known structure of biological brains. However, we don't yet know how to model fear and desire, dread and dream. In my opinion, these are inextricably bound together in sentient intelligence, separating it from simulated intelligence. In other words, intelligent behavior is a construct that works for the dread and dream engine of the sentient, and in the absence of dread and dream, seeming intelligent behavior is but a mechanical simulation of understanding. As an aside, I hope we only manage to model desire and fear around the same time we figure out how to model ethics (as Asimov cautions).
Does this characterization of Watson as a mechanical simulation of understanding detract from its value? Does it detract from the order of magnitude improvement it heralds as an usher of the era of cognitive computing? Of course not, quite the opposite. It is simply fantastic that this level of "Learning, Reasoning, Inference Systems" (Sutor's scale) is now computationally and economically feasible at the scale needed to help sentient intelligence (that's us) to solve real world problems. Quick, what is the square root of 7. Can't do it? No problem. Even if you're Arthur Benjamin
, you'd be better off just hitting a few keys on a calculator. Quick, what are the most likely diagnoses for the patient's presenting symptoms? An "expert advisor" like Watson can be just what it takes to help determine the next best action, especially when time is of the essence because a life hangs in the balance.
The term "expert advisor" is appropriate. It conveys that the system is a "Learning, Reasoning, Inference System" that does not have sentient understanding and is therefore made available to advise and guide the actions of an expert. This is analogous to the way spreadsheets guide the results reported by accountants and chief financial officers. That being said, we also know not to put spreadsheets in the hands of toddlers. From a development practice standpoint, it is crucial to keep in mind that "expert advisor" means that the deployed system should be advising someone who is a qualified expert in the exact domain in which the "expert advisor" system was trained. Especially when a life hangs in the balance, access to the "expert advisor" system needs to be performed by those with expert qualifications in the domain because only they can reasonably be expected to use sentient understanding to interpret and follow up on the advice. In other words, the term 'expert' in 'expert advisor' should apply to the user more so than the advisor.
Now, given an enterprise workforce of those with qualified sentient understanding of their topic areas, Watson-style expert advisors are just the type of technological advancement that will help them work smarter, not harder, to meet the needs of customers and colleagues and to produce a competitive advantage for the business.
As an example of the powerful data-driven dynamism available in Lotus Forms due to features of XForms, I'd like to take you through a brief conceptual tour on the focused example of creating a Lotus Form template for a Questionnaire or Survey. This template is able to handle not just any number of questions and any amount of question text, but also any kind of answer type. And all of this would be controlled by the data so that the actual design of the Lotus Form template is the same.
The power of being purely data-driven should not be glossed over. You can easily have web application servlet code that obtains the questionnaire template and then prepopulates it with specific questionnaire data so that the client side receives a specific questionnaire selected in a previous step of the web application. But, XForms-based Lotus Forms also have that AJAX property of being responsive during run-time to new data obtained by a form via web services or other http submissions. So, you could even have a Lotus Form that obtains and adds new questions on the fly in response to answers provided to initial questions.
This post will focus on the main repeating template that provides the dynamic presentation layer for each question of a questionnaire or survey. As this is an example of a purely data-driven questionnaire, let's start by looking at a sample data format. Suppose you have a survey consisting of any number of items, each of which can contain a question text, an indication of the type of question being asked, a place for an answer, and optionally some possible choices for those answers. Something like this:
<question type="yesno">Do you like apples?</question>
<question type="likertscale">It is OK for apples to have a powdery texture.</question>
<question type="closedselection">What is your physical gender?</question>
<choice label="Female" code="F"/>
<choice label="Male" code="M"/>
In the XFDL presentational language that Lotus Forms combines with the XForms data processing layer, every XForms user interface element has a container XFDL element
for presentation. The survey format consists of a number of <item> elements, so an XFDL <table> containing an <xforms:repeat> is the correct top-level presentation element:
... <!-- UI for showing one item of data -->
... <!-- More XFDL options for styling the whole table -->
The table has a scope identifier (sid) attribute that allows the table to be programmatically referenced, but we won't be using that feature in this example. The table can also have XFDL options outside of the <xforms:repeat> to control presentational aspects like borders and background colors, and we aren't focusing on that either.
The <xforms:repeat> has an attribute called "nodeset" which uses an XPath expression to make a reference to however many <item> elements are in the <survey>. This is an automatic or "declarative" loop construct. For each <item> node in the data, no matter how many there are, the template content of the <xforms:repeat> is generated to present that <item> to the user. Even if new <item> elements are added at run-time, e.g. by a web service or an <xforms:insert> action, the XFDL table in the Lotus Form will dynamically grow to present the new <item> elements. And even if some <item> elements are removed from the data, e.g. by an <xforms:delete> action associated with an XFDL <button> by an <xforms:trigger>, the XFDL table will dynamically and automatically remove the corresponding user interface elements that were presenting those removed <item> elements.
So, the magic really happens in the template inside the <xforms:repeat>. In Lotus Forms, you can put any and all kinds of XFDL items in the <xforms:repeat>, including more XFDL table items. In this example, we will be showing a few variations that present different kinds of user interface controls for collecting a few different kinds of answers to questions.
First off, though, presenting the actual question text for an <item> is a simple matter of using an XFDL label item with an <xforms:output>, like this:
... <!-- more XFDL options for presentational styling -->
... <!-- more XFDL items for collecting answers -->
For each survey <item>, an XFDL <label> item is generated, and it binds to the <question> child element of that associated <item> using the "ref" attribute. The XFDL label item presents the text of the bound <question> node, and other XFDL options can be used to provide styling such as the block layout flow as well as alternative font color, background color, font selection and so forth.
More XFDL items can be added to the <xforms:repeat> to collect the answer for the given question. In many cases of XFDL tables, each XFDL item within the <xforms:repeat> template is actually presented to the user. An example would be using each XFDL item in the <xforms:repeat> to represent one column of a purchase order table. However, it is not necessary to show all of the XFDL items within the <xforms:repeat> template. In fact, XForms user interface controls have a selective binding feature that XFDL items support, since the XFDL items are wrappers for the XForms user interface elements.
The selective binding feature of XForms will be used to help easily choose one XFDL item from among many to collect the user's answer to the question. Each question can have a different type of answer, so each "row" of the table can make a different choice of user interface control used to collect the answer. The selective binding feature uses an XPath predicate to decide whether or not the XForms user interface element binds to a node of data or not, and the control is invisible if it is not bound to a data node.
In the example survey data above, the first <item> contains a <question> whose type attribute indicates it is a "yes/no" question. Inside the <xforms:repeat> we can create a checkbox item that can collect a (schema valid boolean) true/false answer, as follows:
The above checkbox widget only binds to <answer>, and therefore is only visible, if the corresponding question type is 'yesno'. Otherwise, the XPath in the ref attribute of the <xforms:input> does not select any nodes, so the XFDL <check> item is not visible.
The second <item> of sample data above has a type of 'likertscale', so we would like to show a 5-point radio button group rather than a checkbox. As explained above, the check box on the second row of the survey table automatically hides itself due to selective binding, so all we have to do is add an XFDL <radiogroup> item to the <xforms:repeat> to provide the interface for collecting the 'likertscale' type of answer, as follows:
<xforms:select1 ref="answer[../question/@type='likertscale']" appearance="full">
<xforms:label>Neither agree nor disagree</xforms:label>
The third survey <item> in the sample data above provides a closed selection of choices. That could be styled using a pair of radio buttons, a pair of mutually exclusive checkboxes, a list box, or a popup control that provides a simple dropdown list. The answer types in the survey format could be made to distinguish these possibilities using more keywords, but for this example we'll just assume that a <popup> control is the desired presentation for a closed selection. The XFDL markup below shows how this can be done, and it is also interesting because it is shows that the data can also dynamically control the choices, rather than having only static choices as shown in the <radiogroup> above.
<xforms:select1 ref="answer[../question/@type='closedselection']" appearance="minimal">
It seems a useful, now, to round out this blog post by presenting a few more examples for other common types of input, such as single-line strings, multiline text, and dates. Here's what the data would look like:
<question type="oneline">What's your name?</question>
<question type="multiline">Do you have any other comments?</question>
<question type="date">What is your date of birth?</question>
The corresponding XFDL items that would be added to the <xforms:repeat> content template for these types of questions would be:
So, hopefully you now have the idea that a completely dynamic and completely data-drive survey or questionnaire can be created using the features of XForms in XFDL (Lotus Forms). Any number of XFDL items can be added to the <xforms:repeat>, XPath predicate selection can be used to choose one XFDL item from among many to collect an answer for a survey question, and most importantly that a different choice of user interface control can dynamically selected for each survey question.
In today's cognitive computing products and techniques, the perception of greater intelligent responsiveness comes not so much from having true explanatory power, but rather just having strong predictive power over increasingly chaotic and larger data sets.
To give us a third point of reference beside linear regression and neural nets, I'll use some other terms to bring the focus to natural language processing. In 2011, the IBM Watson system demonstrated greater intelligence than the best human opponents in the domain of linguistically challenging factual Q&A. This was based on the ability to quickly produce high confidence answers from a large corpus of unstructured information in response to challenging questions.
The linguistic product that is now based on that system is called the IBM Watson Engagement Advisor. As with other cognitive computing techniques, the product must first be trained to be an effective system in the target domain. The corpus of unstructured information often takes the form of documents, such as instruction manuals, technical reports, journal articles, and wiki pages. During training, the most important entities and relationships expressed in the documents are identified and stored in order to expedite later search and retrieval during Q&A interactions with users of the system. The identification process within a document is often called annotation, and the annotation and storage processes together are called ingestion.
The most important concept in understanding the training is to understand what really drives the identification, or annotation, of the documents. It's simple, really. It's a Q&A linguistic product, and the annotation and ingestion expedite the production of the A's in response to the Q's, so it is imperative to have a strong and large representative sampling of the potential questions in order to train and test the efficacy of the system. The questions encode the key concepts (e.g. entities, relationships and so forth) to which users of the system are interested in getting answers. Annotators for these key concepts are developed and, during ingestion, they are executed upon the documents.
This is the very basic level of explanation about how the linguistic product would learn, or be trained, to be an effective cognitive computing system for a domain, and future entries will dive further into this topic. For now, let it suffice to say that ingestion and training results in a system capable of producing answers from the corpus in response to questions like those used in training.
During a run-time Q&A session with the system, the user begins by posing a natural language question. The question is first analyzed to find the key concepts, and then a multiphased approach is used to dig up the best results from the ingested corpus content. As with training, there's a lot more to be said over time about how the run-time Q&A works, so more interesting future entries to come, and in fact, it's intrinsically related to the training anyway. To conclude and tee up these future entries, I'll say the high order bit here is that a trained Q&A linguistic product seems somehow more intelligent than a linear regression or even a typical neural net application. Why is that? To get a bit more background for that explanation, I'd encourage you to visit or revisit a few of my earlier blog entries about cognitive computing. Compare your perceptions of the intelligence of the minimax algorithm in  with the linear regression method in , and compare  with the neural net in . What's changing?
Modified by John M. Boyer
As an interesting possible counterexample to my last blog about MLR models not understanding the knowledge they learn, consider the neural network. Our brains are neural networks, and we are capable of learning at all levels of Bloom's Taxonomy, not just the knowledge level. Shouldn't artificial neural networks be able to achieve the same things?
The answer is no, not really. Our brains biologically, chemically and physically perform in ways that we scarcely understand, so our name for the thing we call "artificial neural network" is no less anthropomorphizing than when we say that a computer program of today "understands" anything.
Still (again), this is not to say that they aren't incredibly useful and effective. It's just that they are based on straightfoward and well-understood mechanical methods such as feed forward activation of neural outputs via sigmoidal threshold functions applied to inputs and back propagation of synaptic weight adjustments based on easily quantified classification errors. Before going any further, let's have a quick look at a diagram of an artificial neural network (ANN):
The ANN has an output layer on the right that is a classifier for input patterns received on the left. For example, an ANN for optical character recognition could have an input layer of an 8x8 matrix of bits, and the output layer could be an 8-bit code that indicates an ASCII character. The hidden layer(s) of neurons help the ANN to represent more sophisticated phenomena, though there is seldom need for more than one hidden layer. The "synaptic" connections between the neurons in the layers are weighted numbers, and the neurons apply the weights to the inputs and then feed the results into a Sigmoid function that essentially decides, like a transistor or switch, whether or not to fire the output.
An ANN is "trained" by giving it a sequence of input patterns for which the correct output pattern is known. The input pattern feeds forward through the ANN to produce an output. If there is a difference between the ANN output and the correct output, then the differential error is back propagated through the ANN to adjust the weights so that future occurrences of that input pattern are more likely to produce the correct output.
The synaptic weights, then, essentially represent the knowledge that the ANN "learns" from the input patterns. This is analogous to the constants that are "learned" by an MLR model. In fact, all elements of the ANN and MLR model architectures are analogous. The ANN input layer maps to the the independent X variables, the ANN output layer maps to the dependent Y variable, and the transition from input X values and the Y value that is achieved in MLR by multiplication and addition is achieved by a feed forward through synaptic connections, hidden layer neurons and Sigmoid functions in an ANN.
With such a one-to-one architecture mapping between ANNs and MLR models, it is easier to see them as having similar intellectual power. That's not to say they're equivalent, as ANNs are far more powerful. It's just that they're roughly the same (low) order of magnitude with respect to human intellect, and in terms of Bloom's Taxonomy, we call that order of magnitude "knowledge storage/retrieval".
Despite being in the lowest order of magnitude of intellect, the realm of today's artificial intelligence includes many interesting knowledge storage/retrieval techniques that are worth comparing and contrasting to see the range and limits of their power and the use cases they address. Stay tuned!
Modified by John M. Boyer
Ever since my first blog entry in this recent series on artificial intelligence, I've been highlighting the lesser, calculational nature of machine intelligence and learning-- as well as the valuable role it nonetheless can play in driving more effective human understanding and decisions. I've been doing this by articulating mainly what machines do, as that is the primary interest of mine and most who would read a developerWorks blog. Still, our interests will be served by taking an entry to discuss human learning as a counterpoint or contrast.
The multiple linear regression example in my last post is a good example to start with because it highlights the difference between accuracy versus understanding. If there is a linear relationship among the data, then an MLR can have a very high predictive accuracy, but it has no explanatory power whatsoever. The MLR model does not have, nor does it convey, any understanding as to why the relationship exists.
Let's see how this predictive accuracy rates in terms of human intelligence and learning. In this case, we can benefit from an instance of that delightful human propensity to apply ideas to themselves. Specifically, we humans have applied our learning abilities to the phenomenon of our learning abilities, with many useful results including Bloom's Taxonomy.
According to Bloom's taxonomy, the very lowest level of cognitive learning is the knowledge level, or the ability to remember and recall what is learned. When you think about it, you realize that an MLR model, like many predictive analytics, is really a storage mechanism for something that has been machine learned from data. In MLR, we store the constants of a linear formula as the representation of what has been learned from linearly related data.
The next higher level of Bloom's taxonomy is comprehension, which is where understanding and true explanatory power begin to surface. But human learning is so much more sophisticated than the knowledge level of machine learning that there are a number of levels above comprehension. There's the application level, in which we can use our knowledge to solve new problems, including being able to explain why the new solution works. The analysis level drills deeper into our ability to make inferences and generalizations. The synthesis level begins to get at our ability to be creative with what we've learned and come up with new ideas and solutions. Finally, the evaluation level gets at our ability to be subjective and judge quality and creativeness of ideas and solutions. We are beginning to see some faint glimmers of some elements of some of these levels in cognitive computing efforts like IBM Watson, but it is early days indeed.
While we're on the subject of human learning and Bloom's Taxonomy, it makes sense to digress for a bit and mention the IBM Social Learning product. This is a SaaS educational platform intended to help enterprises achieve a Smarter Workforce. A few reasons for the digression are
this is a product I currently work on,
both it and the whole Smarter Workforce initiative are being featured at the IBM Connect 2014 conference being held right now, and
learning is a key ingredient of how a human workforce becomes smarter.
The IBM Social Learning product has a very nice feature that enables educational administrators to implement Bloom's Taxonomy in their learning materials. A component of the product is the Kenexa LCMS, or learning content management system, which includes various subcomponents like a course designer and a metadata dictionary. The educational administrator can add any metadata tag, such as "Learning Goal", and any tag values, such as "Basic Knowledge", "Comprehension", "Application", etc. Once this is done, the educational administrator can use the metadata tag values to classify any learning item in the LCMS accord to Learning Goal. Once these classified learning materials are published, learners can use the "Learning Goal" as a new faceted search criterion in the platform's learning library. A learner would be able to isolate and focus on "knowledge" level learning in a subject area before proceeding to comprehension and then application, for example. This will enable learners to effectively use the natural way in which their learning blooms, i.e. Bloom's Taxonomy.
Finally, there is an aspect of human learning that goes beyond Bloom's taxonomy, and it's an area that is highlighted by the IBM Social Learning product. There is a very important word in the product title: Social. This is crucial because it underscores the central role of communication and collaboration in the human learning process. We are an order of magnitude more effective at learning based on our interconnectedness to others who think and learn, rather having access to just data. This is pertinent to the advancement of artificial intelligence because "social" goes quite beyond the computing architecture underlying a lot of today's machine learning efforts.
Modified by John M. Boyer
Machine learning today is every bit as calculated, as simulated, as is machine intelligence. It is easier to use machine intelligence to highlight how much greater human cognition is, which is why I've been using a machine intelligence algorithm over the last several entries. However, the conclusion drawn so far is that, while machine intelligence is only simulated, it is still quite effective and valuable as an aid to human insight and decision making. Machine learning offers another leap forward in the effectiveness and hence value of machine intelligence, so let's see what that is.
Machine learning occurs when the machine intelligence is developed or adapted in response to data from the domain in which the machine intelligence operates. The James Blog entry only does this degenerately, at a very coarse grain level, so it doesn't really count except as a way to begin giving you the idea. The James Blog entry plays a game with you, and if he loses, he adapts by increasing his lookahead level so that his minimax method will play more effectively against you next time. In some sense, he learned that you were a better player. However, this is only a single integer of configurability with only a few settings of adjustment that controls only one aspect of the machine intelligence algorithm's operation. To be considered machine learning, a method must typically have a more profound impact on the operation of the algorithm, with much more adaptation and configurability based on many instances of input data. An example will clarify the more fine grain nature of machine learning.
The easiest example of which I can think is a predictive analytic algorithm called linear regression. Let's say you'd like to be able to predict or approximate the purchase price of a person's new car based on their age. Perhaps you want to do this so that you can figure out what automobile advertisements are most appropriate to show the person. Now, as soon as you hear this example, your human cognition kicks in and you rattle off several other likely variables that would impact the most likely amount of money a person is willing to spend on a car, such as their income level, debt level, nuclear familial factors, etc. This analytic technique is typically called multiple linear regression (MLR) exactly because we humans most often dream up many more than two variables that we want to simultaneously consider. Like most machine learning techniques, MLR does not learn of new factors to consider by itself. It only considers those factors that a human has programmed it to consider. When they are well chosen, additional variables typically do make an MLR model more effective, but for the purpose of discussing the concept of machine learning, the simple two-variable example suffices since your mind will have no problem generalizing the concept.
Suppose you have records of many prior car purchases, including a wide and nicely distributed selection of prices of the cars and ages of their buyers. This is referred to as "training data". If you plotted the training data, it might look something like the blue points in the image below. Let purchase price be on the vertical Y axis since it is the "dependent" variable that we want to predict, and let age be on the X-axis since it is a predictor, or "independent" variable. MLR uses a standard formula to compute a "line of best fit" through the given data points, again like the one shown in red in the picture.
A line has a formula that looks like this: Y=C1X1+C0, where C1 is a constant that governs the slant (slope) of the line, and C0 is a constant that governs how high or low the line is (C0 happens to be the point where the line meets the Y-axis, and the line slopes up or down from there). If we had more variables, then MLR would just compute more constants to go with each of them. For example, if we wanted to use two variable predictors of a dependent variable, then we'd be using MLR to create a line of the form Y=C2X2+C1X1+C0.
Technically, MLR computes the constants like C1 and C0 of the line Y=C1X1+C0 in such a way that the line minimizes the sum of the squares of the vertical (Y) distances between each data point and the line. For each point, we take its distance from the line as an amount of "error" in the prediction. We square it because that gets rid of the negative sign (and, less importantly, magnifies the error resulting from being further from the line). We sum the squares of the errors to get a total measure of the error produced by the line, and the line is computed so as to minimize that total error.
Once the constants have been computed, it is a trivial matter to use the MLR model as a predictor. You simply plug the known values of the predictor variables into the formula to compute the predicted Y-value. In the car buying example, X1 is the age of a potential buyer, and so you multiply that by the C1 constant, then add C0 to obtain the Y-value, which is the predicted value of the car.
In this way, hopefully you can see that the MLR "learns" the values of the constants like C1 and C0 from the given data points. Furthermore, the actual algorithm that produces the machine intelligence only computes the result of a simple linear equation, so hopefully you can also see that the predictive power comes mainly from the constants, which were "learned" from the data. In the case of the minimax method, most of the machine intelligence came from the algorithm, but with MLR-- as with most machine learning-- the machine intelligence is for the most part an emergent property of the training data.
Lastly, it's worth noting that there are a lot of "best practices" around using MLR. However, these are orthogonal to topic of this post. Suffice it to say that just like the minimax method has a very limited domain in which it is effective as a machine intelligence, MLR also has a limited domain. For example, the predictor variables (the X's) do need to be linearly related to the dependent variable in reality. However, within the limited domain of its linearly related data, MLR is quite effective and an excellent example of a simple machine learning technique that produces machine intelligence within that domain.