Go immediately to this link and join me in congratulating my long-time friend and sensei, Tetsuya Nasukawa, for creating one of the most significant inventions at IBM in the past 100 years:
I hope you're not too jade to feel the power of this, or, to quote the article, the "clout." IBM has had a lot of pretty important inventions. To be named an "icon" among those is no mean feat. TAKMI is at the heart of IBM's flagship text analytics product, IBM Content Analytics (ICA).
I don't know whether it was in an article, or just a person conversation, but Nasukawa-san once said,
"I didn't invent TAKMI to do something humans could do, better; I wanted TAKMI to do something that humans could not do."
I have watched many people struggle to find the "magic" in the software, and get frustrated. I think if that quote were written at the top of the Text Miner user interface, it might help insight-seekers remember just exactly what insight is - something you didn't/couldn't see on your own,
On a cognitive level, it seems that humans can easily see and usually just as easily describe regular distribution patterns, in every day events, around the house, in books, etc. We "get" and like patterns. But when it comes to irregularity, all we seem to be able to do is to smell it - to simply note that something is off. However, we can seem to grasp that irregularity, too, is a pattern. This is one of the strengths of TAKMI. From the above article:
"it is easy for TAKMI to identify irregular distribution of trouble- related expressions"
I might suggest reading that sentence more than once (I did, and I discovered deeper meaning with each reading). Implicit in this statement is that, in life, there is trouble, and that it is easy for humans to see a regular distribution of trouble. For example, after an earthquake, we are not surprised to hear about electrical outages. TAKMI could show us the same. Another example: we can see that organizations tightly associated with Al-Qaeda are implicated in terrorist activities. There is a regular distribution of negative information delineating this relationship, on the internet.
TAKMI goes to the other side, to discover an irregular distribution of trouble - trouble that deviates in ever-so-slightly a fashion from the everyday trouble. These are the kinds of troubles that humans can not detect. Scott Spangler, senior technical staff member in text mining and software development at IBM Almaden Research Center and co-author of Mining the Talk: Unlocking the Business Value in Unstructured Information.
“But what unstructured information can tell you is the answer to questions you didn’t even know you needed to worry about. It lets you
know what you don’t know.”
Seems like we're harping on that same idea again - don't go looking for what you already know, and especially do not go looking for what you think you know! I often say, "let TAKMI do the talking." This is not easy for many people. The difference in approaches to data analysis is actually the essential difference between good science - which we are in short supply of these days - and bad science. Here's the key:
Be descriptive, not prescriptive.
Without your intervention, TAKMI will find all the patterns there are to find in your data corpus. TAKMI is not trying to find anything - it is simply setting up distribution norms for the entire data set (the particular data set), then, distribution norms for subsets of the same data; and finally, ratios between those subsets, which may indicate deviation, or not. For example, if your palette consists of all the colors of the spectrum, does green (for example) stand out, more than any other color? Now, in a collection of red hues, will green stand out more than, says, magenta? How much green will you need, to get it to stand out? Will it require a large patch, or, will one tiny green dot stand out like a beacon, in a field of reds?
Too abstract. If a large number of cars run off the road and crash into trees, it is pretty intuitive that brakes are at fault. I will ager that this is also the case statistically - that if we analyze all the data in the world related to cars running off the road and crashing into trees, we would find faulty brakes to be the number one cause.
So, duh. Would you pay me $2 million for software that tells you that? Do you think auto manufactures don't already focus on braking technology in striving to avoid such accidents? Remember: what humans can not do...
...because you just might never think, nor would there be large, sweeping data trends, to hint that the gas pedal (that's the other pedal, you know,next to the brake) might cause cars to run off the road and crash into trees.