Skip to main content
Icons of Progress


Bringing Order to Unstructured Data
IBM100 TAKMI iconic mark

In 1997, IBM researchers at the company’s Tokyo Research Laboratory pioneered a prototype for a powerful new tool capable of analyzing text. The system, known as TAKMI—for Text Analysis and Knowledge Mining—was a watershed development: for the first time, researchers could efficiently capture and utilize the wealth of buried knowledge residing in enormous databases of text.

By that time, text was searchable, if you knew what to look for. But the challenge was to understand what was inside the databases and know how to take advantage of the massive textual content that you could not read through and digest.

The development of TAKMI quietly set the stage for the coming transformation in business intelligence. Prior to 1997, the field of analytics dealt strictly with numerical and other “structured” data—the type of tagged information that is housed in fixed fields within databases, spreadsheets and other data collections, and that can be analyzed by standard statistical data mining methods.

The technological clout of TAKMI lay in its ability to read “unstructured” data—the data and metadata found in the words, grammar and other textual elements comprising everything from books, journals, text messages and emails, to health records and audio and video files. Analysts today estimate that 80 to 90 percent of any organization’s data is unstructured. And with the rising use of interactive web technologies, such as blogs and social media platforms, churning out ever-expanding volumes of content, that data is growing at a rate of 40 to 60 percent per year.

The key for the success was natural language processing (NLP) technology. Most of the data mining researchers were treating English text data as a bag of words by extracting words from character strings based on white spaces. However, since Japanese text data does not contain white spaces as word separators, IBM researchers in Tokyo applied NLP for extracting words, analyzing their grammatical features, and identifying relationships among words. Such in-depth analysis led to better results in text mining. That’s why the leading-edge text mining technology originated in Japan.

Being able to make sense of such data would open up huge opportunities for enterprises of all kinds. “Structured information can give you answers to questions that you already know to ask,” explains Scott Spangler, senior technical staff member in text mining and software development at IBM Almaden Research Center and co-author of Mining the Talk: Unlocking the Business Value in Unstructured Information. “But what unstructured information can tell you is the answer to questions you didn’t even know you needed to worry about. It lets you know what you don’t know.”

With the TAKMI capabilities, data could be extracted and put to work to spot trends and monitor critical business issues ranging from product failures and ill-received advertisements, to customer behavior and employee engagement. It could bring the operational efficiencies of information management to the field of knowledge management by providing the means for informed problem solving and context-based decision making.

From a technological perspective, the TAKMI framework shifted the focus of document handling from searching and organizing documents to building knowledge. While the existing strategies and products of the time relied on information retrieval and document- clustering technologies to identify keywords and analyze their distribution, TAKMI dug deeper, drawing on natural language processing, data mining and visualization to identify rules and patterns and to extract, contextualize, analyze and present concepts. The resulting output was sophisticated—and actionable—business intelligence.

In the system’s prototypical use—analyzing call center logs from IBM PC Help Centers in Japan and the US—it used semantic analysis to successfully determine that most customers calling during June and July 1998 were inquiring if they could safely install Microsoft ® Windows ® 98 on their machines. By posting a response on its help center homepage, IBM was able to improve its service to customers while freeing up its help lines.

Moreover, TAKMI demonstrated its power by identifying product failures in their early stages, which led to significant cost savings, often in the range of millions of dollars. Unlike case analysis, it is easy for TAKMI to identify irregular distribution of trouble- related expressions to specific products in call logs. Since troubles are usually unexpected, traditional approaches to trouble detection based on manually assigned trouble codes have limitations.

Beyond IBM, TAKMI technologies are helping healthcare professionals provide better care for their patients. In 2007, IBM Research, along with IBM Global Business Services, teamed with Japan’s National Cancer Center to develop an extension of the system for use in mining the enormous body of existing biomedical information. MedTAKMI-CDI gathers, interprets and analyzes clinical data from multiple sources, providing information about patient groups based on categories such as diagnosis, lab test results, age and therapeutic response. By analyzing these patterns, clinicians can generate analytic rules to help them best treat given groups of patients.

In 2009, TAKMI debuted commercially as IBM ® Content Analytics, a stand-alone analytics platform. Of particular value to clients is the system’s ability to bridge previously disconnected structured and unstructured data, analyzing enterprise content from emails, blog posts and chat logs alongside structured data, such as sales figures or customer postal codes.

Today, IBM’s continuing dedication to driving innovations in intelligence is evident in other products in its analytics lineup aimed at unearthing the value in customer-generated content—wherever that content is found. Its predictive analytics software, released in 2010, augments its text-mining capabilities to include text and other data from social media sources in an effort to detect, track and even predict customer attitudes and behavior. The software is also capable of interpreting slang, industry jargon and the ubiquitous balls of virtual emotion known as emoticons.

Analytics will continue to play a critical role in advancing business, science and social progress across nearly every existing industry. In fact, a 2010 white paper coauthored by IBM’s Institute for Business Value and the MIT Sloan Management Review indicated that top-performing organizations use analytics five times as much as their lower-performing peers.

Thanks to breakthrough technologies like TAKMI and its descendents, text and other unstructured data previously locked away is now taking its proper place in helping to make the world work better.


Selected team members who contributed to this Icon of Progress:

  • Tetsuya Nasukawa IBM Senior Researcher
  • Kohichi Takeda Distinguished Engineer, Manager of Analytics and Intelligence, IBM Research - Tokyo
  • Hideo Watanabe Manager, Knowledge Infrastructure Group, IBM’s Tokyo Research Laboratory
  • Shiho Ogino Research Scientist, Computer Science
  • Akiko Murakami Research Scientist, Computer Science
  • Hiroshi Kanayama Research Scientist, Human-Computer Interaction
  • Hironori Takeuchi Research Scientist, Computer Science
  • Issei Yoshida Research Scientist, Math Science
  • Yuta Tsuboi Research Scientist, Computer Science
  • Daisuke Takuma Research Scientist, Math Science