Big Data Analytics

Cross-lingual text mining

Share this post:

Discovering knowledge from large volumes of multilingual text data just got easier with new text mining technology from IBM Research. Using globally distributed databases, this cross-lingual text mining technology developed by the research team in Tokyo allows users to search through – and find value in – data written in a language they don’t understand.

Knowledge Discovery

For example, manufacturers selling products in the U.S., Europe and Asia could quickly identify defects, or complaints based on the data from tens of thousands of customer contact reports stored by call center operators in local customer languages. The cross-lingual text mining technology extracts context from portions of the text that the user wishes to analyze, translated to their preferred language. It analyzes and returns results, highlighting irregularities such as defects or complaints that were previously unnoticed, due to language barriers.

“Finding accurate translation pairs (to match one language to another) was a challenge in developing the technology. Often, notes taken by call center operators are not grammatically correct or truncated.” said Tetsuya Nasukawa, a senior technical staff member at IBM Research – Tokyo.

“The terms being analyzed may not be defined in general translation dictionaries. So, this text mining compares how each concept is expressed in the textual database of the source’s native language – and in the textual database of the requested foreign language to determine the translation pairs.”

To go from a search tool, to a technique that extracts valuable information – from any language domain – users can apply toward trend analysis, claim processing, and other fields, the team in Tokyo used TAKMI (text analysis and knowledge mining) to find noteworthy features, trends and important issues without reading all of the data, and additional technology which extracts translation pairs from any language domains.

Last year, IBM’s text mining research team received the Field Innovation Award from The Japanese Society for Artificial Intelligence in recognition of its pioneering text mining research and development effort.

More stories

Real-Time Sequential Decision-Making by Autonomous Agents

A new approach to real-time sequential decision-making represents a step towards autonomous agents that can make critical decisions in real time.

Continue reading

Leading Food Safety’s Technology-Driven Future

When the first bacterial genome was sequenced in 1995, it was impossible to imagine the new ways that humans would put microbes to work in service of food safety and security. But today, big data can be paired with microbial sequencing to monitor the microbial ecosystem of food and detect potential threats, transforming the field […]

Continue reading

Smarter Farms: Watson Decision Platform for Agriculture

Bringing the power of Watson to farmers Agriculture, a $2.4 trillion industry, is a foundation of economies worldwide. Factors such as climate change, population growth and food security concerns have propelled the industry into seeking more innovative approaches to protecting and improving crop yield. As a result, artificial intelligence is steadily emerging as part of […]

Continue reading