Real-time transliteration using InfoSphere Streams custom Java operator and ICU4J

Integrating a Java transliteration module with a custom Java operator of InfoSphere Streams

From the developerWorks archives

Bharath Kumar Devaraju

Date archived: January 13, 2017 | First published: December 13, 2012

With the ever growing importance of Internet monitoring and sentiment analysis, there is an immediate need for identifying patterns (performing text analytics) in big data. However, one of the challenges during this exercise is that countries can have multiple languages that create a challenge for effectively running the text analytics, since rules are not available for all the languages. For example, in India, the official language of each state is different, and data is available in both English and local languages. This article describes how to bring about consistency during the transliteration process, and to use IBM® InfoSphere® Streams® to prepare linguistic data and apply text analytics or pattern recognition logic.

This content is no longer being updated or maintained. The full article is provided "as is" in a PDF file. Given the rapid evolution of technology, some steps and illustrations may have changed.

Zone=Big data and analytics, Information Management
ArticleTitle=Real-time transliteration using InfoSphere Streams custom Java operator and ICU4J