Open architecture helps Watson understand natural language

Share this post:

Editor’s note: This is a guest post by IBM Senior Technical Staff Member and Apache UIMA Project Management Committee Chairman Marshall Schor. Meet Mr. Schor and Watson at the 2011 Impact Conference, April 10-14.

Natural language is messy. Slang, puns and the context of when and where something is spoken influences meaning. Watson tackled the problem of understanding the natural language of Jeopardy! with a mess of algorithms – managed by an open source architecture.

The open source Unstructured Information Management Architecture (Apache UIMA™) that IBM Research donated to the Apache Foundation in 2006 is what makes Watson’s hundreds of independent algorithms – written in different languages – work together. Watson combines legacy code written in C and C++, developed before Java became popular, with pattern matching algorithms written using Prolog. The majority of the algorithms are coded in Java because it is currently the most popular, general purpose, high performance object oriented language in use today.

IBM Researchers came up with UIMA about a decade ago to connect colleagues who worked on language processing and unstructured information analytics. UIMA (an OASIS standard) wrapped the independent algorithms in a common architecture so they could work together. When UIMA-AS was added to take advantage of multi-core machines and networks of machines, it was a natural fit for Watson.

Watson runs on POWER7 because of its suitability to highly parallelized applications and its high bandwidth between its memory and the 32 cores of each node. UIMA scales out its components across thousands of these cores so Watson can answer a single Jeopardy! clue in about three seconds.

Algorithms at work: Watson learning across categories

Where else is Watson’s software?

UIMA is embedded in several IBM products, including IBM InfoSphere Warehouse, which performs text analytics for both structured and unstructured content. InfoSphere BigInsights has been used to run UIMA analytics within Apache’s Hadoop framework for scalable, distributed computing, to analyze and process a broad set of information including unstructured content.

More stories

A new supercomputing-powered weather model may ready us for Exascale

In the U.S. alone, extreme weather caused some 297 deaths and $53.5 billion in economic damage in 2016. Globally, natural disasters caused $175 billion in damage. It’s essential for governments, business and people to receive advance warning of wild weather in order to minimize its impact, yet today the information we get is limited. Current […]

Continue reading

DREAM Challenge results: Can machine learning help improve accuracy in breast cancer screening?

        Breast Cancer is the most common cancer in women. It is estimated that one out of eight women will be diagnosed with breast cancer in their lifetime. The good news is that 99 percent of women whose breast cancer was detected early (stage 1 or 0) survive beyond five years after […]

Continue reading

Computational Neuroscience

New Issue of the IBM Journal of Research and Development   Understanding the brain’s dynamics is of central importance to neuroscience. Our ability to observe, model, and infer from neuroscientific data the principles and mechanisms of brain dynamics determines our ability to understand the brain’s unusual cognitive and behavioral capabilities. Our guest editors, James Kozloski, […]

Continue reading