December 31, 2015 | Written by: Jeffrey O'Brien
Categorized: Watson APIs
Share this post:
For us humans, there are well-documented benefits to learning more than one language. Exposure to a second or third dialect, especially at a young age, increases a brain’s neuroplasticity, which can improve creativity and academic performance. Bilingual children also often demonstrate a greater ability to perform complex tasks and to subsequently pick up other languages. From a purely unscientific point of view, polyglots also grow up to be some of the most interesting guests at cocktail parties.
For IBM Watson, the reasons to learn languages other than English – the cognitive computing system’s native tongue, if you will – are even more pronounced. Simply put, if Watson is to achieve the greatest possible impact in the world, it must understand as much as possible. Which means it needs to expand its repertoire beyond English fluency. “We believe people who speak many different languages want to use Watson—and most people on the planet do not have English as their native language,” says Michael Karasick, head of innovations for IBM’s Watson team.
Since being introduced via Jeopardy in 2011, Watson has been rapidly boning up on Spanish and Portuguese. Now it’s tackling what many consider to be the single most difficult language to learn, Japanese.
Watson has generated significant interest in Japan. In the year since the Watson Group was founded, Japanese researchers, clients and prospects have all been talking excitedly about potential applications in everything from robotics and elder care to education and wealth management. “It shouldn’t be surprising, really. Artificial Intelligence, humanoid robots, facial recognition—all of these things have a long and rich history in Japan,” says Karasick. “That type of curiosity and knack for exploration make us think there’s real business for Watson in Japan.”
Tapping the obvious interest is no easy task, however. The Foreign Service Institute classifies Japanese as “exceptionally difficult” for native English speakers to learn partly because of its reliance on a complex diagrammatical alphabet, known as kanji. But even without the intimidating characters, the rules of discourse seem opaque. “There is a certain level of indirection when people speak Japanese. English is more literal while Japanese is more subtle,” says Salim Roukos, who runs the multilingual natural processing group that’s tasked with teaching Watson new languages. “With Japanese, there’s a certain politeness that adds to the complexity. And there are many different ways of expressing the same idea.”
The written language doesn’t use spaces between characters, making it more difficult to know where a word begins and ends. Japanese also contains countless idioms, where the literal translation of a phrase (a la, “kick the bucket” in English) doesn’t apply. And unlike in many languages, the context of a Japanese sentence often exists somewhere else in the conversation.
Luckily, Watson comes to this challenge armed with a tool that we humans do not possess, namely a solid state memory, which is a huge plus when it comes to the type of pattern recognition necessary when learning a language. “Watson is like having a research assistant who never forgets,” says Karasick. “Watson can find the signal in an enormous amount of noise.”
Watson’s learning method starts with the consumption of copious amounts of data. In cancer research, that means reading stacks of scientific papers. With cooking, it’s digesting recipes. Whereas learning languages begins with absorbing the meaning of words and phrases and the rules of sentence construction. Roukos and his team feed Watson annotated sentences in a parsing exercise known as syntactic sentence diagramming. “Quantity matters,” says Roukos. “Quantity leads to quality.”
With Japanese, non-native speakers often begin learning the spoken language to avoid the kanji, but speech recognition adds another layer of complexity for a computer, and so Watson starts by reading. The cognitive system consumes roughly 10,000 sentences and deconstructs them in the form of diagrams that indicate syntax and semantic structure. Researchers and Japanese speakers then review the diagrams, fix the errors, and feed corrections back into the system so Watson can learn from its mistakes.
The subsequent batches typically require only a single pass and upon completion of the full set, Watson possesses a so-called “Treebank,” or corpus, of about 1 million words. But that’s really only the end of the beginning. “It’s a fairly lengthy process going from syntactic sentence diagramming to understanding to answering questions,” says Karasick. So while Watson climbs the Japanese learning curve more quickly in the beginning, it’s much slower than humans when it comes to knowing and applying the logic beneath the language.
The fact that humans and Watson are good at different things makes them good partners. The more languages Watson understands, the greater its utility will be in various endeavors and disciplines. “Chemistry is chemistry. Medicine is medicine. There are variations in these fields that are cultural, but the domains are the domains,” says Karasick. “The more languages Watson knows, the more it can help practitioners in those fields.”
Watson and humans do have one thing in common when it comes to learning languages—and it’s something that Roukos knows first-hand. He speaks English, Arabic, and French fluently, as well as “some Armenian.” His proficiency with languages greatly improves his ability to quickly grasp the context and meaning in unfamiliar languages. And the same, he says, is true of Watson. The more languages it learns, the better and faster it becomes at mastering new languages.
Which means it’s only a matter of time before Watson also conquers the cocktail party.