March 5, 2013 | Written by: IBM Research Editorial Staff
Share this post:
Big Data is big. It’s 2.5 quintillion bytes of data
every day big. Proper noun Big. Dr. Dimitri Kanevsky, a Master Inventor with IBM Research, is applying the techniques to discover meaning within Big Data to speech transcription and translation.
He recently earned a Tan Chin Tuan Exchange Fellowship in Engineering from Nanyang Technological University in Singapore for his work in this area, and lectured on everything from patents, to methods on optimizing Big Data, and how those methods are used in speech and translation technologies.
“The award was given to me for developing methods that allow machines to operate with large data that mostly is sparse, as in the data contains small amount of significant information hidden among the volumes of data.
“I led a team that applied these methods to speech, creating a new field: sparse representation in speech,” Kanevsky said.
|Dr. Dimitri Kanevsky delivering his lecture,
Why I Care About Hessian-Free Optimization,
at Nanyang Technological University, Singapore
Kanevsky, who has been deaf since early childhood, spoke to the NTU students through a combination i-Pad, Skype, and human stenographer to put his words on screen (watch one of his NTU lectures, here
). Via wireless Internet loop, Kanevsky spoke; a stenographer typed; text via a tool called Streamtext appeared on the screen in the classroom; students read and responded; the stenographer heard the students and typed again; all in a seamless, real-time flow.
Why all the moving parts? Because understanding speech, like finding valuable information in any kind of data, requires more than just a clever algorithm.
Current speech recognition technology is not accurate enough to understand and transcribe (much less translate) a lecture. The “data” of variations just in English accents, the distance of someone from a microphone, and background noise all make it difficult to accurately decode everything being said. This is why smartphones’ voice recognition technology – and their small vocabularies – won’t work in these environments. Not to mention, the translation delay would not be acceptable in a live lecture or discussion.
Kanevsky’s system instead takes advantage of off-the-shelf components (versus expensive proprietary technology used for television closed captioning); works over the web; and most-importantly grabs the important spoken information.
Applying Big Data computing to speech
For a machine to truly process speech data, it needs cognitive computing – a system with architecture that imitates how the human brain understands information. IBM Watson’s ability to understand natural language is just a first piece to a complex cognitive computing puzzle. But as cognitive computing is applied to Big Data, it will also revolutionize speech recognition and speech translation.
“One of the biggest challenges facing researchers who develop cognitive computing for Big Data is to develop faster methods to process large amount of data through these systems. That is what my team is developing: efficient and fast algorithms for speech transcription and translation,” Kanevsky said.
The techniques to find useful information in Big Data and understand, transcribe, or translate speech are intertwined. Kanevsky explains this with an example of trying to find audio data that represents spoken phrases stored in an audio archive.
|Kanevsky and a team of collaborators earned a patent in 1997 for developing a way to search audio using speech recognition. It formed the basis for data mining that involves pattern recognition technologies.
“You have to transcribe all the spoken phrases stored in those archives. Then, when someone searches for a phrase like ‘I have a dream,’ the system will find all strings for ‘I have a dream’ within the stored data, and produce links to audio that contains this spoken phrase,” Kanevsky said.
As data mining techniques more-efficiently identify small relevant chunks of information from Big Data, and get applied to speech and translation technologies, it creates reusable processes for decoding phrases that need to be analyzed to produce a final, decoded phrase.
Applying transcription and translation to all parts of life
Kanevsky’s work to route speech transcriptions and translations over the Internet more than 15 years ago was the world’s first. He’s since put similar technology into glasses that overlaid text that described or translated what the user looked at. And he also patented the
Artificial Passenger that converses with drivers to keep them awake.
The next great speech challenge is machine translation across different languages. Kanevsky’s team is now working on an automatic means for a speech-to-text tool – based on cognitive computing concepts – that simplifies spoken English for meetings between IBM researchers in the U.S. and China.
“Demonstration of real time transcriptions provided by human writers helped to start the work on developing an automatic means for transcription of meetings between our teams in China and in US,” Kanevsky said.