Dynamic artificial bat ears enrich speech signals
Bats use biosonar to navigate their night flights through jungles and forests. Their system of ultrasonic pulses can pinpoint sound more precisely than man-made technical sonar. To replicate these capabilities, Prof. Rolf Müller, an IBM Faculty Award winner, and his team at Virginia Tech designed artificial bat ears to explore how these flying mammals’ unique abilities can be replicated and harnessed — and it caught our attention.
My neuromorphic computing team, together with IBM Watson speech expert Xiaodong Cui and his colleagues, saw the promise of Müller’s artificial ears’ dynamic periphery (the moveable outer-ears that give bats their biosonar precision) to improve our ability to understand speech and sounds we humans hear. We brought one of Müller’s PhD students, Anupam Gupta, onboard as an intern to help us explore the use of their bat-inspired artificial ears for speech processing. And we found that these ears are not only good for sonar, but for speech as well.
In our paper, “Horseshoe bat inspired reception dynamics embed dynamic features into speech signals”, that I will present at the 172th Acoustical Society of America’s conference this week, we show that special dynamic features modeled after bats’ ability to quickly change the shape of their ears can improve the accuracy of automatic speech recognition systems (ASR) and speaker localization. These dynamic features have the potential to lead to devices that can hear like a bat to improve hearing aids, directional microphones, and anywhere specific sounds need to be pin-pointed, and understood.
Think of a busy, noisy city street. Just hearing the person next to you can be a challenge. But her words could be captured by a shape-shifting hearing aid that then gets translated into words you can understand – without the cacophony of traffic, construction, and other voices muddying what you hear.
Biosonar algorithms for robo-bat ears
Bats’ ultrasonic echoes fall into the frequency band between 10 – 200 kHz, so are mostly too high pitched for human ears, which hear between 20 Hz and 20 kHz. (Plus, our ears don’t move.) So, to harness a bat’s biosonar frequency and precision, Gupta joined us to write code that could translate speech signals into ultrasonic pulses the artificial ears could receive – and back to speech that we could understand.
The first step: compile a data set. We kept it simple with alphabet and digit sounds of 11 US English speakers’ audio from Carnegie Mellon University’s open source speech database.
Data, in the form of the sound of the letter “A” or the number “1,” for example, acquired over a microphone, was translated into an ultrasonic signal; played out over an ultrasonic speaker and received by the dynamic periphery of the artificial bat ears (with embedded microphone); then translated by our software back into its original “A” or “1.”
Artificial ears in a real (noisy) world
Analyzing these limited acoustic signal data sets of only letters and numbers, we reported that the artificial ears “enriched speech signals with dynamic, direction dependent time-frequency patterns.” Next, we compared the sound the artificial ear processed against the original speech data to get an idea of the ears’ accuracy. So, we dumped the original speech data, and the set processed by the artificial ear into a classifier for speech recognition. The artificial ears’ ability to move improved the accuracy of the classifier’s speech recognition. For example, they recognized 67 percent of speech signals from our limited data set, compared to only 35 percent recognition of the acoustic data without the dynamic periphery.
With more data to analyze, we can next begin to test the system against industry benchmarks and look into a new bio-inspired learning algorithm. And maybe a “hearing” app that could turn a smartphone microphone into an IoT-enabled directional microphone to help us hear the real world sounds we want to hear, isn’t too far away.
We look forward to continuing to collaborate with Gupta and Prof. Müller on this effort.
“Horseshoe bat inspired reception dynamics embed dynamic features into speech signals“, The Journal of the Acoustical Society of America
“1aSC31 – Shape changing artificial ear inspired by bats enriches speech signals“, 5th Joint Meeting Acoustical Society of America and Acoustical Society of Japan
Anupam K Gupta1,2 , Jin-Ping Han2, Philip Caspers1, Xiaodong Cui2, Rolf Müller1
1 Dept. of Mechanical Engineering, Virginia Tech, Blacksburg, VA, USA
2 IBM T. J. Watson Research Center, Yorktown, NY, USA