September 27, 2019 | Written by: Brian Kingsbury, Ron Hoory, and Gakuto Kurata
Share this post:
The 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019) took place in Graz, Austria earlier this month. IBM Research AI was proud to support the conference as a silver sponsor and to share our latest research results in 17 papers presented at the conference (listed below).
Many of IBM’s clients need to provide customer care over a wide variety of channels, including e-mail, text chat, and spoken interactions over the telephone. Our clients must provide their customers with a uniformly good experience across these different channels and extract actionable insights from these interactions. To meet these goals, we need to improve the underlying speech technologies: speech-to-text, text-to-speech, and spoken language understanding.
IBM serves a wide range of clients operating in many different industries, each with its own unique terminology and requirements. In order to support a broad set of speech applications, we focus on two goals:
- building strong base models that give our clients good “out-of-the-box” performance, and
- supplying tools that empower our clients to customize models for their own use cases.
Because we are researchers, we also pursue fundamental, exploratory work that may not go into products or services in the near term.
Our Interspeech 2019 papers reflect our short- and long-term goals, and show the depth and diversity of our speech research. We invite you to read the papers that interest you, and reach out to the authors if you want to learn more.
“Identifying Mood Episodes Using Dialogue Features from Clinical Interviews,” Aldeneh, M. Jaiswal, M. Picheny, M. McInnis, and E. Mower Provost
“Forget a Bit to Learn Better: Soft Forgetting for CTC-based Automatic Speech Recognition,” K. Audhkhasi, G. Saon, Z. Tüske, B. Kingsbury, and M. Picheny
“Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text,” M. K. Baskar, S. Watanabe, R. Astudillo, T. Hori, L. Burget, and J. Černocký,
“Acoustic Model Optimization Based On Evolutionary Stochastic Gradient Descent with Anchors for Automatic Speech Recognition,” X. Cui and M. Picheny
“Direct Neuron-wise Fusion of Cognate Neural Networks,” T. Fukuda, M. Suzuki, and G. Kurata
“Adversarial Black-Box Attacks on Automatic Speech Recognition Systems Using Multi-Objective Evolutionary Optimization,” S. Khare, R. Aralikatte, and S. Mani
“High quality, lightweight and adaptable TTS using LPCNet,” Z. Kons, S. Shechtman, A. Sorin, C. Rabinovitz, and R. Hoory
“Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation,” G. Kurata and K. Audhkhasi
“Multi-task CTC Training with Auxiliary Feature Reconstruction for End-to-end Speech Recognition,” G. Kurata and K. Audhkhasi
“Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition,” K. C. Mac, X. Cui, W. Zhang, and M. Picheny
“Challenging the Boundaries of Speech Recognition: The MALACH Corpus,” M. Picheny, Z. Tüske, B. Kingsbury, K. Audhkhasi, X. Cui, and G. Saon
“A New Approach for Automating Analysis of Responses on Verbal Fluency Tests from Subjects At-Risk for Schizophrenia,” M. Pietrowicz, C. Agurto, R. Norel, E. Eyigoz, G. Cecchi, Z. Bilgrami, and C. Corcoran
“Learning Speaker Aware Offsets for Speaker Adaptation of Neural Networks,” L. Sarı, S. Thomas, and M. Hasegawa-Johnson
“Detection and Recovery of OOVs for Improved English Broadcast News Captioning,” S.Thomas, K. Audhkhasi, Z. Tüske, Y. Huang, and M. Picheny
“Advancing sequence-to-sequence based speech recognition,” Z. Tüske, K. Audhkhasi, and G. Saon
“Few-Shot Audio Classification with Attentional Graph Neural Networks,” S. Zhang, Y. Qin, K. Sun, and Y. Lin
“A Highly-Efficient Distributed Deep Learning System For Automatic Speech Recognition,” W. Zhang, X. Cui, U. Finkler, G. Saon, A. Kayi, A. Buyuktosunoglu, B. Kingsbury, D. Kung, and M. Picheny