IBM Research AI at INTERSPEECH 2019

Share this post:

The 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019) took place in Graz, Austria earlier this month. IBM Research AI was proud to support the conference as a silver sponsor and to share our latest research results in 17 papers presented at the conference (listed below).

Many of IBM’s clients need to provide customer care over a wide variety of channels, including e-mail, text chat, and spoken interactions over the telephone. Our clients must provide their customers with a uniformly good experience across these different channels and extract actionable insights from these interactions. To meet these goals, we need to improve the underlying speech technologies: speech-to-text, text-to-speech, and spoken language understanding.

IBM serves a wide range of clients operating in many different industries, each with its own unique terminology and requirements. In order to support a broad set of speech applications, we focus on two goals:

  1. building strong base models that give our clients good “out-of-the-box” performance, and
  2. supplying tools that empower our clients to customize models for their own use cases.

Because we are researchers, we also pursue fundamental, exploratory work that may not go into products or services in the near term.

Our Interspeech 2019 papers reflect our short- and long-term goals, and show the depth and diversity of our speech research. We invite you to read the papers that interest you, and reach out to the authors if you want to learn more.

Identifying Mood Episodes Using Dialogue Features from Clinical Interviews,” Aldeneh, M. Jaiswal, M. Picheny, M. McInnis, and E. Mower Provost

Forget a Bit to Learn Better: Soft Forgetting for CTC-based Automatic Speech Recognition,K. AudhkhasiG. Saon, Z. Tüske, B. Kingsbury, and M. Picheny

Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text,” M. K. Baskar, S. Watanabe, R. Astudillo, T. Hori, L. Burget, and J. Černocký,

Acoustic Model Optimization Based On Evolutionary Stochastic Gradient Descent with Anchors for Automatic Speech Recognition,X. Cui and M. Picheny

Direct Neuron-wise Fusion of Cognate Neural Networks,T. FukudaM. Suzuki, and G. Kurata

Adversarial Black-Box Attacks on Automatic Speech Recognition Systems Using Multi-Objective Evolutionary Optimization,” S. Khare, R. Aralikatte, and S. Mani

High quality, lightweight and adaptable TTS using LPCNet,Z. KonsS. ShechtmanA. Sorin, C. Rabinovitz, and R. Hoory

Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation,G. Kurata and K. Audhkhasi

Multi-task CTC Training with Auxiliary Feature Reconstruction for End-to-end Speech Recognition,G. Kurata and K. Audhkhasi

Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition,” K. C. Mac, X. CuiW. Zhang, and M. Picheny

Challenging the Boundaries of Speech Recognition: The MALACH Corpus,” M. Picheny, Z. Tüske, B. KingsburyK. AudhkhasiX. Cui, and G. Saon

A New Approach for Automating Analysis of Responses on Verbal Fluency Tests from Subjects At-Risk for Schizophrenia,” M. Pietrowicz, C. Agurto, R. NorelE. EyigozG. Cecchi, Z. Bilgrami, and C. Corcoran

Learning Speaker Aware Offsets for Speaker Adaptation of Neural Networks,” L. Sarı, S. Thomas, and M. Hasegawa-Johnson

Detection and Recovery of OOVs for Improved English Broadcast News Captioning,S.ThomasK. Audhkhasi, Z. Tüske, Y. Huang, and M. Picheny

Advancing sequence-to-sequence based speech recognition,” Z. Tüske, K. Audhkhasi, and G. Saon

Few-Shot Audio Classification with Attentional Graph Neural Networks,” S. Zhang, Y. Qin, K. Sun, and Y. Lin

A Highly-Efficient Distributed Deep Learning System For Automatic Speech Recognition,W. ZhangX. Cui, U. Finkler, G. SaonA. KayiA. BuyuktosunogluB. Kingsbury, D. Kung, and M. Picheny


Distinguished Research Staff Member, IBM Research

Ron Hoory

Senior Technical Staff Member

Gakuto Kurata

Senior Technical Staff Member, IBM Research-Tokyo

More AI stories

High quality, lightweight and adaptable Text-to-Speech (TTS) using LPCNet

Recent advances in deep learning are dramatically improving the development of Text-to-Speech systems through more effective and efficient learning of voice and speaking styles of speakers and more natural generation of high-quality output speech.

Continue reading

IBM Project Debater Demonstrates the Future of Democracy in Switzerland

Can Artificial Intelligence (AI) capture the narrative of a community on a controversial topic to provide an unbiased outcome? Recently, the citizens of Lugano, a city of more than 60,000 citizens on the Swiss-Italian border, provided the answer. What is Project Debater? In February 2019, IBM unveiled Project Debater to the world. It’s the first ever AI technology […]

Continue reading

MIT-IBM Watson AI Lab Welcomes Inaugural Members

Two years in, and the MIT-IBM Watson AI Lab is now engaging with leading companies to advance AI research. Today, the Lab announced its new Membership Program with Boston Scientific, Nexplore, Refinitiv and Samsung as the first companies to join.

Continue reading