IBM Research at INTERSPEECH 2020

Share this post:

The 21st INTERSPEECH Conference will take place as a fully virtual conference from October 25 to October 29. INTERSPEECH is the world’s largest conference devoted to speech processing and applications, and is the premiere conference of the International Speech Communication Association.

The current focus of speech technology research at IBM Research AI is around Spoken Customer Care, where our goals are to improve customer experience in human-machine interaction (voice bots) and enhance analytics capabilities in human-human call center interaction. IBM Research AI has contributed 10 papers to the INTERSPEECH technical program, covering a wide range of topics including text-to-speech synthesis, automatic speech recognition, speaker diariazation, and spoken language understanding, that answer many interesting research questions in our line of research.

  • How do we make speech-to-text systems more accurate?
  • How can automatic systems reliably listen to a multi-party conversation and know who spoke when?
  • How can we synthesize speech that has natural and controllable expression?
  • How can we best extract meaning from spoken utterances?

These exciting pieces of work are the result of research done at five of IBM’s global research laboratories in Bangalore, Haifa, São Paulo, Tokyo, and Yorktown Heights, and also in collaboration with our academic partners.

Setting the stage for IBM’s leadership in this field are papers titled, “Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard”, and, “New Advances in Speaker Diarization”.  The first paper establishes new record performances in automatic speech recognition on the 300-hour and 2000-hour Switchboard conversational telephone speech benchmarks by developing a single headed attention, LSTM based encoder-decoder model. The second paper pushes the envelope on state-of-the-art performance for speaker diariazation (the task of determining “who spoke when?”) with novel improvements to speaker clustering with using multiple speaker embeddings and refined neural network-based modeling techniques.

Given the growing importance of spoken conversational systems for IBM’s customer care business, significant work has been accomplished in the other papers – two papers on spoken language understanding, one on speech synthesis, and five other papers on speech recognition and speaker recognition.

Full list of accepted papers:

Speaker recognition and diarization

  1. Hagai Aronowitz, Weizhong Zhu, Masayuki Suzuki, Gakuto Kurata, Ron Hoory, “New Advances in Speaker Diarization
  2. Shai Rozenberg, Hagai Aronowitz, Ron Hoory, “Siamese X-Vector Reconstruction for Domain Adapted Speaker Recognition

Text-to-speech synthesis

  1. Alexander Sorin, Slava Shechtman, Ron Hoory, “Principal Style Components: Expressive Style Control and Cross-Speaker Transfer in Neural TTS

Spoken language understanding

  1. Hong-Kwang J. Kuo, Zoltan Tuske, Samuel Thomas, Yinghui Huang, Kartik Audhkhasi , Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, and Luis Lastras, “End-to-End Spoken Language Understanding without Full Transcripts
  2. Ashish Mittal, Samarth Bharadwaj, Shreya Khare, Saneem Chemmengath, Karthik Sankaranarayanan, Brian Kingsbury, “Representation based meta-learning for few-shot spoken intent recognition

Speech Recognition

  1. Gakuto Kurata, George Saon, “Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-end Speech
  2. Takashi Fukuda, Samuel Thomas, “Implicit Transfer of Privileged Acoustic Information in Generalized Knowledge Distillation Framework
  3. Zoltán Tüske, George Saon, Kartik Audhkhasi, Brian Kingsbury, “Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard
  4. Alexandros Koumparoulis, Gerasimos Potamianos, Samuel Thomas, Edmilson Morais, “Resource-adaptive Deep Learning for Visual Speech Recognition
  5. Samuel Thomas, Kartik Audhkhasi, Brian Kingsbury, “Transliteration Based Data Augmentation for Training Multilingual ASR Acoustic Models in Low Resource Settings

IBM Research

Brian Kingsbury

Distinguished Research Staff Member, IBM Research

Ron Hoory

Senior Technical Staff Member, IBM Research

More AI stories

We’ve moved! The IBM Research blog has a new home

In an effort better integrate the IBM Research blog with the IBM Research web experience, we have migrated to a new landing page:

Continue reading

Pushing the boundaries of human-AI interaction at IUI 2021

At the 2021 virtual edition of the ACM International Conference on Intelligent User Interfaces (IUI), researchers at IBM will present five full papers, two workshop papers, and two demos.

Continue reading

From HPC Consortium’s success to National Strategic Computing Reserve

Founded in March 2020 just as the pandemic’s wave was starting to wash over the world, the Consortium has brought together 43 members with supercomputing resources. Private and public enterprises, academia, government and technology companies, many of whom are typically rivals. “It is simply unprecedented,” said Dario Gil, Senior Vice President and Director of IBM Research, one of the founding organizations. “The outcomes we’ve achieved, the lessons we’ve learned, and the next steps we have to pursue are all the result of the collective efforts of these Consortium’s community.” The next step? Creating the National Strategic Computing Reserve to help the world be better prepared for future global emergencies.

Continue reading