Developers

Automatic Speech Recognition – Are All Tests Comparable?

Share this post:

Key Points:
– Access to appropriate domain data is the dominant factor in determining speech recognition performance. For this reason, Watson offers a cloud-based API with a general model with the option to customize.
– One system’s general speech recognition model shouldn’t be compared to other speech models that have been pre-customized with data that is not universally accessible.
– An important point of distinction is Watson’s ability to allow businesses to opt to build their own customized model versus contributing their data to a central database. This allows the client to maintain control of their critical private and proprietary information.

Learn about Watson Speech to Text

 

Automatic speech recognition, the ability to identify words and phrases in spoken language and converting them to text in real-time, provides nearly endless opportunities for the humans that use these AI systems, from improving customer satisfaction or enabling remote communication between doctors and patients to improving accessibility for the deaf or the blind. Platforms and applications built on automatic speech recognition are only as good as the system’s understanding of language, and the way this understanding is measured. Achieving human parity, meaning an error rate on-par with that of a human listening to two people in conversation, has long remained a significant industry challenge – as has measuring it consistently.

For accurate comparison of systems, training and testing must be consistent
First, training and testing must be consistent, especially on highly customized data sets. Consider this: if you suffered from migraines, would you seek medical advice from a neurologist or a podiatrist? While both are skilled doctors, they are experts in differing domains, with a neurologist trained specifically on the language of the brain. Would it be accurate to judge the podiatrist against the neurologist in this instance? Similarly, for speech recognition systems and the models they apply to be truly comparable, it is necessary for both systems to be trained and tested on the same data. Using proprietary data upon which one systems is already highly trained while the other is not, can greatly skew outcomes.

For this reason, benchmarks for speech recognition systems including the SWITCHBOARD corpus that IBM regularly reports on have been considered the standard controlled data set for automatic speech recognition testing for twenty years and counting. Another industry corpus, known as “CallHome,” is also available for benchmark testing and is widely used for determining system WER figures. By using SWITCHBOARD or CallHome, systems are tested against the exact same data set, eliminating additional variables that may skew findings.

Access to domain data is a critical factor in measuring performance
Second, it is inaccurate and misleading to compare one system’s general model against other speech models that have been pre-customized with data that are not universally accessible. Access to appropriate domain data is the dominant factor in determining speech recognition performance. For this reason, Watson offers a cloud-based API with a general model with the option to customize. An important point of distinction is the Watson system’s ability to allow businesses to opt to build their own customized model versus contributing their data to a central database. This allows the client to maintain control of their critical private and proprietary information.

Rather than providing a highly specialized data set upon which businesses build their applications, Watson’s differentiation is in that it provides the tools and services customers can use to build recognition capabilities in their respective domains against their own data sets. For example, Invoca uses the Watson Speech Recognition API to drive intelligent marketing insights from phone call data. Another example is Mizuho Bank in Japan, which uses Watson Speech Recognition API to provide real-time relevant information to call center agents to better prepare to respond to customers in real-time.

As AI technology becomes more mainstream in its applications across industries and disciplines, the capability for systems to understand natural language and interpret it accurately will be monumentally important. It will be the responsibility of all organizations involved in the advancement of AI to appropriately train and test systems against recognized, standardized methodologies in an ethical, accountable and impactful way.

Learn more about Watson Speech to Text
Watson Speech to Text converts audio voice into written text. Use the Speech to Text API to transcribe calls in a contact center, to identify what is being discussed, when to escalate calls, and to understand content from multiple speakers. You can also use Speech to Text to create voice-controlled applications – even customize the model to improve accuracy for the language and content you care about most such as product names, sensitive subjects, or names of individuals.

Learn more about Watson’s Speech-to-Text API and sign up for a 30-day free trial.
(This post was co-authored by Michael Picheny, George Saon and Bhavik Shah)

 

Convert audio into written text with a free trial of Watson Speech to Text

Add Comment
No Comments

Leave a Reply

Your email address will not be published.Required fields are marked *

More Developers Stories
November 20, 2017

How fund managers can apply AI to turn data into insights, reduce bias in decisions and generate alpha

In this age of rampant data growth, the only way to reliably beat the market on a risk-adjusted basis is to mine unstructured data faster and more accurately than competitors. Companies that combine AI, and machine learning with speed, accuracy, nuance and contextual awareness will change the game of managing and growing investments.

Continue reading

November 16, 2017

Get the Forrester Report on how to make customer service smarter, faster and more cost effective

Call centers executives face the constant challenge of meeting customer expectations and business cost goals. Customers prefer interacting with virtual agents, and are choosing messaging over phone calls for issue resolution. This Forrester report outlines the trends that will enable call centers to become smarter and more strategic.

Continue reading

November 14, 2017

Top 10 ways that AI will impact business in the next decade

AI already impacts many aspects of our daily lives at work and at home. Over the next decade, experts predict that AI enterprise software revenue will grow from $644 million to nearly $39 billion. Here are the top 10 ways that we think AI will impact business over the next 10 years.

Continue reading