As the name suggests, “Speech to Text (STT)” is a mechanism by which machines convert speech or audio into text. STT is also known as Automatic Speech Recognition (ASR). The STT technology finds use in a wide variety of situations across industries.
For instance, it helps automate customer service in a call center where a caller’s speech is converted into text by the STT engine to identify the intent for the call, the entities referred to by the caller using specific nouns in the sentence, the caller’s emotion, and sentiment, among others. This information then helps formulate the best possible response. Another widely used scenario is the transcription of audio files to analyze trends and patterns based on public information about a brand or product. The former use case deals with live transcription, while the latter deals with batch transcription.
IBM Watson is a pioneer in the artificial intelligence (AI), natural language processing (NLP), and speech technology areas with a list of clientele that includes several large enterprises. To cater to Indian audiences, Watson Speech to Text technology is now trained to transcribe Indian English and Hindi language audios! These new services have been built using the next-generation models that offer very high throughput and much greater transcription accuracy.
How to access IBM Watson Speech to Text
You can try the new language features for free on IBM Cloud by using your existing IBM Cloud id to login or by creating a free account. To access the new feature, you have to first add the Watson Speech to Text service in your IBM Cloud account. The “Lite” plan allows you to use the service for up to 500 minutes in a month at no cost.
Tips for testing the new transcription service
Once you have created the service, try the transcription service using the Watson Speech to Text REST API interface. The IBM Cloud API docs is an excellent reference document that provides details of the API call as well as provides sample code to help you test in minutes. Watson supports several languages through various software development kits (SDKs), out of which I use the Python SDK. Check out my basic template: Jupyter Notebook (Test Indian English & Hindi models-Template.ipynb).
You can also leverage Postman using the following steps:
Make a POST call with URL appended with “/v1/recognize”
Use “Basic” authentication with Username as “apikey” and Password as the api key value.
In “Params”, add “model” as “hi-IN_Telephony” or “en-IN_Telephony” depending on language of the audio file being transcribed
Add header: “Content-Type” as “audio/xxx” (where xxx is audio format and can be mp3, etc.)
In the Body, select “binary” and add the audio file to be transcribed
If you have a particular requirement related to speech transcription or would like to understand more about the technology, reach out to me for a free consultation.
Happiest Minds Technologies, positioned as ‘Born Digital. Born Agile’, is an IT company that has capabilities spanning digital solutions, infrastructure, product engineering, and security, delivered across industry sectors.
You would agree that over the last 12-18 months the way you run your business and manage your teams has changed. Perhaps, forever! More workloads are on the cloud. More customers are engaging in digital transactions, expecting smooth but also secure online experiences. More regulatory compliance requirements – getting more stringent. More contractors, suppliers, partners, […]
Genpact, a global professional services firm, is accelerating digital-led innovation and digitally-enabled intelligent operations for its clients – many of them Global Fortune 500 companies – to deliver real-world transformation at scale. With oversees global operations in 30 countries serving over 800 clients, Genpact’s 90,000+ employees are reinventing business models and running thousands of processes, […]