Question & Answer
Question
Answer
The Speech to Text service allows you to transcribe 100 MB of audio data per request for the Synchronous HTTP and WebSockets interfaces. Using the Asynchronous HTTP interface, you may submit up to 1 GB of audio data per request.
One way to maximize the amount of audio data that you can pass with a speech recognition request is to use a format that offers compression. There are two basic types of compression: lossy and lossless. The audio format and compression algorithm that you choose can have a direct impact on the accuracy of speech recognition.
You can read more on data limits and on strategies to improve transcription accuracy in the Data limits and compression section of the documentation.
Was this topic helpful?
Document Information
Modified date:
12 January 2023
UID
ibm1KB0010954