The Text to Speech service was updated on December 1, 2016. The service
now includes a new voice,
es-LA_SofiaVoice, for Latin
American Spanish, and it now offers voice transformation for all US
English voices. For more information about these and all recent updates
to the service, see the Release notes.
The IBM® Text to Speech service provides an Application Programming Interface (API) that uses IBM's speech-synthesis capabilities to convert written text to natural-sounding speech. The service streams the results back to the client with minimal delay. The service offers the following features:
HTTP and WebSocket interfaces: Supports speech synthesis
via both HTTP REST and WebSocket interfaces. Both interfaces
enable the use of SSML for all supported languages. The WebSocket
interface also supports the SSML
as well as optional word timing information for all words of the
input text to synchronize the audio and input, for example, for use
with robots. See Using the HTTP interface
and Using the WebSocket interface.
Audio formats: Produces Ogg format with the opus codec (the default), Waveform Audio File Format (WAV), Free Lossless Audio Codec (FLAC), Linear 16-bit Pulse-Code Modulation (PCM), mu-law (u-law), or basic audio. See Specifying an audio format.
Voices: Synthesizes text to audio in a variety of languages, including English, French, German, Italian, Japanese, Spanish, and Brazilian Portuguese. The service offers at least one male or female voice, sometimes both, for each language and different dialects, such as US and UK English and Castilian, Latin American, and North American Spanish. The audio uses appropriate cadence and intonation. See Specifying a voice.
SSML: Accepts plain text or text that is tagged with the Speech Synthesis Markup Language (SSML), an XML-based markup language that provides annotations of text for speech synthesis applications. See Specifying SSML input.
Expressiveness: Augments SSML with an expressive element that lets you indicate a speaking style of GoodNews, Apology, or Uncertainty. Currently available only for the US English Allison voice. See Using expressive SSML.
Voice transformation: Extends SSML by adding a voice transformation element that lets you expand the range of possible voices by controlling aspects such as pitch, rate, and timbre. The service also offers two built-in virtual voices, Young and Soft. Currently available only for US English voices. See Using voice transformation SSML.
Customization: Provides a customization interface that lets you specify how it pronounces unusual words that occur in your input. You can define pronunciations with the International Phonetic Alphabet (IPA) or IBM Symbolic Phonetic Representation (SPR). See Understanding customization.
For information about the pricing plans available for the service, see the Text to Speech service in Bluemix®.
The Text to Speech service can be used in voice-driven and screenless interfaces, as well as in interfaces for the disabled. It can be used in situations where audio is the preferred method of output, including home automation solutions, assistance tools for the vision-impaired, reading text and email messages aloud to drivers, video script narration and voice over, and reading-based educational tools.
You can see a quick demo of the Text to Speech service in action. The demo lets you enter text from which you can generate speech with different voices, including expressiveness and transformation where supported. Applications in Watson Developer Cloud Starter Kits also demonstrate the Text to Speech service.
We are always looking to improve and learn from your experience with our services. You can ask programming-related questions in the Watson forums on Stack Overflow. You can submit comments or ask product-related questions about this service in the Watson forum on dW Answers. You can also read general posts about Watson services that are written by IBM researchers, developers, and other experts on the Watson blog.