Tutorial

This page provides instructions to help you get started quickly with the IBM® Text to Speech service by running examples in a simple cURL-based tutorial. cURL is a popular command-line interface for working with URLs. It provides an easy way to get started with the service's HTTP API.

Satisfy the prerequisites

Before starting the tutorial, make sure that you meet the following prerequisites:

  • Obtain Bluemix® credentials: For information about obtaining HTTP basic authentication credentials for working with a Watson service in Bluemix, see Getting service credentials in Bluemix. The instructions include information about registering for Bluemix to obtain an ID if you do not already have one.

  • Install the cURL executable: If you have not already downloaded the curl executable, you can install the version for your operating system from curl.haxx.se. You must install the version that supports the Secure Sockets Layer (SSL) protocol. Make sure to include the installed binary file on your PATH environment variable.

Synthesize text

This section shows three simple cURL command-line examples that you can run from your local system. For more information about the parameters used in the examples, see Using the HTTP interface.

The first example generates an audio file in Waveform Audio File Format (WAV). In this command and in those that follow, replace <username>:<password> with the values of the username and password from your HTTP basic authentication credentials for the service in Bluemix. Concatenate the two values with an embedded colon to create a single string of the form username:password. Note that you must use your service credentials, not your Bluemix ID and password.

curl -X POST -u <username>:<password>
--header "Content-Type: application/json"
--header "Accept: audio/wav"
--data "{\"text\":\"hello world\"}"
--output hello_world.wav
"https://stream.watsonplatform.net/text-to-speech/api/v1/synthesize"

The arguments to the command specify the following values:

  • -X POST specifies that the command is to use the HTTP POST request method.

  • -u provides your HTTP basic authentication credentials (username and password) for contacting the service.

  • --header specifies an HTTP header parameter for the call to the service. The command passes values for two headers: The Content-Type header identifies the type of the input as JSON, and Accept indicates that the service is to return the audio signal as WAV output (audio/wav).

  • --data specifies a simple string, "hello world," that the service is to convert into an audio signal. The string is specified as plain text in JSON format.

The final argument tells the command the URL to contact, in this case, the synthesize method of the Text to Speech service. The command uses default values for the service's other parameters.

The final line of the command redirects the output returned by the service to a local file named hello_world.wav. If you submit the command to the service, you can play the file to hear the audio created from the string. Note that the output generates a WAV file that is missing the length information. Nonetheless, the file can be played in many standard audio players.

If you experience SSL errors, you can use the cURL -k option to disable certificate verification. Use of the option with production data is discouraged.

The second example is very similar to the first. However, it does not include the Accept header to specify WAV output, so the service generates output in Ogg format with the opus codec. This is the default format for the synthesize method. The final line of the example redirects the output to a file named hello_world.ogg, which you can play from your local system.

curl -X POST -u <username>:<password>
--header "Content-Type: application/json"
--data "{\"text\":\"hello world\"}"
--output hello_world.ogg
"https://stream.watsonplatform.net/text-to-speech/api/v1/synthesize"

The third example uses the HTTP GET request method to call that version of the synthesize method. In addition to passing the -u option, the command passes the value GET with the -X option. It uses the following query parameters:

  • accept indicates that the audio signal is to be returned in WAV format (audio/wav).

  • text passes the string to be synthesized (the Spanish text "hola mundo").

  • voice indicates that the text is to be spoken by the male Spanish voice named Enrique (es-ES_EnriqueVoice).

The -o option directs the command to write the output to a local file named hola_mundo.wav.

curl -X GET -u <username>:<password>
--output hola_mundo.wav
"https://stream.watsonplatform.net/text-to-speech/api/v1/synthesize?accept=audio/wav&text=hola%20mundo&voice=es-ES_EnriqueVoice"