Curl Node Java

Speech to Text

API Reference
IBM Speech to Text API reference

Introduction

The IBM® Speech to Text service provides an API that enables you to add IBM's speech recognition capabilities to your applications. The service transcribes speech from various languages and audio formats to text with low latency. For most languages, the service supports two sampling rates, broadband and narrowband.

The Speech to Text API consists of the following groups of related calls:

  • Models includes calls that return information about the models (languages and sampling rates) available for transcription.

  • WebSockets includes a single call that establishes a persistent connection with the service over the WebSocket protocol.

  • Sessionless includes HTTP calls that provide a simple means of transcribing audio without the overhead of establishing and maintaining a session.

  • Sessions provides a collection of HTTP calls that provide a mechanism for a client to maintain a long, multi-turn exchange, or session, with the service or to establish multiple parallel conversations with a particular instance of the service.

  • Asynchronous provides a non-blocking HTTP interface for transcribing audio. You can register a callback URL to be notified of job status and, optionally, results, or you can poll the service to learn job status and retrieve results manually.

  • Custom models provides an HTTP interface for creating custom language models. The interface lets you expand the vocabulary of a base language model with domain-specific terminology.

  • Custom corpora provides an HTTP interface for managing the corpora associated with a custom language model. You add a corpus to a custom model to extract words from the corpus into the model's vocabulary.

  • Custom words provides an HTTP interface for managing individual words in a custom language model. You can add, list, and delete words from a custom model.

Usage guidelines for customization

The following usage information pertains to methods of the customization interface that is used with custom models, corpora, and words:

  • The customization interface is a beta release that is available for US English and Japanese only.

  • In all cases, you must be the owner of a custom language model to use any of the methods described in this documentation with that model.

  • Each custom language model is identified by a customization ID, which is a Globally Unique Identifier (GUID). A GUID is a hexadecimal string that has the same format as Watson service credentials: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx. You specify a custom model's GUID with the customization_id parameter of calls associated with the model.

HTTP API endpoint


https://stream.watsonplatform.net/speech-to-text/api

WebSocket API endpoint


wss://stream.watsonplatform.net/speech-to-text/api

The code examples on this tab use the client library that is provided for Node.js.

GitHub

https://github.com/watson-developer-cloud/node-sdk

Node Package Manager


npm install watson-developer-cloud

The code examples on this tab use the client-side library that is provided for Java.

GitHub

https://github.com/watson-developer-cloud/java-sdk

Maven


<dependency>
  <groupId>com.ibm.watson.developer_cloud</groupId>
  <artifactId>java-sdk</artifactId>
  <version>3.5.3</version>
</dependency>

Gradle


compile 'com.ibm.watson.developer_cloud:java-sdk:3.5.3'

Synchronous and asynchronous requests

The Java SDK supports both synchronous (blocking) and asynchronous (non-blocking) execution of all methods. All methods are called with the Java ServiceCall interface.

  • To call a method synchronously, use the execute method of the ServiceCall interface. You can also call the execute method directly from an instance of the service, as shown in Get models. Note that the method can return an unchecked RuntimeException.

  • To call a method asynchronously, use the enqueue method of the ServiceCall interface to receive a callback when the response arrives. The ServiceCallback interface of the method's argument provides onResponse and onFailure methods that you override to handle the callback.

Example synchronous request


ServiceCall call = service.getModels();
List<SpeechModel> models = call.execute();

Example asynchronous request


ServiceCall call = service.getModels();
call.enqueue(new ServiceCallback<List<SpeechModel>>() {
  @Override public void onResponse(List<SpeechModel> models) {
    . . .
  }
  @Override public void onFailure(Exception e) {
    . . .
  }
});

More information

An interactive tool for testing calls to the API and viewing live responses from the service is available in the Speech to Text API explorer. Descriptions of Node classes referred to in this reference are available in the Node documentation for the Watson Developer Cloud Node.js SDK. Descriptions of Java classes referred to in this reference are available in the Javadoc for the Watson Developer Cloud Java SDK. Detailed information about using the service is available at Using the Speech to Text service.

Authentication

You authenticate to the Speech to Text API by providing the username and password that are provided in the service credentials for the service instance that you want to use. The API uses HTTP basic authentication.

After creating an instance of the Speech to Text service, select Service Credentials from the navigation on the left side of its dashboard page to see the username and password that are associated with the instance. For more information, see Obtaining credentials for Watson services.

Applications can also use tokens to establish authenticated communications with Watson services without embedding their service credentials in every call. You write an authentication proxy in Bluemix to obtain a token for your client application, which can then use the token to call the service directly. You use your service credentials to obtain a token for that service. For more information, see Using tokens with Watson services.

Replace {username} and {password} with your service credentials. Use either of the two constructors shown.


curl -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/{method}"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

SpeechToText service = new SpeechToText("{username}", "{password}")

Request logging

By default, Bluemix collects data from all requests and uses the data to improve the Watson services. If you do not want to share your data, you can disable request logging by setting the X-Watson-Learning-Opt-Out header to true for each request. Data is not collected for any request that includes this header. setting the X-Watson-Learning-Opt-Out parameter to true when you create the service instance. Data is not collected for any calls by that instance of the service. setting the X-Watson-Learning-Opt-Out header to true when you create the service instance. Data is not collected for any calls by that instance of the service. For more information, see Controlling request logging for Watson services.


curl -u "{username}":"{password}"
--header "X-Watson-Learning-Opt-Out: true"
"https://stream.watsonplatform.net/speech-to-text/api/v1/{method}"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1({
  username: '{username}',
  password: '{password}',
  headers: {
    'X-Watson-Learning-Opt-Out': 'true'
  }
});

Map<String, String> headers = new HashMap<String, String>();
headers.put("X-Watson-Learning-Opt-Out", "true");

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");
service.setDefaultHeaders(headers);

Response handling

The Speech to Text service uses standard HTTP response codes to indicate whether a method completed successfully. A 200-level response always indicates success. A 300-level response indicates the requested resource has not been modified. A 400-level response indicates some sort of failure. And a 500-level response typically indicates an internal system error. Response codes are listed with the individual calls.

Response codes that indicate success are not readily available with the Node.js SDK. In general, the lack of an error response indicate a 200-level success response. For errors, response codes are indicated in the error object that is returned.

The Java SDK raises equivalent exceptions, which are listed with the individual methods. The exceptions include the error message returned by the service. All methods can throw the following common exceptions.

Common exceptions thrown

Exception Description
IllegalArgumentException An illegal argument was passed to a method that accepts one or more arguments.
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)

Error format

Name Description
error string Description of the error.
code integer HTTP status code.
code_description string Response message that describes the problem.
Name Description
Exception string The name of the exception that was raised.
status integer The HTTP status code.
error string A description of the error.

Example error


{
  "error": "Model en-US_Broadband not found",
  "code": 404,
  "code_description": "No Such Resource"
}

{
  Error: Model en-US_Model not found
  . . .
  code: 404,
  error: 'Model en-US_Model not found',
  code_description: 'No Such Resource'
}

SEVERE: GET https://stream.watsonplatform.net/speech-to-text/api/v1/models/en-US_Broadband, status: 404, error: Model en-US_Broadband not found
Exception in thread "main" com.ibm.watson.developer_cloud.service.exception.NotFoundException: Model en-US_Broadband not found
   . . .

Models

Get models

Retrieves a list of all models available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.


GET /v1/models

getModels(params, callback())

ServiceCall<List<SpeechModel>> getModels()

Request

No arguments.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/models"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

speech_to_text.getModels(null, function(error, models) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(models, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

List<SpeechModel> models = service.getModels().execute();
System.out.println(models);

Response

ModelSet
Name Description
models object[ ] An array of Model objects that provides information about the available models.

Returns a List of Java SpeechModel objects.

Model (Java SpeechModel object)
Name Description
name string The name of the model for use as an identifier in calls to the service (for example, en-US_BroadbandModel).
rate integer The sampling rate (minimum acceptable rate for audio) used by the model in Hertz.
description string A brief description of the model.
sessions string The URI for the model for use with the /v1/sessions method. (Returned only for requests for a single model; see Get a model.)
language string The language identifier for the model (for example, en-US).
url string The URI of the model.
supported_features object A SupportedFeatures object that describes the additional service features supported with the model.
SupportedFeatures
Name Description
custom_language_model boolean Indicates whether the customization interface can be used with the language model.
speaker_labels boolean Indicates whether the speaker_labels parameter can be used with the language model.

Response codes

Status Description
200 OK The request succeeded.
406 Not Acceptable The request specified an Accept header with an incompatible content type.
415 Unsupported Media Type The request specified an unacceptable media type.

Exceptions thrown

Exception Description
ForbiddenException The request specified an Accept header with an incompatible content type. (HTTP response code 406.)
UnsupportedException The request specified an unacceptable media type. (HTTP response code 415.)

Example response


{
  "models": [
    {
      "name": "fr-FR_BroadbandModel",
      "language": "fr-FR",
      "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/fr-FR_BroadbandModel",
      "rate": 16000,
      "supported_features": {
        "custom_language_model": false,
        "speaker_labels": false
      },
      "description": "French broadband model."
    },
    {
      "name": "en-US_NarrowbandModel",
      "language": "en-US",
      "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/en-US_NarrowbandModel",
      "rate": 8000,
      "supported_features": {
        "custom_language_model": true,
        "speaker_labels": true
      },
      "description": "US English narrowband model."
    },
    . . .
  ]
}

[
  {
    "name": "fr-FR_BroadbandModel",
    "rate": 16000,
    "description": "French broadband model."
  },
  {
    "name": "en-US_NarrowbandModel",
    "rate": 8000,
    "description": "US English narrowband model."
  },
  {
    "name": "pt-BR_BroadbandModel",
    "rate": 16000,
    "description": "Brazilian Portuguese broadband model."
  },
  {
    "name": "ja-JP_NarrowbandModel",
    "rate": 8000,
    "description": "Japanese narrowband model."
  },
  . . .
]

Get a model

Retrieves information about a single specified model that is available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.


GET /v1/models/{model_id}

getModel(params, callback())

ServiceCall<SpeechModel> getModel(String modelName)

Request

Parameter Type Description
model_id path string The identifier of the desired model:
  • ar-AR_BroadbandModel
  • en-UK_BroadbandModel
  • en-UK_NarrowbandModel
  • en-US_BroadbandModel
  • en-US_NarrowbandModel
  • es-ES_BroadbandModel
  • es-ES_NarrowbandModel
  • fr-FR_BroadbandModel
  • ja-JP_BroadbandModel
  • ja-JP_NarrowbandModel
  • pt-BR_BroadbandModel
  • pt-BR_NarrowbandModel
  • zh-CN_BroadbandModel
  • zh-CN_NarrowbandModel
Parameter Description
model_id modelName string The identifier of the desired model:
  • ar-AR_BroadbandModel
  • en-UK_BroadbandModel
  • en-UK_NarrowbandModel
  • en-US_BroadbandModel
  • en-US_NarrowbandModel
  • es-ES_BroadbandModel
  • es-ES_NarrowbandModel
  • fr-FR_BroadbandModel
  • ja-JP_BroadbandModel
  • ja-JP_NarrowbandModel
  • pt-BR_BroadbandModel
  • pt-BR_NarrowbandModel
  • zh-CN_BroadbandModel
  • zh-CN_NarrowbandModel

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/models/en-US_BroadbandModel"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  model_id: 'en-US_BroadbandModel'
};

speech_to_text.getModel(params, function(error, model) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(model, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

SpeechModel model = service.getModel("en-US_BroadbandModel").execute();
System.out.println(model);

Response

Returns a single instance of a Model object with results for the specified model.

Returns a single Java SpeechModel object for the specified model. The information is the same as that described for the JSON Model object.

Response codes

Status Description
200 OK The request succeeded.
404 Not Found The specified model_id was not found.
406 Not Acceptable The request specified an Accept header with an incompatible content type.
415 Unsupported Media Type The request specified an unacceptable media type.

Exceptions thrown

Exception Description
NotFoundException The specified modelName was not found. (HTTP response code 404.)
ForbiddenException The request specified an Accept header with an incompatible content type. (HTTP response code 406.)
UnsupportedException The request specified an unacceptable media type. (HTTP response code 415.)

Example response


{
  "rate": 16000,
  "name": "en-US_BroadbandModel",
  "language": "en-US",
  "sessions": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions?model=en-US_BroadbandModel",
  "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/en-US_BroadbandModel",
  "supported_features": {
    "custom_language_model": true,
    "speaker_labels": false
  },
  "description": "US English broadband model."
}

{
  "name": "en-US_BroadbandModel",
  "rate": 16000,
  "sessions": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions?model=en-US_BroadbandModel",
  "description": "US English broadband model."
}

WebSockets

Recognize audio

Sends audio and returns transcription results for recognition requests over a WebSocket connection. Requests and responses are enabled over a single TCP connection that abstracts much of the complexity of the request to offer efficient implementation, low latency, high throughput, and an asynchronous response. By default, only final results are returned for any request; to enable interim results, set the interim_results interimResults parameter to true.

The service imposes a data size limit of 100 MB per connection. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.


/v1/recognize

RecognizeStream createRecognizeStream(params)

void recognizeUsingWebSocket(InputStream audio, RecognizeOptions options,
  RecognizeCallback callback)

Request

The client establishes a connection with the service by using the WebSocket constructor to create an instance of a WebSocket connection object. The constructor sets the following basic parameters for the connection and for all recognition requests sent over it.

Parameter Type Description
X-Watson-Authorization-Token header string Provides an authentication token for the service. The token is used instead of service credentials. You must pass a valid token via either this header or the watson-token query parameter. For more information, see Authentication.
watson-token query string Provides an authentication token for the service. The token is used instead of service credentials. You must pass a valid token via either this query parameter or the X-Watson-Authorization-Token header. For more information, see Authentication.
model query string The identifier of the model to be used for all recognition requests sent over the connection:
  • ar-AR_BroadbandModel
  • en-UK_BroadbandModel
  • en-UK_NarrowbandModel
  • en-US_BroadbandModel (the default)
  • en-US_NarrowbandModel
  • es-ES_BroadbandModel
  • es-ES_NarrowbandModel
  • fr-FR_BroadbandModel
  • ja-JP_BroadbandModel
  • ja-JP_NarrowbandModel
  • pt-BR_BroadbandModel
  • pt-BR_NarrowbandModel
  • zh-CN_BroadbandModel
  • zh-CN_NarrowbandModel
customization_id query string The GUID of a custom language model that is to be used for all requests sent over the connection. The base language model of the specified custom language model must match the model specified with the model parameter. By default, no custom model is used.
x-watson-learning-opt-out query boolean Indicates whether to opt out of data collection for requests sent over the connection. If true, no data is collected; if false (the default), data is collected for all requests and results. You can also opt out of request logging by passing a value of true with the X-Watson-Learning-Opt-Out request header; see Request logging.

The client initiates and manages recognition requests by sending JSON-formatted text messages to the service over the connection. The client sends the audio data to be transcribed as a binary message (blob).

Parameter Description
action string Indicates the action to be performed:
  • start initiates a recognition request. The message must include the content-type parameter; it can also include any optional parameters described in this table. After sending this text message, the client sends the audio data as a binary message (blob).
  • stop indicates that all audio data for the request has been sent to the service.
  • no-op avoids the 30-second session timeout by touching the session and keeping it alive.
content-type string The MIME type of the audio:
  • audio/flac
  • audio/l16 (Also specify the sampling rate and number of channels; for example, audio/l16; rate=48000; channels=2. Ensure that the rate matches the rate at which the audio is captured and specify a maximum of 16 channels.)
  • audio/wav (Provide audio with a maximum of nine channels.)
  • audio/ogg;codecs=opus
  • audio/mulaw (Also specify the sampling rate at which the audio is captured.)
  • audio/basic (Use audio in this format only with narrowband models.)
For additional information about the supported audio formats, see Audio formats. The information includes links to a number of Internet sites that provide technical and usage details about the different formats.
continuous boolean Indicates whether multiple final results that represent consecutive phrases separated by long pauses are returned. If true, the service transcribes the entire audio stream until it terminates rather than stopping at the first half-second of non-speech; if false (the default), recognition ends after the first end-of-speech (EOS) incident is detected.
inactivity_timeout integer The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed. The default is 30 seconds. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity.
interim_results boolean Indicates whether the service is to return interim results. If true, interim results are returned as a stream of JSON objects; each object represents a single SpeechRecognitionEvent. If false (the default), the response is a single SpeechRecognitionEvent with final results only.
keywords string[ ] A list of keywords to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.
keywords_threshold float A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords.
max_alternatives integer The maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
word_alternatives_threshold float A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.
word_confidence boolean Indicates whether a confidence measure in the range of 0 to 1 is returned for each word. The default is false.
timestamps boolean Indicates whether time alignment is returned for each word. The default is false.
profanity_filter boolean Indicates whether profanity filtering is performed on the transcript. If true (the default), the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. If false, the service returns results with no censoring. Applies to US English transcription only.
smart_formatting boolean Indicates whether dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses are to be converted into more readable, conventional representations in the final transcript of a recognition request. If true, smart formatting is performed; if false (the default), no formatting is performed. Applies to US English transcription only.
speaker_labels boolean Indicates whether labels that identify which words were spoken by which participants in a multi-person exchange are to be included in the response. If true, speaker labels are returned; if false (the default), they are not. Speaker labels can be returned only for the following language models:
  • en-US_NarrowbandModel
  • es-ES_NarrowbandModel
  • ja-JP_NarrowbandModel
Setting speaker_labels to true forces the continuous and timestamps parameters to be true, as well, regardless of whether the user specifies false for the parameters. For more information, see Speaker labels.

Pass the audio stream to be transcribed via the method's audio argument, and pass a Java BaseRecognizeCallback object to handle events from the WebSocket connection via the callback argument. Pass all other parameters for the recognition request as a Java RecognizeOptions object via the options argument.

Parameter Description
audio object An InputStream object that passes the audio to be transcribed in the format specified by the contentType parameter.
callback object A Java BaseRecognizeCallback object that implements the RecognizeCallback interface to handle events from the WebSocket connection. Override the definitions of the object's default methods to respond to events as needed by your application.
content_type contentType string The MIME type of the audio:
  • audio/flac
  • audio/l16 (Also specify the sampling rate and number of channels; for example, audio/l16; rate=48000; channels=2. Ensure that the rate matches the rate at which the audio is captured and specify a maximum of 16 channels.)
  • audio/wav (Provide audio with a maximum of nine channels.) The Node SDK uses a default value of audio/wav.
  • audio/ogg;codecs=opus
  • audio/mulaw (Also specify the sampling rate at which the audio is captured.)
  • audio/basic (Use audio in this format only with narrowband models.)
For additional information about the supported audio formats, see Audio formats. The information includes links to a number of Internet sites that provide technical and usage details about the different formats.
model string The identifier of the model to be used for the recognition request:
  • ar-AR_BroadbandModel
  • en-UK_BroadbandModel
  • en-UK_NarrowbandModel
  • en-US_BroadbandModel (the default)
  • en-US_NarrowbandModel
  • es-ES_BroadbandModel
  • es-ES_NarrowbandModel
  • fr-FR_BroadbandModel
  • ja-JP_BroadbandModel
  • ja-JP_NarrowbandModel
  • pt-BR_BroadbandModel
  • pt-BR_NarrowbandModel
  • zh-CN_BroadbandModel
  • zh-CN_NarrowbandModel
continuous boolean Indicates whether multiple final results that represent consecutive phrases separated by long pauses are returned. If true, the service transcribes the entire audio stream until it terminates rather than stopping at the first half-second of non-speech; if false (the default), recognition ends after the first end-of-speech (EOS) incident is detected. The Node SDK uses a default value of true.
inactivity_timeout inactivityTimeout integer The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed. The default is 30 seconds. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. The Node SDK uses a default value of 600.
interim_results interimResults boolean Indicates whether the service is to return interim results. If true, interim results are returned as a stream of JSON objects; each object represents a single SpeechRecognitionEvent. If false (the default), the response is a single SpeechRecognitionEvent with final results only. The Node SDK uses a default value of true.
keywords string[ ] A list of keywords to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.
keywords_threshold keywordsThreshold float A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords.
max_alternatives maxAlternatives integer The maximum number of alternative transcripts to be returned. By default, a single transcription is returned. The Node SDK uses a default value of 3.
word_alternatives_threshold wordAlternativesThreshold float A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.
word_confidence wordConfidence boolean Indicates whether a confidence measure in the range of 0 to 1 is returned for each word. The default is false. The Node SDK uses a default value of true.
timestamps boolean Indicates whether time alignment is returned for each word. The default is false. The Node SDK uses a default value of true.
profanity_filter profanityFilter boolean Indicates whether profanity filtering is performed on the transcript. If true (the default), the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. If false, the service returns results with no censoring. Applies to US English transcription only.
smart_formatting smartFormatting boolean Indicates whether dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses are to be converted into more readable, conventional representations in the final transcript of a recognition request. If true, smart formatting is performed; if false (the default), no formatting is performed. Applies to US English transcription only.
speaker_labels speakerLabels boolean Indicates whether labels that identify which words were spoken by which participants in a multi-person exchange are to be included in the response. If true, speaker labels are returned; if false (the default), they are not. Speaker labels can be returned only for the following language models:
  • en-US_NarrowbandModel
  • es-ES_NarrowbandModel
  • ja-JP_NarrowbandModel
Setting speaker_labels speakerLabels to true forces the continuous and timestamps parameters to be true, as well, regardless of whether the user specifies false for the parameters. For more information, see Speaker labels.
X-Watson-Learning-Opt-Out boolean Indicates whether to opt out of data collection for the call. If true, no data is collected from the call; if false (the default), data is collected. See Request logging.
watson-token string Provides an authentication token for the service as an alternative to providing service credentials. For more information, see Authentication.

Example request


var token = "{authentication-token}";
var wsURI = "wss://stream.watsonplatform.net/speech-to-text/api/v1/recognize?watson-token=" + token
     + '&model=en-US_BroadbandModel';

var websocket = new WebSocket(wsURI);
websocket.onopen = function(evt) { onOpen(evt) };
websocket.onclose = function(evt) { onClose(evt) };
websocket.onmessage = function(evt) { onMessage(evt) };
websocket.onerror = function(evt) { onError(evt) };

function onOpen(evt) {
  var message = {
    action: 'start',
    'content-type': 'audio/flac',
    continuous: true,
    'interim_results': true,
    'max-alternatives': 3,
    keywords: ['colorado', 'tornado', 'tornadoes'],
    'keywords_threshold': 0.5
  };
  websocket.send(JSON.stringify(message));

  // Prepare and send the audio file.
  websocket.send(blob);

  websocket.send(JSON.stringify({action: 'stop'}));
}

function onMessage(evt) {
  console.log(evt.data);
}

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var fs = require('fs');

var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  model: 'en-US_BroadbandModel',
  content_type: 'audio/flac',
  continuous: true,
  'interim_results': true,
  'max_alternatives': 3,
  'word_confidence': false,
  timestamps: false,
  keywords: ['colorado', 'tornado', 'tornadoes'],
  'keywords_threshold': 0.5
};

// Create the stream.
var recognizeStream = speech_to_text.createRecognizeStream(params);

// Pipe in the audio.
fs.createReadStream('audio-file.flac').pipe(recognizeStream);

// Pipe out the transcription to a file.
recognizeStream.pipe(fs.createWriteStream('transcription.txt'));

// Get strings instead of buffers from 'data' events.
recognizeStream.setEncoding('utf8');

// Listen for events.
recognizeStream.on('results', function(event) { onEvent('Results:', event); });
recognizeStream.on('data', function(event) { onEvent('Data:', event); });
recognizeStream.on('error', function(event) { onEvent('Error:', event); });
recognizeStream.on('close', function(event) { onEvent('Close:', event); });
recognizeStream.on('speaker_labels', function(event) { onEvent('Speaker_Labels:', event); });

// Displays events on the console.
function onEvent(name, event) {
  console.log(name, JSON.stringify(event, null, 2));
};

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

RecognizeOptions options = new RecognizeOptions.Builder()
  .model("en-US_BroadbandModel")
  .contentType("audio/flac").continuous(true)
  .interimResults(true).maxAlternatives(3)
  .keywords(new String[]{"colorado", "tornado", "tornadoes"})
  .keywordsThreshold(0.5).build();

BaseRecognizeCallback callback = new BaseRecognizeCallback() {
  @Override
  public void onTranscription(SpeechResults speechResults) {
    System.out.println(speechResults);
  }

  @Override
  public void onDisconnected() {
    System.exit(0);
  }
};

try {
  service.recognizeUsingWebSocket
    (new FileInputStream("audio-file.flac"), options, callback);
}
catch (FileNotFoundException e) {
  e.printStackTrace();
}

Response

Successful recognition returns one or more instances of a SpeechRecognitionEvent object depending on the input and the value of the interim_results parameter.

Returns a Java SpeechResults object that contains the results that are provided in a JSON SpeechRecognitionEvent object. The response includes one or more instances of the object depending on the input and the value of the interimResults parameter.

Response handling

The WebSocket constructor returns an instance of a WebSocket connection object. You assign application-specific calls to the following methods of the object to handle events associated with the connection. Each event handler must accept a single argument for the event from the connection that causes it to execute.

Event Description
onopen Status of the connection's opening.
onmessage Response messages for the connection, including the results of the request as one or more JSON SpeechRecognitionEvent objects.
onerror Errors for the connection or request.
onclose Status of the connection's closing.

The createRecognizeStream method returns a RecognizeStream object. You use the object's on method to define event handlers that capture the following events associated with the connection and the recognition request. For more information about handling stream events with Node.js, see the Node.js Documentation.

Event Description
results Interim and final results for the request as a JSON SpeechRecognitionEvent object.
data Final transcription results for the request.
speaker_labels Speaker label results for the request as a JSON SpeakerLabelsResult object.
error Errors for the connection or request.
close Status of the connection's closing.

The callback parameter of the recognizeUsingWebSocket method accepts a Java object of type BaseRecognizeCallback, which implements the RecognizeCallback interface to handle events from the WebSocket connection. You override the definitions of the following default empty methods of the object to handle events associated with the connection and the recognition request.

Method Description
void onConnected() Status of the connection's opening.
void onTranscription(SpeechResults speechResults) Results for the request.
void onError(Exception e) Errors for the connection or request.
void onDisconnected() Status of the connection's closing.

The connection can produce the following return codes.

Return code Description
1000 The connection closed normally.
1002 The service is closing the connection due to a protocol error.
1006 The connection was closed abnormally.
1009 The frame size exceeded the 4 MB limit.
1011 The service is terminating the connection because it encountered an unexpected condition that prevents it from fulfilling the request.

Example response


{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "several torn "
        }
      ],
      "final": false
    }
  ],
  "result_index": 0
}{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "several tornadoes touch down as "
        }
      ],
      "final": false
    }
  ],
  "result_index": 0
}{
  . . .
}{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
        }
      ],
      "final": false
    }
  ],
  "result_index": 0
}{
  "results": [
    {
      "keywords_result": {
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.94,
            "confidence": 0.913,
            "end_time": 5.62
          }
        ],
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 1.52,
            "confidence": 1.0,
            "end_time": 2.15
          }
        ]
      },
      "alternatives": [
        {
          "confidence": 0.891,
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touched down as a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touch down is a line of severe thunderstorms swept through Colorado on Sunday "
        }
      ],
        "final": true
    }
  ],
  "result_index": 0
}

Results: {
  "results": [
    {
      "alternatives": [
        {
          "transcript": "so "
        }
      ],
      "final": false
    }
  ],
  "result_index": 0
}
Results: {
  "results": [
    {
      "alternatives": [
        {
          "transcript": "several to "
        }
      ],
      "final": false
    }
  ],
  "result_index": 0
}
. . .
Results: {
  "results": [
    {
      "alternatives": [
        {
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado once "
        }
      ],
      "final": false
    }
  ],
  "result_index": 0
}
Results: {
  "results": [
    {
      "keywords_result": {
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 1.52,
            "confidence": 1,
            "end_time": 2.15
          }
        ],
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.94,
            "confidence": 0.913,
            "end_time": 5.62
          }
        ]
      },
      "alternatives": [
        {
          "confidence": 0.891,
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touched down as a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touch down is a line of severe thunderstorms swept through Colorado on Sunday "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}
Data: "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
Close: 1000

{
  "result_index": 0,
  "results": [
    {
      "final": false,
      "alternatives": [
        {
          "transcript": "so "
        }
      ]
    }
  ]
}
{
  "result_index": 0,
  "results": [
    {
      "final": false,
      "alternatives": [
        {
          "transcript": "several tornadoes to "
        }
      ]
    }
  ]
}
. . .
{
  "result_index": 0,
  "results": [
    {
      "final": false,
      "alternatives": [
        {
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado one son "
        }
      ]
    }
  ]
}
{
  "result_index": 0,
  "results": [
    {
      "final": true,
      "alternatives": [
        {
          "confidence": 0.891,
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touched down as a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touch down is a line of severe thunderstorms swept through Colorado on Sunday "
        }
      ],
      "keywords_result": {
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 1.52,
            "end_time": 2.15,
            "confidence": 1.0
          }
        ],
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.94,
            "end_time": 5.62,
            "confidence": 0.913
          }
        ]
      }
    }
  ]
}

Sessionless

Recognize audio

Sends audio and returns transcription results for a sessionless recognition request. Returns only the final results; to enable interim results, use Sessions or WebSockets. By default, returns only the final results; to enable interim results, set the interimResults parameter to true. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

You specify the parameters of the request as a path parameter, request headers, and query parameters. You provide the audio as the body of the request. This method is preferred to the multipart approach for submitting a sessionless recognition request.

For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the connection (response code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the connection (response code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

This call is the same as the session-based recognize call, but this call omits the session_id sessionId parameter and includes the model parameter.


POST /v1/recognize

recognize(params, callback())

ServiceCall<SpeechResults> recognize(File audio)
ServiceCall<SpeechResults> recognize(File audio, RecognizeOptions options)

Request

Parameter Type Description
Content-Type header string The MIME type of the audio:
  • audio/flac
  • audio/l16 (Also specify the sampling rate and number of channels; for example, audio/l16; rate=48000; channels=2. Ensure that the rate matches the rate at which the audio is captured and specify a maximum of 16 channels.)
  • audio/wav (Provide audio with a maximum of nine channels.)
  • audio/ogg;codecs=opus
  • audio/mulaw (Also specify the sampling rate at which the audio is captured.)
  • audio/basic (Use audio in this format only with narrowband models.)
For additional information about the supported audio formats, see Audio formats. The information includes links to a number of Internet sites that provide technical and usage details about the different formats.
Transfer-Encoding header string Must be set to chunked to send the audio in streaming mode. The data does not need to exist fully before being streamed to the service.
body body stream The audio to be transcribed in the format specified by the Content-Type header. With cURL, include a separate --data-binary option for each file of the request.
model query string The identifier of the model to be used for the recognition request:
  • ar-AR_BroadbandModel
  • en-UK_BroadbandModel
  • en-UK_NarrowbandModel
  • en-US_BroadbandModel (the default)
  • en-US_NarrowbandModel
  • es-ES_BroadbandModel
  • es-ES_NarrowbandModel
  • fr-FR_BroadbandModel
  • ja-JP_BroadbandModel
  • ja-JP_NarrowbandModel
  • pt-BR_BroadbandModel
  • pt-BR_NarrowbandModel
  • zh-CN_BroadbandModel
  • zh-CN_NarrowbandModel
customization_id query string The GUID of a custom language model that is to be used with the request. The base language model of the specified custom language model must match the model specified with the model parameter. By default, no custom model is used.
continuous query boolean Indicates whether multiple final results that represent consecutive phrases separated by long pauses are returned. If true, such phrases are returned; if false (the default), recognition ends after the first end-of-speech (EOS) incident is detected.
inactivity_timeout query integer The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 response code. The default is 30 seconds. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity.
keywords query string[ ] A list of keywords to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.
keywords_threshold query float A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords.
max_alternatives query integer The maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
word_alternatives_threshold query float A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.
word_confidence query boolean Indicates whether a confidence measure in the range of 0 to 1 is returned for each word. The default is false.
timestamps query boolean Indicates whether time alignment is returned for each word. The default is false.
profanity_filter query boolean Indicates whether profanity filtering is performed on the transcript. If true (the default), the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. If false, the service returns results with no censoring. Applies to US English transcription only.
smart_formatting query boolean Indicates whether dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses are to be converted into more readable, conventional representations in the final transcript of a recognition request. If true, smart formatting is performed; if false (the default), no formatting is performed. Applies to US English transcription only.
speaker_labels query boolean Indicates whether labels that identify which words were spoken by which participants in a multi-person exchange are to be included in the response. If true, speaker labels are returned; if false (the default), they are not. Speaker labels can be returned only for the following language models:
  • en-US_NarrowbandModel
  • es-ES_NarrowbandModel
  • ja-JP_NarrowbandModel
Setting speaker_labels to true forces the continuous and timestamps parameters to be true, as well, regardless of whether the user specifies false for the parameters. For more information, see Speaker labels.

Pass the audio file to be transcribed via the method's audio argument. Pass all other parameters for the recognition request as a Java RecognizeOptions object via the options argument.

Parameter Description
audio stream File The audio to be transcribed in the format specified by the content_type contentType parameter.
content_type string contentType string The MIME type of the audio:
  • audio/flac
  • audio/l16 (Also specify the sampling rate and number of channels; for example, audio/l16; rate=48000; channels=2. Ensure that the rate matches the rate at which the audio is captured and specify a maximum of 16 channels.)
  • audio/wav (Provide audio with a maximum of nine channels.)
  • audio/ogg;codecs=opus
  • audio/mulaw (Also specify the sampling rate at which the audio is captured.)
  • audio/basic (Use audio in this format only with narrowband models.)
If you omit the contentType, the method attempts to derive it from the extension of the audio file. For additional information about the supported audio formats, see Audio formats. The information includes links to a number of Internet sites that provide technical and usage details about the different formats.
model string The identifier of the model to be used for the recognition request:
  • ar-AR_BroadbandModel
  • en-UK_BroadbandModel
  • en-UK_NarrowbandModel
  • en-US_BroadbandModel (the default)
  • en-US_NarrowbandModel
  • es-ES_BroadbandModel
  • es-ES_NarrowbandModel
  • fr-FR_BroadbandModel
  • ja-JP_BroadbandModel
  • ja-JP_NarrowbandModel
  • pt-BR_BroadbandModel
  • pt-BR_NarrowbandModel
  • zh-CN_BroadbandModel
  • zh-CN_NarrowbandModel
customization_id customizationId string The GUID of a custom language model that is to be used with the request. The base language model of the specified custom language model must match the model specified with the model parameter. By default, no custom model is used.
continuous boolean Indicates whether multiple final results that represent consecutive phrases separated by long pauses are returned. If true, such phrases are returned; if false (the default), recognition ends after the first end-of-speech (EOS) incident is detected.
inactivity_timeout inactivityTimeout integer The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 response code. The default is 30 seconds. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity.
keywords string[ ] A list of keywords to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.
keywords_threshold keywordsThreshold float A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords.
max_alternatives maxAlternatives integer The maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
word_alternatives_threshold wordAlternativesThreshold float A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.
word_confidence wordConfidence boolean Indicates whether a confidence measure in the range of 0 to 1 is returned for each word. The default is false.
timestamps boolean Indicates whether time alignment is returned for each word. The default is false.
profanity_filter profanityFilter boolean Indicates whether profanity filtering is performed on the transcript. If true (the default), the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. If false, the service returns results with no censoring. Applies to US English transcription only.
smart_formatting smartFormatting boolean Indicates whether dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses are to be converted into more readable, conventional representations in the final transcript of a recognition request. If true, smart formatting is performed; if false (the default), no formatting is performed. Applies to US English transcription only.
speaker_labels speakerLabels boolean Indicates whether labels that identify which words were spoken by which participants in a multi-person exchange are to be included in the response. If true, speaker labels are returned; if false (the default), they are not. Speaker labels can be returned only for the following language models:
  • en-US_NarrowbandModel
  • es-ES_NarrowbandModel
  • ja-JP_NarrowbandModel
Setting speaker_labels speakerLabels to true forces the continuous and timestamps parameters to be true, as well, regardless of whether the user specifies false for the parameters. For more information, see Speaker labels.
interimResults boolean Indicates whether the service is to return interim results. If true, interim results are returned as a stream of JSON objects; each object represents a single SpeechRecognitionEvent. If false, the response is a single SpeechRecognitionEvent with final results only.

Example request


curl -X POST -u "{username}":"{password}"
--header "Content-Type: audio/flac"
--data-binary "@audio-file1.flac"
--data-binary "@audio-file2.flac"
"https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?timestamps=true&word_alternatives_threshold=0.9&keywords=%22colorado%22%2C%22tornado%22%2C%22tornadoes%22&keywords_threshold=0.5&continuous=true"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var fs = require('fs');

var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var files = ['audio-file1.flac', 'audio-file2.flac'];
for (var file in files) {
  var params = {
    audio: fs.createReadStream(files[file]),
    content_type: 'audio/flac',
    timestamps: true,
    word_alternatives_threshold: 0.9,
    keywords: ['colorado', 'tornado', 'tornadoes'],
    keywords_threshold: 0.5,
    continuous: true
  };

  speech_to_text.recognize(params, function(error, transcript) {
    if (error)
      console.log('Error:', error);
    else
      console.log(JSON.stringify(transcript, null, 2));
  });
}

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username"}, "{password}");

RecognizeOptions options = new RecognizeOptions.Builder()
  .contentType("audio/flac").timestamps(true)
  .wordAlternativesThreshold(0.9)
  .keywords(new String[]{"colorado", "tornado", "tornadoes"})
  .keywordsThreshold(0.5).continuous(true).build();

String[] files = {"audio-file1.flac", "audio-file2.flac"};
for (String file : files) {
  SpeechResults results = service.recognize(new File(file), options).execute();
  System.out.println(results);
}

Response

Returns one or more instances of a SpeechRecognitionEvent object depending on the input.

Returns a Java SpeechResults object that contains the results that are provided in a JSON SpeechRecognitionEvent object. The response includes one or more instances of the object depending on the input and the value of the interimResults parameter.

SpeechRecognitionEvent (Java SpeechResults object)
Name Description
results object[ ] An array of SpeechRecognitionResult objects that can include interim and final results. Final results are guaranteed not to change; interim results might be replaced by further interim results and final results. The service periodically sends updates to the results list; the result_index is set to the lowest index in the array that has changed; it is incremented for new results.
result_index integer An index that indicates a change point in the results array.
speaker_labels object[ ] An array of SpeakerLabelsResult objects that identifies which words were spoken by which speakers in a multi-person exchange. Returned in the response only if speaker_labels is true.
warnings string[ ] An array of warning messages about invalid query parameters or JSON fields included with the request. Each warning includes a descriptive message and a list of invalid argument strings, for example, "Unknown arguments: invalid_arg_1, invalid_arg_2." The request succeeds despite the warnings.
SpeechRecognitionResult (Java Transcript object)
Name Description
final boolean Indicates whether the result for this utterance is updated further: true if it is updated further; false if it is not.
alternatives object[ ] An array of SpeechRecognitionAlternative objects that provide alternative transcripts.
keywords_result object A KeywordResults object that provides a dictionary (or associative array) whose keys are the strings specified for keywords if both that parameter and keywords_threshold are specified. A keyword for which no matches are found is omitted from the array. The array is omitted if no keywords are found.
keywords_result Map A Map of strings to Lists of KeywordResult objects. The Map provides a dictionary (or associative array) of keywords to their matches in the audio.
  • Each string is a key that represents one of the keywords if both that parameter and keywords_threshold are specified. A keyword for which no matches are found is omitted from the Map.
  • A List of KeywordResult objects is returned for each keyword for which at least one match is found. Each element of the list provides information about the occurrences of the keyword in the audio.
The Map is omitted if no keywords are found in the audio or if keyword spotting is not requested.
word_alternatives object[ ] An array of WordAlternativeResults objects that provide word alternative hypotheses found for words of the input audio if a word_alternatives_threshold is specified.
SpeechRecognitionAlternative (Java SpeechAlternative object)
Name Description
transcript string A transcription of the audio.
confidence number A confidence score for the transcript in the range of 0 to 1. Available only for the best alternative and only in results marked as final.
timestamps string[ ] Time alignments for each word from the transcript as a list of lists. Each inner list consists of three elements: the word followed by its start and end time in seconds. For example, [["hello",0.0,1.2],["world",1.2,2.5]]. Available only for the best alternative.
word_confidence string[ ] A confidence score for each word of the transcript as a list of lists. Each inner list consists of two elements: the word and its confidence score in the range of 0 to 1. For example, [[\"hello\",0.95],[\"world\",0.866]]. Available only for the best alternative and only in results marked as final.
KeywordResults
Name Description
{keyword} list Each keyword entered via the keywords parameter and, for each keyword, an array of KeywordResult objects that provides information about its occurrences in the input audio. The keys of the list are the actual keyword strings. A keyword for which no matches are spotted in the input is omitted from the array.
KeywordResult (Java KeywordsResult object)
Name Description
normalized_text string The specified keyword normalized to the spoken phrase that matched in the audio input.
start_time number The start time in seconds of the keyword match.
end_time number The end time in seconds of the keyword match.
confidence number The confidence score of the keyword match in the range of 0 to 1.
WordAlternativeResults (Java SpeechWordAlternatives object)
Name Description
start_time number The start time in seconds of the word from the input audio that corresponds to the word alternatives.
end_time number The end time in seconds of the word from the input audio that corresponds to the word alternatives.
alternatives object[ ] An array of WordAlternativeResult objects that provides word alternative hypotheses for a word from the input audio.
WordAlternativeResult (Java WordAlternative object)
Name Description
confidence number The confidence score of the word alternative hypothesis in the range of 0 to 1.
word string A word alternative hypothesis for a word from the input audio.
SpeakerLabelsResult (Java SpeakerLabel object)
Name Description
from number The start time of a word from the transcript. The value matches the start time of a word from the timestamps array.
to number The end time of a word from the transcript. The value matches the end time of a word from the timestamps array.
speaker integer The numeric identifier that the service assigns to a speaker from the audio. Speaker IDs begin at 0 initially but can evolve and change across interim results and between interim and final results as the service processes the audio. They are not guaranteed to be sequential, contiguous, or ordered.
confidence number A score that indicates how confident the service is in its identification of the speaker in the range of 0 to 1.
final boolean An indication of whether the service might further change word and speaker-label results. A value of true means that the service guarantees not to send any further updates for the current or any preceding results; false means that the service might send further updates to the results.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The request failed, probably as a result of a user input error (for example, audio not matching the specified format) or because of an inactivity timeout. Specific messages include
  • Model model not found
  • This 8000hz audio input requires a narrow band model. See /v1/models for a list of available models.
  • speaker_labels is not a supported feature for model model
406 Not Acceptable The request specified an Accept header with an incompatible content type.
408 Request Timeout The connection was closed due to inactivity (session timeout) for 30 seconds.
413 Payload Too Large The request passed an audio file that exceeded the currently supported data limit.
415 Unsupported Media Type The request specified an unacceptable media type.
500 Internal Server Error The service experienced an internal error.
503 Service Unavailable The service is currently unavailable.

Exceptions thrown

Exception Description
BadRequestException The request failed, probably as a result of a user input error (for example, audio not matching the specified format) or because of an inactivity timeout. (HTTP response code 400.)
ForbiddenException The request specified an Accept header with an incompatible content type. (HTTP response code 406.)
ServiceResponseException The connection was closed due to inactivity (session timeout) for 30 seconds. (HTTP response code 408.)
RequestTooLargeException The request passed an audio file that exceeded the currently supported data limit. (HTTP response code 413.)
UnsupportedException The request specified an unacceptable media type. (HTTP response code 415.)
InternalServerErrorException The service experienced an internal error. (HTTP response code 500.)
ServiceUnavailableException The service is currently unavailable. (HTTP response code 503.)

Example response


{
  "results": [
    {
      "word_alternatives": [
        {
          "start_time": 0.09,
          "alternatives": [
            {
              "confidence": 1.0,
              "word": "latest"
            }
          ],
          "end_time": 0.6
        },
        {
          "start_time": 0.6,
          "alternatives": [
            {
              "confidence": 1.0,
              "word": "weather"
            }
          ],
          "end_time": 0.85
        },
        . . .
        {
          "start_time": 6.85,
          "alternatives": [
            {
              "confidence": 0.9988,
              "word": "on"
            }
          ],
          "end_time": 7.0
        },
        {
          "start_time": 7.0,
          "alternatives": [
            {
              "confidence": 0.9953,
              "word": "Sunday"
            }
          ],
          "end_time": 7.71
        }
      ],
      "keywords_result": {
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 6.26,
            "confidence": 0.999,
            "end_time": 6.85
          }
        ],
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 4.7,
            "confidence": 0.964,
            "end_time": 5.52
          }
        ]
      },
      "alternatives": [
        {
          "timestamps": [
            [
              "the",
              0.03,
              0.09
            ],
            [
              "latest",
              0.09,
              0.6
            ],
            . . .
            [
              "on",
              6.85,
              7.0
            ],
            [
              "Sunday",
              7.0,
              7.71
            ]
          ],
          "confidence": 0.968,
          "transcript": "the latest weather report a line of severe thunderstorms with several possible tornadoes is approaching Colorado on Sunday "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}

{
  "results": [
    {
      "word_alternatives": [
        {
          "start_time": 0.09,
          "alternatives": [
            {
              "confidence": 1,
              "word": "latest"
            }
          ],
          "end_time": 0.6
        },
        . . .
        {
          "start_time": 0.85,
          "alternatives": [
            {
              "confidence": 0.9979,
              "word": "report"
            }
          ],
          "end_time": 1.52
        }
      ],
      "keywords_result": {},
      "alternatives": [
        {
          "timestamps": [
            [
              "the",
              0.03,
              0.09
            ],
            . . .
            [
              "report",
              0.85,
              1.52
            ]
          ],
          "confidence": 0.983,
          "transcript": "the latest weather report "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}
{
  "results": [
    {
      "word_alternatives": [
        {
          "start_time": 0.14,
          "alternatives": [
            {
              "confidence": 1,
              "word": "a"
            }
          ],
          "end_time": 0.28
        },
        . . .
        {
          "start_time": 5.33,
          "alternatives": [
            {
              "confidence": 0.9953,
              "word": "Sunday"
            }
          ],
          "end_time": 6.04
        }
      ],
      "keywords_result": {
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 3.03,
            "confidence": 0.953,
            "end_time": 3.85
          }
        ],
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.59,
            "confidence": 0.999,
            "end_time": 5.18
          }
        ]
      },
      "alternatives": [
        {
          "timestamps": [
            [
              "a",
              0.14,
              0.28
            ],
            . . .
            [
              "Sunday",
              5.33,
              6.04
            ]
          ],
          "confidence": 0.983,
          "transcript": "a line of severe thunderstorms with several possible tornadoes is approaching Colorado on Sunday "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}

{
  "result_index": 0,
  "results": [
    {
      "final": true,
      "alternatives": [
        {
          "confidence": 0.983,
          "timestamps": [
            [
              "the",
              0.03,
              0.09
            ],
            . . .
            [
              "report",
              0.85,
              1.52
            ]
          ],
          "transcript": "the latest weather report "
        }
      ],
      "keywords_result": {},
      "word_alternatives": [
        {
          "start_time": 0.09,
          "alternatives": [
            {
              "confidence": 1.0,
              "word": "latest"
            }
          ],
          "end_time": 0.6
        },
        . . .
        {
          "start_time": 0.85,
          "alternatives": [
            {
              "confidence": 0.9979,
              "word": "report"
            }
          ],
          "end_time": 1.52
        }
      ]
    }
  ]
}
{
  "result_index": 0,
  "results": [
    {
      "final": true,
      "alternatives": [
        {
          "confidence": 0.983,
          "timestamps": [
            [
              "a",
              0.14,
              0.28
            ],
            . . .
            [
              "Sunday",
              5.33,
              6.04
            ]
          ],
          "transcript": "a line of severe thunderstorms with several possible tornadoes is approaching Colorado on Sunday "
        }
      ],
      "keywords_result": {
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 3.03,
            "end_time": 3.85,
            "confidence": 0.953
          }
        ],
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.59,
            "end_time": 5.18,
            "confidence": 0.999
          }
        ]
      },
      "word_alternatives": [
        {
          "start_time": 0.14,
          "alternatives": [
            {
              "confidence": 1.0,
              "word": "a"
            }
          ],
          "end_time": 0.28
        },
        . . .
        {
          "start_time": 5.33,
          "alternatives": [
            {
              "confidence": 0.9953,
              "word": "Sunday"
            }
          ],
          "end_time": 6.04
        }
      ]
    }
  ]
}

Recognize multipart

Sends audio and returns transcription results for a sessionless recognition request submitted as multipart form data. Returns only the final results; to enable interim results, use Sessions or WebSockets. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

You specify a few parameters of the request via request headers and query parameters, but you specify most parameters as multipart form data in the form of JSON metadata, in which only the part_content_type parameter is required. You then specify the audio files for the request as subsequent parts of the form data.

The multipart approach is intended for two use cases:

  • For use with browsers for which JavaScript is disabled. Multipart requests based on form data do not require the use of JavaScript.

  • When the parameters used with the recognition request are greater than the 8 KB limit imposed by most HTTP servers and proxies. This can occur, for example, if you want to spot a very large number of keywords. Passing the parameters as form data avoids this limit.

For requests to transcribe audio with more than one audio file or to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the connection (response code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the connection (response code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Not supported. Use the sessionless recognize call; see Recognize audio.


POST /v1/recognize

Request

Parameter Type Description
Content-Type header string Must be multipart/form-data to indicate the content type of the payload. cURL automatically sets the header to multipart/form-data when you use the --form option.
Transfer-Encoding header string Must be set to chunked to send the audio in streaming mode or to send a request that includes more than one audio part.
model query string The identifier of the model to be used for the recognition request:
  • ar-AR_BroadbandModel
  • en-UK_BroadbandModel
  • en-UK_NarrowbandModel
  • en-US_BroadbandModel (the default)
  • en-US_NarrowbandModel
  • es-ES_BroadbandModel
  • es-ES_NarrowbandModel
  • fr-FR_BroadbandModel
  • ja-JP_BroadbandModel
  • ja-JP_NarrowbandModel
  • pt-BR_BroadbandModel
  • pt-BR_NarrowbandModel
  • zh-CN_BroadbandModel
  • zh-CN_NarrowbandModel
customization_id query string The GUID of a custom language model that is to be used with the request. The base language model of the specified custom language model must match the model specified with the model parameter. By default, no custom model is used.
metadata form data object A Metadata object that describes the following parts of the request, which contain the audio data. This must be the first part of the request. The Content-Type of the parts is ignored.
upload form data file One or more audio files for the request. To send multiple audio files, set Transfer-Encoding to chunked. With cURL, include a separate --form option for each file of the request.
Metadata
Parameter Description
part_content_type string The MIME type of the audio in the following parts:
  • audio/flac
  • audio/l16 (Also specify the sampling rate and number of channels; for example, audio/l16; rate=48000; channels=2. Ensure that the rate matches the rate at which the audio is captured and specify a maximum of 16 channels.)
  • audio/wav (Provide audio with a maximum of nine channels.)
  • audio/ogg;codecs=opus
  • audio/mulaw (Also specify the sampling rate at which the audio is captured.)
  • audio/basic (Use audio in this format only with narrowband models.)
All data parts must have the same MIME type. For additional information about the supported audio formats, see Audio formats. The information includes links to a number of Internet sites that provide technical and usage details about the different formats.
data_parts_count integer The number of audio data parts (audio files) sent with the request. Server-side end-of-stream detection is applied to the last (and possibly the only) data part. If omitted, the number of parts is determined from the request itself.
sequence_id integer The sequence ID for all data parts of this recognition task. If omitted, no sequence ID is associated with the request. Available only for session-based requests.
continuous boolean Indicates whether multiple final results that represent consecutive phrases separated by long pauses are returned. If true, such phrases are returned; if false (the default), recognition ends after the first end-of-speech (EOS) incident is detected.
inactivity_timeout integer The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 response code. The default is 30 seconds. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity.
keywords string[ ] A list of keywords to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.
keywords_threshold float A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords.
max_alternatives integer The maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
word_alternatives_threshold float A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.
word_confidence boolean Indicates whether a confidence measure in the range of 0 to 1 is returned for each word. The default is false.
timestamps boolean Indicates whether time alignment is returned for each word. The default is false.
profanity_filter boolean Indicates whether profanity filtering is performed on the transcript. If true (the default), the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. If false, the service returns results with no censoring. Applies to US English transcription only.
smart_formatting boolean Indicates whether dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses are to be converted into more readable, conventional representations in the final transcript of a recognition request. If true, smart formatting is performed; if false (the default), no formatting is performed. Applies to US English transcription only.
speaker_labels boolean Indicates whether labels that identify which words were spoken by which participants in a multi-person exchange are to be included in the response. If true, speaker labels are returned; if false (the default), they are not. Speaker labels can be returned only for the following language models:
  • en-US_NarrowbandModel
  • es-ES_NarrowbandModel
  • ja-JP_NarrowbandModel
Setting speaker_labels to true forces the continuous and timestamps parameters to be true, as well, regardless of whether the user specifies false for the parameters. For more information, see Speaker labels.

Example request


curl -X POST -u "{username}":"{password}"
--header "Transfer-Encoding: chunked"
--form metadata="{\"data_parts_count\":2,
  \"part_content_type\":\"audio/flac\",
  \"timestamps\":true,
  \"word_alternatives_threshold\":0.9,
  \"keywords\":[\"colorado\",\"tornado\",\"tornadoes\"],
  \"keywords_threshold\":0.5,
  \"continuous\":true}"
--form upload="@audio-file1.flac"
--form upload="@audio-file2.flac"
"https://stream.watsonplatform.net/speech-to-text/api/v1/recognize"

Response

Returns one or more instances of a SpeechRecognitionEvent object depending on the input.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The request failed, probably as a result of a user input error (for example, audio not matching the specified format) or because of an inactivity timeout. Specific messages include
  • Model model not found
  • This 8000hz audio input requires a narrow band model. See /v1/models for a list of available models.
  • speaker_labels is not a supported feature for model model
406 Not Acceptable The request specified an Accept header with an incompatible content type.
408 Request Timeout The connection was closed due to inactivity (session timeout) for 30 seconds.
413 Payload Too Large The request passed an audio file that exceeded the currently supported data limit.
415 Unsupported Media Type The request specified an unacceptable media type.
500 Internal Server Error The service experienced an internal error.
503 Service Unavailable The service is currently unavailable.

Example response


{
  "results": [
    {
      "word_alternatives": [
        {
          "start_time": 0.09,
          "alternatives": [
            {
              "confidence": 1.0,
              "word": "latest"
            }
          ],
          "end_time": 0.6
        },
        . . .
        {
          "start_time": 0.85,
          "alternatives": [
            {
              "confidence": 0.9979,
              "word": "report"
            }
          ],
          "end_time": 1.52
        }
      ],
      "keywords_result": {},
      "alternatives": [
        {
          "timestamps": [
            [
              "the",
              0.03,
              0.09
            ],
            . . .
            [
              "report",
              0.85,
              1.52
            ]
          ],
          "confidence": 0.983,
          "transcript": "the latest weather report "
        }
      ],
      "final": true
    },
    {
      "word_alternatives": [
        {
          "start_time": 0.14,
          "alternatives": [
            {
              "confidence": 1.0,
              "word": "a"
            }
          ],
          "end_time": 0.29
        },
        . . .
        {
          "start_time": 5.33,
          "alternatives": [
            {
              "confidence": 0.9951,
              "word": "Sunday"
            }
          ],
          "end_time": 6.04
        }
      ],
      "keywords_result": {
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 3.03,
            "confidence": 0.947,
            "end_time": 3.85
          }
        ],
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.59,
            "confidence": 0.998,
            "end_time": 5.18
          }
        ]
      },
      "alternatives": [
        {
          "timestamps": [
            [
              "a",
              0.14,
              0.29
            ],
            . . .
              "Sunday",
              5.33,
              6.04
            ]
          ],
          "confidence": 0.985,
          "transcript": "a line of severe thunderstorms with several possible tornadoes is approaching Colorado on Sunday "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}

Sessions

Create a session

Creates a session and locks recognition requests to that engine. You can use the session for multiple recognition requests so that each request is processed with the same engine. The session expires after 30 seconds of inactivity. Use the Get status method to prevent the session from expiring.

The method returns a session cookie in the Set-Cookie response header. You must pass this cookie with each request that uses the session. For more information, see Using cookies with sessions.

The method returns a session cookie in the cookie-session field of the Session object. You must pass this cookie with the corresponding parameter of the observeResult method.


POST /v1/sessions

createSession(params, callback())

ServiceCall<SpeechSession> createSession()
ServiceCall<SpeechSession> createSession(String model)
ServiceCall<SpeechSession> createSession(SpeechModel model)

Request

Parameter Type Description
model query string The identifier of the model to be used by the new session:
  • ar-AR_BroadbandModel
  • en-UK_BroadbandModel
  • en-UK_NarrowbandModel
  • en-US_BroadbandModel (the default)
  • en-US_NarrowbandModel
  • es-ES_BroadbandModel
  • es-ES_NarrowbandModel
  • fr-FR_BroadbandModel
  • ja-JP_BroadbandModel
  • ja-JP_NarrowbandModel
  • pt-BR_BroadbandModel
  • pt-BR_NarrowbandModel
  • zh-CN_BroadbandModel
  • zh-CN_NarrowbandModel
customization_id query string The GUID of a custom language model that is to be used with the new session. The base language model of the specified custom language model must match the model specified with the model parameter. By default, no custom model is used.
body body string An empty request body: {}. With cURL, use the --data option to pass the empty data.
Parameter Description
model string The identifier of the model to be used by the new session:
  • ar-AR_BroadbandModel
  • en-UK_BroadbandModel
  • en-UK_NarrowbandModel
  • en-US_BroadbandModel (the default)
  • en-US_NarrowbandModel
  • es-ES_BroadbandModel
  • es-ES_NarrowbandModel
  • fr-FR_BroadbandModel
  • ja-JP_BroadbandModel
  • ja-JP_NarrowbandModel
  • pt-BR_BroadbandModel
  • pt-BR_NarrowbandModel
  • zh-CN_BroadbandModel
  • zh-CN_NarrowbandModel
You can provide only one of the two model arguments.
model object A Java SpeechModel object that identifies the model to be used by the new session:
  • SpeechModel.AR_AR_BROADBANDMODEL
  • SpeechModel.EN_UK_BROADBANDMODEL
  • SpeechModel.EN_UK_NARROWBANDMODEL
  • SpeechModel.EN_US_BROADBANDMODEL (the default)
  • SpeechModel.EN_US_NARROWBANDMODEL
  • SpeechModel.ES_ES_BROADBANDMODEL
  • SpeechModel.ES_ES_NARROWBANDMODEL
  • SpeechModel.FR_FR_BROADBANDMODEL
  • SpeechModel.JA_JP_BROADBANDMODEL
  • SpeechModel.JA_JP_NARROWBANDMODEL
  • SpeechModel.PT_BR_BROADBANDMODEL
  • SpeechModel.PT_BR_NARROWBANDMODEL
  • SpeechModel.ZH_CN_BROADBANDMODEL
  • SpeechModel.ZH_CN_NARROWBANDMODEL
You can provide only one of the two model arguments.

Example request


curl -X POST -u "{username}":"{password}"
--cookie-jar cookies.txt
--data "{}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/sessions"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

speech_to_text.createSession({}, function(error, session) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(session, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

SpeechSession session = service.createSession().execute();
System.out.println(session);
session.getSessionId();

Response

Returns a Java SpeechSession object that contains the information about the new session that is provided in a JSON Session object.

Session (Java SpeechSession object)
Name Description
recognize string The URI for REST recognition requests.
recognizeWS string The URI for WebSocket recognition requests. The URI is needed only for working with WebSockets.
observe_result string The URI for REST results observers.
session_id string The identifier for the new session.
new_session_uri string The URI for the new session.
cookie_session string The cookie for the new session.

Response codes

Status Description
201 Created The session was successfully created.
406 Not Acceptable The request specified an Accept header with an incompatible content type.
415 Unsupported Media Type The request specified an unacceptable media type.
503 Service Unavailable The service is currently unavailable.

Exceptions thrown

Exception Description
ForbiddenException The request specified an Accept header with an incompatible content type. (HTTP response code 406.)
UnsupportedException The request specified an unacceptable media type. (HTTP response code 415.)
ServiceUnavailableException The service is currently unavailable. (HTTP response code 503.)

Example response


{
  "recognize": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/0ac1b5dfc2e8fc490a41e29e67c27931/recognize",
  "recognizeWS": "wss://stream.watsonplatform.net/speech-to-text/api/v1/sessions/0ac1b5dfc2e8fc490a41e29e67c27931/recognize",
  "observe_result": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/0ac1b5dfc2e8fc490a41e29e67c27931/observe_result",
  "session_id": "0ac1b5dfc2e8fc490a41e29e67c27931",
  "new_session_uri": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/0ac1b5dfc2e8fc490a41e29e67c27931"
}

{
  "recognize": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/e0ec707b639fc870069e938c324f1e8a/recognize",
  "recognizeWS": "wss://stream.watsonplatform.net/speech-to-text/api/v1/sessions/e0ec707b639fc870069e938c324f1e8a/recognize",
  "observe_result": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/e0ec707b639fc870069e938c324f1e8a/observe_result",
  "session_id": "e0ec707b639fc870069e938c324f1e8a",
  "new_session_uri": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/e0ec707b639fc870069e938c324f1e8a",
  "cookie_session": "e0ec707b639fc870069e938c324f1e8aas123sd12e"
}

Get status

Checks whether a specified session can accept another recognition request. Concurrent recognition tasks during the same session are not allowed. The returned state must be initialized to indicate that you can send another recognition request. You can also use this method to prevent the session from expiring after 30 seconds of inactivity. The request must pass the cookie that was returned by the Create a session method.


GET /v1/sessions/{session_id}/recognize

getRecognizeStatus(params, classback()) DEPRECATED

ServiceCall<SpeechSessionStatus> getRecognizeStatus(SpeechSession session)

Request

Parameter Type Description
session_id path string The identifier of the session whose status is to be checked.
Parameter Description
session_id string The identifier of the session whose status is to be checked.
session object A Java SpeechSession object that identifies the session whose status is to be checked.

Example request


curl -X GET -u "{username}":"{password}"
--cookie cookies.txt
"https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/{session_id}/recognize"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'session_id': '{session_id}'
}

speech_to_text.getRecognizeStatus(params, function(error, status) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(status, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

SpeechSessionStatus status = service.getRecognizeStatus({session}).execute();
System.out.println(status);

Response

RecognizeStatus
Name Description
session object A SessionStatus object that provides information about the session.

Returns a Java SpeechSessionStatus object that contains the information about the session that is provided in a JSON SessionStatus object.

SessionStatus (Java SpeechSessionStatus object)
Name Description
recognize string The URI for REST recognition requests.
recognizeWS string The URI for WebSocket recognition requests. The URI is needed only for working with WebSockets.
state string The state of the session. The state must be initialized to perform a new recognition request on the session.
observe_result string The URI for REST results observers.
model string The URI for information about the model that is used with the session.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The request failed because of an inactivity timeout or because it failed to pass the session cookie. If an existing session is closed, session_closed is set to true. The request failed because of an inactivity timeout. If an existing session is closed, session_closed is set to true.
404 Not Found The specified session_id was not found, possibly because of an invalid session cookie. The specified session_id was not found.
406 Not Acceptable The request specified an Accept header with an incompatible content type.
415 Unsupported Media Type The request specified an unacceptable media type.

Exceptions thrown

Exception Description
BadRequestException The session timed out due to inactivity, or the request failed to pass the session cookie. (HTTP response code 400.)
NotFoundException The specified session was not found, possibly because of an invalid session cookie. (HTTP response code 404.)
ForbiddenException The request specified an Accept header with an incompatible content type. (HTTP response code 406.)
UnsupportedException The request specified an unacceptable media type. (HTTP response code 415.)

Example response


{
  "session": {
    "recognize": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/0ac1b5dfc2e8fc490a41e29e67c27931/recognize",
    "recognizeWS": "wss://stream.watsonplatform.net/speech-to-text/api/v1/sessions/0ac1b5dfc2e8fc490a41e29e67c27931/recognize",
    "state": "initialized",
    "observe_result": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/0ac1b5dfc2e8fc490a41e29e67c27931/observe_result",
    "model": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/en-US_BroadbandModel"
  }
}

{
  "recognize": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/0ac1b5dfc2e8fc490a41e29e67c27931/recognize",
  "state": "initialized",
  "observe_result": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/0ac1b5dfc2e8fc490a41e29e67c27931/observe_result",
  "model": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/en-US_BroadbandModel"
}

Observe result

Requests results for a recognition task within a specified session. You can submit this method multiple times for the same recognition task. To see interim results, set the interim_results parameter to true. The request must pass the cookie that was returned by the Create a session method. The request must pass the cookie that was returned by the Create a session method with the cookie-session parameter.

To see results for a specific recognition task, specify a sequence ID (with the sequence_id query parameter) that matches the sequence ID of the recognition request. A request with a sequence ID can arrive before, during, or after the matching recognition request, but it must arrive no later than 30 seconds after the recognition completes to avoid a session timeout (response code 408). Send multiple requests for the sequence ID with a maximum gap of 30 seconds to avoid the timeout.

Omit the sequence ID to observe results for an ongoing recognition task. If no recognition task is ongoing, the method returns results for the next recognition task regardless of whether it specifies a sequence ID.

Not supported. To obtain interim results, use the interimResults parameter of the session-based recognize method; see Recognize audio.


GET /v1/sessions/{session_id}/observe_result

observeResult(params, callback()) DEPRECATED

Request

Parameter Type Description
session_id path string The identifier of the session whose results you want to observe.
sequence_id query integer The sequence ID of the recognition task whose results you want to observe. Omit the parameter to obtain results either for an ongoing recognition, if any, or for the next recognition task regardless of whether it specifies a sequence ID.
interim_results query boolean Indicates whether the service is to return interim results. If true, interim results are returned as a stream of JSON objects; each object represents a single SpeechRecognitionEvent. If false, the response is a single SpeechRecognitionEvent with final results only.
Parameter Description
session_id string The identifier of the session whose results you want to observe.
cookie_session string The cookie for the session whose results you want to observe. The session cookie is returned by the Create a session method.
interim_results boolean Indicates whether the service is to return interim results. If true, interim results are returned as a stream of JSON objects; each object represents a single SpeechRecognitionEvent. If false, the response is a single SpeechRecognitionEvent with final results only.

Example request


curl -X GET -u "{username}":"{password}"
--cookie cookies.txt
"https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/{session_id}/observe_result?interim_results=true"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'session_id': '{session_id}',
  'cookie_session': '{cookie_session}',
  'interim_results': true
}

speech_to_text.observeResult(params, function(error, interim_results) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(interim_results, null, 2));
});

Response

Returns one or more instances of a SpeechRecognitionEvent object depending on the input and the value of the interim_results parameter.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The request failed because of a user input error (for example, audio not matching the specified format), because of an inactivity timeout, or because it failed to pass the session cookie. If an existing session is closed, session_closed is set to true. The request failed because of a user input error (for example, audio not matching the specified format) or because of an inactivity timeout. If an existing session is closed, session_closed is set to true.
404 Not Found The specified session_id was not found, possibly because of an invalid session cookie, or a specified sequence_id does not match the sequence ID of a recognition task. The specified session_id was not found.
406 Not Acceptable The request specified an Accept header with an incompatible content type.
408 Request Timeout The session was closed due to inactivity (session timeout) for 30 seconds. The session is destroyed with session_closed set to true.
413 Payload Too Large The request passed an audio file that exceeded the currently supported data limit. The session is destroyed with session_closed set to true.
415 Unsupported Media Type The request passed an unacceptable media type.
500 Internal Server Error The service experienced an internal error. The session is destroyed with session_closed set to true.

Example response


{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "several torn "
        }
      ],
      "final": false
    }
  ],
  "result_index": 0
}{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "several tornadoes touch down as "
        }
      ],
      "final": false
    }
  ],
  "result_index": 0
}{
  . . .
}{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
        }
      ],
      "final": false
    }
  ],
  "result_index": 0
}{
  "results": [
    {
      "keywords_result": {
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 1.52,
            "confidence": 1.0,
            "end_time": 2.15
          }
        ],
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.94,
            "confidence": 0.913,
            "end_time": 5.62
          }
        ]
      },
      "alternatives": [
        {
          "confidence": 0.891,
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}

Recognize audio

Sends audio and returns transcription results for a session-based recognition request. By default, returns only the final transcription results for the request. To see interim results, set the parameter interim_results to true in a call to the Observe result method. set the interimResults parameter to true.

The service imposes a data size limit of 100 MB per session. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The request must pass the cookie that was returned by the Create a session method.

You specify the parameters of the request as a path parameter, request headers, and query parameters. You provide the audio as the body of the request. This method is preferred to the multipart approach for submitting a session-based recognition request.

For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the session (response code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the session (response code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

To enable polling by the observe_result method for large audio requests, specify an integer with the sequence_id query parameter.

This call is the same as the sessionless recognize call, but this call requires the session_id sessionId parameter and omits the model parameter.


POST /v1/sessions/{session_id}/recognize

recognize(params, callback())

ServiceCall<SpeechResults> recognize(File audio, RecognizeOptions options)

Request

Parameter Type Description
session_id path string The identifier of the session to be used.
Content-Type header string The MIME type of the audio:
  • audio/flac
  • audio/l16 (Also specify the sampling rate and number of channels; for example, audio/l16; rate=48000; channels=2. Ensure that the rate matches the rate at which the audio is captured and specify a maximum of 16 channels.)
  • audio/wav (Provide audio with a maximum of nine channels.)
  • audio/ogg;codecs=opus
  • audio/mulaw (Also specify the sampling rate at which the audio is captured.)
  • audio/basic (Use audio in this format only with narrowband models.)
For additional information about the supported audio formats, see Audio formats. The information includes links to a number of Internet sites that provide technical and usage details about the different formats.
Transfer-Encoding header string Must be set to chunked to send the audio in streaming mode. The data does not need to exist fully before being streamed to the service.
body body stream The audio to be transcribed in the format specified by the Content-Type header. With cURL, include a separate --data-binary option for each file of the request; see the sessionless recognize audio request for an example.
sequence_id query integer The sequence ID of this recognition task. If omitted, no sequence ID is associated with the request.
continuous query boolean Indicates whether multiple final results that represent consecutive phrases separated by long pauses are returned. If true, such phrases are returned; if false (the default), recognition ends after the first end-of-speech (EOS) incident is detected.
inactivity_timeout query integer The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 response code. The default is 30 seconds. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity.
keywords query string[ ] A list of keywords to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.
keywords_threshold query float A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the you omit the parameter. If you specify a threshold, you must also specify one or more keywords.
max_alternatives query integer The maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
word_alternatives_threshold query float A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.
word_confidence query boolean Indicates whether a confidence measure in the range of 0 to 1 is returned for each word. The default is false.
timestamps query boolean Indicates whether time alignment is returned for each word. The default is false.
profanity_filter query boolean Indicates whether profanity filtering is performed on the transcript. If true (the default), the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. If false, the service returns results with no censoring. Applies to US English transcription only.
smart_formatting query boolean Indicates whether dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses are to be converted into more readable, conventional representations in the final transcript of a recognition request. If true, smart formatting is performed; if false (the default), no formatting is performed. Applies to US English transcription only.
speaker_labels query boolean Indicates whether labels that identify which words were spoken by which participants in a multi-person exchange are to be included in the response. If true, speaker labels are returned; if false (the default), they are not. Speaker labels can be returned only for the following language models:
  • en-US_NarrowbandModel
  • es-ES_NarrowbandModel
  • ja-JP_NarrowbandModel
Setting speaker_labels to true forces the continuous and timestamps parameters to be true, as well, regardless of whether the user specifies false for the parameters. For more information, see Speaker labels.

Pass the audio file to be transcribed via the method's audio argument. Pass all other parameters for the recognition request as a Java RecognizeOptions object via the options argument.

Parameter Description
session_id string sessionId string The identifier of the session to be used. You must provide one of sessionId or session.
session object A Java SpeechSession object that identifies the session to be used. You must provide one of sessionId or session.
audio stream File The audio to be transcribed in the format specified by the content_type contentType parameter.
content_type string contentType string The MIME type of the audio:
  • audio/flac
  • audio/l16 (Also specify the sampling rate and number of channels; for example, audio/l16; rate=48000; channels=2. Ensure that the rate matches the rate at which the audio is captured and specify a maximum of 16 channels.)
  • audio/wav (Provide audio with a maximum of nine channels.)
  • audio/ogg;codecs=opus
  • audio/mulaw (Also specify the sampling rate at which the audio is captured.)
  • audio/basic (Use audio in this format only with narrowband models.)
If you omit the contentType, the method attempts to derive it from the extension of the audio file. For additional information about the supported audio formats, see Audio formats. The information includes links to a number of Internet sites that provide technical and usage details about the different formats.
continuous boolean Indicates whether multiple final results that represent consecutive phrases separated by long pauses are returned. If true, such phrases are returned; if false (the default), recognition ends after the first end-of-speech (EOS) incident is detected.
inactivity_timeout inactivityTimeout integer The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 response code. The default is 30 seconds. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity.
keywords string[ ] A list of keywords to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.
keywords_threshold keywordsThreshold float A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords.
max_alternatives maxAlternatives integer The maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
word_alternatives_threshold wordAlternativesThreshold float A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.
word_confidence wordConfidence boolean Indicates whether a confidence measure in the range of 0 to 1 is returned for each word. The default is false.
timestamps boolean Indicates whether time alignment is returned for each word. The default is false.
profanity_filter profanityFilter boolean Indicates whether profanity filtering is performed on the transcript. If true (the default), the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. If false, the service returns results with no censoring. Applies to US English transcription only.
smart_formatting smartFormatting boolean Indicates whether dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses are to be converted into more readable, conventional representations in the final transcript of a recognition request. If true, smart formatting is performed; if false (the default), no formatting is performed. Applies to US English transcription only.
speaker_labels speakerLabels boolean Indicates whether labels that identify which words were spoken by which participants in a multi-person exchange are to be included in the response. If true, speaker labels are returned; if false (the default), they are not. Speaker labels can be returned only for the following language models:
  • en-US_NarrowbandModel
  • es-ES_NarrowbandModel
  • ja-JP_NarrowbandModel
Setting speaker_labels speakerLabels to true forces the continuous and timestamps parameters to be true, as well, regardless of whether the user specifies false for the parameters. For more information, see Speaker labels.
interimResults boolean Indicates whether the service is to return interim results. If true, interim results are returned as a stream of JSON objects; each object represents a single SpeechRecognitionEvent. If false, the response is a single SpeechRecognitionEvent with final results only.

Example request


curl -X POST -u "{username}":"{password}"
--cookie cookies.txt
--header "Content-Type: audio/flac"
--data-binary "@audio-file.flac"
"https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/{session_id}/recognize?max_alternatives=3&word_confidence=true&keywords=colorado,tornado,tornadoes&keywords_threshold=0.5"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'session_id': '{session_id'},
  audio: fs.createReadStream('audio-file.flac'),
  'content_type': 'audio/flac',
  'max_alternatives': 3,
  'word_confidence': true,
  keywords: ['colorado', 'tornado', 'tornadoes'],
  'keywords_threshold': 0.5
};

speech_to_text.recognize(params, function(error, transcript) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(transcript, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username"},"{password}");

RecognizeOptions options = new RecognizeOptions.Builder()
  .sessionId({session}).contentType("audio/flac")
  .maxAlternatives(3).wordConfidence(true)
  .keywords(new String[]{"colorado", "tornado", "tornadoes"})
  .keywordsThreshold(0.5).build();

SpeechResults results = service.recognize(new File("audio-file.flac"), options)
  .execute();
System.out.println(results);

Response

Returns one or more instances of a SpeechRecognitionEvent object depending on the input.

Returns a Java SpeechResults object that contains the results that are provided in a JSON SpeechRecognitionEvent object. The response includes one or more instances of the object depending on the input and the value of the interimResults parameter.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The request failed because of a user input error (for example, audio not matching the specified format), because the session is in the wrong state, because of an inactivity timeout, or because it failed to pass the session cookie. Specific messages include
  • Model model not found
  • This 8000hz audio input requires a narrow band model. See /v1/models for a list of available models.
  • speaker_labels is not a supported feature for model model
  • Cookie must be set.
If an existing session is closed, session_closed is set to true.
The request failed because of a user input error (for example, audio not matching the specified format), because the session is in the wrong state, or because of an inactivity timeout. Specific messages include
  • Model model not found
  • This 8000hz audio input requires a narrow band model. See /v1/models for a list of available models.
  • speaker_labels is not a supported feature for model model
If an existing session is closed, session_closed is set to true.
404 Not Found The specified session_id was not found, possibly because of an invalid session cookie. The specified session_id was not found.
406 Not Acceptable The request specified an Accept header with an incompatible content type.
408 Request Timeout The session was closed due to inactivity (session timeout) for 30 seconds. The session is destroyed with session_closed set to true.
413 Payload Too Large The request passed an audio file that exceeded the currently supported data limit. The session is destroyed with session_closed set to true.
415 Unsupported Media Type The request specified an unacceptable media type.
500 Internal Server Error The service experienced an internal error. The session is destroyed with session_closed set to true.
503 Service Unavailable The session is already processing a request. Concurrent requests are not allowed on the same session. The session remains alive after this error.

Exceptions thrown

Exception Description
BadRequestException The request failed because of a user input error (for example, audio not matching the specified format), because the session is in the wrong state, because of an inactivity timeout, or because it failed to pass the session cookie. The session is closed. (HTTP response code 400.)
NotFoundException The specified session or sessionId was not found, possibly because of an invalid session cookie. (HTTP response code 404.)
ForbiddenException The request specified an Accept header with an incompatible content type. (HTTP response code 406.)
ServiceResponseException The session was closed due to inactivity (session timeout) for 30 seconds. The session is closed. (HTTP response code 408.)
RequestTooLargeException The request passed an audio file that exceeded the currently supported data limit. The session is closed. (HTTP response code 413.)
UnsupportedException The request specified an unacceptable media type. (HTTP response code 415.)
InternalServerErrorException The service experienced an internal error. The session is closed. (HTTP response code 500.)
ServiceUnavailableException The session is already processing a request. Concurrent requests are not allowed on the same session. The session remains alive after this error. (HTTP response code 503.)

Example response


{
  "results": [
    {
      "keywords_result": {
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.94,
            "confidence": 0.913,
            "end_time": 5.62
          }
        ],
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 1.52,
            "confidence": 1.0,
            "end_time": 2.15
          }
        ]
      },
      "alternatives": [
        {
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday ",
          "confidence": 0.891,
          "word_confidence": [
            [
              "several",
              1.0
            ],
            [
              "tornadoes",
              1.0
            ],
            . . .
            [
              "on",
              0.311
            ],
            [
              "Sunday",
              0.986
            ]
          ]
        },
        {
          "transcript": "several tornadoes touched down as a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touch down is a line of severe thunderstorms swept through Colorado on Sunday "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}

Recognize multipart

Sends audio and returns transcription results for a session-based recognition request submitted as multipart form data. By default, returns only the final transcription results for the request. To see interim results, set the parameter interim_results to true in a call to the /v1/sessions/{session_id}/observe_result method before this POST request finishes.

The service imposes a data size limit of 100 MB per session. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. The request must pass the cookie that was returned by the Create a session method.

You specify a few parameters of the request via a path parameter and as request headers, but you specify most parameters as multipart form data in the form of JSON metadata, in which only the part_content_type parameter is required. You then specify the audio files for the request as subsequent parts of the form data.

The multipart approach is intended for two use cases:

  • For use with browsers for which JavaScript is disabled. Multipart requests based on form data do not require the use of JavaScript.

  • When the parameters used with the recognition request are greater than the 8 KB limit imposed by most HTTP servers and proxies. This can occur, for example, if you want to spot a very large number of keywords. Passing the parameters as form data avoids this limit.

For requests to transcribe audio with more than one audio file or to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the session (response code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the session (response code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

To enable polling by the observe_result method for large audio requests, specify an integer with the sequence_id parameter of the JSON metadata.

Not supported. Use the session-based recognize method; see Recognize audio.


POST /v1/sessions/{session_id}/recognize

Request

Parameter Type Description
session_id path string The identifier of the session to be used.
Content-Type header string Must be set to multipart/form-data to indicate the content type of the payload. cURL automatically sets the header to multipart/form-data when you use the --form option.
Transfer-Encoding header string Must be set to chunked to send the audio in streaming mode or to send a request that includes more than one audio part.
metadata form data object A Metadata object that describes the following parts of the request, which contain the audio data. This must be the first part of the request. The Content-Type of the parts is ignored.
upload form data file One or more audio files for the request. To send multiple audio files, set Transfer-Encoding to chunked. With cURL, include a separate --form option for each file of the request; see the sessionless recognize multipart request for an example.

Example request


curl -X POST -u "{username}":"{password}"
--cookie cookies.txt
--form metadata="{\"data_parts_count\":1,
  \"part_content_type\":\"audio/flac\",
  \"max_alternatives\":3,
  \"word_confidence\":true,
  \"keywords\":[\"colorado\",\"tornado\",\"tornadoes\"],
  \"keywords_threshold\":0.5}"
--form upload="@audio-file.flac"
"https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/{session_id}/recognize"

Response

Returns one or more instances of a SpeechRecognitionEvent object depending on the input.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The request failed because of a user input error (for example, audio not matching the specified format), because the session is in the wrong state, because of an inactivity timeout, or because it failed to pass the session cookie. Specific messages include
  • Model model not found
  • This 8000hz audio input requires a narrow band model. See /v1/models for a list of available models.
  • speaker_labels is not a supported feature for model model
  • Cookie must be set.
If an existing session is closed, session_closed is set to true.
404 Not Found The specified session_id was not found, possibly because of an invalid session cookie.
406 Not Acceptable The request specified an Accept header with an incompatible content type.
408 Request Timeout The session was closed due to inactivity (session timeout) for 30 seconds. The session is destroyed with session_closed set to true.
413 Payload Too Large The request passed an audio file that exceeded the currently supported data limit. The session is destroyed with session_closed set to true.
415 Unsupported Media Type The request specified an unacceptable media type.
500 Internal Server Error The service experienced an internal error. The session is destroyed with session_closed set to true.
503 Service Unavailable The session is already processing a request. Concurrent requests are not allowed on the same session. The session remains alive after this error.

Example response


{
  "results": [
    {
      "keywords_result": {
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 1.52,
            "confidence": 1.0,
            "end_time": 2.15
          }
        ],
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.94,
            "confidence": 0.913,
            "end_time": 5.62
          }
        ]
      },
      "alternatives": [
        {
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday ",
          "confidence": 0.891,
          "word_confidence": [
            [
              "several",
              1.0
            ],
            [
              "tornadoes",
              1.0
            ],
            . . .
            [
              "on",
              0.311
            ],
            [
              "Sunday",
              0.986
            ]
          ]
        },
        {
          "transcript": "several tornadoes touched down as a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touch down is a line of severe thunderstorms swept through Colorado on Sunday "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}

Delete a session

Deletes an existing session and its engine. The request must pass the cookie that was returned by the Create a session method. You cannot send requests to a session after it is deleted. By default, a session expires after 30 seconds of inactivity if you do not delete it first.


DELETE /v1/sessions/{session_id}

deleteSession(params, callback())

ServiceCall<Void> deleteSession(SpeechSession session)

Request

Parameter Type Description
session_id path string The identifier of the session to be deleted.
Parameter Description
session_id string The identifier of the session to be deleted.
session object A Java SpeechSession object that identifies the session to be deleted.

Example request


curl -X DELETE -u "{username}":"{password}"
--cookie cookies.txt
"https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/{session_id}"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'session_id': '{session_id}'
}

speech_to_text.deleteSession(params, function(error, session) {
  if (error)
    console.log('Error:', error);
  else
    console.log('Session deleted: ', '{session_id}');
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

service.deleteSession({session}).execute();

Response

No response body.

Response codes

Status Description
204 No Content The session was successfully deleted. No content is returned.
400 Bad Request The request must set the cookie.
404 Not Found The specified session_id was not found, possibly because of an invalid session cookie. The specified session_id was not found.
406 Not Acceptable The request specified an Accept header with an incompatible content type.

Exceptions thrown

Exception Description
BadRequestException The request must set the cookie. (HTTP response code 400.)
NotFoundException The specified session was not found, possibly because of an invalid session cookie. (HTTP response code 404.)
ForbiddenException The request specified an Accept header with an incompatible content type. (HTTP response code 406.)

Asynchronous

Register a callback

Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or white-list, the callback URL if it is not already registered by sending a GET request to the callback URL. The service passes a random alphanumeric challenge string via the challenge_string query parameter of the request.

To be registered successfully, the callback URL must respond to the GET request from the service. The response must send status code 200 and must include the challenge string in its body. Upon receiving this response, the service responds to the original POST registration request with response code 201. a RecognitionCallback object that has a status of created.

The service sends only a single GET request to the callback URL. If the service does not receive a reply with a response code of 200 and a body that echoes the challenge string sent by the service within five seconds, it does not white-list the URL; it instead sends status code 400 in response to the POST registration request. If the requested callback URL is already white-listed, the service responds to the initial registration request with response code 200. a RecognitionCallback object that has a status of already created.

If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of the challenge string in its response to the POST request. It sends this signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.

Once you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time. For more information about registering a callback URL, see Registering a callback.

Not supported.


POST /v1/register_callback

ServiceCall<RecognitionCallback> registerCallback(String callbackUrl, String secret)

Request

Parameter Type Description
callback_url query string An HTTP or HTTPS URL to which callback notifications are to be sent. To be white-listed, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.
user_secret query string A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.
body body string An empty request body: {}. With cURL, use the --data option to pass the empty data.
Parameter Description
callbackUrl string An HTTP or HTTPS URL to which callback notifications are to be sent. To be white-listed, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.
secret string A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you pass a value of null, the service does not send the header.

Example request


curl -X POST -u "{username}":"{password}"
--data "{}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/register_callback?callback_url=http://{user_callback_path}/results&user_secret=ThisIsMySecret"

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

RecognitionCallback callback = service.registerCallback("http://{user_callback_path}/results",
  "ThisIsMySecret").execute();
System.out.println(callback);

Response

Returns a Java RecognitionCallback object that contains the information about the new callback that is provided in a JSON RegisterStatus object.

RegisterStatus (Java RecognitionCallback object)
Name Description
status string The current status of the job: created if the callback URL was successfully white-listed as a result of the call or already created if the URL was already white-listed.
url string The callback URL that is successfully registered.

Response codes

Status Description
200 OK The callback was already registered (white-listed). The status included in the response is already created.
201 Created The callback was successfully registered (white-listed). The status included in the response is created.
400 Bad Request The callback registration failed. The request was missing a required parameter or specified an invalid argument; the client sent an invalid response to the service's GET request during the registration process; or the client failed to respond to the server's request before the five-second timeout.
503 Service Unavailable The service is currently unavailable.

Exceptions thrown

Exception Description
BadRequestException The callback registration failed. The request was missing a required parameter or specified an invalid argument; the client sent an invalid response to the service's GET request during the registration process; or the client failed to respond to the server's request before the five-second timeout. (HTTP response code 400.)
ServiceUnavailableException The service is currently unavailable. (HTTP response code 503.)

Example response


{
  "status": "created",
  "url": "http://{user_callback_path}/results"
}

Create a job

Creates a job for a new asynchronous recognition request. The job is owned by the user whose service credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

  • By callback notification: Include the callback_url query parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token query parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.

  • By polling the service: Omit the callback_url, events, and user_token query parameters. You must then use the Check jobs or Check a job method to check the status of the job, using the latter to retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl parameter to specify how long the results are to remain available after the job is complete. Note that using the HTTPS Check a job method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.

The method supports the same basic recognition parameters as all HTTP REST and WebSocket recognition requests. The method supports the same basic recognition parameters as all recognition methods. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Not supported.


POST /v1/recognitions

ServiceCall<RecognitionJob> createRecognitionJob(File audio, RecognizeOptions recognizeOptions,
  RecognitionJobOptions recognitionJobOptions)

Request

The method supports the parameters common to recognition requests made with the service's WebSocket and HTTP interfaces; see the request parameters for the HTTP sessionless Recognize audio method for a list of supported parameters. It also supports the following parameters specific to the asynchronous HTTP interface.

Parameter Type Description
callback_url query string A URL to which callback notifications are to be sent. The URL must already be successfully white-listed by using the POST register_callback method. Omit the parameter to poll the service for job completion and results.

You can include the same callback URL with any number of job creation requests. Use the user_token query parameter to specify a unique user-specified string with each job to differentiate the callback notifications for the jobs.
events query string[, string ...] If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are
  • recognitions.started generates a callback notification when the service begins to process the job.
  • recognitions.completed generates a callback notification when the job is complete. You must use the GET recognitions/{id} method to retrieve the results before they time out or are deleted.
  • recognitions.completed_with_results generates a callback notification when the job is complete. The notification includes the results of the request.
  • recognitions.failed generates a callback notification if the service experiences an error while processing the job.
Omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. The recognitions.completed and recognitions.completed_with_results events are incompatible; you can specify only of the two events.

If the job does not include a callback URL, omit the parameter.
user_token query string If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job. The token allows the user to maintain an internal mapping between jobs and notification events.

If the job does not include a callback URL, omit the parameter.
results_ttl query integer The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.

Pass the audio file to be transcribed via the method's audio argument. The method supports all of the parameters common to recognition requests made with other methods, which you pass as a RecognizeOptions object via the options argument; see the Recognize audio method for a list of supported parameters. The method also supports the following parameters specific to the asynchronous HTTP interface, which you pass as a RecognitionJobOptions object with the recognitionJobOptions argument.

Parameter Description
callbackUrl string A URL to which callback notifications are to be sent. The URL must already be successfully white-listed by using the POST register_callback method. Omit the parameter to poll the service for job completion and results.

You can include the same callback URL with any number of job creation requests. Use the user_token query parameter to specify a unique user-specified string with each job to differentiate the callback notifications for the jobs.
events string[ ] If the job includes a callback URL, an array of notification events to which to subscribe. Valid events are
  • recognitions.started generates a callback notification when the service begins to process the job.
  • recognitions.completed generates a callback notification when the job is complete. You must use the GET recognitions/{id} method to retrieve the results before they time out or are deleted.
  • recognitions.completed_with_results generates a callback notification when the job is complete. The notification includes the results of the request.
  • recognitions.failed generates a callback notification if the service experiences an error while processing the job.
Omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. The recognitions.completed and recognitions.completed_with_results events are incompatible; you can specify only of the two events.

If the job does not include a callback URL, omit the parameter.
userToken string If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job. The token allows the user to maintain an internal mapping between jobs and notification events.

If the job does not include a callback URL, omit the parameter.
resultsTtl integer The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.

Example request


curl -X POST -u "{username}":"{password}"
--header "Content-Type: audio/flac"
--data-binary "@audio-file.flac"
"https://stream.watsonplatform.net/speech-to-text/api/v1/recognitions?callback_url=http://{user_callback_path}/results&user_token=job25&continuous=true&timestamps=true"

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

RecognizeOptions recognizeOptions = new RecognizeOptions.Builder().contentType("audio/flac")
  .continuous(true).timestamps(true).build();

RecognitionJobOptions jobOptions = new RecognitionJobOptions.Builder().userToken("job25")
  .build();

RecognitionJob job = service.createRecognitionJob(new File("audio-file.flac"),
  recognizeOptions, jobOptions).execute();
System.out.println(job);

Response

Returns a Java RecognitionJob object that contains the information about the new job that is provided in a JSON CreateStatus object.

CreateStatus (Java RecognitionJob object)
Name Description
id string The ID of the job.
status string The current status of the job, which is waiting when the job is initially created. Other possible statuses are processing, completed, and failed.
created string The date and time in Coordinated Universal Time (UTC) at which the job was created. The value is provided in full ISO 8601 format (YYYY-MM-DDThh:mm:ss.sTZD).
url string The URL to use to request information about the job with the GET recognitions/{id} method.
warnings string[ ] An array of warning messages about invalid query parameters included with the request. Each warning includes a descriptive message and a list of invalid argument strings, for example, "unexpected query parameter 'user_token', query parameter 'callback_url' was not specified". The request succeeds despite the warnings.

Response codes

Status Description
201 Created The job was successfully created.
400 Bad Request The request specified an invalid argument. For example, the request passed audio that does not match the specified format, specified a callback URL that has not been white-listed, or specified both the recognitions.completed and recognitions.completed_with_results events. Specific messages include
  • Model model not found
  • This 8000hz audio input requires a narrow band model. See /v1/models for a list of available models.
  • speaker_labels is not a supported feature for model model
503 Service Unavailable The service is currently unavailable.

Exceptions thrown

Exception Description
BadRequestException The request specified an invalid argument. For example, the request passed audio that does not match the specified format, specified a callback URL that has not been white-listed, or specified both the recognitions.completed and recognitions.completed_with_results events. (HTTP response code 400.)
ServiceUnavailableException The service is currently unavailable. (HTTP response code 503.)

Example response


{
  "id": "4bd734c0-e575-21f3-de03-f932aa0468a0",
  "status": "waiting",
  "created": "2016-08-17T19:15:17.926Z",
  "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/recognitions/4bd734c0-e575-21f3-de03-f932aa0468a0"
}

Check jobs

Returns the status and ID of all outstanding jobs associated with the service credentials with which it is called. The method also returns the creation and update times of each job, and, if a job was created with a callback URL and a user token, the user token for the job. To obtain the results for a job whose status is completed, use the Check a job method. A job and its results remain available until you delete them with the Delete a job method or until the job's time to live expires, whichever comes first.

Not supported.


GET /v1/recognitions

ServiceCall<List<RecognitionJob>> getRecognitionJobs()

Request

No arguments.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/recognitions"

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

List<RecognitionJob> jobs = service.getRecognitionJobs().execute();
System.out.println(jobs);

Response

JobsStatusList
Name Description
recognitions object[ ] An array of JobsStatus objects that provides the status for each of the user's current jobs. The array is empty if the user has no current jobs.

Returns a List of Java RecognitionJob objects that contain the information about a job that is provided in a JSON JobsStatus object.

JobsStatus (Java RecognitionJob object)
Name Description
id string The ID of the job.
status string The current status of the job:
  • waiting: The service is preparing the job for processing. The service returns this status when the job is initially created or when it is waiting for capacity to process the job. The job remains in this state until the service has the capacity to begin processing it.
  • processing: The service is actively processing the job.
  • completed: The service has finished processing the job. If the job specified a callback URL and the event recognitions.completed_with_results, the service sent the results with the callback notification. Otherwise, use the GET recognitions/{id} method to retrieve the results.
  • failed: The job failed.
created string The date and time in Coordinated Universal Time (UTC) at which the job was created. The value is provided in full ISO 8601 format (YYYY-MM-DDThh:mm:ss.sTZD).
updated string The date and time in Coordinated Universal Time (UTC) at which the job was last updated by the service. The value is provided in full ISO 8601 format (YYYY-MM-DDThh:mm:ss.sTZD).
user_token string The user token associated with the job, if the job was created with a callback URL and a user token.

Response codes

Status Description
200 OK The request succeeded.
503 Service Unavailable The service is currently unavailable.

Exceptions thrown

Exception Description
ServiceUnavailableException The service is currently unavailable. (HTTP response code 503.)

Example response


{
  "recognitions": [
    {
      "id": "4bd734c0-e575-21f3-de03-f932aa0468a0",
      "created": "2016-08-17T19:15:17.926Z",
      "updated": "2016-08-17T19:15:17.926Z",
      "status": "waiting",
      "user_token": "job25"
    },
    {
      "id": "4bb1dca0-f6b1-11e5-80bc-71fb7b058b20",
      "created": "2016-08-17T19:13:23.622Z",
      "updated": "2016-08-17T19:13:24.434Z",
      "status": "processing"
    },
    {
      "id": "398fcd80-330a-22ba-93ce-1a73f454dd98",
      "created": "2016-08-17T19:11:04.298Z",
      "updated": "2016-08-17T19:11:16.003Z",
      "status": "completed"
    }
  ]
}

[
  {
    "id": "4bd734c0-e575-21f3-de03-f932aa0468a0",
    "created": "2016-08-17T19:15:17.926Z",
    "updated": "2016-08-17T19:15:17.926Z",
    "status": "waiting",
    "user_token": "job25"
  },
  {
    "id": "4bb1dca0-f6b1-11e5-80bc-71fb7b058b20",
    "created": "2016-08-17T19:13:23.622Z",
    "updated": "2016-08-17T19:13:24.434Z",
    "status": "processing"
  },
  {
    "id": "398fcd80-330a-22ba-93ce-1a73f454dd98",
    "created": "2016-08-17T19:11:04.298Z",
    "updated": "2016-08-17T19:11:16.003Z",
    "status": "completed"
  }
]

Check a job

Returns information about a specified job. The response always includes the status of the job and its creation and update times. If the status is completed, the response also includes the results of the recognition request. You must submit the request with the service credentials of the user who created the job.

You can use the method to retrieve the results of any job, regardless of whether it was submitted with a callback URL and the recognitions.completed_with_results event, and you can retrieve the results multiple times for as long as they remain available. Use the Check jobs method to request information about all jobs associated with the calling user.

Not supported.


GET /v1/recognitions/{id}

ServiceCall<RecognitionJob> getRecognitionJob(String id)

Request

Parameter Type Description
id path string The ID of the job whose status is to be checked.
Parameter Description
id string The ID of the job whose status is to be checked.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/recognitions/{id}"

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

RecognitionJob job = service.getRecognitionJob({id}).execute();
System.out.println(job);

Response

Returns a Java RecognitionJob object that contains the information about the job that is provided in a JSON JobStatus object.

JobStatus (Java RecognitionJob object)
Name Description
id string The ID of the job.
status string The current status of the job:
  • waiting: The service is preparing the job for processing. The service also returns this status when the job is initially created or when it is waiting for capacity to process the job. The job remains in this state until the service has the capacity to process it.
  • processing: The service is actively processing the job.
  • completed: The service has finished processing the job. If the job specified a callback URL and the event recognitions.completed_with_results, the service sent the results with the callback notification. Otherwise, use the GET recognitions/{id} method to retrieve the results.
  • failed: The job failed.
created string The date and time in Coordinated Universal Time (UTC) at which the job was created. The value is provided in full ISO 8601 format (YYYY-MM-DDThh:mm:ss.sTZD).
updated string The date and time in Coordinated Universal Time (UTC) at which the job was last updated by the service. The value is provided in full ISO 8601 format (YYYY-MM-DDThh:mm:ss.sTZD).
results object[ ] If the status is completed, the results of the recognition request as an array that includes one or more SpeechRecognitionEvent objects depending on the input. a List that contains one or more Java SpeechResults objects depending on the input. Each has the same information as a SpeechRecognitionEvent object.

Response codes

Status Description
200 OK The request succeeded.
404 Not Found The specified job id was not found.
503 Service Unavailable The service is currently unavailable.

Exceptions thrown

Exception Description
NotFoundException The specified job id was not found. (HTTP response code 404.)
ServiceUnavailableException The service is currently unavailable. (HTTP response code 503.)

Example response


{
  "id": "4bd734c0-e575-21f3-de03-f932aa0468a0",
  "results": [
    {
      "result_index": 0,
      "results": [
        {
          "final": true,
          "alternatives": [
            {
              "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday ",
              "timestamps": [
                [
                  "several",
                  1,
                  1.52
                ],
                [
                  "tornadoes",
                  1.52,
                  2.15
                ],
                . . .
                [
                  "Sunday",
                  5.74,
                  6.33
                ]
              ],
              "confidence": 0.885
            }
          ]
        }
      ]
    }
  ],
  "created": "2016-08-17T19:11:04.298Z",
  "updated": "2016-08-17T19:11:16.003Z",
  "status": "completed"
}

Delete a job

Deletes the specified job. You cannot delete a job that the service is actively processing. Once you delete a job, its results are no longer available. The service automatically deletes a job and its results when the time to live for the results expires. You must submit the request with the service credentials of the user who created the job.

Not supported.


DELETE /v1/recognitions/{id}

ServiceCall<Void> deleteRecognitionJob(String id)

Request

Parameter Type Description
id path string The ID of the job that is to be deleted.
Parameter Description
id string The ID of the job that is to be deleted.

Example request


curl -X DELETE -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/recognitions/{id}"

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

service.deleteRecognitionJob({id}).execute();

Response

No response body.

Response codes

Status Description
204 No Content The job was successfully deleted.
400 Bad Request The service cannot delete a job that it is actively processing:
  • unable to delete the processing job
404 Not Found The specified job id was not found.
503 Service Unavailable The service is currently unavailable.

Exceptions thrown

Exception Description
BadRequestException The service cannot delete a job that it is actively processing. (HTTP response code 400.)
NotFoundException The specified job id was not found. (HTTP response code 404.)
ServiceUnavailableException The service is currently unavailable. (HTTP response code 503.)

Custom models

Create a custom model

Creates a new custom language model for a specified base language model. The custom language model can be used only with the base language model for which it is created. The new model is owned by the individual whose service credentials are used to create it.

Supported but not yet documented. For more information, see Watson Developer Cloud Java SDK.


POST /v1/customizations

createCustomization(params, callback)

Request

Parameter Type Description
Content-Type header string The type of the input, application/json.
body body object A JSON CustomModel object that provides basic information about the new custom model.
CustomModel
Parameter Description
name string The name of the new custom model. Use a name that is unique among all custom models that are owned by the calling user. Use a localized name that matches the language of the custom model.
base_model_name string The name of the language model that is to be customized by the new model. You must use the name of one of the US English or Japanese models that is returned by the Get models method:
  • en-US_BroadbandModel
  • en-US_NarrowbandModel
  • ja-JP_BroadbandModel
  • ja-JP_NarrowbandModel
The new custom model can be used only with the base language model that it customizes.
description string A description of the new custom model. Use a localized description that matches the language of the custom model.
Parameter Description
name string The name of the new custom model. Use a name that is unique among all custom models that are owned by the calling user. Use a localized name that matches the language of the custom model.
base_model_name string The name of the language model that is to be customized by the new model. You must use the name of one of the US English or Japanese models that is returned by the Get models method:
  • en-US_BroadbandModel
  • en-US_NarrowbandModel
  • ja-JP_BroadbandModel
  • ja-JP_NarrowbandModel
The new custom model can be used only with the base language model that it customizes.
description string A description of the new custom model. Use a localized description that matches the language of the custom model.

Example request


curl -X POST -u "{username}":"{password}"
--header "Content-Type: application/json"
--data "{\"name\": \"Example model\",
  \"base_model_name\": \"en-US_BroadbandModel\",
  \"description\": \"Example custom language model\"}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  name: 'Example model',
  'base_model_name': 'en-US_BroadbandModel',
  description: 'Example custom language model'
};

speech_to_text.createCustomization(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(models, null, 2));
});

Response

CustomizationID
Name Description
customization_id string The GUID of the new custom language model.

Response codes

Status Description
201 Created The custom language model was successfully created.
400 Bad Request A required parameter is null or invalid. Specific failure messages include:
  • Required parameter 'name' is missing
  • Required parameter 'name' cannot be empty string
  • Required parameter 'name' cannot be null
  • The base model 'name' is not recognized
  • Customization is not supported for base model 'name'
401 Unauthorized The specified service credentials are invalid.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{
  "customization_id": "74f4807e-b5ff-4866-824e-6bba1a84fe96",
}

List custom models

Lists information about all custom language models that are owned by the calling user. Use the language parameter to see all custom models for the specified language; omit the parameter to see the custom models for all languages.

Supported but not yet documented. For more information, see Watson Developer Cloud Java SDK.


GET /v1/customizations

getCustomizations(params, callback)

Request

Parameter Type Description
language query string The language for which custom models are to be returned:
  • en-US (the default)
  • ja-JP
Parameter Description
language string The language for which custom models are to be returned:
  • en-US (the default)
  • ja-JP

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations?language=en-US"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

speech_to_text.getCustomizations(null, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(models, null, 2));
});

Response

Customizations
Name Description
customizations object[ ] An array of Customization objects that provides information about each available custom model. The array is empty if the user owns no custom models (if no language is specified) or owns no custom models for the specified language.
Customization
Name Description
customization_id string The GUID of the custom language model.
created string The date and time in Coordinated Universal Time (UTC) at which the custom language model was created. The value is provided in full ISO 8601 format (YYYY-MM-DDThh:mm:ss.sTZD).
language string The language of the custom language model, en-US or ja-JP.
owner string The GUID of the service credentials for the owner of the custom language model.
name string The name of the custom language model.
description string The description of the custom language model.
base_model_name string The name of the base language model for which the custom language model was created.
status string The current status of the custom language model:
  • pending indicates that the model was created but is waiting either for training data to be added or for the service to finish analyzing added data.
  • ready indicates that the model contains data and is ready to be trained.
  • training indicates that the model is currently being trained.
  • available indicates that the model is trained and ready to use.
  • failed indicates that training of the model failed.
progress integer A percentage that indicates the progress of the model's current training. A value of 100 means that the model is fully trained.
Note: For this beta release, the progress field does not reflect the current progress of the training. The field changes from 0 to 100 when training is complete.
warnings string If the request included unknown query parameters, the following message:
  • Unexpected query parameter(s) [parameters] detected
where parameters is a list that includes a quoted string for each unknown parameter.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The specified language is not supported:
  • Language 'xx-XX' is not supported for customization
401 Unauthorized The specified service credentials are invalid.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{
  "customizations": [
    {
      "customization_id": "74f4807e-b5ff-4866-824e-6bba1a84fe96",
      "created": "2016-06-01T18:42:25.324Z",
      "language": "en-US",
      "owner": "297cfd08-330a-22ba-93ce-1a73f454dd98",
      "name": "Example model",
      "description": "Example custom language model",
      "base_model_name": "en-US_BroadbandModel",
      "status": "pending",
      "progress": 0
    },
    {
      "customization_id": "8391f918-3b76-e109-763c-b7732fae4829",
      "created": "2016-06-01T18:51:37.291Z",
      "language": "en-US",
      "owner": "297cfd08-330a-22ba-93ce-1a73f454dd98",
      "name": "Example model two",
      "description": "Example custom language model two",
      "base_model_name": "en-US_NarrowbandModel",
      "status": "available",
      "progress": 100
    }
  ]
}

List a custom model

Lists information about a specified custom language model. Only the owner of a custom model can use this method to query information about the model.

Supported but not yet documented. For more information, see Watson Developer Cloud Java SDK.


GET /v1/customizations/{customization_id}

getCustomization(params, callback)

Request

Parameter Type Description
customization_id path string The GUID of the custom language model for which information is to be returned. You must make the request with the service credentials of the model's owner.
Parameter Description
customization_id string The GUID of the custom language model for which information is to be returned. You must make the request with the service credentials of the model's owner.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}'
};

speech_to_text.getCustomization(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(models, null, 2));
});

Response

The method returns a single instance of a Customization object that provides information about the specified model.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The specified customization ID is invalid:
  • Malformed GUID: 'customization_id'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requester:
  • Invalid customization_id 'customization_id' for user
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{
  "customization_id": "74f4807e-b5ff-4866-824e-6bba1a84fe96",
  "created": "2016-06-01T18:42:25.324Z",
  "language": "en-US",
  "owner": "297cfd08-330a-22ba-93ce-1a73f454dd98",
  "name": "Example model",
  "description": "Example custom language model",
  "base_model_name": "en-US_BroadbandModel",
  "status": "pending",
  "progress": 0
}

Train a custom model

Initiates the training of a custom language model with new corpora, custom words, or both. After adding corpora or words to the custom model, use this method to begin the actual training of the model on the new data. You can specify whether the custom model is to be trained with all words from its words resources or only with words that were added or modified by the user. Only the owner of a custom model can use this method to train the model.

This method is asynchronous and can take on the order of minutes to complete depending on the amount of data on which the service is being trained and the current load on the service. The method returns an HTTP 200 response code to indicate that the training process has begun. If the method succeeds, the training process has begun.

You can monitor the status of the training by using the List a custom model method to poll the model's status. Use a loop to check the status every 10 seconds. The method returns a Customization object that includes status and progress fields. A status of available means that the custom model is trained and ready to use. If training is in progress, the progress field indicates the progress of the training as a percentage complete. The service cannot accept subsequent training requests, or requests to add new corpora or words, until the existing request completes.

You can monitor the status of the request by using the Monitor a custom model method to poll the model's status at a specified interval. The service cannot accept subsequent training requests, or requests to add new corpora or words, until the existing request completes.

Training can fail to start for the following reasons:

  • No training data (corpora or words) have been added to the custom model.

  • Pre-processing of corpora to generate a list of out-of-vocabulary (OOV) words is not complete.

  • Pre-processing of words to validate or auto-generate sounds-like pronunciations is not complete.

  • One or more words that were added to the custom model have invalid sounds-like pronunciations that you must fix.

Supported but not yet documented. For more information, see Watson Developer Cloud Java SDK.


POST /v1/customizations/{customization_id}/train

trainCustomization(params, callback)

Request

Parameter Type Description
customization_id path string The GUID of the custom language model that is to be trained. You must make the request with the service credentials of the model's owner.
word_type_to_add query string The type of words from the custom model's words resource on which to train the model:
  • all (the default) trains the model on all new words, regardless of whether they were extracted from corpora or were added or modified by the user.
  • user trains the model only on new words that were added or modified by the user; the model is not trained on new words extracted from corpora.
body body string An empty request body: {}. With cURL, use the --data option to pass the empty data.
Parameter Description
customization_id string The GUID of the custom language model that is to be trained. You must make the request with the service credentials of the model's owner.
word_type_to_add string The type of words from the custom model's words resource on which to train the model:
  • all (the default) trains the model on all new words, regardless of whether they were extracted from corpora or were added or modified by the user.
  • user trains the model only on new words that were added or modified by the user; the model is not trained on new words extracted from corpora.

Example request


curl -X POST -u "{username}":"{password}"
--data "{}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/train"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}'
};

speech_to_text.trainCustomization(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(models, null, 2));
});

Response

An empty response body: {}.

Response codes

Status Description
200 OK Training of the custom model started successfully.
400 Bad Request A required parameter is null or invalid, the custom model is not ready to be trained, or the total number of words or OOV words exceeds the maximum threshold. Specific failure messages include:
  • No input data available for running training
  • Total number of words number exceeds maximum allowed
  • Total number of OOV words number exceeds maximum
  • Malformed GUID: 'customization_id'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requester:
  • Invalid customization_id 'customization_id' for user
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization 'customization_id' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{}

Reset a custom model

Resets a custom language model by removing all corpora and words from the model. Resetting a custom model initializes the model to its state when it was first created. Metadata such as the name and language of the model are preserved. Only the owner of a custom model can use this method to reset the model.

Supported but not yet documented. For more information, see Watson Developer Cloud Java SDK.


POST /v1/customizations/{customization_id}/reset

resetCustomization(params, callback)

Request

Parameter Type Description
customization_id path string The GUID of the custom language model that is to be reset. You must make the request with the service credentials of the model's owner.
body body string An empty request body: {}. With cURL, use the --data option to pass the empty data.
Parameter Description
customization_id string The GUID of the custom language model that is to be reset. You must make the request with the service credentials of the model's owner.

Example request


curl -X POST -u "{username}":"{password}"
--data "{}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/reset"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}'
};

speech_to_text.resetCustomization(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(models, null, 2));
});

Response

An empty response body: {}.

Response codes

Status Description
200 OK The custom model was successfully reset.
400 Bad Request The specified customization ID is invalid:
  • Malformed GUID: 'customization_id'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requester:
  • Invalid customization_id 'customization_id' for user
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization 'customization_id' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{}

Upgrade a custom model

Upgrades a custom language model to the latest release level of the Speech to Text service. The method bases the upgrade on the latest trained data stored for the custom model. If the corpora or words for the model have changed since the model was last trained, you must use the Train a custom model method to train the model on the new data. Only the owner of a custom model can use this method to upgrade the model.

Not supported. This method is not currently implemented by the service. It will be added for a future release of the API.

Not supported.


POST /v1/customizations/{customization_id}/upgrade_model

Request

Parameter Type Description
customization_id path string The GUID of the custom language model that is to be upgraded. You must make the request with the service credentials of the model's owner.
body body string An empty request body: {}. With cURL, use the --data option to pass the empty data.

Example request


curl -X POST -u "{username}":"{password}"
--data "{}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/cupstomizations/{customization_id}/upgrade_model"

Response

An empty response body: {}.

Response codes

Status Description
200 OK The custom model was successfully upgraded.
204 No Content The custom model was already current with the latest release level.
400 Bad Request The specified customization ID is invalid:
  • Malformed GUID: 'customization_id'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requester:
  • Invalid customization_id 'customization_id' for user
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{}

Monitor a custom model

Monitors the status of an Add custom words or Train a custom model request. Those calls succeed if initial validation of new words is successful or the training operation successfully begins. This call checks the status of the service's asynchronous completion of those requests. Only the owner of a custom model can use this method to monitor the status of a request to add words to the model or to train the model.

You can specify the frequency with which the method is to check the status of the custom model and the number of times it is to check before giving up. If the operation being monitored succeeds, the method returns information about the custom model, including its status: ready if new words were successfully added or available if the model was successfully trained.

The method can return one of the following error messages:

  • Customization is still pending, try increasing interval or times params: The service's analysis of new words or training of the custom model failed to complete within the specified number of attempts. You can use the method to repeat the monitoring operation.

  • Customization training failed: The service's analysis of new words or training of the custom model failed. The model has the status failed.

  • Unexpected customization status: status: The custom model is in an unexpected state.

Not supported. For information about monitoring a request to train a custom model, see the Train a custom model method. For information about monitoring a request to add words to a custom model, see the Add custom words method.


whenCustomizationReady(params, callback)

Request

Parameter Description
customization_id string The GUID of the custom language model whose status is to be monitored. You must make the request with the service credentials of the model's owner.
interval integer The frequency in milliseconds at which the status of the custom model is to be checked. By default, the method checks the status every 5000 milliseconds (5 seconds).
times integer The maximum number of times that the status of the custom model is to be checked. By default, the method checks the status 30 times. The method returns an error if the operation is not complete after the specified number of attempts.

Example request


var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}',
  interval: 10000
};

speech_to_text.whenCustomizationReady(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(models, null, 2));
});

Response

The method returns a single instance of a Customization object that provides information about the specified model, including its status.

Response codes

Status Description
400 Bad Request A required parameter is null or invalid. Specific failure messages include:
  • Malformed GUID: 'customization_id'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requester:
  • Invalid customization_id 'customization_id' for user
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response for adding words


{
  "customization_id": "74f4807e-b5ff-4866-824e-6bba1a84fe96",
  "created": "2016-06-01T18:42:25.324Z",
  "language": "en-US",
  "owner": "297cfd08-330a-22ba-93ce-1a73f454dd98",
  "name": "Example model",
  "description": "Example custom language model",
  "base_model_name": "en-US_BroadbandModel",
  "status": "ready",
  "progress": 0
}

Example response for training


{
  "customization_id": "74f4807e-b5ff-4866-824e-6bba1a84fe96",
  "created": "2016-06-01T18:42:25.324Z",
  "language": "en-US",
  "owner": "297cfd08-330a-22ba-93ce-1a73f454dd98",
  "name": "Example model",
  "description": "Example custom language model",
  "base_model_name": "en-US_BroadbandModel",
  "status": "available",
  "progress": 100
}

Delete a custom model

Deletes an existing custom language model. The custom model cannot be deleted if another request, such as adding a corpus to the model, is currently being processed. Only the owner of a custom model can use this method to delete the model.

Supported but not yet documented. For more information, see Watson Developer Cloud Java SDK.


DELETE /v1/customizations/{customization_id}

deleteCustomization(params, callback)

Request

Parameter Type Description
customization_id path string The GUID of the custom language model that is to be deleted. You must make the request with the service credentials of the model's owner.
Parameter Description
customization_id string The GUID of the custom language model that is to be deleted. You must make the request with the service credentials of the model's owner.

Example request


curl -X DELETE -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}'
};

speech_to_text.deleteCustomization(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(models, null, 2));
});

Response

An empty response body: {}.

Response codes

Status Description
200 OK The custom model was successfully deleted.
400 Bad Request The specified customization ID is invalid:
  • Malformed GUID: 'customization_id'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requester:
  • Invalid customization_id 'customization_id' for user
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization 'customization_id' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{}

Custom corpora

Add a corpus

Adds a single corpus text file of new training data to the custom language model. Use multiple requests to submit multiple corpus text files. Only the owner of a custom model can use this method to add a corpus to the model. Note that adding a corpus does not affect the custom model until you train the model for the new data by using the Train a custom model method.

Submit a plain text file that contains sample sentences from the domain of interest to enable the service to extract words in context. The more sentences you add that represent the context in which speakers use words from the domain, the better the service's recognition accuracy. For guidelines about adding a corpus text file and for information about how the service parses a corpus file, see Preparing a corpus file.

The call returns an HTTP 201 response code if the corpus is valid. The call succeeds if the corpus is valid. The service then asynchronously pre-processes the contents of the corpus and automatically extracts new words that it finds. This can take on the order of a minute or two to complete depending on the total number of words and the number of new words in the corpus, as well as the current load on the service. You cannot submit requests to add additional corpora or words to the custom model, or to train the model, until the service's analysis of the corpus for the current request completes. Use the List a corpus method to check the status of the analysis. Use the Monitor a corpus method to monitor the status of the analysis.

The service auto-populates the model's words resource with any word that is not found in its base vocabulary; these are referred to as out-of-vocabulary (OOV) words. You can use the List custom words method to examine the words resource. If necessary, you can use the Add custom words or Add a custom word method to correct problems, eliminate typographical errors, and modify how words are pronounced.

To add a corpus file that has the same name as an existing corpus, set the allow_overwrite parameter to true; otherwise, the request fails. Overwriting an existing corpus causes the service to process the corpus text file and extract OOV words anew. Before doing so, it removes any OOV words associated with the existing corpus from the model's words resource unless they were also added by another corpus or they have been modified in some way by the user.

The service limits the overall amount of data that you can add to a custom model to a maximum of 10 million total words from all corpora combined. Also, you can add no more than 30 thousand new custom words to a model; this includes words that the service extracts from corpora and words that you add directly.

Supported but not yet documented. For more information, see Watson Developer Cloud Java SDK.


POST /v1/customizations/{customization_id}/corpora/{corpus_name}

addCorpus(params, callback)

Request

Parameter Type Description
customization_id path string The GUID of the custom language model to which a corpus is to be added. You must make the request with the service credentials of the model's owner.
corpus_name path string The name of the corpus that is to be added. The name cannot contain spaces and cannot be the string user, which is reserved by the service to denote custom words added or modified by the user. Use a localized name that matches the language of the custom model.
allow_overwrite query boolean Indicates whether the specified corpus is to overwrite an existing corpus with the same name. If a corpus with the same name already exists, the request fails unless allow_overwrite is set to true; by default, the parameter is false. The parameter has no effect if a corpus with the same name does not already exist.
body body file A plain text file that contains the training data for the corpus. Encode the file in UTF-8 if it contains non-ASCII characters; the service assumes UTF-8 encoding if it encounters non-ASCII characters. With cURL, use the --data-binary option to upload the file for the request.
Parameter Description
customization_id string The GUID of the custom language model to which a corpus is to be added. You must make the request with the service credentials of the model's owner.
name string The name of the corpus that is to be added. The name cannot contain spaces and cannot be the string user, which is reserved by the service to denote custom words added or modified by the user. Use a localized name that matches the language of the custom model.
corpus file Plain text that contains the training data for the corpus. Provide the text as a string, a buffer, or as a readable stream; a readable stream is recommended when reading a file from disk. Encode the file in UTF-8 if it contains non-ASCII characters; the service assumes UTF-8 encoding if it encounters non-ASCII characters.
allow_overwrite boolean Indicates whether the specified corpus is to overwrite an existing corpus with the same name. If a corpus with the same name already exists, the request fails unless allow_overwrite is set to true; by default, the parameter is false. The parameter has no effect if a corpus with the same name does not already exist.

Example request


curl -X POST -u "{username}":"{password}"
--data-binary "@MyCorpus.txt"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/corpora/MyCorpus"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var fs = require('fs');

var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}',
  name: 'MyCorpus',
  corpus: fs.createReadStream('MyCorpus.txt')
};

speech_to_text.addCorpus(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(models, null, 2));
});

Response

An empty response body: {}.

Response codes

Status Description
201 Created Addition of the corpus data was successfully started. The service is analyzing the data.
400 Bad Request A required parameter is null or invalid, or the specified corpus already exists. Specific failure messages include:
  • Malformed GUID: 'customization_id'
  • Corpus file not specified or empty
  • Corpus 'name' already exists - change its name, remove existing corpus before adding new one, or overwrite existing corpus by setting 'allow_overwrite' to 'true'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requester:
  • Invalid customization_id 'customization_id' for user
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization 'customization_id' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request. You can also receive status code 500 Forwarding error if the service is currently busy handling a previous request for the custom model.

Example response


{}

List corpora

Lists information about all corpora that have been added to the specified custom language model. The information includes the total number of words and out-of-vocabulary (OOV) words, name, and status of each corpus. Only the owner of a custom model can use this method to list the model's corpora.

Supported but not yet documented. For more information, see Watson Developer Cloud Java SDK.


GET /v1/customizations/{customization_id}/corpora

getCorpora(params, callback)

Request

Parameter Type Description
customization_id path string The GUID of the custom language model for which corpora are to be listed. You must make the request with the service credentials of the model's owner.
Parameter Description
customization_id string The GUID of the custom language model for which corpora are to be listed. You must make the request with the service credentials of the model's owner.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/corpora"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}'
};

speech_to_text.getCorpora(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(models, null, 2));
});

Response

Corpora
Name Description
corpora object[ ] An array of Corpus objects that provides information about corpora of the custom model. The array is empty if the custom model has no corpora.
Corpus
Name Description
name string The name of the corpus.
total_words integer The total number of words in the corpus. The value is 0 while the corpus is being processed.
out_of_vocabulary_words integer The number of OOV words in the corpus. The value is 0 while the corpus is being processed.
status string The status of the corpus:
  • analyzed indicates that the service has successfully analyzed the corpus. The custom model can be trained with data from the corpus.
  • being_processed indicates that the service is still analyzing the corpus. The service cannot accept requests to add new corpora or words, or to train the custom model.
  • undetermined indicates that the service encountered an error while processing the corpus.
error string If the status of the corpus is undetermined, the following message:
  • Analysis of corpus 'name' failed. Please try adding the corpus again by setting the 'allow_overwrite' flag to 'true'.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The specified customization ID is invalid:
  • Malformed GUID: 'customization_id'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requester:
  • Invalid customization_id 'customization_id' for user
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{
  "corpora": [
    {
      "name": "corpus1",
      "out_of_vocabulary_words": 191,
      "total_words": 5037,
      "status": "analyzed"
    },
    {
      "name": "corpus2",
      "out_of_vocabulary_words": 0,
      "total_words": 0,
      "status": "being_processed"
    },
    {
      "name": "corpus3",
      "out_of_vocabulary_words": 0,
      "total_words": 0,
      "status": "undetermined",
      "error": "Analysis of corpus 'corpus3.txt' failed. Please try adding the corpus again by setting the 'allow_overwrite' flag to 'true'."
    }
  ]
}

List a corpus

Lists information about a specified corpus. The information includes the total number of words and out-of-vocabulary (OOV) words, name, and status of the corpus. Only the owner of a custom model can use this method to list information about a corpus from the model.

Not supported.


GET /v1/customizations/{customization_id}/corpora/{corpus_name}

getCorpus(params, callback)

Request

Parameter Type Description
customization_id path string The GUID of the custom language model for which a corpus is be listed. You must make the request with the service credentials of the model's owner.
corpus_name path string The name of the corpus about which information is to be listed.
Parameter Description
customization_id string The GUID of the custom language model for which a corpus is be listed. You must make the request with the service credentials of the model's owner.
name string The name of the corpus about which information is to be listed.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/corpora/MyCorpus"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}',
  name: 'MyCorpus'
};

speech_to_text.getCorpus(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(models, null, 2));
});

Response

The method returns a single instance of a Corpus object that provides information about the specified corpus.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The specified customization ID or corpus name is invalid:
  • Malformed GUID: 'customization_id'
  • Invalid value for corpus name 'corpus_name'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requester:
  • Invalid customization_id 'customization_id' for user
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{
  "name": "MyCorpus",
  "out_of_vocabulary_words": 191,
  "total_words": 5037,
  "status": "analyzed"
}

Monitor a corpus

Monitors the status of an Add corpus request. That call succeeds if initial validation of a corpus is successful. This call checks the status of the service's asynchronous processing of the contents of the corpus. Only the owner of a custom model can use this method to monitor the status of a request to add a corpus to the model.

You can specify the frequency with which the method is to check the status of the corpus and the number of times it is to check before giving up. If the analysis succeeds, the method returns information about all of the custom model's corpora. (Note that the service can analyze the addition of only a single corpus at a time.)

The method can return one of the following error messages:

  • Corpora is still being processed, try increasing interval or times params: The service's analysis of the corpus failed to complete within the specified number of attempts. You can use the method to repeat the monitoring operation.

  • Unexpected corpus analysis status: The corpus is in an unexpected state.

Not supported. For information about monitoring a request to add a corpus to a custom model, see the Add a corpus method.


whenCorporaAnalyzed(params, callback)

Request

Parameter Description
customization_id string The GUID of the custom language model for which analysis of a corpus is be monitored. You must make the request with the service credentials of the model's owner.
interval integer The frequency in milliseconds at which the status of the corpus is to be checked. By default, the method checks the status every 5000 milliseconds (5 seconds).
times integer The maximum number of times that the status of the corpus is to be checked. By default, the method checks the status 30 times. The method returns an error if the operation is not complete after the specified number of attempts.

Example request


var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}',
  interval: 10000
};

speech_to_text.whenCorporaAnalyzed(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(models, null, 2));
});

Response

The method returns a Corpora object that contains an array of Corpus objects, each of which provides information about a corpus of the specified custom model.

Response codes

Status Description
400 Bad Request A required parameter is null or invalid. Specific failure messages include:
  • Malformed GUID: 'customization_id'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requester:
  • Invalid customization_id 'customization_id' for user
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{
  "corpora": [
    {
      "name": "corpus1",
      "out_of_vocabulary_words": 191,
      "total_words": 5037,
      "status": "analyzed"
    },
    {
      "name": "corpus2",
      "out_of_vocabulary_words": 15,
      "total_words": 1154,
      "status": "analyzed"
    }
  ]
}

Delete a corpus

Deletes an existing corpus from a custom language model. The service removes any out-of-vocabulary (OOV) words associated with the corpus from the custom model's words resource unless they were also added by another corpus or they have been modified in some way with the Add custom words or Add a custom word method. Removing a corpus does not affect the custom model until you train the model with the Train a custom model method. Only the owner of a custom model can use this method to delete a corpus from the model.

Supported but not yet documented. For more information, see Watson Developer Cloud Java SDK.


DELETE /v1/customizations/{customization_id}/corpora/{corpus_name}

deleteCorpus(params, callback)

Request

Parameter Type Description
customization_id path string The GUID of the custom language model from which a corpus is to be deleted. You must make the request with the service credentials of the model's owner.
corpus_name path string The name of the corpus that is to be deleted.
Parameter Description
customization_id string The GUID of the custom language model from which a corpus is to be deleted. You must make the request with the service credentials of the model's owner.
name string The name of the corpus that is to be deleted.

Example request


curl -X DELETE -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/corpora/MyCorpus"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}',
  name: 'MyCorpus'
};

speech_to_text.deleteCorpus(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(models, null, 2));
});

Response

An empty response body: {}.

Response codes

Status Description
200 OK The corpus was successfully deleted.
400 Bad Request The specified customization ID or corpus name is invalid, including the case where the corpus does not exist for the custom model. Specific failure messages include:
  • Malformed GUID: 'customization_id'
  • Invalid value for corpus name 'corpus_name'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requester:
  • Invalid customization_id 'customization_id' for user
405 Method Not Allowed No corpus name was specified with the request.
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization 'customization_id' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{}

Custom words

Add custom words

Adds one or more custom words to a custom language model. The service populates the words resource for a custom model with out-of-vocabulary (OOV) words found in each corpus added to the model. You can use this method to add additional words or to modify existing words in the words resource. Only the owner of a custom model can use this method to add or modify custom words associated with the model. Adding or modifying custom words does not affect the custom model until you train the model for the new data by using the Train a custom model method.

You add custom words by providing a Words object, which is an array of Word objects, one per word. You add custom words by providing an array of Word objects, one per word. You must use the Word object's word parameter to identify the word that is to be added. You can also provide one or both of the following optional parameters for each word:

  • The sounds_like parameter provides an array of one or more pronunciations for the word. Use the parameter to specify how the word can be pronounced by users. Use the parameter for words that are difficult to pronounce, foreign words, acronyms, and so on. For example, you might specify that the word IEEE can sound like i triple e. You can specify a maximum of five sounds-like pronunciations for a word. For information about pronunciation rules, see Using the sounds_like field.

  • The display_as parameter provides a different way of spelling the word in a transcript. Use the parameter when you want the word to appear different from its usual representation or from its spelling in corpora training data. For example, you might indicate that the word IBM(trademark) is to be displayed as IBM™. For more information, see Using the display_as field.

If you add a custom word that already exists in the words resource for the custom model, the new definition overrides the existing data for the word. If the service encounters an error with the input data, it returns a failure code and does not add any of the words to the words resource.

The call returns an HTTP 201 response code if the input data is valid. The call succeeds if the input data is valid. The service then asynchronously pre-processes the words to add them to the model's words resource. The time that it takes for the analysis to complete depends on the number of new words that you add but is generally faster than adding a corpus or training a model.

You can monitor the status of the request by using the List a custom model method to poll the model's status. Use a loop to check the status every 10 seconds. The method returns a Customization object that includes a status field. A status of ready means that the words have been added to the custom model. The service cannot accept requests to add new corpora or words or to train the model until the existing request completes.

You can monitor the status of the request by using the Monitor a custom model method to poll the model's status at a specified interval. The service cannot accept requests to add new corpora or words or to train the model until the existing request completes.

You can use the List custom words or List a custom word method to review the words that you add. Words with an invalid sounds_like field include an error field that describes the problem. If necessary, you can use the Add custom words or Add a custom word method to correct problems, eliminate typographical errors, and modify how words are pronounced.

Supported but not yet documented. For more information, see Watson Developer Cloud Java SDK.


POST /v1/customizations/{customization_id}/words

addWords(params, callback)

Request

Parameter Type Description
customization_id path string The GUID of the custom language model to which words are to be added. You must make the request with the service credentials of the model's owner.
Content_Type header string The type of the input, application/json.
body body object A JSON Words object that provides information about one or more custom words.
Words
Name Description
words object[ ] An array of Word objects that provides information about each custom word that is to be added to the custom model.
Parameter Description
customization_id string The GUID of the custom language model to which words are to be added. You must make the request with the service credentials of the model's owner.
words object[ ] A JSON Words object that provides information about one or more custom words.
Word
Name Description
word string The custom word that is to be added to the custom model. Do not include spaces in the word. Use a - (dash) or _ (underscore) to connect the tokens of compound words.
sounds_like string[ ] An array of sounds-like pronunciations for the custom word. Specify how words that are difficult to pronounce, foreign words, acronyms, and so on can be pronounced by users.
  • For a word that is not in the service's base vocabulary, omit the parameter to have the service automatically generate a sounds-like pronunciation for the word.
  • For a word that is in the service's base vocabulary, use the parameter to specify additional pronunciations for the word. You cannot override the default pronunciation of a word; pronunciations you add augment the pronunciation from the base vocabulary.
A word can have at most five sounds-like pronunciations, and a pronunciation can include at most 40 characters not including spaces.
display_as string An alternative spelling for the custom word when it appears in a transcript. Use the parameter when you want the word to have a spelling that is different from its usual representation or from its spelling in corpora training data.

Example request


curl -X POST -u "{username}":"{password}"
--header "Content-Type: application/json"
--data "{\"words\":
  [{\"word\": \"HHonors\", \"sounds_like\": [\"hilton honors\", \"h honors\"], \"display_as\": \"HHonors\"},
  {\"word\": \"IEEE\", \"sounds_like\": [\"i triple e\"]}]}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/words"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}',
  words: [
    {word: 'HHonors', 'sounds_like': ['hilton honors', 'h honors'], 'display_as': 'HHonors'},
    {word: 'IEEE', 'sounds_like': ['i triple e']}
  ]
};

speech_to_text.addWords(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(models, null, 2));
});

Response

An empty response body: {}.

Response codes

Status Description
201 Created Addition of the custom words was successfully started. The service is analyzing the data.
400 Bad Request A required parameter is null or invalid, or the maximum number of sounds-like pronunciations for a word is exceeded. Specific failure messages include:
  • Malformed GUID: 'customization_id'
  • Required property 'property' is missing in JSON input-JSON
  • Word 'word' contains invalid character character
  • Maximum number of sounds-like for a word exceeded
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requester:
  • Invalid customization_id 'customization_id' for user
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization 'customization_id' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{}

Add a custom word

Adds a custom word to a custom language model. The service populates the words resource for a custom model with out-of-vocabulary (OOV) words found in each corpus added to the model. You can use this method to add additional words or to modify existing words in the words resource. Only the owner of a custom model can use this method to add or modify a custom word for the model. Adding or modifying a custom word does not affect the custom model until you train the model for the new data by using the Train a custom model method.

Use the word_name path parameter to specify the custom word that is to be added or modified. Use the WordDefinition object to provide one or both of the following optional parameters for the word:

Use the word parameter to specify the custom word that is to be added or modified. Use one or both of the following optional parameters to provide information about the word:

  • The sounds_like parameter provides an array of one or more pronunciations for the word. Use the parameter to specify how the word can be pronounced by users. Use the parameter for words that are difficult to pronounce, foreign words, acronyms, and so on. For example, you might specify that the word IEEE can sound like i triple e. You can specify a maximum of five sounds-like pronunciations for a word. For information about pronunciation rules, see Using the sounds_like field.

  • The display_as parameter provides a different way of spelling the word in a transcript. Use the parameter when you want the word to appear different from its usual representation or from its spelling in corpora training data. For example, you might indicate that the word IBM(trademark) is to be displayed as IBM™. For more information, see Using the display_as field.

If you add a custom word that already exists in the words resource for the custom model, the new definition overrides the existing data for the word. If the service encounters an error, it does not add the word to the words resource. Use the List a custom word method to review the word that you add.

Supported but not yet documented. For more information, see Watson Developer Cloud Java SDK.


PUT /v1/customizations/{customization_id}/words/{word_name}

addWord(params, callback)

Request

Parameter Type Description
customization_id path string The GUID of the custom language model to which a word is to be added. You must make the request with the service credentials of the model's owner.
word_name path string The custom word that is to be added to the custom model. Do not include spaces in the word. Use a - (dash) or _ (underscore) to connect the tokens of compound words.
Content_Type header string The type of the input, application/json.
body body object A JSON WordDefinition object that provides information about the custom word. Specify an empty JSON object to add a word with no sounds-like or display-as information.
WordDefinition
Name Description
sounds_like string[ ] An array of sounds-like pronunciations for the custom word. Specify how words that are difficult to pronounce, foreign words, acronyms, and so on can be pronounced by users.
  • For a word that is not in the service's base vocabulary, omit the parameter to have the service automatically generate a sounds-like pronunciation for the word.
  • For a word that is in the service's base vocabulary, use the parameter to specify additional pronunciations for the word. You cannot override the default pronunciation of a word; pronunciations you add augment the pronunciation from the base vocabulary.
A word can have at most five sounds-like pronunciations, and a pronunciation can include at most 40 characters not including spaces.
display_as string An alternative spelling for the custom word when it appears in a transcript. Use the parameter when you want the word to have a spelling that is different from its usual representation or from its spelling in corpora training data.
Parameter Description
customization_id string The GUID of the custom language model to which a word is to be added. You must make the request with the service credentials of the model's owner.
word string The custom word that is to be added to the custom model. Do not include spaces in the word. Use a - (dash) or _ (underscore) to connect the tokens of compound words.
sounds_like string[ ] An array of sounds-like pronunciations for the custom word. Specify how words that are difficult to pronounce, foreign words, acronyms, and so on can be pronounced by users.
  • For a word that is not in the service's base vocabulary, omit the parameter to have the service automatically generate a sounds-like pronunciation for the word.
  • For a word that is in the service's base vocabulary, use the parameter to specify additional pronunciations for the word. You cannot override the default pronunciation of a word; pronunciations you add augment the pronunciation from the base vocabulary.
A word can have at most five sounds-like pronunciations, and a pronunciation can include at most 40 characters not including spaces.
display_as string An alternative spelling for the custom word when it appears in a transcript. Use the parameter when you want the word to have a spelling that is different from its usual representation or from its spelling in corpora training data.

Example request


curl -X PUT -u "{username}":"{password}"
--header "Content-Type: application/json"
--data "{\"sounds_like\": [\"N. C. A. A.\", \"N. C. double A.\"], \"display_as\": \"NCAA\"}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/words/NCAA"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}',
  word: 'NCAA',
  'sounds_like': ['N. C. A. A.', 'N. C. double A.'],
  'display_as': 'NCAA'
};

speech_to_text.addWord(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(models, null, 2));
});

Response

An empty response body: {}.

Response codes

Status Description
201 Created The custom word was successfully added to the custom model.
400 Bad Request The specified customization ID is invalid, or the maximum number of sounds-like pronunciations for a word is exceeded. Specific failure messages include:
  • Malformed GUID: 'customization_id'
  • Maximum number of sounds-like for a word exceeded
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requester:
  • Invalid customization_id 'customization_id' for user
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization 'customization_id' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{}

List custom words

Lists information about custom words from a custom language model. You can list all words from the custom model's words resource, only custom words that were added or modified by the user, or only OOV words that were extracted from corpora. You can also indicate the order in which the service is to return words; by default, words are listed in ascending alphabetical order. Words are listed in ascending alphabetical order. Only the owner of a custom model can use this method to query the words from the model.

Supported but not yet documented. For more information, see Watson Developer Cloud Java SDK.


GET /v1/customizations/{customization_id}/words

getWords(params, callback)

Request

Parameter Type Description
customization_id path string The GUID of the custom language model from which words are to be queried. You must make the request with the service credentials of the model's owner.
word_type query string The type of words to be listed from the custom language model's words resource:
  • all (the default) shows all words.
  • user shows only custom words that were added or modified by the user.
  • corpora shows only OOV that were extracted from corpora.
sort query string Indicates the order in which the words are to be listed. The parameter accepts one of two arguments, alphabetical or count, to indicate how the words are to be sorted. You can prepend an optional + or - to an argument to indicate whether the results are to be sorted in ascending or descending order.
  • alphabetical and +alphabetical list the words in ascending alphabetical order; this is the default ordering if you omit the sort parameter.
  • -alphabetical lists the words in descending alphabetical order.
  • count and -count list the words in descending order by the values of their count fields.
  • +count list the words in ascending order by the values of their count fields.
For alphabetical ordering, the lexicographical precedence is numeric values, uppercase letters, and lowercase letters. For count ordering, values with the same count are not ordered. With cURL, URL encode the + symbol as %2B.
Parameter Description
customization_id string The GUID of the custom language model from which words are to be queried. You must make the request with the service credentials of the model's owner.
word_type string The type of words to be listed from the custom language model's words resource:
  • all (the default) shows all words.
  • user shows only custom words that were added or modified by the user.
  • corpora shows only OOV that were extracted from corpora.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/words?sort=%2Balphabetical"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}'
};

speech_to_text.getWords(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(models, null, 2));
});

Response

WordsList
Name Description
words object[ ] An array of WordData objects that provides information about each word in the custom model's words resource. The array is empty if the custom model has no words.
WordData
Name Description
word string A custom word from the custom model. The spelling of the word is used to train the model.
sounds_like string[ ] An array of pronunciations for the custom word. The array can include the sounds-like pronunciation automatically generated by the service if none is provided for the word; the service adds this pronunciation when it finishes pre-processing the word.
display_as string The spelling of the custom word that the service uses to display the word in a transcript. The field contains an empty string if no display-as value is provided for the word, in which case the word is displayed as it is spelled.
source string[ ] An array of sources that describes how the word was added to the custom model's words resource. For OOV words added from a corpus, includes the name of the corpus; if the word was added by multiple corpora, the names of all corpora are listed. If the word was modified or added by the user directly, the field includes the string user.
count integer A sum of the number of times the word is found across all corpora. For example, if the word occurs five times in one corpus and seven times in another, its count is 12. If you add a custom word to a model before it is added by any corpora, the count begins at 1; if the word is added from a corpus first and later modified, the count reflects only the number of times it is found in corpora.

Note: For custom models created prior to the existence of the count field, the field always remains at 0. To update the field for such models, add the model's corpora again and include the allow_overwrite parameter; see Add a corpus.
error object[ ] If the service discovered one or more problems with the custom word's definition that you need to correct, an array of WordError objects that describes each of the errors.
WordError
Name Description
{element} string A key-value pair that describes an error associated with the word's definition in the format "element": "message", where element is the aspect of the definition that caused the problem and message describes the problem. The following example describes a problem with one of the word's sounds-like definitions:
  • "sounds_like_string": "Numbers are not allowed in sounds-like. You can try for example 'suggested_string'."
You must correct the error before you can train the model.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The specified customization ID is invalid:
  • Malformed GUID: 'customization_id'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requester:
  • Invalid customization_id 'customization_id' for user
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{
  "words": [
    {
      "word": "75.00",
      "sounds_like": ["75 dollars"],
      "display_as": "75.00",
      "count": 1,
      "source": ["user"],
      "error": [{"75 dollars": "Numbers are not allowed in sounds_like. You can try for example 'seventy five dollars'."}]
    },
    {
      "word": "HHonors",
      "sounds_like": ["hilton honors","h honors"],
      "display_as": "HHonors",
      "count": 1,
      "source": ["corpus1"]
    },
    {
      "word": "IEEE",
      "sounds_like": ["i triple e"],
      "display_as": "IEEE",
      "count": 3,
      "source": ["corpus1","corpus2"]
    },
    {
      "word": "tomato",
      "sounds_like": ["tomatoh","tomayto"],
      "display_as": "tomato",
      "count": 1,
      "source": ["user"]
    }
  ]
}

List a custom word

Lists information about a custom word from a custom language model. Only the owner of a custom model can use this method to query a word from the model.

Supported but not yet documented. For more information, see Watson Developer Cloud Java SDK.


GET /v1/customizations/{customization_id}/words/{word_name}

getWord(params, callback)

Request

Parameter Type Description
customization_id path string The GUID of the custom language model from which a word is to be queried. You must make the request with the service credentials of the model's owner.
word_name path string The custom word that is to be queried from the custom model.
Parameter Description
customization_id string The GUID of the custom language model from which a word is to be queried. You must make the request with the service credentials of the model's owner.
word string The custom word that is to be queried from the custom model.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/words/NCAA"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}',
  word: 'NCAA'
};

speech_to_text.getWord(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(models, null, 2));
});

Response

The method returns a single instance of a WordData object that provides information about the specified word.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The specified customization ID or word is invalid, including the case where the word does not exist for the custom model. Specific failure messages include:
  • Malformed GUID: 'customization_id'
  • Invalid value for word 'word'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requester:
  • Invalid customization_id 'customization_id' for user
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{
  "word": "NCAA",
  "sounds_like": ["N. C. A. A.","N. C. double A."],
  "display_as": "NCAA",
  "count": 1,
  "source": ["corpus3","user"]
}

Delete a custom word

Deletes a custom word from a custom language model. You can remove any word that you added to the custom model's words resource via any means. However, if the word also exists in the service's base vocabulary, the service removes only the custom pronunciation for the word; the word remains in the base vocabulary.

Removing a custom word does not affect the custom model until you train the model with the Train a custom model method. Only the owner of a custom model can use this method to delete a word from the model.

Supported but not yet documented. For more information, see Watson Developer Cloud Java SDK.


DELETE /v1/customizations/{customization_id}/words/{word_name}

deleteWord(params, callback)

Request

Parameter Type Description
customization_id path string The GUID of the custom language model from which a word is to be deleted. You must make the request with the service credentials of the model's owner.
word_name path string The custom word that is to be deleted from the custom model.
Parameter Description
customization_id string The GUID of the custom language model from which a word is to be deleted. You must make the request with the service credentials of the model's owner.
word string The custom word that is to be deleted from the custom model.

Example request


curl -X DELETE -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/words/NCAA"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}',
  word: 'NCAA'
};

speech_to_text.deleteWord(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(models, null, 2));
});

Response

An empty response body: {}.

Response codes

Status Description
200 OK The custom word was successfully deleted from the custom model.
400 Bad Request The specified customization ID or word is invalid, including the case where the word does not exist for the custom model. Specific failure messages include:
  • Malformed GUID: 'customization_id'
  • Invalid value for word 'word'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requester:
  • Invalid customization_id 'customization_id' for user
405 Method Not Allowed No word name was specified with the request.
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization 'customization_id' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{}