Curl Node Java

Speech to Text

API Reference
IBM Speech to Text API reference

Introduction

The IBM® Speech to Text service provides an API that enables you to add IBM's speech recognition capabilities to your applications. The service transcribes speech from various languages and audio formats to text with low latency. For most languages, the service supports two sampling rates, broadband and narrowband. The service returns all JSON response content in the UTF-8 character set.

The Speech to Text API consists of the following groups of related calls:

  • Models includes methods that return information about the language models that are available for speech recognition.

  • WebSockets includes a single method that establishes a persistent connection with the service over the WebSocket protocol.

  • Sessionless includes a method that provides a simple means of transcribing audio without the overhead of establishing and maintaining a session.

  • Sessions provides methods that allow a client to maintain a long, multi-turn exchange, or session, with the service or to establish multiple parallel conversations with a particular instance of the service.

  • Asynchronous provides a non-blocking interface for transcribing audio. You can register a callback URL to be notified of job status and, optionally, results, or you can poll the service to learn job status and retrieve results manually.

  • Custom language models provides an interface for creating and managing custom language models. The interface lets you expand the vocabulary of a base model with domain-specific terminology.

  • Custom corpora provides an interface for managing the corpora associated with a custom language model. You add corpora to extract out-of-vocabulary (OOV) words from the corpora into the custom language model's vocabulary. You can add, list, and delete corpora from a custom language model.

  • Custom words provides an interface for managing individual words in a custom language model. You can add, modify, list, and delete words from a custom language model.

  • Custom acoustic models provides an interface for creating and managing custom acoustic models. The interface lets you adapt a base model for the audio characteristics of your environment and speakers.

  • Custom audio resources provides an interface for managing the audio resources associated with a custom acoustic model. You add audio resources that closely match the acoustic characteristics of the audio that you want to transcribe. You can add, list, and delete audio resources from a custom acoustic model.

The X-Watson-Metadata header allows you to associate a customer ID with data that is passed with a request. For more information, see Information security.

Usage guidelines for customization

The following information pertains to methods of the customization interface:

  • In all cases, you must use service credentials created for the instance of the service that owns a custom model to use the methods described in this documentation with that model. For more information, see Ownership of custom models.

  • How the service handles request logging for the customization interface depends on the request. The service does not log data that are used to build custom models. But it does log data when a custom model is used with a recognition request. For more information, see Request logging and data privacy.

  • Each custom model is identified by a customization ID, which is a Globally Unique Identifier (GUID). A GUID is a hexadecimal string that has the same format as Watson service credentials: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx. You specify a custom model's GUID with the appropriate customization parameter of methods that support customization.

For more information about using both custom language models and custom acoustic models, see The customization interface.

HTTP API endpoint


https://stream.watsonplatform.net/speech-to-text/api

WebSocket API endpoint


wss://stream.watsonplatform.net/speech-to-text/api

Important: If you have IBM® Cloud Dedicated, these might not be your endpoints. Check your endpoint URLs on the Service credentials page for your instance of the Speech to Text service.

The code examples on this tab use the client library that is provided for Node.js.

GitHub

https://github.com/watson-developer-cloud/node-sdk

Node Package Manager


npm install watson-developer-cloud

The code examples on this tab use the client-side library that is provided for Java.

GitHub

https://github.com/watson-developer-cloud/java-sdk

Maven


<dependency>
  <groupId>com.ibm.watson.developer_cloud</groupId>
  <artifactId>java-sdk</artifactId>
  <version>4.1.0</version>
</dependency>

Gradle


compile 'com.ibm.watson.developer_cloud:java-sdk:4.1.0'

Synchronous and asynchronous requests

The Java SDK supports both synchronous (blocking) and asynchronous (non-blocking) execution of all methods. All methods are called with the Java ServiceCall interface.

  • To call a method synchronously, use the execute method of the ServiceCall interface. You can also call the execute method directly from an instance of the service, as shown in Get models. Note that the method can return an unchecked RuntimeException.

  • To call a method asynchronously, use the enqueue method of the ServiceCall interface to receive a callback when the response arrives. The ServiceCallback interface of the method's argument provides onResponse and onFailure methods that you override to handle the callback.

Example synchronous request


ServiceCall call = service.getModels();
List<SpeechModel> models = call.execute();

Example asynchronous request


ServiceCall call = service.getModels();
call.enqueue(new ServiceCallback<List<SpeechModel>>() {
  @Override public void onResponse(List<SpeechModel> models) {
    . . .
  }
  @Override public void onFailure(Exception e) {
    . . .
  }
});

More information

An interactive tool for testing calls to the API and viewing live responses from the service is available in the Speech to Text API explorer. Descriptions of Node classes referred to in this reference are available in the Node documentation for the Watson Developer Cloud Node.js SDK. Descriptions of Java classes referred to in this reference are available in the Javadoc for the Watson Developer Cloud Java SDK. Detailed information about using the service is available at About Speech to Text.

Authentication

You authenticate to the Speech to Text API by providing the username and password of the service credentials for the instance of the service that you want to use. The API uses HTTP basic authentication. For information about creating a service instance and obtaining service credentials, see Service credentials for Watson services.

Applications can also use tokens to establish authenticated communications with Watson services without embedding their service credentials in every call. You write an authentication proxy in IBM Cloud to obtain a token for your client application, which can then use the token to call the service directly. You use your service credentials to obtain a token for that service. For more information, see Tokens for authentication.

Replace {username} and {password} with your service credentials. Use either of the two constructors shown.


curl -u "{username}:{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/{method}"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

SpeechToText service = new SpeechToText("{username}", "{password}");

Request logging

By default, all Watson services log requests and their results. Logging is done only to improve the services for future users. The logged data is not shared or made public. To prevent IBM from accessing your data for general service improvements, set the X-Watson-Learning-Opt-Out request header to true for all requests. (Any value other than false or 0 disables request logging for that call.) You must set the header on each request that you do not want IBM to access for general service improvements. set the X-Watson-Learning-Opt-Out request header to true when you create the instance. (Any value other than false or 0 disables request logging for that instance.) You must set the header when you create the instance whose calls you do not want IBM to access for general service improvements.

Request logging for the customization interface

The service does not log data (corpora, words, and audio resources) that are used to build custom models; your training data is never used to improve the service's base models. The service does log data when a custom model is used with a recognition request; you must set the X-Watson-Learning-Opt-Out request header to prevent logging for recognition requests. For more information, see Request logging and data privacy.


curl -u "{username}":"{password}"
--header "X-Watson-Learning-Opt-Out: true"
"https://stream.watsonplatform.net/speech-to-text/api/v1/{method}"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1({
  username: '{username}',
  password: '{password}',
  headers: {
    'X-Watson-Learning-Opt-Out': 'true'
  }
});

Map<String, String> headers = new HashMap<String, String>();
headers.put("X-Watson-Learning-Opt-Out", "true");

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");
service.setDefaultHeaders(headers);

Response handling

The Speech to Text service uses standard HTTP response codes to indicate whether a method completed successfully. A 200-level response always indicates success. A 300-level response indicates the requested resource has not been modified. A 400-level response indicates some sort of input failure. And a 500-level response typically indicates an internal system error. Response codes are listed with the individual calls.

Response codes that indicate success are not readily available with the Node.js SDK. In general, the lack of an error response indicates a 200-level success response. For errors, response codes are indicated in the error object that is returned.

The Java SDK raises equivalent exceptions, which are listed with the individual methods. The exceptions include the error message returned by the service. All methods that accept an argument can throw the following exception.

Exception Description
IllegalArgumentException An illegal argument was passed to the method.

Error format

Name Description
error string Description of the error.
code integer HTTP status code.
code_description string Response message that describes the problem.
warnings string[ ] Warnings associated with the error.
Name Description
Exception string The name of the exception that was raised.
status integer The HTTP status code.
error string A description of the error.

Example error


{
  "error": "Model en-US_Broadband not found",
  "code": 404,
  "code_description": "No Such Resource"
}

{
  Error: Model en-US_Model not found
  . . .
  code: 404,
  error: 'Model en-US_Model not found',
  code_description: 'No Such Resource'
}

SEVERE: GET https://stream.watsonplatform.net/speech-to-text/api/v1/models/en-US_Broadband, status: 404, error: Model en-US_Broadband not found
Exception in thread "main" com.ibm.watson.developer_cloud.service.exception.NotFoundException: Model en-US_Broadband not found
   . . .

Models

Get models

Retrieves a list of all language models that are available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.


GET /v1/models

listModels(params, callback())

ServiceCall<List<SpeechModel>> getModels()

Request

No arguments.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/models"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

speech_to_text.listModels(null, function(error, models) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(models, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

List<SpeechModel> models = service.getModels().execute();
System.out.println(models);

Response

SpeechModels
Name Description
models object[ ] An array of SpeechModel objects that provides information about the available models.

Returns a List of Java SpeechModel objects. Each object provides the same information as a JSON SpeechModel object.

SpeechModel (Java SpeechModel object)
Name Description
name string The name of the model for use as an identifier in calls to the service (for example, en-US_BroadbandModel).
language string The language identifier for the model (for example, en-US).
rate integer The sampling rate (minimum acceptable rate for audio) used by the model in Hertz.
url string The URI of the model.
description string A brief description of the model.
sessions string The URI for the model for use with the /v1/sessions method. (Returned only for requests for a single model; see Get a model.)
supported_features object A SupportedFeatures object that describes the additional service features supported with the model.
SupportedFeatures (Java SpeechModel.SupportedFeatures object)
Name Description
custom_language_model boolean Indicates whether the customization interface can be used to create a custom language model based on the model.
speaker_labels boolean Indicates whether the speaker_labels parameter can be used with the language model.

Response codes

Status Description
200 OK The request succeeded.
406 Not Acceptable The request specified an Accept header with an incompatible content type.
415 Unsupported Media Type The request specified an unacceptable media type.

Exceptions thrown

Exception Description
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
ForbiddenException The request specified an Accept header with an incompatible content type. (HTTP response code 406.)
UnsupportedException The request specified an unacceptable media type. (HTTP response code 415.)

Example response


{
  "models": [
    {
      "name": "fr-FR_BroadbandModel",
      "language": "fr-FR",
      "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/fr-FR_BroadbandModel",
      "rate": 16000,
      "supported_features": {
        "custom_language_model": false,
        "speaker_labels": false
      },
      "description": "French broadband model."
    },
    {
      "name": "en-US_NarrowbandModel",
      "language": "en-US",
      "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/en-US_NarrowbandModel",
      "rate": 8000,
      "supported_features": {
        "custom_language_model": true,
        "speaker_labels": true
      },
      "description": "US English narrowband model."
    },
    . . .
  ]
}

[
  {
    "name": "fr-FR_BroadbandModel",
    "language": "fr-FR",
    "rate": 16000,
    "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/fr-FR_BroadbandModel",
    "description": "French broadband model.",
    "supported_features": {
      "custom_language_model": false,
      "speaker_labels": false
    }
  },
  {
    "name": "en-US_NarrowbandModel",
    "language": "en-US",
    "rate": 8000,
    "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/en-US_NarrowbandModel",
    "description": "US English narrowband model.",
    "supported_features": {
      "custom_language_model": true,
      "speaker_labels": true
    }
  },
  . . .
]

Get a model

Retrieves information about a single specified language model that is available for use with the service. The information includes the name of the model and its minimum sampling rate in Hertz, among other things.


GET /v1/models/{model_id}

getModel(params, callback())

ServiceCall<SpeechModel> getModel(String modelName)

Request

Parameter Description
model_id path string The identifier of the desired model:
  • ar-AR_BroadbandModel
  • en-GB_BroadbandModel
  • en-GB_NarrowbandModel
  • en-US_BroadbandModel
  • en-US_NarrowbandModel
  • es-ES_BroadbandModel
  • es-ES_NarrowbandModel
  • fr-FR_BroadbandModel
  • ja-JP_BroadbandModel
  • ja-JP_NarrowbandModel
  • ko-KR_BroadbandModel
  • ko-KR_NarrowbandModel
  • pt-BR_BroadbandModel
  • pt-BR_NarrowbandModel
  • zh-CN_BroadbandModel
  • zh-CN_NarrowbandModel
Parameter Description
model_id modelName string The identifier of the desired model:
  • ar-AR_BroadbandModel
  • en-GB_BroadbandModel
  • en-GB_NarrowbandModel
  • en-US_BroadbandModel
  • en-US_NarrowbandModel
  • es-ES_BroadbandModel
  • es-ES_NarrowbandModel
  • fr-FR_BroadbandModel
  • ja-JP_BroadbandModel
  • ja-JP_NarrowbandModel
  • ko-KR_BroadbandModel
  • ko-KR_NarrowbandModel
  • pt-BR_BroadbandModel
  • pt-BR_NarrowbandModel
  • zh-CN_BroadbandModel
  • zh-CN_NarrowbandModel

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/models/en-US_BroadbandModel"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  model_id: 'en-US_BroadbandModel'
};

speech_to_text.getModel(params, function(error, model) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(model, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

SpeechModel model = service.getModel("en-US_BroadbandModel").execute();
System.out.println(model);

Response

Returns a single instance of a SpeechModel object with results for the specified model.

Returns a single Java SpeechModel object for the specified model. The information is the same as that described for the JSON SpeechModel object.

Response codes

Status Description
200 OK The request succeeded.
404 Not Found The specified model_id was not found.
406 Not Acceptable The request specified an Accept header with an incompatible content type.
415 Unsupported Media Type The request specified an unacceptable media type.

Exceptions thrown

Exception Description
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
NotFoundException The specified modelName was not found. (HTTP response code 404.)
ForbiddenException The request specified an Accept header with an incompatible content type. (HTTP response code 406.)
UnsupportedException The request specified an unacceptable media type. (HTTP response code 415.)

Example response


{
  "name": "en-US_BroadbandModel",
  "language": "en-US",
  "rate": 16000,
  "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/en-US_BroadbandModel",
  "description": "US English broadband model.",
  "sessions": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions?model=en-US_BroadbandModel",
  "supported_features": {
    "custom_language_model": true,
    "speaker_labels": true
  }
}

WebSockets

Recognize audio

Sends audio and returns transcription results for recognition requests over a WebSocket connection. Requests and responses are enabled over a single TCP connection that abstracts much of the complexity of the request to offer efficient implementation, low latency, high throughput, and an asynchronous response. By default, only final results are returned for any request; to enable interim results, set the interim_results interimResults parameter to true.

The service imposes a data size limit of 100 MB per utterance (per recognition request). You can send multiple utterances over a single WebSocket connection. The service automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. (For the audio/l16 format, you can specify the endianness.)

For complete documentation about submitting requests with the WebSocket interface, see The WebSocket interface.


/v1/recognize

RecognizeStream createRecognizeStream(params)

void recognizeUsingWebSocket(InputStream audio, RecognizeOptions options,
  RecognizeCallback callback)

Request

The client establishes a connection with the service by using the WebSocket constructor to create an instance of a WebSocket connection object. The constructor sets the following basic parameters for the connection and for all recognition requests sent over it.

Parameter Description
X-Watson-Authorization-Token header string Provides an authentication token for the service. The token is used instead of service credentials. You must pass a valid token via either this header or the watson-token query parameter. For more information, see Authentication.
watson-token query string Provides an authentication token for the service. The token is used instead of service credentials. You must pass a valid token via either this query parameter or the X-Watson-Authorization-Token header. For more information, see Authentication.
model query string The identifier of the model that is to be used for all recognition requests sent over the connection:
  • ar-AR_BroadbandModel
  • en-GB_BroadbandModel
  • en-GB_NarrowbandModel
  • en-US_BroadbandModel (the default)
  • en-US_NarrowbandModel
  • es-ES_BroadbandModel
  • es-ES_NarrowbandModel
  • fr-FR_BroadbandModel
  • ja-JP_BroadbandModel
  • ja-JP_NarrowbandModel
  • ko-KR_BroadbandModel
  • ko-KR_NarrowbandModel
  • pt-BR_BroadbandModel
  • pt-BR_NarrowbandModel
  • zh-CN_BroadbandModel
  • zh-CN_NarrowbandModel
customization_id query string The GUID of a custom language model that is to be used for all requests sent over the connection. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom language model is used.
acoustic_customization_id query string The GUID of a custom acoustic model that is to be used for all requests sent over the connection. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom acoustic model is used.
base_model_version query string The version of the specified base model that is to be used for all requests sent over the connection. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. For more information, see Base model version.
x-watson-learning-opt-out query boolean Indicates whether to opt out of data collection for requests sent over the connection. The default is false; data is collected for all requests and responses. You can also opt out of request logging by passing a value of true with the X-Watson-Learning-Opt-Out request header; see Request logging.

The client initiates and manages recognition requests by sending JSON-formatted text messages to the service over the connection. The client sends the audio data to be transcribed as a binary message (blob).

Parameter Description
action string The action to be performed:
  • start initiates a recognition request. The message must include the content-type parameter; it can also include any optional parameters described in this table. After sending this text message, the client sends the audio data as a binary message (blob).
    Between recognition requests, the client can send new start messages to modify the parameters that are to be used for subsequent requests. By default, the service continues to use the parameters specified with the previous start message.
  • stop indicates that all audio data for the request has been sent to the service.
content-type string The audio format (MIME type) of the audio:
  • audio/basic (Use only with narrowband models.)
  • audio/flac
  • audio/l16 (Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
  • audio/mp3
  • audio/mpeg
  • audio/mulaw (Specify the sampling rate of the audio.)
  • audio/ogg (The service automatically detects the codec of the input audio.)
  • audio/ogg;codecs=opus
  • audio/ogg;codecs=vorbis
  • audio/wav (Provide audio with a maximum of nine channels.)
  • audio/webm (The service automatically detects the codec of the input audio.)
  • audio/webm;codecs=opus
  • audio/webm;codecs=vorbis
For information about the supported audio formats, including specifying the sampling rate, channels, and endianness for the indicated formats, see Audio formats. The information includes links to a number of Internet sites that provide technical and usage details about the different formats.
customization_weight double If you specify a customization ID when you open the connection, you can use the customization weight to tell the service how much weight to give to words from the custom language model compared to those from the base model for the current request.

Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.

The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.
inactivity_timeout integer The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed. The default is 30 seconds. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity.
interim_results boolean Indicates whether the service is to return interim results. If true, interim results are returned as a stream of JSON SpeechRecognitionResults objects. If false (the default), the response is a single SpeechRecognitionResults object with final results only.
keywords string[ ] A list of keywords to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. If you specify any keywords, you must also specify a keywords threshold. You can spot a maximum of 1000 keywords. Omit the parameter or specify an empty array if you do not need to spot keywords.
keywords_threshold float A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords.
max_alternatives integer The maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
word_alternatives_threshold float A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.
word_confidence boolean Indicates whether a confidence measure in the range of 0 to 1 is returned for each word. The default is false; no word confidence measures are returned.
timestamps boolean Indicates whether time alignment is returned for each word. The default is false; no timestamps are returned.
profanity_filter boolean Indicates whether profanity filtering is performed on the transcript. If true (the default), the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. If false, the service returns results with no censoring. Applies to US English transcription only.
smart_formatting boolean Indicates whether dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses are to be converted into more readable, conventional representations in the final transcript of a recognition request. For US English, also converts certain keyword strings to punctuation symbols. The default is false; no smart formatting is performed. Applies to US English and Spanish transcription only.
speaker_labels boolean Indicates whether labels that identify which words were spoken by which participants in a multi-person exchange are to be included in the response. The default is false; no speaker labels are returned. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter.

To determine whether a language model supports speaker labels, use the Get models method and check that the attribute speaker_labels is set to true. You can also refer to Speaker labels.

Pass the audio stream to be transcribed via the method's audio argument, and pass a Java BaseRecognizeCallback object to handle events from the WebSocket connection via the callback argument. Pass all other parameters for the recognition request as a Java RecognizeOptions object via the options argument.

Parameter Description
audio object An InputStream object that passes the audio to be transcribed in the format specified by the contentType parameter.
callback object A Java BaseRecognizeCallback object that implements the RecognizeCallback interface to handle events from the WebSocket connection. Override the definitions of the object's default methods to respond to events as needed by your application.
content_type contentType string The audio format (MIME type) of the audio:
  • audio/basic (Use only with narrowband models.)
  • audio/flac
  • audio/l16 (Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
  • audio/mp3
  • audio/mpeg
  • audio/mulaw (Specify the sampling rate of the audio.)
  • audio/ogg (The service automatically detects the codec of the input audio.)
  • audio/ogg;codecs=opus
  • audio/ogg;codecs=vorbis
  • audio/wav (Provide audio with a maximum of nine channels.)
  • audio/webm (The service automatically detects the codec of the input audio.)
  • audio/webm;codecs=opus
  • audio/webm;codecs=vorbis
For information about the supported audio formats, including specifying the sampling rate, channels, and endianness for the indicated formats, see Audio formats. The information includes links to a number of Internet sites that provide technical and usage details about the different formats.
model string The identifier of the model that is to be used for the recognition request:
  • ar-AR_BroadbandModel
  • en-GB_BroadbandModel
  • en-GB_NarrowbandModel
  • en-US_BroadbandModel (the default)
  • en-US_NarrowbandModel
  • es-ES_BroadbandModel
  • es-ES_NarrowbandModel
  • fr-FR_BroadbandModel
  • ja-JP_BroadbandModel
  • ja-JP_NarrowbandModel
  • ko-KR_BroadbandModel
  • ko-KR_NarrowbandModel
  • pt-BR_BroadbandModel
  • pt-BR_NarrowbandModel
  • zh-CN_BroadbandModel
  • zh-CN_NarrowbandModel
customization_id customizationId string The GUID of a custom language model that is to be used for all requests sent over the connection. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom language model is used.
acoustic_customization_id string The GUID of a custom acoustic model that is to be used for all requests sent over the connection. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom acoustic model is used.
customization_weight customizationWeight double If you specify a customization ID with the request, you can use the customization weight to tell the service how much weight to give to words from the custom language model compared to those from the base model for the request.

Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.

The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.
inactivity_timeout inactivityTimeout integer The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed. The default is 30 seconds. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. The Node SDK uses a default value of 600.
interim_results interimResults boolean Indicates whether the service is to return interim results. If true, interim results are returned as a stream of JSON SpeechRecognitionResults objects. If false (the default), the response is a single SpeechRecognitionResults object with final results only. The Node SDK uses a default value of true.
keywords string[ ] A list of keywords to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. If you specify any keywords, you must also specify a keywords threshold. You can spot a maximum of 1000 keywords. Omit the parameter or specify an empty array if you do not need to spot keywords.
keywords_threshold keywordsThreshold float A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords.
max_alternatives maxAlternatives integer The maximum number of alternative transcripts to be returned. By default, a single transcription is returned. The Node SDK uses a default value of 3.
word_alternatives_threshold wordAlternativesThreshold float A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.
word_confidence wordConfidence boolean Indicates whether a confidence measure in the range of 0 to 1 is returned for each word. The default is false; no word confidence measures are returned. The Node SDK uses a default value of true.
timestamps boolean Indicates whether time alignment is returned for each word. The default is false; no timestamps are returned. The Node SDK uses a default value of true.
profanity_filter profanityFilter boolean Indicates whether profanity filtering is performed on the transcript. If true (the default), the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. If false, the service returns results with no censoring. Applies to US English transcription only.
smart_formatting smartFormatting boolean Indicates whether dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses are to be converted into more readable, conventional representations in the final transcript of a recognition request. For US English, also converts certain keyword strings to punctuation symbols. The default is false; no smart formatting is performed. Applies to US English and Spanish transcription only.
speaker_labels speakerLabels boolean Indicates whether labels that identify which words were spoken by which participants in a multi-person exchange are to be included in the response. The default is false; no speaker labels are returned. Setting speaker_labels speakerLabels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter.

To determine whether a language model supports speaker labels, use the Get models method and check that the attribute speaker_labels is set to true. You can also refer to Speaker labels.
X-Watson-Learning-Opt-Out boolean Indicates whether to opt out of data collection for the call. The default is false; data is collected for all requests and responses. See Request logging.
watson-token string Provides an authentication token for the service as an alternative to providing service credentials. For more information, see Authentication.

Example request


var token = "{authentication-token}";
var wsURI = "wss://stream.watsonplatform.net/speech-to-text/api/v1/recognize"
  + "?watson-token=" + token + '&model=en-US_BroadbandModel';

var websocket = new WebSocket(wsURI);
websocket.onopen = function(evt) { onOpen(evt) };
websocket.onclose = function(evt) { onClose(evt) };
websocket.onmessage = function(evt) { onMessage(evt) };
websocket.onerror = function(evt) { onError(evt) };

function onOpen(evt) {
  var message = {
    action: 'start',
    'content-type': 'audio/flac',
    'interim_results': true,
    'max-alternatives': 3,
    keywords: ['colorado', 'tornado', 'tornadoes'],
    'keywords_threshold': 0.5
  };
  websocket.send(JSON.stringify(message));

  // Prepare and send the audio file.
  websocket.send(blob);

  websocket.send(JSON.stringify({action: 'stop'}));
}

function onMessage(evt) {
  console.log(evt.data);
}

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var fs = require('fs');

var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  model: 'en-US_BroadbandModel',
  content_type: 'audio/flac',
  'interim_results': true,
  'max_alternatives': 3,
  'word_confidence': false,
  timestamps: false,
  keywords: ['colorado', 'tornado', 'tornadoes'],
  'keywords_threshold': 0.5
};

// Create the stream.
var recognizeStream = speech_to_text.createRecognizeStream(params);

// Pipe in the audio.
fs.createReadStream('audio-file.flac').pipe(recognizeStream);

// Pipe out the transcription to a file.
recognizeStream.pipe(fs.createWriteStream('transcription.txt'));

// Get strings instead of buffers from 'data' events.
recognizeStream.setEncoding('utf8');

// Listen for events.
recognizeStream.on('results', function(event) { onEvent('Results:', event); });
recognizeStream.on('data', function(event) { onEvent('Data:', event); });
recognizeStream.on('error', function(event) { onEvent('Error:', event); });
recognizeStream.on('close', function(event) { onEvent('Close:', event); });
recognizeStream.on('speaker_labels', function(event) { onEvent('Speaker_Labels:', event); });

// Displays events on the console.
function onEvent(name, event) {
  console.log(name, JSON.stringify(event, null, 2));
};

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

RecognizeOptions options = new RecognizeOptions.Builder()
  .model("en-US_BroadbandModel").contentType("audio/flac")
  .interimResults(true).maxAlternatives(3)
  .keywords(new String[]{"colorado", "tornado", "tornadoes"})
  .keywordsThreshold(0.5).build();

BaseRecognizeCallback callback = new BaseRecognizeCallback() {
  @Override
  public void onTranscription(SpeechResults speechResults) {
    System.out.println(speechResults);
  }

  @Override
  public void onDisconnected() {
    System.exit(0);
  }
};

try {
  service.recognizeUsingWebSocket
    (new FileInputStream("audio-file.flac"), options, callback);
}
catch (FileNotFoundException e) {
  e.printStackTrace();
}

Response

Successful recognition returns one or more instances of a SpeechRecognitionResults object depending on the input and the value of the interim_results parameter.

Returns a Java SpeechResults object that contains the results that are provided in a JSON SpeechRecognitionResults object. The response includes one or more instances of the object depending on the input and the value of the interimResults parameter.

Response handling

The WebSocket constructor returns an instance of a WebSocket connection object. You assign application-specific calls to the following methods of the object to handle events associated with the connection. Each event handler must accept a single argument for the event from the connection that causes it to execute.

Event Description
onopen Status of the connection's opening.
onmessage Response messages for the connection, including the results of the request as one or more JSON SpeechRecognitionResults objects.
onerror Errors for the connection or request.
onclose Status of the connection's closing.

The createRecognizeStream method returns a RecognizeStream object. You use the object's on method to define event handlers that capture the following events associated with the connection and the recognition request. For more information about handling stream events with Node.js, see the Node.js Documentation.

Event Description
results Interim and final results for the request as JSON SpeechRecognitionResults objects.
data Final transcription results for the request.
speaker_labels Speaker label results for the request as a JSON SpeakerLabelsResult object.
error Errors for the connection or request.
close Status of the connection's closing.

The callback parameter of the recognizeUsingWebSocket method accepts a Java object of type BaseRecognizeCallback, which implements the RecognizeCallback interface to handle events from the WebSocket connection. You override the definitions of the following default empty methods of the object to handle events associated with the connection and the recognition request. The methods are called when their associated events occur.

Method Description
void onConnected() The WebSocket connection is made.
void onListening() The service is listening for audio.
void onTranscription(SpeechResults speechResults) Results for the request are received from the service.
void onTranscriptionComplete() Final results for the request have been returned by the service.
void onError(Exception e) An error occurs in the WebSocket connection.
void onInactivityTimeout(RuntimeException runtimeException) An inactivity timeout occurs for the request.
void onDisconnected() The WebSocket connection is closed.

The connection can produce the following return codes.

Return code Description
1000 The connection closed normally.
1002 The service is closing the connection due to a protocol error.
1006 The connection was closed abnormally.
1009 The frame size exceeded the 4 MB limit.
1011 The service is terminating the connection because it encountered an unexpected condition that prevents it from fulfilling the request.

Example response


{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "several torn "
        }
      ],
      "final": false
    }
  ],
  "result_index": 0
}{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "several tornadoes touch down as "
        }
      ],
      "final": false
    }
  ],
  "result_index": 0
}{
  . . .
}{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
        }
      ],
      "final": false
    }
  ],
  "result_index": 0
}{
  "results": [
    {
      "keywords_result": {
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.94,
            "confidence": 0.913,
            "end_time": 5.62
          }
        ],
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 1.52,
            "confidence": 1.0,
            "end_time": 2.15
          }
        ]
      },
      "alternatives": [
        {
          "confidence": 0.891,
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touched down as a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touch down is a line of severe thunderstorms swept through Colorado on Sunday "
        }
      ],
        "final": true
    }
  ],
  "result_index": 0
}

Results: {
  "results": [
    {
      "alternatives": [
        {
          "transcript": "so "
        }
      ],
      "final": false
    }
  ],
  "result_index": 0
}
Results: {
  "results": [
    {
      "alternatives": [
        {
          "transcript": "several to "
        }
      ],
      "final": false
    }
  ],
  "result_index": 0
}
. . .
Results: {
  "results": [
    {
      "alternatives": [
        {
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado once "
        }
      ],
      "final": false
    }
  ],
  "result_index": 0
}
Results: {
  "results": [
    {
      "keywords_result": {
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 1.52,
            "confidence": 1,
            "end_time": 2.15
          }
        ],
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.94,
            "confidence": 0.913,
            "end_time": 5.62
          }
        ]
      },
      "alternatives": [
        {
          "confidence": 0.891,
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touched down as a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touch down is a line of severe thunderstorms swept through Colorado on Sunday "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}
Data: "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
Close: 1000

{
  "result_index": 0,
  "results": [
    {
      "final": false,
      "alternatives": [
        {
          "transcript": "so "
        }
      ]
    }
  ]
}
{
  "result_index": 0,
  "results": [
    {
      "final": false,
      "alternatives": [
        {
          "transcript": "several tornadoes to "
        }
      ]
    }
  ]
}
. . .
{
  "result_index": 0,
  "results": [
    {
      "final": false,
      "alternatives": [
        {
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado one son "
        }
      ]
    }
  ]
}
{
  "result_index": 0,
  "results": [
    {
      "final": true,
      "alternatives": [
        {
          "confidence": 0.891,
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touched down as a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touch down is a line of severe thunderstorms swept through Colorado on Sunday "
        }
      ],
      "keywords_result": {
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 1.52,
            "end_time": 2.15,
            "confidence": 1.0
          }
        ],
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.94,
            "end_time": 5.62,
            "confidence": 0.913
          }
        ]
      }
    }
  ]
}

Sessionless

Recognize audio

Sends audio and returns transcription results for a sessionless recognition request. Returns only the final transcription results; to enable interim results, use Sessions or WebSockets. Returns only the final transcription results; the interim_results parameter is a no-op when used with this method. To enable interim results, use WebSockets. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. (For the audio/l16 format, you can specify the endianness.)

You specify the parameters of the request as request headers and query parameters. You provide the audio as the body of the request. This method is preferred to the multipart approach for submitting a sessionless recognition request.

For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the connection (response code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the connection (response code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

This call is the same as the session-based recognize call, but this call omits the session_id sessionId parameter and includes the model parameter.


POST /v1/recognize

recognize(params, callback())

ServiceCall<SpeechResults> recognize(File audio)
ServiceCall<SpeechResults> recognize(File audio, RecognizeOptions options)

Request

Parameter Description
Content-Type header string The audio format (MIME type) of the audio:
  • audio/basic (Use only with narrowband models.)
  • audio/flac
  • audio/l16 (Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
  • audio/mp3
  • audio/mpeg
  • audio/mulaw (Specify the sampling rate of the audio.)
  • audio/ogg (The service automatically detects the codec of the input audio.)
  • audio/ogg;codecs=opus
  • audio/ogg;codecs=vorbis
  • audio/wav (Provide audio with a maximum of nine channels.)
  • audio/webm (The service automatically detects the codec of the input audio.)
  • audio/webm;codecs=opus
  • audio/webm;codecs=vorbis
For information about the supported audio formats, including specifying the sampling rate, channels, and endianness for the indicated formats, see Audio formats. The information includes links to a number of Internet sites that provide technical and usage details about the different formats.
Transfer-Encoding header string Must be set to chunked to send the audio in streaming mode. The data does not need to exist fully before being streamed to the service.
audio body stream The audio to be transcribed in the format specified by the Content-Type header. With cURL, include a separate --data-binary option for each file of the request.
model query string The identifier of the model that is to be used for the recognition request:
  • ar-AR_BroadbandModel
  • en-GB_BroadbandModel
  • en-GB_NarrowbandModel
  • en-US_BroadbandModel (the default)
  • en-US_NarrowbandModel
  • es-ES_BroadbandModel
  • es-ES_NarrowbandModel
  • fr-FR_BroadbandModel
  • ja-JP_BroadbandModel
  • ja-JP_NarrowbandModel
  • ko-KR_BroadbandModel
  • ko-KR_NarrowbandModel
  • pt-BR_BroadbandModel
  • pt-BR_NarrowbandModel
  • zh-CN_BroadbandModel
  • zh-CN_NarrowbandModel
customization_id query string The GUID of a custom language model that is to be used with the request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom language model is used.
acoustic_customization_id query string The GUID of a custom acoustic model that is to be used with the request. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom acoustic model is used.
base_model_version query string The version of the specified base model that is to be used with the request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. For more information, see Base model version.
customization_weight query double If you specify a customization ID with the request, you can use the customization weight to tell the service how much weight to give to words from the custom language model compared to those from the base model for the request.

Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.

The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.
inactivity_timeout query integer The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 response code. The default is 30 seconds. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity.
keywords query string[ ] A list of keywords to spot in the audio. Each keyword string can include one or more tokens. If you specify any keywords, you must also specify a keywords threshold. You can spot a maximum of 1000 keywords. Omit the parameter or specify an empty array if you do not need to spot keywords.
keywords_threshold query float A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords.
max_alternatives query integer The maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
word_alternatives_threshold query float A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.
word_confidence query boolean Indicates whether a confidence measure in the range of 0 to 1 is returned for each word. The default is false; no word confidence measures are returned.
timestamps query boolean Indicates whether time alignment is returned for each word. The default is false; no timestamps are returned.
profanity_filter query boolean Indicates whether profanity filtering is performed on the transcript. If true (the default), the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. If false, the service returns results with no censoring. Applies to US English transcription only.
smart_formatting query boolean Indicates whether dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses are to be converted into more readable, conventional representations in the final transcript of a recognition request. For US English, also converts certain keyword strings to punctuation symbols. The default is false; no smart formatting is performed. Applies to US English and Spanish transcription only.
speaker_labels query boolean Indicates whether labels that identify which words were spoken by which participants in a multi-person exchange are to be included in the response. The default is false; no speaker labels are returned. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter.

To determine whether a language model supports speaker labels, use the Get models method and check that the attribute speaker_labels is set to true. You can also refer to Speaker labels.

Pass the audio file to be transcribed via the method's audio argument. Pass all other parameters for the recognition request as a Java RecognizeOptions object via the options argument.

Parameter Description
audio stream File The audio to be transcribed in the format specified by the content_type contentType parameter.
content_type string contentType string The audio format (MIME type) of the audio:
  • audio/basic (Use only with narrowband models.)
  • audio/flac
  • audio/l16 (Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
  • audio/mp3
  • audio/mpeg
  • audio/mulaw (Specify the sampling rate of the audio.)
  • audio/ogg (The service automatically detects the codec of the input audio.)
  • audio/ogg;codecs=opus
  • audio/ogg;codecs=vorbis
  • audio/wav (Provide audio with a maximum of nine channels.)
  • audio/webm (The service automatically detects the codec of the input audio.)
  • audio/webm;codecs=opus
  • audio/webm;codecs=vorbis
For information about the supported audio formats, including specifying the sampling rate, channels, and endianness for the indicated formats, see Audio formats. The information includes links to a number of Internet sites that provide technical and usage details about the different formats.
model string The identifier of the model that is to be used for the recognition request:
  • ar-AR_BroadbandModel
  • en-GB_BroadbandModel
  • en-GB_NarrowbandModel
  • en-US_BroadbandModel (the default)
  • en-US_NarrowbandModel
  • es-ES_BroadbandModel
  • es-ES_NarrowbandModel
  • fr-FR_BroadbandModel
  • ja-JP_BroadbandModel
  • ja-JP_NarrowbandModel
  • ko-KR_BroadbandModel
  • ko-KR_NarrowbandModel
  • pt-BR_BroadbandModel
  • pt-BR_NarrowbandModel
  • zh-CN_BroadbandModel
  • zh-CN_NarrowbandModel
customization_id customizationId string The GUID of a custom language model that is to be used with the request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom language model is used.
acoustic_customization_id string The GUID of a custom acoustic model that is to be used with the request. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom acoustic model is used.
customization_weight customizationWeight double If you specify a customization ID with the request, you can use the customization weight to tell the service how much weight to give to words from the custom language model compared to those from the base model for the request.

Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.

The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.
inactivity_timeout inactivityTimeout integer The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 response code. The default is 30 seconds. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity.
keywords string[ ] A list of keywords to spot in the audio. Each keyword string can include one or more tokens. If you specify any keywords, you must also specify a keywords threshold. You can spot a maximum of 1000 keywords. Omit the parameter or specify an empty array if you do not need to spot keywords.
keywords_threshold keywordsThreshold float A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords.
max_alternatives maxAlternatives integer The maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
word_alternatives_threshold wordAlternativesThreshold float A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.
word_confidence wordConfidence boolean Indicates whether a confidence measure in the range of 0 to 1 is returned for each word. The default is false; no word confidence measures are returned.
timestamps boolean Indicates whether time alignment is returned for each word. The default is false; no timestamps are returned.
profanity_filter profanityFilter boolean Indicates whether profanity filtering is performed on the transcript. If true (the default), the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. If false, the service returns results with no censoring. Applies to US English transcription only.
smart_formatting smartFormatting boolean Indicates whether dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses are to be converted into more readable, conventional representations in the final transcript of a recognition request. For US English, also converts certain keyword strings to punctuation symbols. The default is false; no smart formatting is performed. Applies to US English and Spanish transcription only.
speaker_labels speakerLabels boolean Indicates whether labels that identify which words were spoken by which participants in a multi-person exchange are to be included in the response. The default is false; no speaker labels are returned. Setting speaker_labels speakerLabels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter.

To determine whether a language model supports speaker labels, use the Get models method and check that the attribute speaker_labels is set to true. You can also refer to Speaker labels.
interimResults boolean Not supported.

Example request


curl -X POST -u "{username}":"{password}"
--header "Content-Type: audio/flac"
--data-binary "@audio-file1.flac"
--data-binary "@audio-file2.flac"
"https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?timestamps=true&word_alternatives_threshold=0.9&keywords=%22colorado%22%2C%22tornado%22%2C%22tornadoes%22&keywords_threshold=0.5"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var fs = require('fs');

var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var files = ['audio-file1.flac', 'audio-file2.flac'];
for (var file in files) {
  var params = {
    audio: fs.createReadStream(files[file]),
    content_type: 'audio/flac',
    timestamps: true,
    word_alternatives_threshold: 0.9,
    keywords: ['colorado', 'tornado', 'tornadoes'],
    keywords_threshold: 0.5
  };

  speech_to_text.recognize(params, function(error, transcript) {
    if (error)
      console.log('Error:', error);
    else
      console.log(JSON.stringify(transcript, null, 2));
  });
}

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username"}, "{password}");

RecognizeOptions options = new RecognizeOptions.Builder()
  .contentType("audio/flac").timestamps(true)
  .wordAlternativesThreshold(0.9)
  .keywords(new String[]{"colorado", "tornado", "tornadoes"})
  .keywordsThreshold(0.5).build();

String[] files = {"audio-file1.flac", "audio-file2.flac"};
for (String file : files) {
  SpeechResults results = service.recognize(new File(file), options).execute();
  System.out.println(results);
}

Response

Returns one or more instances of a SpeechRecognitionResults object depending on the input.

Returns a Java SpeechResults object that contains the results that are provided in a JSON SpeechRecognitionResults object. The response includes one or more instances of the object.

SpeechRecognitionResults (Java SpeechResults object)
Name Description
results object[ ] An array of SpeechRecognitionResult objects that can include interim results (if supported by the method) and final results. Final results are guaranteed not to change; interim results might be replaced by further interim results and final results. The service periodically sends updates to the results array; the result_index is set to the lowest index in the array that has changed; it is incremented for new results.
result_index integer An index that indicates a change point in the results array. The service increments the index only for additional results that it sends for new audio for the same request.
speaker_labels object[ ] An array of SpeakerLabelsResult objects that identifies which words were spoken by which speakers in a multi-person exchange. Returned in the response only if speaker_labels is true. When interim results are also requested for methods that support them, it is possible for a SpeechRecognitionResults object to include only the speaker_labels field.
warnings string[ ] An array of warning messages associated with the request:
  • Warnings for invalid query parameters or JSON fields can include a descriptive message and a list of invalid argument strings, for example, "Unknown arguments:" or "Unknown url query arguments:" followed by a list of the form "invalid_arg_1, invalid_arg_2."
  • The following warning is returned if the request passes a custom model that is based on an older version of a base model for which an updated version is available:

    "Using previous version of base model, because your custom model has been built with it. Please note that this version will be supported only for a limited time. Consider updating your custom model to the new base model. If you do not do that you will be automatically switched to base model when you used the non-updated custom model."
In both cases, the request succeeds despite the warnings.
SpeechRecognitionResult (Java Transcript object)
Name Description
final boolean Indicates whether the result for this utterance are final. If true, the results for this utterance are not updated further; no additional results are sent for a result_index once its results are indicated as final. If false, the results are updated further.
alternatives object[ ] An array of SpeechRecognitionAlternative objects that provide alternative transcripts. The alternatives array can include additional requested output such as word confidence or timestamps.
keywords_result object A KeywordResults object that provides a dictionary (or associative array) whose keys are the strings specified for keywords if both that parameter and keywords_threshold are specified. A keyword for which no matches are found is omitted from the array. The array is omitted if no keywords are found.
keywords_result Map A Map of strings to Lists of KeywordResult objects. The Map provides a dictionary (or associative array) of keywords to their matches in the audio.
  • Each string is a key that represents one of the keywords if both that parameter and keywords_threshold are specified. A keyword for which no matches are found is omitted from the Map.
  • A List of KeywordResult objects is returned for each keyword for which at least one match is found. Each element of the list provides information about the occurrences of the keyword in the audio.
The Map is omitted if no keywords are found in the audio or if keyword spotting is not requested.
word_alternatives object[ ] An array of WordAlternativeResults objects that provide alternative hypotheses found for words of the input audio if a word_alternatives_threshold is specified.
SpeechRecognitionAlternative (Java SpeechAlternative object)
Name Description
transcript string A transcription of the audio.
confidence number A score that indicates the service's confidence in the transcript in the range of 0 to 1. Available only for the best alternative and only in results marked as final.
timestamps string[ ] Time alignments for each word from the transcript as a list of lists. Each inner list consists of three elements: the word followed by its start and end time in seconds. For example, [["hello",0.0,1.2],["world",1.2,2.5]]. Available only for the best alternative.
word_confidence string[ ] A confidence score for each word of the transcript as a list of lists. Each inner list consists of two elements: the word and its confidence score in the range of 0 to 1. For example, [[\"hello\",0.95],[\"world\",0.866]]. Available only for the best alternative and only in results marked as final.
KeywordResults
Name Description
{keyword} list Each keyword entered via the keywords parameter and, for each keyword, an array of KeywordResult objects that provides information about its occurrences in the input audio. The keys of the list are the actual keyword strings. A keyword for which no matches are spotted in the input is omitted from the array.
KeywordResult (Java KeywordsResult object)
Name Description
normalized_text string The specified keyword normalized to the spoken phrase that matched in the audio input.
start_time number The start time in seconds of the keyword match.
end_time number The end time in seconds of the keyword match.
confidence number The confidence score of the keyword match in the range of 0 to 1.
WordAlternativeResults (Java SpeechWordAlternatives object)
Name Description
start_time number The start time in seconds of the word from the input audio that corresponds to the word alternatives.
end_time number The end time in seconds of the word from the input audio that corresponds to the word alternatives.
alternatives object[ ] An array of WordAlternativeResult objects that provides alternative hypotheses for a word from the input audio.
WordAlternativeResult (Java WordAlternative object)
Name Description
confidence number The confidence score of the word alternative hypothesis in the range of 0 to 1.
word string An alternative hypothesis for a word from the input audio.
SpeakerLabelsResult (Java SpeakerLabel object)
Name Description
from number The start time of a word from the transcript. The value matches the start time of a word from the timestamps array.
to number The end time of a word from the transcript. The value matches the end time of a word from the timestamps array.
speaker integer The numeric identifier that the service assigns to a speaker from the audio. Speaker IDs begin at 0 initially but can evolve and change across interim results (if supported by the method) and between interim and final results as the service processes the audio. They are not guaranteed to be sequential, contiguous, or ordered.
confidence number A score that indicates the service's confidence in its identification of the speaker in the range of 0 to 1.
final boolean An indication of whether the service might further change word and speaker-label results. A value of true means that the service guarantees not to send any further updates for the current or any preceding results; false means that the service might send further updates to the results.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The request failed because of a user input error. For example, the request passed audio that does not match the indicated format, specified a custom language or custom acoustic model that is not in the available state, or experienced an inactivity timeout. Specific messages include
  • Model {model} not found
  • Requested model is not available
  • This 8000hz audio input requires a narrow band model. See /v1/models for a list of available models.
  • speaker_labels is not a supported feature for model {model}
406 Not Acceptable The request specified an Accept header with an incompatible content type.
408 Request Timeout The connection was closed due to inactivity (session timeout) for 30 seconds.
413 Payload Too Large The request passed an audio file that exceeded the currently supported data limit.
415 Unsupported Media Type The request specified an unacceptable media type.
500 Internal Server Error The service experienced an internal error.
503 Service Unavailable The service is currently unavailable.

Exceptions thrown

Exception Description
BadRequestException The request failed because of a user input error. For example, the request passed audio that does not match the indicated format, specified a custom language or custom acoustic model that is not in the available state, or experienced an inactivity timeout. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
ForbiddenException The request specified an Accept header with an incompatible content type. (HTTP response code 406.)
ServiceResponseException The connection was closed due to inactivity (session timeout) for 30 seconds. (HTTP response code 408.)
RequestTooLargeException The request passed an audio file that exceeded the currently supported data limit. (HTTP response code 413.)
UnsupportedException The request specified an unacceptable media type. (HTTP response code 415.)
InternalServerErrorException The service experienced an internal error. (HTTP response code 500.)
ServiceUnavailableException The service is currently unavailable. (HTTP response code 503.)

Example response


{
  "results": [
    {
      "word_alternatives": [
        {
          "start_time": 0.09,
          "alternatives": [
            {
              "confidence": 1.0,
              "word": "latest"
            }
          ],
          "end_time": 0.6
        },
        {
          "start_time": 0.6,
          "alternatives": [
            {
              "confidence": 1.0,
              "word": "weather"
            }
          ],
          "end_time": 0.85
        },
        . . .
        {
          "start_time": 6.85,
          "alternatives": [
            {
              "confidence": 0.9988,
              "word": "on"
            }
          ],
          "end_time": 7.0
        },
        {
          "start_time": 7.0,
          "alternatives": [
            {
              "confidence": 0.9953,
              "word": "Sunday"
            }
          ],
          "end_time": 7.71
        }
      ],
      "keywords_result": {
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 6.26,
            "confidence": 0.999,
            "end_time": 6.85
          }
        ],
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 4.7,
            "confidence": 0.964,
            "end_time": 5.52
          }
        ]
      },
      "alternatives": [
        {
          "timestamps": [
            [
              "the",
              0.03,
              0.09
            ],
            [
              "latest",
              0.09,
              0.6
            ],
            . . .
            [
              "on",
              6.85,
              7.0
            ],
            [
              "Sunday",
              7.0,
              7.71
            ]
          ],
          "confidence": 0.968,
          "transcript": "the latest weather report a line of severe thunderstorms with several possible tornadoes is approaching Colorado on Sunday "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}

{
  "results": [
    {
      "word_alternatives": [
        {
          "start_time": 0.09,
          "alternatives": [
            {
              "confidence": 1,
              "word": "latest"
            }
          ],
          "end_time": 0.6
        },
        . . .
        {
          "start_time": 0.85,
          "alternatives": [
            {
              "confidence": 0.9979,
              "word": "report"
            }
          ],
          "end_time": 1.52
        }
      ],
      "keywords_result": {},
      "alternatives": [
        {
          "timestamps": [
            [
              "the",
              0.03,
              0.09
            ],
            . . .
            [
              "report",
              0.85,
              1.52
            ]
          ],
          "confidence": 0.983,
          "transcript": "the latest weather report "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}
{
  "results": [
    {
      "word_alternatives": [
        {
          "start_time": 0.14,
          "alternatives": [
            {
              "confidence": 1,
              "word": "a"
            }
          ],
          "end_time": 0.28
        },
        . . .
        {
          "start_time": 5.33,
          "alternatives": [
            {
              "confidence": 0.9953,
              "word": "Sunday"
            }
          ],
          "end_time": 6.04
        }
      ],
      "keywords_result": {
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 3.03,
            "confidence": 0.953,
            "end_time": 3.85
          }
        ],
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.59,
            "confidence": 0.999,
            "end_time": 5.18
          }
        ]
      },
      "alternatives": [
        {
          "timestamps": [
            [
              "a",
              0.14,
              0.28
            ],
            . . .
            [
              "Sunday",
              5.33,
              6.04
            ]
          ],
          "confidence": 0.983,
          "transcript": "a line of severe thunderstorms with several possible tornadoes is approaching Colorado on Sunday "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}

{
  "result_index": 0,
  "results": [
    {
      "final": true,
      "alternatives": [
        {
          "confidence": 0.983,
          "timestamps": [
            [
              "the",
              0.03,
              0.09
            ],
            . . .
            [
              "report",
              0.85,
              1.52
            ]
          ],
          "transcript": "the latest weather report "
        }
      ],
      "keywords_result": {},
      "word_alternatives": [
        {
          "start_time": 0.09,
          "alternatives": [
            {
              "confidence": 1.0,
              "word": "latest"
            }
          ],
          "end_time": 0.6
        },
        . . .
        {
          "start_time": 0.85,
          "alternatives": [
            {
              "confidence": 0.9979,
              "word": "report"
            }
          ],
          "end_time": 1.52
        }
      ]
    }
  ]
}
{
  "result_index": 0,
  "results": [
    {
      "final": true,
      "alternatives": [
        {
          "confidence": 0.983,
          "timestamps": [
            [
              "a",
              0.14,
              0.28
            ],
            . . .
            [
              "Sunday",
              5.33,
              6.04
            ]
          ],
          "transcript": "a line of severe thunderstorms with several possible tornadoes is approaching Colorado on Sunday "
        }
      ],
      "keywords_result": {
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 3.03,
            "end_time": 3.85,
            "confidence": 0.953
          }
        ],
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.59,
            "end_time": 5.18,
            "confidence": 0.999
          }
        ]
      },
      "word_alternatives": [
        {
          "start_time": 0.14,
          "alternatives": [
            {
              "confidence": 1.0,
              "word": "a"
            }
          ],
          "end_time": 0.28
        },
        . . .
        {
          "start_time": 5.33,
          "alternatives": [
            {
              "confidence": 0.9953,
              "word": "Sunday"
            }
          ],
          "end_time": 6.04
        }
      ]
    }
  ]
}

Recognize multipart

Sends audio and returns transcription results for a sessionless recognition request submitted as multipart form data. Returns only the final results; to enable interim results, use Sessions or WebSockets. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. (For the audio/l16 format, you can specify the endianness.)

You specify some parameters of the request via request headers and query parameters, but you specify most parameters as multipart form data in the form of JSON metadata, in which only the part_content_type parameter is required. You then specify the audio files for the request as subsequent parts of the form data.

The multipart approach is intended for two use cases:

  • For use with browsers for which JavaScript is disabled. Multipart requests based on form data do not require the use of JavaScript.

  • When the parameters used with the recognition request are greater than the 8 KB limit imposed by most HTTP servers and proxies. This can occur, for example, if you want to spot a very large number of keywords. Passing the parameters as form data avoids this limit.

For requests to transcribe audio with more than one audio file or to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the connection (response code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the connection (response code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Not supported. Use the sessionless recognize call; see Recognize audio.


POST /v1/recognize

Request

Parameter Description
Content-Type header string Must be multipart/form-data to indicate the content type of the payload. cURL automatically sets the header to multipart/form-data when you use the --form option.
Transfer-Encoding header string Must be set to chunked to send the audio in streaming mode or to send a request that includes more than one audio part.
model query string The identifier of the model that is to be used for the recognition request:
  • ar-AR_BroadbandModel
  • en-GB_BroadbandModel
  • en-GB_NarrowbandModel
  • en-US_BroadbandModel (the default)
  • en-US_NarrowbandModel
  • es-ES_BroadbandModel
  • es-ES_NarrowbandModel
  • fr-FR_BroadbandModel
  • ja-JP_BroadbandModel
  • ja-JP_NarrowbandModel
  • ko-KR_BroadbandModel
  • ko-KR_NarrowbandModel
  • pt-BR_BroadbandModel
  • pt-BR_NarrowbandModel
  • zh-CN_BroadbandModel
  • zh-CN_NarrowbandModel
customization_id query string The GUID of a custom language model that is to be used with the request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model.By default, no custom language model is used.
acoustic_customization_id query string The GUID of a custom acoustic model that is to be used with the request. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom acoustic model is used.
base_model_version query string The version of the specified base model that is to be used with the request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. For more information, see Base model version.
metadata form data object A MultipartRecognition object that provides the parameters for the multipart recognition request. This must be the first part of the request and must consist of JSON-formatted data. The information describes the subsequent parts of the request, which pass the audio files to be transcribed.
upload form data file One or more audio files for the request. To send multiple audio files, set Transfer-Encoding to chunked. With cURL, include a separate --form option for each file of the request.
MultipartRecognition
Parameter Description
part_content_type string The audio format (MIME type) of the audio in the following parts:
  • audio/basic (Use only with narrowband models.)
  • audio/flac
  • audio/l16 (Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
  • audio/mp3
  • audio/mpeg
  • audio/mulaw (Specify the sampling rate of the audio.)
  • audio/ogg (The service automatically detects the codec of the input audio.)
  • audio/ogg;codecs=opus
  • audio/ogg;codecs=vorbis
  • audio/wav (Provide audio with a maximum of nine channels.)
  • audio/webm (The service automatically detects the codec of the input audio.)
  • audio/webm;codecs=opus
  • audio/webm;codecs=vorbis
All data parts must have the same audio format. For information about the supported audio formats, including specifying the sampling rate, channels, and endianness for the indicated formats, see Audio formats. The information includes links to a number of Internet sites that provide technical and usage details about the different formats.
data_parts_count integer The number of audio data parts (audio files) sent with the request. Server-side end-of-stream detection is applied to the last (and possibly the only) data part. If omitted, the number of parts is determined from the request itself.
customization_weight double If you specify a customization ID with the request, you can use the customization weight to tell the service how much weight to give to words from the custom language model compared to those from the base model for the request.

Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.

The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.
sequence_id integer The sequence ID for all data parts of this recognition task. If omitted, no sequence ID is associated with the request. Available only for session-based requests.
inactivity_timeout integer The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 response code. The default is 30 seconds. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity.
keywords string[ ] A list of keywords to spot in the audio. Each keyword string can include one or more tokens. If you specify any keywords, you must also specify a keywords threshold. You can spot a maximum of 1000 keywords. Omit the parameter or specify an empty array if you do not need to spot keywords.
keywords_threshold float A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords.
max_alternatives integer The maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
word_alternatives_threshold float A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.
word_confidence boolean Indicates whether a confidence measure in the range of 0 to 1 is returned for each word. The default is false; no word confidence measures are returned.
timestamps boolean Indicates whether time alignment is returned for each word. The default is false; no timestamps are returned.
profanity_filter boolean Indicates whether profanity filtering is performed on the transcript. If true (the default), the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. If false, the service returns results with no censoring. Applies to US English transcription only.
smart_formatting boolean Indicates whether dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses are to be converted into more readable, conventional representations in the final transcript of a recognition request. For US English, also converts certain keyword strings to punctuation symbols. The default is false; no smart formatting is performed. Applies to US English and Spanish transcription only.
speaker_labels boolean Indicates whether labels that identify which words were spoken by which participants in a multi-person exchange are to be included in the response. The default is false; no speaker labels are returned. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter.

To determine whether a language model supports speaker labels, use the Get models method and check that the attribute speaker_labels is set to true. You can also refer to Speaker labels.

Example request


curl -X POST -u "{username}":"{password}"
--header "Transfer-Encoding: chunked"
--form metadata="{\"data_parts_count\":2,
  \"part_content_type\":\"audio/flac\",
  \"timestamps\":true,
  \"word_alternatives_threshold\":0.9,
  \"keywords\":[\"colorado\",\"tornado\",\"tornadoes\"],
  \"keywords_threshold\":0.5}"
--form upload="@audio-file1.flac"
--form upload="@audio-file2.flac"
"https://stream.watsonplatform.net/speech-to-text/api/v1/recognize"

Response

Returns one or more instances of a SpeechRecognitionResults object depending on the input.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The request failed because of a user input error. For example, the request passed audio that does not match the indicated format, specified a custom language or custom acoustic model that is not in the available state, or experienced an inactivity timeout. Specific messages include
  • Model {model} not found
  • Requested model is not available
  • This 8000hz audio input requires a narrow band model. See /v1/models for a list of available models.
  • speaker_labels is not a supported feature for model {model}
406 Not Acceptable The request specified an Accept header with an incompatible content type.
408 Request Timeout The connection was closed due to inactivity (session timeout) for 30 seconds.
413 Payload Too Large The request passed an audio file that exceeded the currently supported data limit.
415 Unsupported Media Type The request specified an unacceptable media type.
500 Internal Server Error The service experienced an internal error.
503 Service Unavailable The service is currently unavailable.

Example response


{
  "results": [
    {
      "word_alternatives": [
        {
          "start_time": 0.09,
          "alternatives": [
            {
              "confidence": 1.0,
              "word": "latest"
            }
          ],
          "end_time": 0.6
        },
        . . .
        {
          "start_time": 0.85,
          "alternatives": [
            {
              "confidence": 0.9979,
              "word": "report"
            }
          ],
          "end_time": 1.52
        }
      ],
      "keywords_result": {},
      "alternatives": [
        {
          "timestamps": [
            [
              "the",
              0.03,
              0.09
            ],
            . . .
            [
              "report",
              0.85,
              1.52
            ]
          ],
          "confidence": 0.983,
          "transcript": "the latest weather report "
        }
      ],
      "final": true
    },
    {
      "word_alternatives": [
        {
          "start_time": 0.14,
          "alternatives": [
            {
              "confidence": 1.0,
              "word": "a"
            }
          ],
          "end_time": 0.29
        },
        . . .
        {
          "start_time": 5.33,
          "alternatives": [
            {
              "confidence": 0.9951,
              "word": "Sunday"
            }
          ],
          "end_time": 6.04
        }
      ],
      "keywords_result": {
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 3.03,
            "confidence": 0.947,
            "end_time": 3.85
          }
        ],
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.59,
            "confidence": 0.998,
            "end_time": 5.18
          }
        ]
      },
      "alternatives": [
        {
          "timestamps": [
            [
              "a",
              0.14,
              0.29
            ],
            . . .
              "Sunday",
              5.33,
              6.04
            ]
          ],
          "confidence": 0.985,
          "transcript": "a line of severe thunderstorms with several possible tornadoes is approaching Colorado on Sunday "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}

Sessions

Create a session

Creates a session and locks recognition requests to that engine. You can use the session for multiple recognition requests so that each request is processed with the same engine. The session expires after 30 seconds of inactivity; for information about avoiding session timeouts, see Timeouts.

The method returns a session cookie in the Set-Cookie response header. You must pass this cookie with each request that uses the session. For more information, see Using cookies with sessions.

The method returns a session cookie in the cookie-session field of the SpeechSession object. You must pass this cookie with the corresponding parameter of the observeResult method.


POST /v1/sessions

createSession(params, callback())

ServiceCall<SpeechSession> createSession()
ServiceCall<SpeechSession> createSession(String model)
ServiceCall<SpeechSession> createSession(SpeechModel model)

Request

Parameter Description
model query string The identifier of the model that is to be used by the new session:
  • ar-AR_BroadbandModel
  • en-GB_BroadbandModel
  • en-GB_NarrowbandModel
  • en-US_BroadbandModel (the default)
  • en-US_NarrowbandModel
  • es-ES_BroadbandModel
  • es-ES_NarrowbandModel
  • fr-FR_BroadbandModel
  • ja-JP_BroadbandModel
  • ja-JP_NarrowbandModel
  • ko-KR_BroadbandModel
  • ko-KR_NarrowbandModel
  • pt-BR_BroadbandModel
  • pt-BR_NarrowbandModel
  • zh-CN_BroadbandModel
  • zh-CN_NarrowbandModel
customization_id query string The GUID of a custom language model that is to be used with the new session. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom language model is used.
acoustic_customization_id query string The GUID of a custom acoustic model that is to be used with the new session. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom acoustic model is used.
base_model_version query string The version of the specified base model that is to be used with the new session. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. For more information, see Base model version.
Parameter Description
model string The identifier of the model that is to be used by the new session:
  • ar-AR_BroadbandModel
  • en-GB_BroadbandModel
  • en-GB_NarrowbandModel
  • en-US_BroadbandModel (the default)
  • en-US_NarrowbandModel
  • es-ES_BroadbandModel
  • es-ES_NarrowbandModel
  • fr-FR_BroadbandModel
  • ja-JP_BroadbandModel
  • ja-JP_NarrowbandModel
  • ko-KR_BroadbandModel
  • ko-KR_NarrowbandModel
  • pt-BR_BroadbandModel
  • pt-BR_NarrowbandModel
  • zh-CN_BroadbandModel
  • zh-CN_NarrowbandModel
You can provide only one of the two model arguments.
model object A Java SpeechModel object that identifies the model that is to be used by the new session:
  • SpeechModel.AR_AR_BROADBANDMODEL
  • SpeechModel.EN_UK_BROADBANDMODEL
  • SpeechModel.EN_UK_NARROWBANDMODEL
  • SpeechModel.EN_US_BROADBANDMODEL (the default)
  • SpeechModel.EN_US_NARROWBANDMODEL
  • SpeechModel.ES_ES_BROADBANDMODEL
  • SpeechModel.ES_ES_NARROWBANDMODEL
  • SpeechModel.FR_FR_BROADBANDMODEL
  • SpeechModel.JA_JP_BROADBANDMODEL
  • SpeechModel.JA_JP_NARROWBANDMODEL
  • SpeechModel.KO_KR_BROADBANDMODEL
  • SpeechModel.KO_KR_NARROWBANDMODEL
  • SpeechModel.PT_BR_BROADBANDMODEL
  • SpeechModel.PT_BR_NARROWBANDMODEL
  • SpeechModel.ZH_CN_BROADBANDMODEL
  • SpeechModel.ZH_CN_NARROWBANDMODEL
You can provide only one of the two model arguments.
customization_id string The GUID of a custom language model that is to be used with the new session. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom language model is used.
acoustic_customization_id string The GUID of a custom acoustic model that is to be used with the new session. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom acoustic model is used.

Example request


curl -X POST -u "{username}":"{password}"
--cookie-jar cookies.txt
"https://stream.watsonplatform.net/speech-to-text/api/v1/sessions"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

speech_to_text.createSession({}, function(error, session) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(session, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

SpeechSession session = service.createSession().execute();
System.out.println(session);
session.getSessionId();

Response

Returns a Java SpeechSession object that contains the information about the new session that is provided in a JSON SpeechSession object.

SpeechSession (Java SpeechSession object)
Name Description
recognize string The URI for REST recognition requests.
recognizeWS string The URI for WebSocket recognition requests. The URI is needed only for working with WebSockets.
observe_result string The URI for REST results observers.
session_id string The identifier for the new session.
new_session_uri string The URI for the new session.
cookie_session string The cookie for the new session.

Response codes

Status Description
201 Created The session was successfully created.
406 Not Acceptable The request specified an Accept header with an incompatible content type.
415 Unsupported Media Type The request specified an unacceptable media type.
503 Service Unavailable The service is currently unavailable.

Exceptions thrown

Exception Description
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
ForbiddenException The request specified an Accept header with an incompatible content type. (HTTP response code 406.)
UnsupportedException The request specified an unacceptable media type. (HTTP response code 415.)
ServiceUnavailableException The service is currently unavailable. (HTTP response code 503.)

Example response


{
  "recognize": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/0ac1b5dfc2e8fc490a41e29e67c27931/recognize",
  "recognizeWS": "wss://stream.watsonplatform.net/speech-to-text/api/v1/sessions/0ac1b5dfc2e8fc490a41e29e67c27931/recognize",
  "observe_result": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/0ac1b5dfc2e8fc490a41e29e67c27931/observe_result",
  "session_id": "0ac1b5dfc2e8fc490a41e29e67c27931",
  "new_session_uri": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/0ac1b5dfc2e8fc490a41e29e67c27931"
}

{
  "recognize": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/e0ec707b639fc870069e938c324f1e8a/recognize",
  "recognizeWS": "wss://stream.watsonplatform.net/speech-to-text/api/v1/sessions/e0ec707b639fc870069e938c324f1e8a/recognize",
  "observe_result": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/e0ec707b639fc870069e938c324f1e8a/observe_result",
  "session_id": "e0ec707b639fc870069e938c324f1e8a",
  "new_session_uri": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/e0ec707b639fc870069e938c324f1e8a",
  "cookie_session": "e0ec707b639fc870069e938c324f1e8aas123sd12e"
}

Get status

Checks whether a specified session can accept another recognition request. Concurrent recognition tasks during the same session are not allowed. The method blocks until the session is in the initialized state to indicate that you can send another recognition request. The request must pass the cookie that was returned by the Create a session method.


GET /v1/sessions/{session_id}/recognize

getSessionStatus(params, classback()) DEPRECATED

ServiceCall<SpeechSessionStatus> getRecognizeStatus(SpeechSession session)

Request

Parameter Description
session_id path string The identifier of the session whose status is to be checked.
Parameter Description
session_id string The identifier of the session whose status is to be checked.
session object A Java SpeechSession object that identifies the session whose status is to be checked.

Example request


curl -X GET -u "{username}":"{password}"
--cookie cookies.txt
"https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/{session_id}/recognize"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'session_id': '{session_id}'
}

speech_to_text.getSessionStatus(params, function(error, status) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(status, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

SpeechSessionStatus status = service.getRecognizeStatus({session}).execute();
System.out.println(status);

Response

SessionStatus
Name Description
session object A SpeechSession object that provides information about the session.

Returns a Java SpeechSessionStatus object that contains the information about the session that is provided in a JSON SpeechSession object.

SpeechSession (Java SpeechSessionStatus object)
Name Description
recognize string The URI for REST recognition requests.
recognizeWS string The URI for WebSocket recognition requests. The URI is needed only for working with WebSockets.
state string The state of the session. The state must be initialized for the session to accept another recognition request. Other internal states are possible, but they have no meaning for the user.
observe_result string The URI for REST results observers.
model string The URI for information about the model that is used with the session.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The request failed because of an inactivity timeout or because it failed to pass the session cookie. If an existing session is closed, session_closed is set to true. The request failed because of an inactivity timeout. If an existing session is closed, session_closed is set to true.
404 Not Found The specified session_id was not found, possibly because of an invalid session cookie. The specified session_id was not found.
406 Not Acceptable The request specified an Accept header with an incompatible content type.
415 Unsupported Media Type The request specified an unacceptable media type.

Exceptions thrown

Exception Description
BadRequestException The session timed out due to inactivity, or the request failed to pass the session cookie. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
NotFoundException The specified session was not found, possibly because of an invalid session cookie. (HTTP response code 404.)
ForbiddenException The request specified an Accept header with an incompatible content type. (HTTP response code 406.)
UnsupportedException The request specified an unacceptable media type. (HTTP response code 415.)

Example response


{
  "session": {
    "recognize": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/0ac1b5dfc2e8fc490a41e29e67c27931/recognize",
    "recognizeWS": "wss://stream.watsonplatform.net/speech-to-text/api/v1/sessions/0ac1b5dfc2e8fc490a41e29e67c27931/recognize",
    "state": "initialized",
    "observe_result": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/0ac1b5dfc2e8fc490a41e29e67c27931/observe_result",
    "model": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/en-US_BroadbandModel"
  }
}

{
  "recognize": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/0ac1b5dfc2e8fc490a41e29e67c27931/recognize",
  "state": "initialized",
  "observe_result": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/0ac1b5dfc2e8fc490a41e29e67c27931/observe_result",
  "model": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/en-US_BroadbandModel"
}

Observe result

Requests results for a recognition task within a specified session. You can submit this method multiple times for the same recognition task. To see interim results, set the interim_results parameter to true. The request must pass the cookie that was returned by the Create a session method. The request must pass the cookie that was returned by the Create a session method with the cookie-session parameter.

To see results for a specific recognition task, specify a sequence ID (with the sequence_id query parameter) that matches the sequence ID of the recognition request. A request with a sequence ID can arrive before, during, or after the matching recognition request, but it must arrive no later than 30 seconds after the recognition completes to avoid a session timeout (response code 408). Send multiple requests for the sequence ID with a maximum gap of 30 seconds to avoid the timeout. For more information, see Timeouts.

Omit the sequence ID to observe results for an ongoing recognition task. If no recognition task is ongoing, the method returns results for the next recognition task regardless of whether it specifies a sequence ID.

Not supported. To obtain interim results, use WebSockets.


GET /v1/sessions/{session_id}/observe_result

observeResult(params, callback()) DEPRECATED

Request

Parameter Description
session_id path string The identifier of the session whose results you want to observe.
sequence_id query integer The sequence ID of the recognition task whose results you want to observe. Omit the parameter to obtain results either for an ongoing recognition, if any, or for the next recognition task regardless of whether it specifies a sequence ID.
interim_results query boolean Indicates whether the service is to return interim results. If true, interim results are returned as a stream of JSON SpeechRecognitionResults objects. If false, the response is a single SpeechRecognitionResults object with final results only.
Parameter Description
session_id string The identifier of the session whose results you want to observe.
cookie_session string The cookie for the session whose results you want to observe. The session cookie is returned by the Create a session method.
interim_results boolean Indicates whether the service is to return interim results. If true, interim results are returned as a stream of JSON SpeechRecognitionResults objects. If false, the response is a single SpeechRecognitionResults object with final results only.

Example request


curl -X GET -u "{username}":"{password}"
--cookie cookies.txt
"https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/{session_id}/observe_result?interim_results=true"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'session_id': '{session_id}',
  'cookie_session': '{cookie_session}',
  'interim_results': true
}

speech_to_text.observeResult(params, function(error, interim_results) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(interim_results, null, 2));
});

Response

Returns one or more instances of a SpeechRecognitionResults object depending on the input and the value of the interim_results parameter.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The request failed because of a user input error. For example, the request failed to pass the session cookie or experienced an inactivity timeout. If an existing session is closed, session_closed is set to true. The request failed because of a user input error. For example, the request experienced an inactivity timeout. If an existing session is closed, session_closed is set to true.
404 Not Found The specified session_id was not found, possibly because of an invalid session cookie, or a specified sequence_id does not match the sequence ID of a recognition task. The specified session_id was not found.
406 Not Acceptable The request specified an Accept header with an incompatible content type.
408 Request Timeout The session was closed due to inactivity (session timeout) for 30 seconds. The session is destroyed with session_closed set to true.
413 Payload Too Large The request passed an audio file that exceeded the currently supported data limit. The session is destroyed with session_closed set to true.
415 Unsupported Media Type The request specified an unacceptable media type.
500 Internal Server Error The service experienced an internal error. The session is destroyed with session_closed set to true.

Example response


{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "several torn "
        }
      ],
      "final": false
    }
  ],
  "result_index": 0
}{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "several tornadoes touch down as "
        }
      ],
      "final": false
    }
  ],
  "result_index": 0
}{
  . . .
}{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
        }
      ],
      "final": false
    }
  ],
  "result_index": 0
}{
  "results": [
    {
      "keywords_result": {
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 1.52,
            "confidence": 1.0,
            "end_time": 2.15
          }
        ],
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.94,
            "confidence": 0.913,
            "end_time": 5.62
          }
        ]
      },
      "alternatives": [
        {
          "confidence": 0.891,
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}

Recognize audio

Sends audio and returns transcription results for a session-based recognition request. By default, returns only the final transcription results for the request. To see interim results, set the parameter interim_results to true in a call to the Observe result method before this POST request finishes. Returns only the final transcription results; the interim_results parameter is a no-op when used with this method. To enable interim results, use WebSockets.

The service imposes a data size limit of 100 MB per session. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. (For the audio/l16 format, you can specify the endianness.) The request must pass the cookie that was returned by the Create a session method.

You specify the parameters of the request as a path parameter, request headers, and query parameters. You provide the audio as the body of the request. This method is preferred to the multipart approach for submitting a session-based recognition request.

For requests to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the session (response code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the session (response code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds. For more information, see Timeouts.

To enable polling by the observe_result method for large audio requests, specify an integer with the sequence_id query parameter.

This call is the same as the sessionless recognize call, but this call requires the session_id sessionId parameter and omits the model parameter.


POST /v1/sessions/{session_id}/recognize

recognize(params, callback())

ServiceCall<SpeechResults> recognize(File audio, RecognizeOptions options)

Request

Parameter Description
session_id path string The identifier of the session that is to be used.
Content-Type header string The audio format (MIME type) of the audio:
  • audio/basic (Use only with narrowband models.)
  • audio/flac
  • audio/l16 (Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
  • audio/mp3
  • audio/mpeg
  • audio/mulaw (Specify the sampling rate of the audio.)
  • audio/ogg (The service automatically detects the codec of the input audio.)
  • audio/ogg;codecs=opus
  • audio/ogg;codecs=vorbis
  • audio/wav (Provide audio with a maximum of nine channels.)
  • audio/webm (The service automatically detects the codec of the input audio.)
  • audio/webm;codecs=opus
  • audio/webm;codecs=vorbis
For information about the supported audio formats, including specifying the sampling rate, channels, and endianness for the indicated formats, see Audio formats. The information includes links to a number of Internet sites that provide technical and usage details about the different formats.
Transfer-Encoding header string Must be set to chunked to send the audio in streaming mode. The data does not need to exist fully before being streamed to the service.
audio body stream The audio to be transcribed in the format specified by the Content-Type header. With cURL, include a separate --data-binary option for each file of the request; see the sessionless recognize audio request for an example.
customization_weight query double If you specify a customization ID when you create the session, you can use the customization weight to tell the service how much weight to give to words from the custom language model compared to those from the base model for the current request.

Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.

The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.
sequence_id query integer The sequence ID of this recognition task. If omitted, no sequence ID is associated with the request.
inactivity_timeout query integer The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 response code. The default is 30 seconds. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity.
keywords query string[ ] A list of keywords to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. If you specify any keywords, you must also specify a keywords threshold. You can spot a maximum of 1000 keywords. Omit the parameter or specify an empty array if you do not need to spot keywords.
keywords_threshold query float A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the you omit the parameter. If you specify a threshold, you must also specify one or more keywords.
max_alternatives query integer The maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
word_alternatives_threshold query float A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.
word_confidence query boolean Indicates whether a confidence measure in the range of 0 to 1 is returned for each word. The default is false; no word confidence measures are returned.
timestamps query boolean Indicates whether time alignment is returned for each word. The default is false; no timestamps are returned.
profanity_filter query boolean Indicates whether profanity filtering is performed on the transcript. If true (the default), the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. If false, the service returns results with no censoring. Applies to US English transcription only.
smart_formatting query boolean Indicates whether dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses are to be converted into more readable, conventional representations in the final transcript of a recognition request. For US English, also converts certain keyword strings to punctuation symbols. The default is false; no smart formatting is performed. Applies to US English and Spanish transcription only.
speaker_labels query boolean Indicates whether labels that identify which words were spoken by which participants in a multi-person exchange are to be included in the response. The default is false; no speaker labels are returned. Setting speaker_labels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter.

To determine whether a language model supports speaker labels, use the Get models method and check that the attribute speaker_labels is set to true. You can also refer to Speaker labels.

Pass the audio file to be transcribed via the method's audio argument. Pass all other parameters for the recognition request as a Java RecognizeOptions object via the options argument.

Parameter Description
session_id string sessionId string The identifier of the session that is to be used. You must provide one of sessionId or session.
session object A Java SpeechSession object that identifies the session to be used. You must provide one of sessionId or session.
audio stream File The audio to be transcribed in the format specified by the content_type contentType parameter.
content_type string contentType string The audio format (MIME type) of the audio:
  • audio/basic (Use only with narrowband models.)
  • audio/flac
  • audio/l16 (Specify the sampling rate (rate) and optionally the number of channels (channels) and endianness (endianness) of the audio.)
  • audio/mp3
  • audio/mpeg
  • audio/mulaw (Specify the sampling rate of the audio.)
  • audio/ogg (The service automatically detects the codec of the input audio.)
  • audio/ogg;codecs=opus
  • audio/ogg;codecs=vorbis
  • audio/wav (Provide audio with a maximum of nine channels.)
  • audio/webm (The service automatically detects the codec of the input audio.)
  • audio/webm;codecs=opus
  • audio/webm;codecs=vorbis
If you omit the contentType, the method attempts to derive it from the extension of the audio file. For information about the supported audio formats, including specifying the sampling rate, channels, and endianness for the indicated formats, see Audio formats. The information includes links to a number of Internet sites that provide technical and usage details about the different formats.
customization_id customizationId string The GUID of a custom language model that is to be used with the request. The base model of the specified custom language model must match the model specified with the Create a session method. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom language model is used.
acoustic_customization_id string The GUID of a custom acoustic model that is to be used with the request. The base model of the specified custom acoustic model must match the model specified with the Create a session method. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom acoustic model is used.
customization_weight customizationWeight double If you specify a customization ID with the request, you can use the customization weight to tell the service how much weight to give to words from the custom language model compared to those from the base model for the request.

Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.

The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.
inactivity_timeout inactivityTimeout integer The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 response code. The default is 30 seconds. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity.
keywords string[ ] A list of keywords to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. If you specify any keywords, you must also specify a keywords threshold. You can spot a maximum of 1000 keywords. Omit the parameter or specify an empty array if you do not need to spot keywords.
keywords_threshold keywordsThreshold float A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords.
max_alternatives maxAlternatives integer The maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
word_alternatives_threshold wordAlternativesThreshold float A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.
word_confidence wordConfidence boolean Indicates whether a confidence measure in the range of 0 to 1 is returned for each word. The default is false; no word confidence measures are returned.
timestamps boolean Indicates whether time alignment is returned for each word. The default is false; no timestamps are returned.
profanity_filter profanityFilter boolean Indicates whether profanity filtering is performed on the transcript. If true (the default), the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. If false, the service returns results with no censoring. Applies to US English transcription only.
smart_formatting smartFormatting boolean Indicates whether dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses are to be converted into more readable, conventional representations in the final transcript of a recognition request. For US English, also converts certain keyword strings to punctuation symbols. The default is false; no smart formatting is performed. Applies to US English and Spanish transcription only.
speaker_labels speakerLabels boolean Indicates whether labels that identify which words were spoken by which participants in a multi-person exchange are to be included in the response. The default is false; no speaker labels are returned. Setting speaker_labels speakerLabels to true forces the timestamps parameter to be true, regardless of whether you specify false for the parameter.

To determine whether a language model supports speaker labels, use the Get models method and check that the attribute speaker_labels is set to true. You can also refer to Speaker labels.
interimResults boolean Not supported.

Example request


curl -X POST -u "{username}":"{password}"
--cookie cookies.txt
--header "Content-Type: audio/flac"
--data-binary "@audio-file.flac"
"https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/{session_id}/recognize?max_alternatives=3&word_confidence=true&keywords=colorado,tornado,tornadoes&keywords_threshold=0.5"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var fs = require('fs');

var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'session_id': '{session_id'},
  audio: fs.createReadStream('audio-file.flac'),
  'content_type': 'audio/flac',
  'max_alternatives': 3,
  'word_confidence': true,
  keywords: ['colorado', 'tornado', 'tornadoes'],
  'keywords_threshold': 0.5
};

speech_to_text.recognize(params, function(error, transcript) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(transcript, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username"},"{password}");

RecognizeOptions options = new RecognizeOptions.Builder()
  .sessionId({session}).contentType("audio/flac")
  .maxAlternatives(3).wordConfidence(true)
  .keywords(new String[]{"colorado", "tornado", "tornadoes"})
  .keywordsThreshold(0.5).build();

SpeechResults results = service.recognize(new File("audio-file.flac"), options)
  .execute();
System.out.println(results);

Response

Returns one or more instances of a SpeechRecognitionResults object depending on the input.

Returns a Java SpeechResults object that contains the results that are provided in a JSON SpeechRecognitionResults object. The response includes one or more instances of the object depending on the input.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The request failed because of a user input error. For example, the request passed audio that does not match the indicated format, specified a custom language or custom acoustic model that is not in the available state, experienced an inactivity timeout, failed to pass the session cookie, or was used with a session that is in the wrong state. Specific messages include
  • Model {model} not found
  • Requested model is not available
  • This 8000hz audio input requires a narrow band model. See /v1/models for a list of available models.
  • speaker_labels is not a supported feature for model {model}
  • Cookie must be set.
If an existing session is closed, session_closed is set to true.
The request failed because of a user input error. For example, the request passed audio that does not match the indicated format, specified a custom language or custom acoustic model that is not in the available state, experienced an inactivity timeout, or was used with a session that is in the wrong state. Specific messages include
  • Model {model} not found
  • Requested model is not available
  • This 8000hz audio input requires a narrow band model. See /v1/models for a list of available models.
  • speaker_labels is not a supported feature for model {model}
If an existing session is closed, session_closed is set to true.
404 Not Found The specified session_id was not found, possibly because of an invalid session cookie. The specified session_id was not found.
406 Not Acceptable The request specified an Accept header with an incompatible content type.
408 Request Timeout The session was closed due to inactivity (session timeout) for 30 seconds. The session is destroyed with session_closed set to true.
413 Payload Too Large The request passed an audio file that exceeded the currently supported data limit. The session is destroyed with session_closed set to true.
415 Unsupported Media Type The request specified an unacceptable media type.
500 Internal Server Error The service experienced an internal error. The session is destroyed with session_closed set to true.
503 Service Unavailable The session is already processing a request. Concurrent requests are not allowed on the same session. The session remains alive after this error.

Exceptions thrown

Exception Description
BadRequestException The request failed because of a user input error. For example, the request passed audio that does not match the indicated format, specified a custom language or custom acoustic model that is not in the available state, experienced an inactivity timeout, failed to pass the session cookie, or was used with a session that is in the wrong state. The session is closed. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
NotFoundException The specified session or sessionId was not found, possibly because of an invalid session cookie. (HTTP response code 404.)
ForbiddenException The request specified an Accept header with an incompatible content type. (HTTP response code 406.)
ServiceResponseException The session was closed due to inactivity (session timeout) for 30 seconds. The session is closed. (HTTP response code 408.)
RequestTooLargeException The request passed an audio file that exceeded the currently supported data limit. The session is closed. (HTTP response code 413.)
UnsupportedException The request specified an unacceptable media type. (HTTP response code 415.)
InternalServerErrorException The service experienced an internal error. The session is closed. (HTTP response code 500.)
ServiceUnavailableException The session is already processing a request. Concurrent requests are not allowed on the same session. The session remains alive after this error. (HTTP response code 503.)

Example response


{
  "results": [
    {
      "keywords_result": {
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.94,
            "confidence": 0.913,
            "end_time": 5.62
          }
        ],
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 1.52,
            "confidence": 1.0,
            "end_time": 2.15
          }
        ]
      },
      "alternatives": [
        {
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday ",
          "confidence": 0.891,
          "word_confidence": [
            [
              "several",
              1.0
            ],
            [
              "tornadoes",
              1.0
            ],
            . . .
            [
              "on",
              0.311
            ],
            [
              "Sunday",
              0.986
            ]
          ]
        },
        {
          "transcript": "several tornadoes touched down as a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touch down is a line of severe thunderstorms swept through Colorado on Sunday "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}

Recognize multipart

Sends audio and returns transcription results for a session-based recognition request submitted as multipart form data. By default, returns only the final transcription results for the request. To see interim results, set the parameter interim_results to true in a call to the Observe result method before this POST request finishes.

The service imposes a data size limit of 100 MB per session. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. (For the audio/l16 format, you can specify the endianness.) The request must pass the cookie that was returned by the Create a session method.

You specify a few parameters of the request via a path parameter and as request headers, but you specify most parameters as multipart form data in the form of JSON metadata, in which only the part_content_type parameter is required. You then specify the audio files for the request as subsequent parts of the form data.

The multipart approach is intended for two use cases:

  • For use with browsers for which JavaScript is disabled. Multipart requests based on form data do not require the use of JavaScript.

  • When the parameters used with the recognition request are greater than the 8 KB limit imposed by most HTTP servers and proxies. This can occur, for example, if you want to spot a very large number of keywords. Passing the parameters as form data avoids this limit.

For requests to transcribe audio with more than one audio file or to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the session (response code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the session (response code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds. For more information, see Timeouts.

To enable polling by the observe_result method for large audio requests, specify an integer with the sequence_id parameter of the JSON metadata.

Not supported. Use the session-based recognize method; see Recognize audio.


POST /v1/sessions/{session_id}/recognize

Request

Parameter Description
session_id path string The identifier of the session that is to be used.
Content-Type header string Must be set to multipart/form-data to indicate the content type of the payload. cURL automatically sets the header to multipart/form-data when you use the --form option.
Transfer-Encoding header string Must be set to chunked to send the audio in streaming mode or to send a request that includes more than one audio part.
metadata form data object A MultipartRecognition object that provides the parameters for the multipart recognition request. This must be the first part of the request and must consist of JSON-formatted data. The information describes the subsequent parts of the request, which pass the audio files to be transcribed.
upload form data file One or more audio files for the request. To send multiple audio files, set Transfer-Encoding to chunked. With cURL, include a separate --form option for each file of the request; see the sessionless recognize multipart request for an example.

Example request


curl -X POST -u "{username}":"{password}"
--cookie cookies.txt
--form metadata="{\"data_parts_count\":1,
  \"part_content_type\":\"audio/flac\",
  \"max_alternatives\":3,
  \"word_confidence\":true,
  \"keywords\":[\"colorado\",\"tornado\",\"tornadoes\"],
  \"keywords_threshold\":0.5}"
--form upload="@audio-file.flac"
"https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/{session_id}/recognize"

Response

Returns one or more instances of a SpeechRecognitionResults object depending on the input.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The request failed because of a user input error. For example, the request passed audio that does not match the indicated format, specified a custom language or custom acoustic model that is not in the available state, experienced an inactivity timeout, failed to pass the session cookie, or was used with a session that is in the wrong state. Specific messages include
  • Model {model} not found
  • Requested model is not available
  • This 8000hz audio input requires a narrow band model. See /v1/models for a list of available models.
  • speaker_labels is not a supported feature for model {model}
  • Cookie must be set.
If an existing session is closed, session_closed is set to true.
404 Not Found The specified session_id was not found, possibly because of an invalid session cookie.
406 Not Acceptable The request specified an Accept header with an incompatible content type.
408 Request Timeout The session was closed due to inactivity (session timeout) for 30 seconds. The session is destroyed with session_closed set to true.
413 Payload Too Large The request passed an audio file that exceeded the currently supported data limit. The session is destroyed with session_closed set to true.
415 Unsupported Media Type The request specified an unacceptable media type.
500 Internal Server Error The service experienced an internal error. The session is destroyed with session_closed set to true.
503 Service Unavailable The session is already processing a request. Concurrent requests are not allowed on the same session. The session remains alive after this error.

Example response


{
  "results": [
    {
      "keywords_result": {
        "tornadoes": [
          {
            "normalized_text": "tornadoes",
            "start_time": 1.52,
            "confidence": 1.0,
            "end_time": 2.15
          }
        ],
        "colorado": [
          {
            "normalized_text": "Colorado",
            "start_time": 4.94,
            "confidence": 0.913,
            "end_time": 5.62
          }
        ]
      },
      "alternatives": [
        {
          "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday ",
          "confidence": 0.891,
          "word_confidence": [
            [
              "several",
              1.0
            ],
            [
              "tornadoes",
              1.0
            ],
            . . .
            [
              "on",
              0.311
            ],
            [
              "Sunday",
              0.986
            ]
          ]
        },
        {
          "transcript": "several tornadoes touched down as a line of severe thunderstorms swept through Colorado on Sunday "
        },
        {
          "transcript": "several tornadoes touch down is a line of severe thunderstorms swept through Colorado on Sunday "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}

Delete a session

Deletes an existing session and its engine. The request must pass the cookie that was returned by the Create a session method. You cannot send requests to a session after it is deleted. By default, a session expires after 30 seconds of inactivity if you do not delete it first.


DELETE /v1/sessions/{session_id}

deleteSession(params, callback())

ServiceCall<Void> deleteSession(SpeechSession session)

Request

Parameter Description
session_id path string The identifier of the session that is to be deleted.
Parameter Description
session_id string The identifier of the session that is to be deleted.
session object A Java SpeechSession object that identifies the session that is to be deleted.

Example request


curl -X DELETE -u "{username}":"{password}"
--cookie cookies.txt
"https://stream.watsonplatform.net/speech-to-text/api/v1/sessions/{session_id}"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'session_id': '{session_id}'
}

speech_to_text.deleteSession(params, function(error) {
  if (error)
    console.log('Error:', error);
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

service.deleteSession({session}).execute();

Response

No response body.

Response codes

Status Description
204 No Content The session was successfully deleted.
400 Bad Request The request must set the cookie.
404 Not Found The specified session_id was not found, possibly because of an invalid session cookie. The specified session_id was not found.
406 Not Acceptable The request specified an Accept header with an incompatible content type.

Exceptions thrown

Exception Description
BadRequestException The request must set the cookie. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
NotFoundException The specified session was not found, possibly because of an invalid session cookie. (HTTP response code 404.)
ForbiddenException The request specified an Accept header with an incompatible content type. (HTTP response code 406.)

Asynchronous

Register a callback

Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or white-list, the callback URL if it is not already registered by sending a GET request to the callback URL. The service passes a random alphanumeric challenge string via the challenge_string query parameter of the request. The request includes an Accept header that specifies text/plain as the required response type.

To be registered successfully, the callback URL must respond to the GET request from the service. The response must send status code 200 and must include the challenge string in its body. Set the Content-Type response header to text/plain. Upon receiving this response, the service responds to the original POST registration request with response code 201. a RegisterStatus object that has a status of created. a RecognitionCallback object that has a status of created.

The service sends only a single GET request to the callback URL. If the service does not receive a reply with a response code of 200 and a body that echoes the challenge string sent by the service within five seconds, it does not white-list the URL; it instead sends status code 400 in response to the POST registration request. If the requested callback URL is already white-listed, the service responds to the initial registration request with response code 200. a RegisterStatus object that has a status of already created. a RecognitionCallback object that has a status of already created.

If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of the challenge string in its response to the POST request. It sends this signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.

Once you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time. For more information, see Registering a callback URL.


POST /v1/register_callback

registerCallback(params, callback())

ServiceCall<RecognitionCallback> registerCallback(String callbackUrl, String secret)

Request

Parameter Description
callback_url query string An HTTP or HTTPS URL to which callback notifications are to be sent. To be white-listed, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.
user_secret query string A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.
Parameter Description
callback_url string An HTTP or HTTPS URL to which callback notifications are to be sent. To be white-listed, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.
user_secret string A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.
Parameter Description
callbackUrl string An HTTP or HTTPS URL to which callback notifications are to be sent. To be white-listed, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.
secret string A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you pass a value of null, the service does not send the header.

Example request


curl -X POST -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/register_callback?callback_url=http://{user_callback_path}/results&user_secret=ThisIsMySecret"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'callback_url': 'http://{user_callback_path}/results',
  'user_secret': 'ThisIsMySecret'
};

speech_to_text.registerCallback(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(response, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

RecognitionCallback callback = service.registerCallback("http://{user_callback_path}/results",
  "ThisIsMySecret").execute();
System.out.println(callback);

Response

Returns a Java RecognitionCallback object that contains the information about the new callback that is provided in a JSON RegisterStatus object.

RegisterStatus (Java RecognitionCallback object)
Name Description
status string The current status of the job: created if the callback URL was successfully white-listed as a result of the call or already created if the URL was already white-listed.
url string The callback URL that is successfully registered.

Response codes

Status Description
200 OK The callback was already registered (white-listed). The status included in the response is already created.
201 Created The callback was successfully registered (white-listed). The status included in the response is created.
400 Bad Request The callback registration failed. The request was missing a required parameter or specified an invalid argument; the client sent an invalid response to the service's GET request during the registration process; or the client failed to respond to the server's request before the five-second timeout.
503 Service Unavailable The service is currently unavailable.

Exceptions thrown

Exception Description
BadRequestException The callback registration failed. The request was missing a required parameter or specified an invalid argument; the client sent an invalid response to the service's GET request during the registration process; or the client failed to respond to the server's request before the five-second timeout. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
ServiceUnavailableException The service is currently unavailable. (HTTP response code 503.)

Example response


{
  "status": "created",
  "url": "http://{user_callback_path}/results"
}

Unregister a callback

Unregisters a callback URL that was previously white-listed with the Register a callback method for use with the asynchronous interface. Once unregistered, the URL can no longer be used with asynchronous recognition requests.

Not yet supported.


POST /v1/unregister_callback

unregisterCallback(params, callback())

Request

Parameter Description
callback_url query string The callback URL that is to be unregistered.

Example request


curl -X POST -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/unregister_callback?callback_url=http://{user_callback_path}/results"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'callback_url': 'http://{user_callback_path}/results'
};

speech_to_text.unregisterCallback(params, function(error, response) {
  if (error)
    console.log('Error:', error);
});

Response

No response body.

Response codes

Status Description
200 OK The callback URL was successfully unregistered.
400 Bad Request The request failed because of a user input error (for example, because it failed to pass a callback URL).
404 Not Found The specified callback_url was not found.
503 Service Unavailable The service is currently unavailable.

Create a job

Creates a job for a new asynchronous recognition request. The job is owned by the user whose service credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

  • By callback notification: Include the callback_url callbackUrl parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token userToken parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.

  • By polling the service: Omit the callback_url, events and user_token callbackUrl, events and userToken parameters. You must then use the Check jobs or Check a job method to check the status of the job, using the latter to retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl resultsTtl parameter to specify how long the results are to remain available after the job is complete. For detailed usage information about the two approaches, including callback notifications, see Creating a job. Note that using the HTTPS Check a job method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.

The method supports the same basic recognition parameters as all recognition methods. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding. (For the audio/l16 format, you can specify the endianness.)


POST /v1/recognitions

createJob(params, callback())

ServiceCall<RecognitionJob> createRecognitionJob(File audio, RecognizeOptions recognizeOptions,
  RecognitionJobOptions recognitionJobOptions)

Request

The method supports the parameters common to all recognition requests; see the request parameters for the sessionless Recognize audio method for a list of supported parameters. It also supports the following parameters specific to the asynchronous interface.

Parameter Description
callback_url query string A URL to which callback notifications are to be sent. The URL must already be successfully white-listed by using the POST register_callback method. Omit the parameter to poll the service for job completion and results.

You can include the same callback URL with any number of job creation requests. Use the user_token query parameter to specify a unique user-specified string with each job to differentiate the callback notifications for the jobs.
events query string[, string...] If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are
  • recognitions.started generates a callback notification when the service begins to process the job.
  • recognitions.completed generates a callback notification when the job is complete. You must use the GET recognitions/{id} method to retrieve the results before they time out or are deleted.
  • recognitions.completed_with_results generates a callback notification when the job is complete. The notification includes the results of the request.
  • recognitions.failed generates a callback notification if the service experiences an error while processing the job.
Omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. The recognitions.completed and recognitions.completed_with_results events are incompatible; you can specify only of the two events.

If the job does not include a callback URL, omit the parameter.
user_token query string If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job. The token allows the user to maintain an internal mapping between jobs and notification events.

If the job does not include a callback URL, omit the parameter.
results_ttl query integer The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.

Pass the audio file to be transcribed via the method's audio argument. The method supports the parameters common to all recognition requests, which you pass as a RecognizeOptions object via the options argument; see the Recognize audio method for a list of supported parameters. The method also supports the following parameters specific to the asynchronous interface, which you pass as a RecognitionJobOptions object with the recognitionJobOptions argument.

Parameter Description
callback_url callbackUrl string A URL to which callback notifications are to be sent. The URL must already be successfully white-listed by using the POST register_callback method. Omit the parameter to poll the service for job completion and results.

You can include the same callback URL with any number of job creation requests. Use the user_token query parameter to specify a unique user-specified string with each job to differentiate the callback notifications for the jobs.
events string[, string...] string[ ] If the job includes a callback URL, a list of notification events to which to subscribe. Specify the events either as a comma-separated list of strings or as an array of strings. an array of notification events to which to subscribe. Valid events are
  • recognitions.started generates a callback notification when the service begins to process the job.
  • recognitions.completed generates a callback notification when the job is complete. You must use the GET recognitions/{id} method to retrieve the results before they time out or are deleted.
  • recognitions.completed_with_results generates a callback notification when the job is complete. The notification includes the results of the request.
  • recognitions.failed generates a callback notification if the service experiences an error while processing the job.
Omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. The recognitions.completed and recognitions.completed_with_results events are incompatible; you can specify only of the two events.

If the job does not include a callback URL, omit the parameter.
user_token userToken string If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job. The token allows the user to maintain an internal mapping between jobs and notification events.

If the job does not include a callback URL, omit the parameter.
results_ttl resultsTtl integer The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.

Example request


curl -X POST -u "{username}":"{password}"
--header "Content-Type: audio/flac"
--data-binary "@audio-file.flac"
"https://stream.watsonplatform.net/speech-to-text/api/v1/recognitions?callback_url=http://{user_callback_path}/results&user_token=job25&timestamps=true"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var fs = require('fs');

var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'content_type': 'audio/flac',
  audio: fs.createReadStream('audio-file.flac'),
  'callback_url': 'http://{user_callback_path}/results',
  'user_token': 'job25',
  timestamps: true
};

speech_to_text.createJob(params, function(error, job) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(job, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

RecognizeOptions recognizeOptions = new RecognizeOptions.Builder().contentType("audio/flac")
  .timestamps(true).build();

RecognitionJobOptions jobOptions = new RecognitionJobOptions.Builder().userToken("job25")
  .build();

RecognitionJob job = service.createRecognitionJob(new File("audio-file.flac"),
  recognizeOptions, jobOptions).execute();
System.out.println(job);

Response

Returns a Java RecognitionJob object that contains the information about the new job that is provided in a JSON RecognitionJob object.

RecognitionJob (Java RecognitionJob object)
Name Description
id string The identifier of the job.
status string The current status of the job, which is waiting when the job is initially created. Other possible statuses are processing, completed, and failed.
created string The date and time in Coordinated Universal Time (UTC) at which the job was created. The value is provided in full ISO 8601 format (YYYY-MM-DDThh:mm:ss.sTZD).
url string The URL to use to request information about the job with the GET recognitions/{id} method.
warnings string[ ] An array of warning messages about invalid parameters included with the request. Each warning includes a descriptive message and a list of invalid argument strings, for example, "unexpected query parameter 'user_token', query parameter 'callback_url' was not specified". The request succeeds despite the warnings.

Response codes

Status Description
201 Created The job was successfully created.
400 Bad Request The request failed because of a user input error. For example, the request passed audio that does not match the indicated format, specified a callback URL that has not been white-listed, specified a custom language or custom acoustic model that is not in the available state, or specified both the recognitions.completed and recognitions.completed_with_results events. Specific messages include
  • Model {model} not found
  • Requested model is not available
  • This 8000hz audio input requires a narrow band model. See /v1/models for a list of available models.
  • speaker_labels is not a supported feature for model {model}
503 Service Unavailable The service is currently unavailable.

Exceptions thrown

Exception Description
BadRequestException The request failed because of a user input error. For example, the request passed audio that does not match the indicated format, specified a callback URL that has not been white-listed, specified a custom language or custom acoustic model that is not in the available state, or specified both the recognitions.completed and recognitions.completed_with_results events. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
ServiceUnavailableException The service is currently unavailable. (HTTP response code 503.)

Example response


{
  "id": "4bd734c0-e575-21f3-de03-f932aa0468a0",
  "status": "waiting",
  "created": "2016-08-17T19:15:17.926Z",
  "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/recognitions/4bd734c0-e575-21f3-de03-f932aa0468a0"
}

Check jobs

Returns the status and ID of the latest 100 outstanding jobs associated with the service credentials with which it is called. The method also returns the creation and update times of each job, and, if a job was created with a callback URL and a user token, the user token for the job. To obtain the results for a job whose status is completed or not one of the latest 100 outstanding jobs, use the Check a job method. A job and its results remain available until you delete them with the Delete a job method or until the job's time to live expires, whichever comes first.


GET /v1/recognitions

checkJobs(params, callback())

ServiceCall<List<RecognitionJob>> getRecognitionJobs()

Request

No arguments.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/recognitions"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

speech_to_text.checkJobs(null, function(error, jobs) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(jobs, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

List<RecognitionJob> jobs = service.getRecognitionJobs().execute();
System.out.println(jobs);

Response

RecognitionJobs
Name Description
recognitions object[ ] An array of RecognitionJob objects that provides the status for each of the user's current jobs. The array is empty if the user has no current jobs.

Returns a List of Java RecognitionJob objects. Each object provides the same information as a JSON RecognitionJob object. The list is empty if the user has no outstanding jobs.

RecognitionJob (Java RecognitionJob object)
Name Description
id string The identifier of the job.
status string The current status of the job:
  • waiting: The service is preparing the job for processing. The service returns this status when the job is initially created or when it is waiting for capacity to process the job. The job remains in this state until the service has the capacity to begin processing it.
  • processing: The service is actively processing the job.
  • completed: The service has finished processing the job. If the job specified a callback URL and the event recognitions.completed_with_results, the service sent the results with the callback notification. Otherwise, use the GET recognitions/{id} method to retrieve the results.
  • failed: The job failed.
created string The date and time in Coordinated Universal Time (UTC) at which the job was created. The value is provided in full ISO 8601 format (YYYY-MM-DDThh:mm:ss.sTZD).
updated string The date and time in Coordinated Universal Time (UTC) at which the job was last updated by the service. The value is provided in full ISO 8601 format (YYYY-MM-DDThh:mm:ss.sTZD).
user_token string The user token associated with the job, if the job was created with a callback URL and a user token.

Response codes

Status Description
200 OK The request succeeded.
503 Service Unavailable The service is currently unavailable.

Exceptions thrown

Exception Description
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
ServiceUnavailableException The service is currently unavailable. (HTTP response code 503.)

Example response


{
  "recognitions": [
    {
      "id": "4bd734c0-e575-21f3-de03-f932aa0468a0",
      "created": "2016-08-17T19:15:17.926Z",
      "updated": "2016-08-17T19:15:17.926Z",
      "status": "waiting",
      "user_token": "job25"
    },
    {
      "id": "4bb1dca0-f6b1-11e5-80bc-71fb7b058b20",
      "created": "2016-08-17T19:13:23.622Z",
      "updated": "2016-08-17T19:13:24.434Z",
      "status": "processing"
    },
    {
      "id": "398fcd80-330a-22ba-93ce-1a73f454dd98",
      "created": "2016-08-17T19:11:04.298Z",
      "updated": "2016-08-17T19:11:16.003Z",
      "status": "completed"
    }
  ]
}

[
  {
    "id": "4bd734c0-e575-21f3-de03-f932aa0468a0",
    "created": "2016-08-17T19:15:17.926Z",
    "updated": "2016-08-17T19:15:17.926Z",
    "status": "waiting",
    "user_token": "job25"
  },
  {
    "id": "4bb1dca0-f6b1-11e5-80bc-71fb7b058b20",
    "created": "2016-08-17T19:13:23.622Z",
    "updated": "2016-08-17T19:13:24.434Z",
    "status": "processing"
  },
  {
    "id": "398fcd80-330a-22ba-93ce-1a73f454dd98",
    "created": "2016-08-17T19:11:04.298Z",
    "updated": "2016-08-17T19:11:16.003Z",
    "status": "completed"
  }
]

Check a job

Returns information about a specified job. The response always includes the status of the job and its creation and update times. If the status is completed, the response also includes the results of the recognition request. You must submit the request with the service credentials of the user who created the job.

You can use the method to retrieve the results of any job, regardless of whether it was submitted with a callback URL and the recognitions.completed_with_results event, and you can retrieve the results multiple times for as long as they remain available. Use the Check jobs method to request information about the most recent jobs associated with the calling user.


GET /v1/recognitions/{id}

checkJob(params, callback())

ServiceCall<RecognitionJob> getRecognitionJob(String id)

Request

Parameter Description
id path string The identifier of the job whose status is to be checked.
Parameter Description
id string The identifier of the job whose status is to be checked.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/recognitions/{id}"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

params = {
  id: '{job_id}'
};

speech_to_text.checkJob(params, function(error, job) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(job, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

RecognitionJob job = service.getRecognitionJob({job_id}).execute();
System.out.println(job);

Response

Returns a single Java RecognitionJob object for the specified job. The information is the same as that described for the JSON RecognitionJob object.

RecognitionJob (Java RecognitionJob object)
Name Description
id string The identifier of the job.
status string The current status of the job:
  • waiting: The service is preparing the job for processing. The service also returns this status when the job is initially created or when it is waiting for capacity to process the job. The job remains in this state until the service has the capacity to process it.
  • processing: The service is actively processing the job.
  • completed: The service has finished processing the job. If the job specified a callback URL and the event recognitions.completed_with_results, the service sent the results with the callback notification. Otherwise, use the GET recognitions/{id} method to retrieve the results.
  • failed: The job failed.
created string The date and time in Coordinated Universal Time (UTC) at which the job was created. The value is provided in full ISO 8601 format (YYYY-MM-DDThh:mm:ss.sTZD).
updated string The date and time in Coordinated Universal Time (UTC) at which the job was last updated by the service. The value is provided in full ISO 8601 format (YYYY-MM-DDThh:mm:ss.sTZD).
results object[ ] If the status is completed, the results of the recognition request as an array that includes one or more SpeechRecognitionResults objects depending on the input. a List that contains one or more Java SpeechResults objects depending on the input. Each object has the same information as a SpeechRecognitionResults object.

Response codes

Status Description
200 OK The request succeeded.
404 Not Found The specified job id was not found.
503 Service Unavailable The service is currently unavailable.

Exceptions thrown

Exception Description
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
NotFoundException The specified job id was not found. (HTTP response code 404.)
ServiceUnavailableException The service is currently unavailable. (HTTP response code 503.)

Example response


{
  "id": "4bd734c0-e575-21f3-de03-f932aa0468a0",
  "results": [
    {
      "result_index": 0,
      "results": [
        {
          "final": true,
          "alternatives": [
            {
              "transcript": "several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday ",
              "timestamps": [
                [
                  "several",
                  1,
                  1.52
                ],
                [
                  "tornadoes",
                  1.52,
                  2.15
                ],
                . . .
                [
                  "Sunday",
                  5.74,
                  6.33
                ]
              ],
              "confidence": 0.885
            }
          ]
        }
      ]
    }
  ],
  "created": "2016-08-17T19:11:04.298Z",
  "updated": "2016-08-17T19:11:16.003Z",
  "status": "completed"
}

Delete a job

Deletes the specified job. You cannot delete a job that the service is actively processing. Once you delete a job, its results are no longer available. The service automatically deletes a job and its results when the time to live for the results expires. You must submit the request with the service credentials of the user who created the job.


DELETE /v1/recognitions/{id}

deleteJob(params, callback())

ServiceCall<Void> deleteRecognitionJob(String id)

Request

Parameter Description
id path string The identifier of the job that is to be deleted.
Parameter Description
id string The identifier of the job that is to be deleted.

Example request


curl -X DELETE -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/recognitions/{id}"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

params = {
  id: '{job_id}'
};

speech_to_text.deleteJob(params, function(error) {
  if (error)
    console.log('Error:', error);
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

service.deleteRecognitionJob({job_id}).execute();

Response

No response body.

Response codes

Status Description
204 No Content The job was successfully deleted.
400 Bad Request The service cannot delete a job that it is actively processing:
  • Unable to delete the processing job
404 Not Found The specified job id was not found.
503 Service Unavailable The service is currently unavailable.

Exceptions thrown

Exception Description
BadRequestException The service cannot delete a job that it is actively processing. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
NotFoundException The specified job id was not found. (HTTP response code 404.)
ServiceUnavailableException The service is currently unavailable. (HTTP response code 503.)

Custom language models

Create a custom language model

Creates a new custom language model for a specified base model. The custom language model can be used only with the base model for which it is created. The model is owned by the instance of the service whose credentials are used to create it.


POST /v1/customizations

createLanguageModel(params, callback)

ServiceCall<Customization> createCustomization(String name, SpeechModel baseModel,
  String description)

Request

Parameter Description
Content-Type header string The type of the input, application/json.
create_language_model body object A JSON CreateLanguageModel object that provides basic information about the new custom language model.
CreateLanguageModel
Parameter Description
name string A user-defined name for the new custom language model. Use a name that is unique among all custom language models that you own. Use a localized name that matches the language of the custom model. Use a name that describes the domain of the custom model, such as Medical custom model or Legal custom model.
base_model_name string The name of the language model that is to be customized by the new custom language model. The new custom model can be used only with the base model that it customizes. To determine whether a base model supports custom language models, use the Get models method and check that the attribute custom_language_model is set to true. You can also refer to Language support for customization.
dialect string The dialect of the specified language that is to be used with the custom language model. The parameter is meaningful only for Spanish models, for which the service creates a custom language model that is suited for speech in one of the following dialects:
  • es-ES for Castilian Spanish (the default)
  • es-LA for Latin American Spanish
  • es-US for North American (Mexican) Spanish
A specified dialect must be valid for the base model. By default, the dialect matches the language of the base model; for example, en-US for either of the US English language models.
description string A description of the new custom language model. Use a localized description that matches the language of the custom model.
Parameter Description
name string The name of the new custom language model. Use a name that is unique among all custom language models that you own. Use a localized name that matches the language of the custom model.
base_model_name string The name of the language model that is to be customized by the new custom language model. The new custom model can be used only with the base model that it customizes. To determine whether a base model supports custom language models, use the Get models method and check that the attribute custom_language_model is set to true. You can also refer to Language support for customization.
baseModel object A Java SpeechModel object that identifies the base model that is to be customized by the new model. The new custom model can be used only with the base model that it customizes. To determine whether a base model supports custom language models, use the Get models method and check that the attribute custom_language_model is set to true. You can also refer to Language support for customization.
content_type string The type of the input, application/json.
dialect string The dialect of the specified language that is to be used with the custom language model. The parameter is meaningful only for Spanish models, for which the service creates a custom language model that is suited for speech in one of the following dialects:
  • es-ES for Castilian Spanish (the default)
  • es-LA for Latin American Spanish
  • es-US for North American (Mexican) Spanish
A specified dialect must be valid for the base model. By default, the dialect matches the language of the base model; for example, en-US for either of the US English language models.
description string A description of the new custom language model. Use a localized description that matches the language of the custom model.

Example request


curl -X POST -u "{username}":"{password}"
--header "Content-Type: application/json"
--data "{\"name\": \"Example model\",
  \"base_model_name\": \"en-US_BroadbandModel\",
  \"description\": \"Example custom language model\"}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  name: 'Example model',
  'base_model_name': 'en-US_BroadbandModel',
  description: 'Example custom language model',
  'content_type': 'application/json'
};

speech_to_text.createLanguageModel(params, function(error, customization) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(customization, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

Customization customization = service.createCustomization("Example model",
  SpeechModel.EN_US_BROADBANDMODEL, "Example custom language model").execute();
System.out.println(customization);

Response

LanguageModel (Java Customization object)
Name Description
customization_id string The customization ID (GUID) of the new custom language model.

Response codes

Status Description
201 Created The custom language model was successfully created.
400 Bad Request A required parameter is null or invalid. Specific failure messages include:
  • Required parameter '{name}' is missing
  • Required parameter '{name}' cannot be empty string
  • Required parameter '{name}' cannot be null
  • The base model '{name}' is not recognized
  • Invalid dialect value '{dialect}' specified for language '{language}'
  • Customization is not supported for base model '{name}'
401 Unauthorized The specified service credentials are invalid.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Exceptions thrown

Exception Description
BadRequestException A required parameter is null or invalid. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
InternalServerErrorException The service experienced an internal error. (HTTP response code 500.)

Example response


{
  "customization_id": "74f4807e-b5ff-4866-824e-6bba1a84fe96",
}

List custom language models

Lists information about all custom language models that are owned by an instance of the service. Use the language parameter to see all custom language models for the specified language; omit the parameter to see all custom language models for all languages. You must use credentials for the instance of the service that owns a model to list information about it.


GET /v1/customizations

listLanguageModels(params, callback)

ServiceCall<List<Customization>> getCustomizations(String language)

Request

Parameter Description
language query string The identifier of the language for which custom language models are to be returned (for example, en-US). Omit the parameter to see all custom language models owned by the requesting service credentials.
Parameter Description
language string The identifier of the language for which custom language models are to be returned (for example, en-US). Omit the parameter Pass a value of null to see all custom language models owned by the requesting service credentials.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

speech_to_text.listLanguageModels(null, function(error, customizations) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(customizations, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

List<Customization> customizations = service.getCustomizations(null).execute();
System.out.println(customizations);

Response

LanguageModels
Name Description
customizations object[ ] An array of LanguageModel objects that provides information about each available custom language model. The array is empty if the requesting service credentials own no custom language models (if no language is specified) or own no custom language models for the specified language.

Returns a List of Java Customization objects. Each object provides the same information as a JSON LanguageModel object. The list is empty if the requesting service credentials own no custom language models.

LanguageModel (Java Customization object)
Name Description
customization_id string The customization ID (GUID) of the custom language model.
created string The date and time in Coordinated Universal Time (UTC) at which the custom language model was created. The value is provided in full ISO 8601 format (YYYY-MM-DDThh:mm:ss.sTZD).
language string The language identifier of the custom language model (for example, en-US).
dialect string The dialect of the language for the custom language model. By default, the dialect matches the language of the base model; for example, en-US for either of the US English models. For Spanish models, the field indicates the dialect for which the model was created:
  • es-ES for Castilian Spanish (the default)
  • es-LA for Latin American Spanish
  • es-US for North American (Mexican) Spanish
versions string[ ] A list of the available versions of the custom language model. Each element of the array indicates a version of the base model with which the custom model can be used. Multiple versions exist only if the custom model has been upgraded; otherwise, only a single version is shown.
owner string The GUID of the service credentials for the instance of the service that owns the custom language model.
name string The name of the custom language model.
description string The description of the custom language model.
base_model_name string The name of the language model for which the custom language model was created.
status string The current status of the custom language model:
  • pending indicates that the model was created but is waiting either for training data to be added or for the service to finish analyzing added data.
  • ready indicates that the model contains data and is ready to be trained.
  • training indicates that the model is currently being trained.
  • available indicates that the model is trained and ready to use.
  • upgrading indicates that the model is currently being upgraded.
  • failed indicates that training of the model failed.
progress integer A percentage that indicates the progress of the custom language model's current training. A value of 100 means that the model is fully trained.
Note: The progress field does not currently reflect the progress of the training. The field changes from 0 to 100 when training is complete.
warnings string If the request included unknown query parameters, the following message:
  • Unexpected query parameter(s) [parameters] detected
where parameters is a list that includes a quoted string for each unknown parameter.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request A required parameter is null or invalid. Specific failure messages include:
  • Language '{language}' is not supported for customization
401 Unauthorized The specified service credentials are invalid.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Exceptions thrown

Exception Description
BadRequestException The specified language is not supported. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
InternalServerErrorException The service experienced an internal error. (HTTP response code 500.)

Example response


{
  "customizations": [
    {
      "customization_id": "74f4807e-b5ff-4866-824e-6bba1a84fe96",
      "created": "2016-06-01T18:42:25.324Z",
      "language": "en-US",
      "dialect": "en-US",
      "versions": [
        "en-US_BroadbandModel.v07-06082016.06202016",
        "en-US_BroadbandModel.v2017-11-15"
      ],
      "owner": "297cfd08-330a-22ba-93ce-1a73f454dd98",
      "name": "Example model",
      "description": "Example custom language model",
      "base_model_name": "en-US_BroadbandModel",
      "status": "pending",
      "progress": 0
    },
    {
      "customization_id": "8391f918-3b76-e109-763c-b7732fae4829",
      "created": "2016-06-01T18:51:37.291Z",
      "language": "en-US",
      "dialect": "en-US",
      "versions": [
        "en-US_BroadbandModel.v2017-11-15"
      ],
      "owner": "297cfd08-330a-22ba-93ce-1a73f454dd98",
      "name": "Example model two",
      "description": "Example custom language model two",
      "base_model_name": "en-US_BroadbandModel",
      "status": "available",
      "progress": 100
    }
  ]
}

[
  {
    "customization_id": "74f4807e-b5ff-4866-824e-6bba1a84fe96",
    "created": "2016-06-01T18:42:25.324Z",
    "language": "en-US",
    "owner": "297cfd08-330a-22ba-93ce-1a73f454dd98",
    "name": "Example model",
    "description": "Example custom language model",
    "base_model_name": "en-US_BroadbandModel",
    "status": "pending",
    "progress": 0
  },
  {
    "customization_id": "8391f918-3b76-e109-763c-b7732fae4829",
    "created": "2016-06-01T18:51:37.291Z",
    "language": "en-US",
    "owner": "297cfd08-330a-22ba-93ce-1a73f454dd98",
    "name": "Example model two",
    "description": "Example custom language model two",
    "base_model_name": "en-US_NarrowbandModel",
    "status": "available",
    "progress": 100
  }
]

List a custom language model

Lists information about a specified custom language model. You must use credentials for the instance of the service that owns a model to list information about it.


GET /v1/customizations/{customization_id}

getLanguageModel(params, callback)

ServiceCall<Customization> getCustomization(String customizationId)

Request

Parameter Description
customization_id path string The GUID of the custom language model about which information is to be returned. You must make the request with service credentials created for the instance of the service that owns the custom model.
Parameter Description
customization_id customizationId string The GUID of the custom language model for which information is to be returned. You must make the request with service credentials created for the instance of the service that owns the custom model.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}'
};

speech_to_text.getLanguageModel(params, function(error, customization) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(customization, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

Customization customization = service.getCustomization({customizationId}).execute();
System.out.println(customization);

Response

Returns a single instance of a LanguageModel object that provides information about the specified custom language model.

Returns a single Java Customization object for the specified custom language model. The information is the same as that described for the JSON LanguageModel object.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The specified customization ID is invalid:
  • Malformed GUID: '{customization_id}'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
500 Internal Server Error An internal error prevented the service from satisfying the request.

Exceptions thrown

Exception Description
BadRequestException The specified customization ID is invalid. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
InternalServerErrorException The service experienced an internal error. (HTTP response code 500.)

Example response


{
  "customization_id": "74f4807e-b5ff-4866-824e-6bba1a84fe96",
  "created": "2016-06-01T18:42:25.324Z",
  "language": "en-US",
  "dialect": "en-US",
  "versions": [
    "en-US_BroadbandModel.v07-06082016.06202016",
    "en-US_BroadbandModel.v2017-11-15"
  ],
  "owner": "297cfd08-330a-22ba-93ce-1a73f454dd98",
  "name": "Example model",
  "description": "Example custom language model",
  "base_model_name": "en-US_BroadbandModel",
  "status": "pending",
  "progress": 0
}

Train a custom language model

Initiates the training of a custom language model with new corpora, custom words, or both. After adding, modifying, or deleting corpora or words for a custom language model, use this method to begin the actual training of the model on the latest data. You can specify whether the custom language model is to be trained with all words from its words resource or only with words that were added or modified by the user. You must use credentials for the instance of the service that owns a model to train it.

The training method is asynchronous. It can take on the order of minutes to complete depending on the amount of data on which the service is being trained and the current load on the service. The method returns an HTTP 200 response code to indicate that the training process has begun. If the method succeeds, the training process has begun.

You can monitor the status of the training by using the List a custom language model method to poll the model's status. Use a loop to check the status every 10 seconds. The method returns a LanguageModel object that includes status and progress fields. A status of available means that the custom model is trained and ready to use. The service cannot accept subsequent training requests, or requests to add new corpora or words, until the existing request completes.

Training can fail to start for the following reasons:

  • The service is currently handling another request for the custom model, such as another training request or a request to add a corpus or words to the model.

  • No training data (corpora or words) have been added to the custom model.

  • One or more words that were added to the custom model have invalid sounds-like pronunciations that you must fix.


POST /v1/customizations/{customization_id}/train

trainLanguageModel(params, callback)

ServiceCall<Void> trainCustomization(String customizationId,
  Customization.WordTypeToAdd wordTypeToAdd)

Request

Parameter Description
customization_id path string The GUID of the custom language model that is to be trained. You must make the request with service credentials created for the instance of the service that owns the custom model.
word_type_to_add query string The type of words from the custom language model's words resource on which to train the model:
  • all (the default) trains the model on all new words, regardless of whether they were extracted from corpora or were added or modified by the user.
  • user trains the model only on new words that were added or modified by the user; the model is not trained on new words extracted from corpora.
customization_weight query double Specifies a customization weight for the custom language model. The customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for speech recognition. Specify a value between 0.0 and 1.0. The default value is 0.3.

The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.

The value that you assign is used for all recognition requests that use the model. You can override it for any recognition request by specifying a customization weight for that request.
Parameter Description
customization_id customizationId string The GUID of the custom language model that is to be trained. You must make the request with service credentials created for the instance of the service that owns the custom model.
word_type_to_add string The type of words from the custom language model's words resource on which to train the model:
  • all (the default) trains the model on all new words, regardless of whether they were extracted from corpora or were added or modified by the user.
  • user trains the model only on new words that were added or modified by the user; the model is not trained on new words extracted from corpora.
wordTypeToAdd Customization.WordTypeToAdd The type of words from the custom language model's words resource on which to train the model:
  • WordTypeToAdd.ALL (the default) trains the model on all new words, regardless of whether they were extracted from corpora or were added or modified by the user.
  • WordTypeToAdd.USER trains the model only on new words that were added or modified by the user; the model is not trained on new words extracted from corpora.
customization_weight double Specifies a customization weight for the custom language model. The customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for speech recognition. Specify a value between 0.0 and 1.0. The default value is 0.3.

The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.

The value that you assign is used for all recognition requests that use the model. You can override it for any recognition request by specifying a customization weight for that request.

Example request


curl -X POST -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/train"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}'
};

speech_to_text.trainLanguageModel(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(response, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

service.trainCustomization({customizationId}, WordTypeToAdd.ALL).execute();

Response

An empty response body: {}.

No response body.

Response codes

Status Description
200 OK Training of the custom language model started successfully.
400 Bad Request A required parameter is null or invalid, the custom model is not ready to be trained, or the total number of words or OOV words exceeds the maximum threshold. Specific failure messages include:
  • No input data available for running training
  • Total number of words {number} exceeds maximum allowed
  • Total number of OOV words {number} exceeds {maximum}
  • Malformed GUID: '{customization_id}'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization '{customization_id}' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Exceptions thrown

Exception Description
BadRequestException A required parameter is null or invalid, the custom model is not ready to be trained, or the total number of words or OOV words exceeds the maximum threshold. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
ConflictException The service is currently busy handling a previous request for the custom model. (HTTP response code 409.)
InternalServerErrorException The service experienced an internal error. (HTTP response code 500.)

Example response


{}

Reset a custom language model

Resets a custom language model by removing all corpora and words from the model. Resetting a custom language model initializes the model to its state when it was first created. Metadata such as the name and language of the model are preserved, but the model's words resource is removed and must be re-created. You must use credentials for the instance of the service that owns a model to reset it.


POST /v1/customizations/{customization_id}/reset

resetLanguageModel(params, callback)

ServiceCall<Void> resetCustomization(String customizationId)

Request

Parameter Description
customization_id path string The GUID of the custom language model that is to be reset. You must make the request with service credentials created for the instance of the service that owns the custom model.
Parameter Description
customization_id customizationId string The GUID of the custom language model that is to be reset. You must make the request with service credentials created for the instance of the service that owns the custom model.

Example request


curl -X POST -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/reset"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}'
};

speech_to_text.resetLanguageModel(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(response, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

service.resetCustomization({customizationId}).execute();

Response

An empty response body: {}.

No response body.

Response codes

Status Description
200 OK The custom language model was successfully reset.
400 Bad Request The specified customization ID is invalid:
  • Malformed GUID: '{customization_id}'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization '{customization_id}' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Exceptions thrown

Exception Description
BadRequestException The specified customization ID is invalid. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
ConflictException The service is currently busy handling a previous request for the custom model. (HTTP response code 409.)
InternalServerErrorException The service experienced an internal error. (HTTP response code 500.)

Example response


{}

Upgrade a custom language model

Initiates the upgrade of a custom language model to the latest version of its base language model. The upgrade method is asynchronous. It can take on the order of minutes to complete depending on the amount of data in the custom model and the current load on the service. A custom model must be in the ready or available state to be upgraded. You must use credentials for the instance of the service that owns a model to upgrade it.

The method returns an HTTP 200 response code to indicate that the upgrade process has begun successfully. You can monitor the status of the upgrade by using the List a custom language model method to poll the model's status. Use a loop to check the status every 10 seconds. While it is being upgraded, the custom model has the status upgrading. When the upgrade is complete, the model resumes the status that it had prior to upgrade. The service cannot accept subsequent requests for the model until the upgrade completes.

For more information, see Upgrading custom models.


POST /v1/customizations/{customization_id}/upgrade_model

upgradeLanguageModel(params, callback)

ServiceCall<Void> upgradeCustomization(String customizationId)

Request

Parameter Description
customization_id customizationId path string The GUID of the custom language model that is to be upgraded. You must make the request with service credentials created for the instance of the service that owns the custom model.

Example request


curl -X POST -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/upgrade_model"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}'
};

speech_to_text.upgradeLanguageModel(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(response, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

service.upgradeCustomization({customizationId}).execute();

Response

An empty response body: {}.

No response body.

Response codes

Status Description
200 OK Upgrade of the custom language model has started successfully.
400 Bad Request The specified customization ID is invalid, or the specified model cannot be upgraded:
  • Malformed GUID: '{customization_id}'
  • Custom model is up-to-date
  • No input data available to upgrade the model
  • Cannot upgrade failed custom model
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization '{customization_id}' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Exceptions thrown

Exception Description
BadRequestException The specified customization ID is invalid or the specified custom model cannot be upgraded. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
ConflictException The service is currently busy handling a previous request for the custom model. (HTTP response code 409.)
InternalServerErrorException The service experienced an internal error. (HTTP response code 500.)

Example response


{}

Delete a custom language model

Deletes an existing custom language model. The custom model cannot be deleted if another request, such as adding a corpus to the model, is currently being processed. You must use credentials for the instance of the service that owns a model to delete it.


DELETE /v1/customizations/{customization_id}

deleteLanguageModel(params, callback)

ServiceCall<Void> deleteCustomization(String customizationId)

Request

Parameter Description
customization_id path string The GUID of the custom language model that is to be deleted. You must make the request with service credentials created for the instance of the service that owns the custom model.
Parameter Description
customization_id customizationId string The GUID of the custom language model that is to be deleted. You must make the request with service credentials created for the instance of the service that owns the custom model.

Example request


curl -X DELETE -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}'
};

speech_to_text.deleteLanguageModel(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(response, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

service.deleteCustomization({customizationId}).execute();

Response

An empty response body: {}.

No response body.

Response codes

Status Description
200 OK The custom language model was successfully deleted.
400 Bad Request The specified customization ID is invalid:
  • Malformed GUID: '{customization_id}'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials, including the case where the custom model does not exist:
  • Invalid customization_id '{customization_id}' for user
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization '{customization_id}' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Exceptions thrown

Exception Description
BadRequestException The specified customization ID is invalid. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
ConflictException The service is currently busy handling a previous request for the custom model. (HTTP response code 409.)
InternalServerErrorException The service experienced an internal error. (HTTP response code 500.)

Example response


{}

Custom corpora

Add a corpus

Adds a single corpus text file of new training data to a custom language model. Use multiple requests to submit multiple corpus text files. You must use credentials for the instance of the service that owns a model to add a corpus to it. Note that adding a corpus does not affect the custom language model until you train the model for the new data by using the Train a custom language model method.

Submit a plain text file that contains sample sentences from the domain of interest to enable the service to extract words in context. The more sentences you add that represent the context in which speakers use words from the domain, the better the service's recognition accuracy. For guidelines about adding a corpus text file and for information about how the service parses a corpus file, see Preparing a corpus text file.

The call returns an HTTP 201 response code if the corpus is valid. The call succeeds if the corpus is valid. The service then asynchronously processes the contents of the corpus and automatically extracts new words that it finds. This can take on the order of a minute or two to complete depending on the total number of words and the number of new words in the corpus, as well as the current load on the service. You cannot submit requests to add additional corpora or words to the custom model, or to train the model, until the service's analysis of the corpus for the current request completes. Use the List a corpus method to monitor the status of the analysis.

The service auto-populates the model's words resource with any word that is not found in its base vocabulary; these are referred to as out-of-vocabulary (OOV) words. You can use the List custom words method to examine the words resource. If necessary, you can use the Add custom words or Add a custom word method to correct problems, eliminate typographical errors, and modify how words are pronounced.

To add a corpus file that has the same name as an existing corpus, set the allow_overwrite allowOverwrite parameter to true; otherwise, the request fails. Overwriting an existing corpus causes the service to process the corpus text file and extract OOV words anew. Before doing so, it removes any OOV words associated with the existing corpus from the model's words resource unless they were also added by another corpus or they have been modified in some way by the user.

The service limits the overall amount of data that you can add to a custom model to a maximum of 10 million total words from all corpora combined. Also, you can add no more than 30 thousand custom (OOV) words to a model; this includes words that the service extracts from corpora and words that you add directly.


POST /v1/customizations/{customization_id}/corpora/{corpus_name}

addCorpus(params, callback)

ServiceCall<Void> addCorpus(String customizationId, String corpusName,
  File corpusFile, Boolean allowOverwrite)
ServiceCall<Void> addTextToCustomizationCorpus(String customizationId, String corpusName,
  Boolean allowOverwrite, File trainingData) DEPRECATED

Request

Parameter Description
customization_id path string The GUID of the custom language model to which a corpus is to be added. You must make the request with service credentials created for the instance of the service that owns the custom model.
corpus_name path string The name of the corpus that is to be added to the custom language model. The name cannot contain spaces and cannot be the string user, which is reserved by the service to denote custom words added or modified by the user. Use a localized name that matches the language of the custom model.
corpus_file body file A plain text file that contains the training data for the corpus. Encode the file in UTF-8 if it contains non-ASCII characters; the service assumes UTF-8 encoding if it encounters non-ASCII characters. With cURL, use the --data-binary option to upload the file for the request.
allow_overwrite query boolean Indicates whether the specified corpus is to overwrite an existing corpus with the same name. If a corpus with the same name already exists, the request fails unless allow_overwrite is set to true; by default, the parameter is false. The parameter has no effect if a corpus with the same name does not already exist.
Parameter Description
customization_id string The GUID of the custom language model to which a corpus is to be added. You must make the request with service credentials created for the instance of the service that owns the custom model.
corpus_name string The name of the corpus that is to be added. The name cannot contain spaces and cannot be the string user, which is reserved by the service to denote custom words added or modified by the user. Use a localized name that matches the language of the custom model.
corpus_file file A plain text file that contains the training data for the corpus. Provide the text as a string, a buffer, or as a readable stream; a readable stream is recommended when reading a file from disk. Encode the file in UTF-8 if it contains non-ASCII characters; the service assumes UTF-8 encoding if it encounters non-ASCII characters.
allow_overwrite boolean Indicates whether the specified corpus is to overwrite an existing corpus with the same name. If a corpus with the same name already exists, the request fails unless allow_overwrite is set to true; by default, the parameter is false. The parameter has no effect if a corpus with the same name does not already exist.
Parameter Description
customizationId string The GUID of the custom language model to which a corpus is to be added. You must make the request with service credentials created for the instance of the service that owns the custom model.
corpusName string The name of the corpus that is to be added. The name cannot contain spaces and cannot be the string user, which is reserved by the service to denote custom words added or modified by the user. Use a localized name that matches the language of the custom model.
corpusFile
trainingData File
A plain text file that contains the training data for the corpus. Provide the text as a string, a buffer, or as a readable stream; a readable stream is recommended when reading a file from disk. Encode the file in UTF-8 if it contains non-ASCII characters; the service assumes UTF-8 encoding if it encounters non-ASCII characters.
allowOverwrite boolean Indicates whether the specified corpus is to overwrite an existing corpus with the same name. If a corpus with the same name already exists, the request fails unless allowOverwrite is set to true; by default, the parameter is false. The parameter has no effect if a corpus with the same name does not already exist.

Example request


curl -X POST -u "{username}":"{password}"
--data-binary "@MyCorpus.txt"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/corpora/MyCorpus"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var fs = require('fs');

var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}',
  'corpus_name': 'MyCorpus',
  'corpus_file': fs.createReadStream('MyCorpus.txt')
};

speech_to_text.addCorpus(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(response, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

service.addCorpus({customizationId}, "MyCorpus", new File("MyCorpus.txt"), false)
  .execute();

Response

An empty response body: {}.

No response body.

Response codes

Status Description
201 Created Addition of the corpus data was successfully started. The service is analyzing the data.
400 Bad Request A required parameter is null or invalid, or the specified corpus already exists. Specific failure messages include:
  • Malformed GUID: '{customization_id}'
  • Corpus file not specified or empty
  • Corpus '{name}' already exists - change its name, remove existing file before adding new one, or overwrite existing file by setting 'allow_overwrite' to 'true'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization '{customization_id}' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request. You can also receive status code 500 Forwarding Error if the service is currently busy handling a previous request for the custom model.

Exceptions thrown

Exception Description
BadRequestException A required parameter is null or invalid, or the specified corpus already exists and allowOverwrite is not set to true. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
ConflictException The service is currently busy handling a previous request for the custom model. (HTTP response code 409.)
InternalServerErrorException The service experienced an internal error. You can also receive this exception for a Forwarding Error if the service is currently busy handling a previous request for the custom model. (HTTP response code 500.)

Example response


{}

List corpora

Lists information about all corpora from a custom language model. The information includes the total number of words and out-of-vocabulary (OOV) words, name, and status of each corpus. You must use credentials for the instance of the service that owns a model to list its corpora.


GET /v1/customizations/{customization_id}/corpora

listCorpora(params, callback)

ServiceCall<List<Corpus>> getCorpora(String customizationId)

Request

Parameter Description
customization_id path string The GUID of the custom language model for which corpora are to be listed. You must make the request with service credentials created for the instance of the service that owns the custom model.
Parameter Description
customization_id customizationId string The GUID of the custom language model for which corpora are to be listed. You must make the request with service credentials created for the instance of the service that owns the custom model.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/corpora"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}'
};

speech_to_text.listCorpora(params, function(error, corpora) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(corpora, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

List<Corpus> corpora = service.getCorpora({customizationId}).execute();
System.out.println(corpora);

Response

Corpora
Name Description
corpora object[ ] An array of Corpus objects that provides information about corpora of the custom model. The array is empty if the custom model has no corpora.

Returns a List of Java Corpus objects. Each object provides the same information as a JSON Corpus object. The list is empty if the custom model contains no corpora.

Corpus (Java Corpus object)
Name Description
name string The name of the corpus.
out_of_vocabulary_words integer The number of OOV words in the corpus. The value is 0 while the corpus is being processed.
total_words integer The total number of words in the corpus. The value is 0 while the corpus is being processed.
status string The status of the corpus:
  • analyzed indicates that the service has successfully analyzed the corpus. The custom model can be trained with data from the corpus.
  • being_processed indicates that the service is still analyzing the corpus. The service cannot accept requests to add new corpora or words, or to train the custom model.
  • undetermined indicates that the service encountered an error while processing the corpus.
error string If the status of the corpus is undetermined, the following message:
  • Analysis of corpus 'name' failed. Please try adding the corpus again by setting the 'allow_overwrite' flag to 'true'.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The specified customization ID is invalid:
  • Malformed GUID: '{customization_id}'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
500 Internal Server Error An internal error prevented the service from satisfying the request.

Exceptions thrown

Exception Description
BadRequestException The specified customization ID is invalid. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
InternalServerErrorException The service experienced an internal error. (HTTP response code 500.)

Example response


{
  "corpora": [
    {
      "name": "corpus1",
      "out_of_vocabulary_words": 191,
      "total_words": 5037,
      "status": "analyzed"
    },
    {
      "name": "corpus2",
      "out_of_vocabulary_words": 0,
      "total_words": 0,
      "status": "being_processed"
    },
    {
      "name": "corpus3",
      "out_of_vocabulary_words": 0,
      "total_words": 0,
      "status": "undetermined",
      "error": "Analysis of corpus 'corpus3.txt' failed. Please try adding the corpus again by setting the 'allow_overwrite' flag to 'true'."
    }
  ]
}

[
  {
    "name": "corpus1",
    "out_of_vocabulary_words": 191,
    "status": "analyzed",
    "total_words": 5037
  },
  {
    "name": "corpus2",
    "out_of_vocabulary_words": 0,
    "status": "being_processed",
    "total_words": 0
  },
  {
    "name": "corpus3",
    "out_of_vocabulary_words": 0,
    "status": "undetermined",
    "total_words": 0,
    "error": "Analysis of corpus 'corpus3.txt' failed. Please try adding the corpus again by setting the 'allow_overwrite' flag to 'true'."
  }
]

List a corpus

Lists information about a corpus from a custom language model. The information includes the total number of words and out-of-vocabulary (OOV) words, name, and status of the corpus. You must use credentials for the instance of the service that owns a model to list its corpora.


GET /v1/customizations/{customization_id}/corpora/{corpus_name}

getCorpus(params, callback)

ServiceCall<Corpus> getCorpus(String customizationId, String corpusName)

Request

Parameter Description
customization_id path string The GUID of the custom language model for which a corpus is be listed. You must make the request with service credentials created for the instance of the service that owns the custom model.
corpus_name path string The name of the corpus about which information is to be listed.
Parameter Description
customization_id customizationId string The GUID of the custom language model for which a corpus is be listed. You must make the request with service credentials created for the instance of the service that owns the custom model.
corpus_name corpusName string The name of the corpus about which information is to be listed.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/corpora/MyCorpus"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}',
  'corpus_name': 'MyCorpus'
};

speech_to_text.getCorpus(params, function(error, corpus) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(corpus, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

Corpus corpus = service.getCorpus({customizationId}, "MyCorpus").execute();
System.out.println(corpus);

Response

Returns a single instance of a Corpus object that provides information about the specified corpus.

Returns a single Java Corpus object for the specified corpus. The information is the same as that described for the JSON Corpus object.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The specified customization ID or corpus name is invalid, including the case where the corpus does not exist for the custom model. Specific failure messages include:
  • Malformed GUID: '{customization_id}'
  • Invalid value for corpus name '{corpus_name}'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
500 Internal Server Error An internal error prevented the service from satisfying the request.

Exceptions thrown

Exception Description
BadRequestException The specified customization ID or corpus name is invalid. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
InternalServerErrorException The service experienced an internal error. (HTTP response code 500.)

Example response


{
  "name": "MyCorpus",
  "out_of_vocabulary_words": 191,
  "total_words": 5037,
  "status": "analyzed"
}

Delete a corpus

Deletes an existing corpus from a custom language model. The service removes any out-of-vocabulary (OOV) words associated with the corpus from the custom model's words resource unless they were also added by another corpus or they have been modified in some way with the Add custom words or Add a custom word method. Removing a corpus does not affect the custom model until you train the model with the Train a custom language model method. You must use credentials for the instance of the service that owns a model to delete its corpora.


DELETE /v1/customizations/{customization_id}/corpora/{corpus_name}

deleteCorpus(params, callback)

ServiceCall<Void> deleteCorpus(String customizationId, String corpusName)

Request

Parameter Description
customization_id path string The GUID of the custom language model from which a corpus is to be deleted. You must make the request with service credentials created for the instance of the service that owns the custom model.
corpus_name path string The name of the corpus that is to be deleted from the custom language model.
Parameter Description
customization_id customizationId string The GUID of the custom language model from which a corpus is to be deleted. You must make the request with service credentials created for the instance of the service that owns the custom model.
corpus_name corpusName string The name of the corpus that is to be deleted from the custom language model.

Example request


curl -X DELETE -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/corpora/MyCorpus"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}',
  'corpus_name': 'MyCorpus'
};

speech_to_text.deleteCorpus(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(response, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

service.deleteCorpus({customizationId}, "MyCorpus").execute();

Response

An empty response body: {}.

No response body.

Response codes

Status Description
200 OK The corpus was successfully deleted from the custom language model.
400 Bad Request The specified customization ID or corpus name is invalid, including the case where the corpus does not exist for the custom model. Specific failure messages include:
  • Malformed GUID: '{customization_id}'
  • Invalid value for corpus name '{corpus_name}'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
405 Method Not Allowed No corpus name was specified with the request.
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization '{customization_id}' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Exceptions thrown

Exception Description
BadRequestException The specified customization ID or corpus name is invalid, including the case where the corpus does not exist for the custom model. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
ServiceResponseException No corpus name was specified with the request. (HTTP response code 405.)
ConflictException The service is currently busy handling a previous request for the custom model. (HTTP response code 409.)
InternalServerErrorException The service experienced an internal error. (HTTP response code 500.)

Example response


{}

Custom words

Add custom words

Adds one or more custom words to a custom language model. The service populates the words resource for a custom model with out-of-vocabulary (OOV) words found in each corpus added to the model. You can use this method to add additional words or to modify existing words in the words resource. The words resource for a model can contain a maximum of 30 thousand custom (OOV) words, including words that the service extracts from corpora and words that you add directly.

You must use credentials for the instance of the service that owns a model to add or modify custom words for the model. Adding or modifying custom words does not affect the custom model until you train the model for the new data by using the Train a custom language model method.

You add custom words by providing a CustomWords object, which is an array of CustomWord objects, one per word. You add custom words by providing a an array of CustomWord objects, one per word. You add custom words by providing a comma-separated list of Word objects, one per word. You must use the object's word parameter to identify the word that is to be added. You can also provide one or both of the following optional parameters for each word:

  • The display_as The displayAs parameter provides a different way of spelling the word in a transcript. Use the parameter when you want the word to appear different from its usual representation or from its spelling in corpora training data. For example, you might indicate that the word IBM(trademark) is to be displayed as IBM™. For more information, see Using the display_as field.

  • The sounds_like parameter provides an array The soundsLike parameter provides a comma-separated list of one or more pronunciations for the word. Use the parameter to specify how the word can be pronounced by users. Use the parameter for words that are difficult to pronounce, foreign words, acronyms, and so on. For example, you might specify that the word IEEE can sound like I. triple E.. You can specify a maximum of five sounds-like pronunciations for a word. For information about pronunciation rules, see Using the sounds_like field.

If you add a custom word that already exists in the words resource for the custom model, the new definition overwrites the existing data for the word. If the service encounters an error with the input data, it returns a failure code and does not add any of the words to the words resource.

The call returns an HTTP 201 response code if the input data is valid. The call succeeds if the input data is valid. The service then asynchronously processes the words to add them to the model's words resource. The time that it takes for the analysis to complete depends on the number of new words that you add but is generally faster than adding a corpus or training a model.

You can monitor the status of the request by using the List a custom language model method to poll the model's status. Use a loop to check the status every 10 seconds. The method returns a LanguageModel object that includes a status field. A status of ready means that the words have been added to the custom model. The service cannot accept requests to add new corpora or words or to train the model until the existing request completes.

You can use the List custom words or List a custom word method to review the words that you add. Words with an invalid sounds_like field include an error field that describes the problem. If necessary, you can use the Add custom words or Add a custom word method to correct problems, eliminate typographical errors, and modify how words are pronounced.


POST /v1/customizations/{customization_id}/words

addWords(params, callback)

ServiceCall<Void> addWords(String customizationId, Word... words)

Request

Parameter Description
customization_id path string The GUID of the custom language model to which words are to be added. You must make the request with service credentials created for the instance of the service that owns the custom model.
Content-Type header string The type of the input, application/json.
custom_words body object A CustomWords object that provides information about one or more custom words that are to be added to the custom language model.
Parameter Description
customization_id string The GUID of the custom language model to which words are to be added. You must make the request with service credentials created for the instance of the service that owns the custom model.
words object[ ] An array of CustomWord objects that provides information about each custom word that is to be added to the custom model.
content_type string The type of the input, application/json.
Parameter Description
customizationId string The GUID of the custom language model to which words are to be added. You must make the request with service credentials created for the instance of the service that owns the custom model.
words object... A comma-separated list of Java Word objects, each of which provides information about a custom word to be added.
CustomWords
Name Description
words object[ ] An array of CustomWord objects that provides information about each custom word that is to be added to the custom model.
CustomWord (Java Word object)
Name Description
word string The custom word that is to be added to the custom language model. Do not include spaces in the word. Use a - (dash) or _ (underscore) to connect the tokens of compound words.
display_as displayAs string An alternative spelling for the custom word when it appears in a transcript. Use the parameter when you want the word to have a spelling that is different from its usual representation or from its spelling in corpora training data.
sounds_like soundsLike string[ ] string... An array A comma-separated list of sounds-like pronunciations for the custom word. Specify how words that are difficult to pronounce, foreign words, acronyms, and so on can be pronounced by users.
  • For a word that is not in the service's base vocabulary, omit the parameter to have the service automatically generate a sounds-like pronunciation for the word.
  • For a word that is in the service's base vocabulary, use the parameter to specify additional pronunciations for the word. You cannot override the default pronunciation of a word; pronunciations you add augment the pronunciation from the base vocabulary.
A word can have at most five sounds-like pronunciations, and a pronunciation can include at most 40 characters not including spaces.
Constructors:
  • Word()
  • Word(String word)
  • Word(String word, String displayAs)
  • Word(String word, String displayAs, String... soundsLike)
You can create an empty Word object, but you must add at least the name of the custom word before adding it to a custom model.

Example request


curl -X POST -u "{username}":"{password}"
--header "Content-Type: application/json"
--data "{\"words\":
  [{\"word\": \"HHonors\", \"sounds_like\": [\"hilton honors\", \"H. honors\"], \"display_as\": \"HHonors\"},
  {\"word\": \"IEEE\", \"sounds_like\": [\"I. triple E.\"]}]}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/words"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}',
  words: [
    {word: 'HHonors', 'sounds_like': ['hilton honors', 'H. honors'], 'display_as': 'HHonors'},
    {word: 'IEEE', 'sounds_like': ['I. triple E.']}
  ],
  'content_type': 'application/json'
};

speech_to_text.addWords(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(response, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

service.addWords({customizationId},
  new Word("HHonors", "HHonors", "hilton honors", "H. honors"),
  new Word("IEEE", null, "I. triple E."))
.execute();

Response

An empty response body: {}.

No response body.

Response codes

Status Description
201 Created Addition of the custom words was successfully started. The service is analyzing the data.
400 Bad Request A required parameter is null or invalid, the JSON input is invalid, or the maximum number of sounds-like pronunciations for a word is exceeded. Specific failure messages include:
  • Malformed GUID: '{customization_id}'
  • Required property '{property}' is missing in JSON '{JSON}'
  • Word '{word}' contains invalid character character
  • Maximum number of sounds-like for a word exceeded
  • Maximum number of allowed phones of one item of sounds_like for word '{word}' exceeded
  • Malformed JSON: '{JSON}'
  • Wrong type of parameter '{parameter}' detected in the passed JSON
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization '{customization_id}' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Exceptions thrown

Exception Description
BadRequestException A required parameter is null or invalid, the JSON input is invalid, or the maximum number of sounds-like pronunciations for a word is exceeded. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
ConflictException The service is currently busy handling a previous request for the custom model. (HTTP response code 409.)
InternalServerErrorException The service experienced an internal error. (HTTP response code 500.)

Example response


{}

Add a custom word

Adds a custom word to a custom language model. The service populates the words resource for a custom model with out-of-vocabulary (OOV) words found in each corpus added to the model. You can use this method to add a word or to modify an existing word in the words resource. The words resource for a model can contain a maximum of 30 thousand custom (OOV) words, including words that the service extracts from corpora and words that you add directly.

You must use credentials for the instance of the service that owns a model to add or modify a custom word for the model. Adding or modifying a custom word does not affect the custom model until you train the model for the new data by using the Train a custom language model method.

Use the word_name path parameter to specify the custom word that is to be added or modified. Use the CustomWord object to provide one or both of the following optional parameters for the word:

Use the word_name parameter to specify the custom word that is to be added or modified. Use one or both of the following optional parameters to provide information about the word:

Specify a Word object that defines the new word. Use the object's word parameter to specify the custom word that is to be added or modified. Use one or both of the following optional parameters to provide information about the word:

  • The display_as The displayAs parameter provides a different way of spelling the word in a transcript. Use the parameter when you want the word to appear different from its usual representation or from its spelling in corpora training data. For example, you might indicate that the word IBM(trademark) is to be displayed as IBM™. For more information, see Using the display_as field.

  • The sounds_like parameter provides an array The soundsLike parameter provides a comma-separated list of one or more pronunciations for the word. Use the parameter to specify how the word can be pronounced by users. Use the parameter for words that are difficult to pronounce, foreign words, acronyms, and so on. For example, you might specify that the word IEEE can sound like I. triple E.. You can specify a maximum of five sounds-like pronunciations for a word. For information about pronunciation rules, see Using the sounds_like field.

If you add a custom word that already exists in the words resource for the custom model, the new definition overwrites the existing data for the word. If the service encounters an error, it does not add the word to the words resource. Use the List a custom word method to review the word that you add.


PUT /v1/customizations/{customization_id}/words/{word_name}

addWord(params, callback)

ServiceCall<Void> addWord(String customizationId, Word word)

Request

Parameter Description
customization_id path string The GUID of the custom language model to which a word is to be added. You must make the request with service credentials created for the instance of the service that owns the custom model.
word_name path string The custom word that is to be added to the custom language model. Do not include spaces in the word. Use a - (dash) or _ (underscore) to connect the tokens of compound words.
Content-Type header string The type of the input, application/json.
custom_word body object A CustomWord object that provides information about the custom word. Specify an empty JSON object to add a word with no sounds-like or display-as information.
CustomWord
Name Description
sounds_like string[ ] An array of sounds-like pronunciations for the custom word. Specify how words that are difficult to pronounce, foreign words, acronyms, and so on can be pronounced by users.
  • For a word that is not in the service's base vocabulary, omit the parameter to have the service automatically generate a sounds-like pronunciation for the word.
  • For a word that is in the service's base vocabulary, use the parameter to specify additional pronunciations for the word. You cannot override the default pronunciation of a word; pronunciations you add augment the pronunciation from the base vocabulary.
A word can have at most five sounds-like pronunciations, and a pronunciation can include at most 40 characters not including spaces.
display_as string An alternative spelling for the custom word when it appears in a transcript. Use the parameter when you want the word to have a spelling that is different from its usual representation or from its spelling in corpora training data.
Parameter Description
customization_id string The GUID of the custom language model to which a word is to be added. You must make the request with service credentials created for the instance of the service that owns the custom model.
word_name string The custom word that is to be added to the custom model. Do not include spaces in the word. Use a - (dash) or _ (underscore) to connect the tokens of compound words.
content_type string The type of the input, application/json.
sounds_like string[ ] An array of sounds-like pronunciations for the custom word. Specify how words that are difficult to pronounce, foreign words, acronyms, and so on can be pronounced by users.
  • For a word that is not in the service's base vocabulary, omit the parameter to have the service automatically generate a sounds-like pronunciation for the word.
  • For a word that is in the service's base vocabulary, use the parameter to specify additional pronunciations for the word. You cannot override the default pronunciation of a word; pronunciations you add augment the pronunciation from the base vocabulary.
A word can have at most five sounds-like pronunciations, and a pronunciation can include at most 40 characters not including spaces.
display_as string An alternative spelling for the custom word when it appears in a transcript. Use the parameter when you want the word to have a spelling that is different from its usual representation or from its spelling in corpora training data.
Parameter Description
customizationId string The GUID of the custom language model to which a word is to be added. You must make the request with service credentials created for the instance of the service that owns the custom model.
word object A Java Word object that provides information about the custom word to be added.

Example request


curl -X PUT -u "{username}":"{password}"
--header "Content-Type: application/json"
--data "{\"sounds_like\": [\"N. C. A. A.\", \"N. C. double A.\"], \"display_as\": \"NCAA\"}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/words/NCAA"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}',
  'word_name': 'NCAA',
  'sounds_like': ['N. C. A. A.', 'N. C. double A.'],
  'display_as': 'NCAA',
  'content_type': 'application/json'
};

speech_to_text.addWord(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(response, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

service.addWord({customizationId},
  new Word("NCAA", "NCAA", "N. C. A. A.", "N. C. double A."))
.execute();

Response

An empty response body: {}.

No response body.

Response codes

Status Description
201 Created The custom word was successfully added to the custom language model.
400 Bad Request The specified customization ID is invalid, or the maximum number of sounds-like pronunciations for a word is exceeded. Specific failure messages include:
  • Malformed GUID: '{customization_id}'
  • Maximum number of sounds-like for a word exceeded
  • Maximum number of allowed phones of one item of sounds_like for word '{word}' exceeded
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization '{customization_id}' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Exceptions thrown

Exception Description
BadRequestException The specified customization ID is invalid, or the maximum number of sounds-like pronunciations for a word is exceeded. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
ConflictException The service is currently busy handling a previous request for the custom model. (HTTP response code 409.)
InternalServerErrorException The service experienced an internal error. (HTTP response code 500.)

Example response


{}

List custom words

Lists information about custom words from a custom language model. You can list all words from the custom model's words resource, only custom words that were added or modified by the user, or only out-of-vocabulary (OOV) words that were extracted from corpora. You can also indicate the order in which the service is to return words; by default, words are listed in ascending alphabetical order. You must use credentials for the instance of the service that owns a model to query information about its words.


GET /v1/customizations/{customization_id}/words

listWords(params, callback)

ServiceCall<List<WordData>> getWords(String customizationId, Word.Type type)
ServiceCall<List<WordData>> getWords(String customizationId, Word.Type type, Word.Sort sort)

Request

Parameter Description
customization_id path string The GUID of the custom language model from which words are to be queried. You must make the request with service credentials created for the instance of the service that owns the custom model.
word_type query string The type of words to be listed from the custom language model's words resource:
  • all shows all words. This is the default if you omit the parameter.
  • user shows only custom words that were added or modified by the user.
  • corpora shows only OOV that were extracted from corpora.
sort query string The order in which the words are to be listed. The parameter accepts one of two arguments, alphabetical or count, to indicate how the words are to be sorted. You can prepend an optional + or - to an argument to indicate whether the results are to be sorted in ascending or descending order.
  • alphabetical and +alphabetical list the words in ascending alphabetical order. This is the default ordering if you omit the parameter.
  • -alphabetical lists the words in descending alphabetical order.
  • count and -count list the words in descending order by the values of their count fields.
  • +count lists the words in ascending order by the values of their count fields.
For alphabetical ordering, the lexicographical precedence is numeric values, uppercase letters, and lowercase letters. For count ordering, values with the same count are ordered alphabetically. With cURL, URL encode the + symbol as %2B.
Parameter Description
customization_id customizationId string The GUID of the custom language model from which words are to be queried. You must make the request with service credentials created for the instance of the service that owns the custom model.
word_type string The type of words to be listed from the custom language model's words resource:
  • all shows all words. This is the default if you omit the parameter.
  • user shows only custom words that were added or modified by the user.
  • corpora shows only OOV that were extracted from corpora.
type Word.Type The type of words to be listed from the custom language model's words resource. Specify one of the following constants:
  • Type.All shows all words. This is the default if you pass null for the parameter.
  • Type.USER shows only custom words that were added or modified by the user.
  • Type.CORPORA shows only OOV that were extracted from corpora.
sort string The order in which the words are to be listed. Specify one of the following strings:
  • +alphabetical lists the words in ascending alphabetical order. This is the default ordering if omit the parameter.
  • -alphabetical lists the words in descending alphabetical order.
  • -count lists the words in descending order by the values of their count fields.
  • +count lists the words in ascending order by the values of their count fields.
For alphabetical ordering, the lexicographical precedence is numeric values, uppercase letters, and lowercase letters. For count ordering, values with the same count are not ordered.
sort Word.Sort The order in which the words are to be listed. Specify one of the following constants:
  • Sort.ALPHA and Sort.PLUS_ALPHA list the words in ascending alphabetical order. This is the default ordering if you pass null for the parameter.
  • Sort.MINUS_ALPHA lists the words in descending alphabetical order.
  • Sort.COUNT and Sort.MINUS_COUNT list the words in descending order by the values of their count fields.
  • Sort.PLUS_COUNT lists the words in ascending order by the values of their count fields.
For alphabetical ordering, the lexicographical precedence is numeric values, uppercase letters, and lowercase letters. For count ordering, values with the same count are not ordered.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/words?sort=%2Balphabetical"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}'
};

speech_to_text.listWords(params, function(error, words) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(words, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

List<WordData> words = service.getWords({customizationId}, Type.ALL, Sort.ALPHA).execute();
System.out.println(words);

Response

Words
Name Description
words object[ ] An array of Word objects that provides information about each word in the custom model's words resource. The array is empty if the custom model has no words.

Returns a List of Java WordData objects. Each object provides the same information as a JSON Word object. The list is empty if the custom model contains no words.

Word (Java WordData object)
Name Description
word string A custom word from the custom model. The spelling of the word is used to train the model.
sounds_like string[ ] An array of pronunciations for the custom word. The array can include the sounds-like pronunciation automatically generated by the service if none is provided for the word; the service adds this pronunciation when it finishes processing the word.
display_as string The spelling of the custom word that the service uses to display the word in a transcript. The field contains an empty string if no display-as value is provided for the word, in which case the word is displayed as it is spelled.
source string[ ] An array of sources that describes how the word was added to the custom model's words resource. For OOV words added from a corpus, includes the name of the corpus; if the word was added by multiple corpora, the names of all corpora are listed. If the word was modified or added by the user directly, the field includes the string user.
count integer A sum of the number of times the word is found across all corpora. For example, if the word occurs five times in one corpus and seven times in another, its count is 12. If you add a custom word to a model before it is added by any corpora, the count begins at 1; if the word is added from a corpus first and later modified, the count reflects only the number of times it is found in corpora.

Note: For custom models created prior to the existence of the count field, the field always remains at 0. To update the field for such models, add the model's corpora again and include the allow_overwrite parameter; see Add a corpus.
error object[ ] If the service discovered one or more problems that you need to correct for the custom word's definition, an array of WordError objects that describes each of the errors.
error <String, String>[ ] If the service discovered one or more problems that you need to correct for the custom word's definition, a List of key-value pairs that describes each of the errors. Each error is described in the format "element": "message", where element is the aspect of the definition that caused the problem and message describes the problem. The following example describes a problem with one of the word's sounds-like definitions:
  • "{sounds_like_string}": "Numbers are not allowed in sounds-like. You can try for example '{suggested_string}'."
You must correct the error before you can train the model.
WordError
Name Description
{element} string A key-value pair that describes an error associated with the word's definition in the format "element": "message", where element is the aspect of the definition that caused the problem and message describes the problem. The following example describes a problem with one of the word's sounds-like definitions:
  • "{sounds_like_string}": "Numbers are not allowed in sounds-like. You can try for example '{suggested_string}'."
You must correct the error before you can train the model.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The specified customization ID is invalid:
  • Malformed GUID: '{customization_id}'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
500 Internal Server Error An internal error prevented the service from satisfying the request.

Exceptions thrown

Exception Description
BadRequestException The specified customization ID is invalid. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
InternalServerErrorException The service experienced an internal error. (HTTP response code 500.)

Example response


{
  "words": [
    {
      "word": "75.00",
      "sounds_like": ["75 dollars"],
      "display_as": "75.00",
      "count": 1,
      "source": ["user"],
      "error": [{"75 dollars": "Numbers are not allowed in sounds_like. You can try for example 'seventy five dollars'."}]
    },
    {
      "word": "HHonors",
      "sounds_like": ["hilton honors","H. honors"],
      "display_as": "HHonors",
      "count": 1,
      "source": ["corpus1"]
    },
    {
      "word": "IEEE",
      "sounds_like": ["I. triple E."],
      "display_as": "IEEE",
      "count": 3,
      "source": ["corpus1","corpus2","user"]
    },
    {
      "word": "NCAA",
      "sounds_like": ["N. C. A. A.","N. C. double A."],
      "display_as": "NCAA",
      "count": 1,
      "source": ["corpus3","user"]
    },
    {
      "word": "tomato",
      "sounds_like": ["tomatoh","tomayto"],
      "display_as": "tomato",
      "count": 1,
      "source": ["user"]
    }
  ]
}

[
  {
    "word": "75.00",
    "sounds_like": ["75 dollars"],
    "display_as": "75.00",
    "count": 1,
    "source": ["user"],
    "error": [{"75 dollars": "Numbers are not allowed in sounds_like. You can try for example 'seventy five dollars'."}]
  },
  {
    "word": "HHonors",
    "sounds_like": ["hilton honors","H. honors"],
    "display_as": "HHonors",
    "count": 1,
    "source": ["corpus1"]
  },
  {
    "word": "IEEE",
    "sounds_like": ["I. triple E."],
    "display_as": "IEEE",
    "count": 3,
    "source": ["corpus1","corpus2","user"]
  },
  {
    "word": "NCAA",
    "sounds_like": ["N. C. A. A.","N. C. double A."],
    "display_as": "NCAA",
    "count": 1,
    "source": ["corpus3","user"]
  },
  {
    "word": "tomato",
    "sounds_like": ["tomatoh","tomayto"],
    "display_as": "tomato",
    "count": 1,
    "source": ["user"]
  }
]

List a custom word

Lists information about a custom word from a custom language model. You must use credentials for the instance of the service that owns a model to query information about its words.


GET /v1/customizations/{customization_id}/words/{word_name}

getWord(params, callback)

ServiceCall<WordData> getWord(String customizationId, String wordName)

Request

Parameter Description
customization_id path string The GUID of the custom language model from which a word is to be queried. You must make the request with service credentials created for the instance of the service that owns the custom model.
word_name path string The custom word that is to be queried from the custom language model.
Parameter Description
customization_id customizationId string The GUID of the custom language model from which a word is to be queried. You must make the request with service credentials created for the instance of the service that owns the custom model.
word wordName string The custom word that is to be queried from the custom model.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/words/NCAA"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}',
  word: 'NCAA'
};

speech_to_text.getWord(params, function(error, word) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(word, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

WordData word = service.getWord({customizationId}, "NCAA").execute();
System.out.println(word);

Response

Returns a single instance of a Word object that provides information about the specified word.

Returns a single Java WordData object for the specified word. The information is the same as that described for the JSON Word object.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The specified customization ID or word is invalid, including the case where the word does not exist for the custom model. Specific failure messages include:
  • Malformed GUID: '{customization_id}'
  • Invalid value for word '{word}'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
500 Internal Server Error An internal error prevented the service from satisfying the request.

Exceptions thrown

Exception Description
BadRequestException The specified customization ID or word is invalid, including the case where the word does not exist for the custom model. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
InternalServerErrorException The service experienced an internal error. (HTTP response code 500.)

Example response


{
  "word": "NCAA",
  "sounds_like": ["N. C. A. A.","N. C. double A."],
  "display_as": "NCAA",
  "count": 1,
  "source": ["corpus3","user"]
}

Delete a custom word

Deletes a custom word from a custom language model. You can remove any word that you added to the custom model's words resource via any means. However, if the word also exists in the service's base vocabulary, the service removes only the custom pronunciation for the word; the word remains in the base vocabulary. Removing a custom word does not affect the custom model until you train the model with the Train a custom language model method. You must use credentials for the instance of the service that owns a model to delete its words.


DELETE /v1/customizations/{customization_id}/words/{word_name}

deleteWord(params, callback)

ServiceCall<Void> deleteWord(String customizationId, String wordName)

Request

Parameter Description
customization_id path string The GUID of the custom language model from which a word is to be deleted. You must make the request with service credentials created for the instance of the service that owns the custom model.
word_name path string The custom word that is to be deleted from the custom language model.
Parameter Description
customization_id customizationId string The GUID of the custom language model from which a word is to be deleted. You must make the request with service credentials created for the instance of the service that owns the custom model.
word_name wordName string The custom word that is to be deleted from the custom model.

Example request


curl -X DELETE -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/words/NCAA"

var SpeechToTextV1 = require('watson-developer-cloud/speech-to-text/v1');
var speech_to_text = new SpeechToTextV1 ({
  username: '{username}',
  password: '{password}'
});

var params = {
  'customization_id': '{customization_id}',
  'word_name': 'NCAA'
};

speech_to_text.deleteWord(params, function(error, response) {
  if (error)
    console.log('Error:', error);
  else
    console.log(JSON.stringify(response, null, 2));
});

SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("{username}", "{password}");

service.deleteWord({customizationId}, "NCAA").execute();

Response

An empty response body: {}.

No response body.

Response codes

Status Description
200 OK The custom word was successfully deleted from the custom language model.
400 Bad Request The specified customization ID or word is invalid, including the case where the word does not exist for the custom model. Specific failure messages include:
  • Malformed GUID: '{customization_id}'
  • Invalid value for word '{word}'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
405 Method Not Allowed No word name was specified with the request.
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization '{customization_id}' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Exceptions thrown

Exception Description
BadRequestException The specified customization ID or word is invalid, including the case where the word does not exist for the custom model. (HTTP response code 400.)
UnauthorizedException Access is denied due to invalid credentials. (HTTP response code 401.)
ServiceResponseException No word name was specified with the request. (HTTP response code 405.)
ConflictException The service is currently busy handling a previous request for the custom model. (HTTP response code 409.)
InternalServerErrorException The service experienced an internal error. (HTTP response code 500.)

Example response


{}

Custom acoustic models

Create a custom acoustic model

Creates a new custom acoustic model for a specified base model. The custom acoustic model can be used only with the base model for which it is created. The model is owned by the instance of the service whose credentials are used to create it.

Supported but not yet documented. See the createAcousticModel method in https://github.com/watson-developer-cloud/node-sdk.

Not yet supported.


POST /v1/acoustic_customizations

Request

Parameter Description
Content-Type header string The type of the input, application/json.
create_acoustic_model body object A CreateAcousticModel object that provides basic information about the new custom acoustic model.
CreateAcousticModel
Parameter Description
name string A user-defined name for the new custom acoustic model. Use a name that is unique among all custom acoustic models that you own. Use a localized name that matches the language of the custom model. Use a name that describes the acoustic environment of the custom model, such as Mobile custom model or Noisy car custom model.
base_model_name string The name of the language model that is to be customized by the new custom acoustic model. The new custom model can be used only with the base model that it customizes. To determine whether a base model supports custom acoustic models, refer to Language support for customization.
description string A description of the new custom acoustic model. Use a localized description that matches the language of the custom model.

Example request


curl -X POST -u "{username}":"{password}"
--header "Content-Type: application/json"
--data "{\"name\": \"Example acoustic model\",
  \"base_model_name\": \"en-US_BroadbandModel\",
  \"description\": \"Example custom acoustic model\"}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/acoustic_customizations"

Response

AcousticModel
Name Description
customization_id string The customization ID (GUID) of the new custom acoustic model.

Response codes

Status Description
201 Created The custom acoustic model was successfully created.
400 Bad Request A required parameter is null or invalid. Specific failure messages include:
  • Required parameter '{name}' is missing
  • Required parameter '{name}' cannot be empty string
  • Required parameter '{name}' cannot be null
  • The base model '{name}' is not recognized
  • Customization is not supported for base model '{name}'
401 Unauthorized The specified service credentials are invalid.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{
  "customization_id": "74f4807e-b5ff-4866-824e-6bba1a84fe96"
}

List custom acoustic models

Lists information about all custom acoustic models that are owned by an instance of the service. Use the language parameter to see all custom acoustic models for the specified language; omit the parameter to see all custom acoustic models for all languages. You must use credentials for the instance of the service that owns a model to list information about it.

Supported but not yet documented. See the listAcousticModels method in https://github.com/watson-developer-cloud/node-sdk.

Not yet supported.


GET /v1/acoustic_customizations

Request

Parameter Description
language query string The identifier of the language for which custom acoustic models are to be returned (for example, en-US). Omit the parameter to see all custom acoustic models owned by the requesting service credentials.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/acoustic_customizations?language=en-US"

Response

AcousticModels
Name Description
customizations object[ ] An array of AcousticModel objects that provides information about each available custom acoustic model. The array is empty if the requesting service credentials own no custom acoustic models (if no language is specified) or own no custom acoustic models for the specified language.
AcousticModel
Name Description
customization_id string The customization ID (GUID) of the custom acoustic model.
created string The date and time in Coordinated Universal Time (UTC) at which the custom acoustic model was created. The value is provided in full ISO 8601 format (YYYY-MM-DDThh:mm:ss.sTZD).
language string The language identifier of the custom acoustic model (for example, en-US).
versions string[ ] A list of the available versions of the custom acoustic model. Each element of the array indicates a version of the base model with which the custom model can be used. Multiple versions exist only if the custom model has been upgraded; otherwise, only a single version is shown.
owner string The GUID of the service credentials for the instance of the service that owns the custom acoustic model.
name string The name of the custom acoustic model.
description string The description of the custom acoustic model.
base_model_name string The name of the language model for which the custom acoustic model was created.
status string The current status of the custom acoustic model:
  • pending indicates that the model was created but is waiting either for training data to be added or for the service to finish analyzing added data.
  • ready indicates that the model contains data and is ready to be trained.
  • training indicates that the model is currently being trained.
  • available indicates that the model is trained and ready to use.
  • upgrading indicates that the model is currently being upgraded.
  • failed indicates that training of the model failed.
progress integer A percentage that indicates the progress of the custom acoustic model's current training. A value of 100 means that the model is fully trained.
Note: The progress field does not currently reflect the progress of the training. The field changes from 0 to 100 when training is complete.
warnings string If the request included unknown query parameters, the following message:
  • Unexpected query parameter(s) [parameters] detected
where parameters is a list that includes a quoted string for each unknown parameter.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request A required parameter is null or invalid. Specific failure messages include:
  • Language '{language}' is not supported for customization
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{
  "customizations": [
    {
      "customization_id": "74f4807e-b5ff-4866-824e-6bba1a84fe96",
      "created": "2016-06-01T18:42:25.324Z",
      "language": "en-US",
      "versions": [
        "en-US_BroadbandModel.v07-06082016.06202016",
        "en-US_BroadbandModel.v2017-11-15"
      ],
      "owner": "297cfd08-330a-22ba-93ce-1a73f454dd98",
      "name": "Example model one",
      "description": "Example custom acoustic model",
      "base_model_name": "en-US_BroadbandModel",
      "status": "pending",
      "progress": 0
    },
    {
      "customization_id": "8391f918-3b76-e109-763c-b7732fae4829",
      "created": "2016-06-01T18:51:37.291Z",
      "language": "en-US",
      "versions": [
        "en-US_BroadbandModel.v2017-11-15"
      ],
      "owner": "297cfd08-330a-22ba-93ce-1a73f454dd98",
      "name": "Example model two",
      "description": "Example custom acoustic model two",
      "base_model_name": "en-US_BroadbandModel",
      "status": "available",
      "progress": 100
    }
  ]
}

List a custom acoustic model

Lists information about a specified custom acoustic model. You must use credentials for the instance of the service that owns a model to list information about it.

Supported but not yet documented. See the getAcousticModel method in https://github.com/watson-developer-cloud/node-sdk.

Not yet supported.


GET /v1/acoustic_customizations/{customization_id}

Request

Parameter Description
customization_id path string The GUID of the custom acoustic model about which information is to be returned. You must make the request with service credentials created for the instance of the service that owns the custom model.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/acoustic_customizations/{customization_id}"

Response

Returns a single instance of an AcousticModel object that provides information about the specified custom acoustic model.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The specified customization ID is invalid:
  • Malformed GUID: '{customization_id}'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{
  "customization_id": "74f4807e-b5ff-4866-824e-6bba1a84fe96",
  "created": "2016-06-01T18:42:25.324Z",
  "language": "en-US",
  "versions": [
    "en-US_BroadbandModel.v07-06082016.06202016",
    "en-US_BroadbandModel.v2017-11-15"
  ],
  "owner": "297cfd08-330a-22ba-93ce-1a73f454dd98",
  "name": "Example model one",
  "description": "Example custom acoustic model",
  "base_model_name": "en-US_BroadbandModel",
  "status": "pending",
  "progress": 0
}

Train a custom acoustic model

Initiates the training of a custom acoustic model with new or changed audio resources. After adding or deleting audio resources for a custom acoustic model, use this method to begin the actual training of the model on the latest audio data. The custom acoustic model does not reflect its changed data until you train it. You must use credentials for the instance of the service that owns a model to train it.

The training method is asynchronous. It can take on the order of minutes or hours to complete depending on the total amount of audio data on which the custom acoustic model is being trained and the current load on the service. Typically, training a custom acoustic model takes approximately two to four times the length of its audio data. The range of time depends on the model being trained and the nature of the audio, such as whether the audio is clean or noisy. The method returns an HTTP 200 response code to indicate that the training process has begun.

You can monitor the status of the training by using the List a custom acoustic model method to poll the model's status. Use a loop to check the status once a minute. The method returns an AcousticModel object that includes status and progress fields. A status of available indicates that the custom model is trained and ready to use. The service cannot accept subsequent training requests, or requests to add new audio resources, until the existing request completes.

You can use the optional custom_language_model_id query parameter to specify the GUID of a separately created custom language model that is to be used during training. Specify a custom language model if you have verbatim transcriptions of the audio files that you have added to the custom model or you have either corpora (text files) or a list of words that are relevant to the contents of the audio files. For information about creating a separate custom language model, see Creating a custom language model.

Training can fail to start for the following reasons:

  • The service is currently handling another request for the custom model, such as another training request or a request to add audio resources to the model.

  • The custom model contains less than 10 minutes or more than 50 hours of audio data.

  • One or more of the custom model's audio resources is invalid.

Supported but not yet documented. See the trainAcousticModel method in https://github.com/watson-developer-cloud/node-sdk.

Not yet supported.


POST /v1/acoustic_customizations/{customization_id}/train

Request

Parameter Description
customization_id path string The GUID of the custom acoustic model that is to be trained. You must make the request with service credentials created for the instance of the service that owns the custom model.
custom_language_model_id query string The GUID of a custom language model that is to be used during training of the custom acoustic model. Specify a custom language model that has been trained with verbatim transcriptions of the audio resources or that contains words that are relevant to the contents of the audio resources.

Example request


curl -X POST -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/acoustic_customizations/{customization_id}/train?custom_language_model_id={customization_id}"

Response

An empty response body: {}.

Response codes

Status Description
200 OK Training of the custom acoustic model started successfully.
400 Bad Request A required parameter is null or invalid, or the custom model is not ready to be trained. Specific failure messages include:
  • No input data modified since last training
  • The following audio resources are invalid: '{resources}'. Fix errors before training.
  • Malformed GUID: '{customization_id}'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization '{customization_id}' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{}

Reset a custom acoustic model

Resets a custom acoustic model by removing all audio resources from the model. Resetting a custom acoustic model initializes the model to its state when it was first created. Metadata such as the name and language of the model are preserved, but the model's audio resources are removed and must be re-created. You must use credentials for the instance of the service that owns a model to reset it.

Supported but not yet documented. See the resetAcousticModel method in https://github.com/watson-developer-cloud/node-sdk.

Not yet supported.


POST /v1/acoustic_customizations/{customization_id}/reset

Request

Parameter Description
customization_id path string The GUID of the custom acoustic model that is to be reset. You must make the request with service credentials created for the instance of the service that owns the custom model.

Example request


curl -X POST -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/acoustic_customizations/{customization_id}/reset"

Response

An empty response body: {}.

Response codes

Status Description
200 OK The custom acoustic model was successfully reset.
400 Bad Request The specified customization ID is invalid:
  • Malformed GUID: '{customization_id}'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization '{customization_id}' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{}

Upgrade a custom acoustic model

Initiates the upgrade of a custom acoustic model to the latest version of its base language model. The upgrade method is asynchronous. It can take on the order of minutes or hours to complete depending on the amount of data in the custom model and the current load on the service; typically, upgrade takes approximately twice the length of the total audio contained in the custom model. A custom model must be in the ready or available state to be upgraded. You must use credentials for the instance of the service that owns a model to upgrade it.

The method returns an HTTP 200 response code to indicate that the upgrade process has begun successfully. You can monitor the status of the upgrade by using the List a custom acoustic model method to poll the model's status. Use a loop to check the status once a minute. While it is being upgraded, the custom model has the status upgrading. When the upgrade is complete, the model resumes the status that it had prior to upgrade. The service cannot accept subsequent requests for the model until the upgrade completes.

If the custom acoustic model was trained with a separately created custom language model, you must use the custom_language_model_id query parameter to specify the GUID of that custom language model. The custom language model must be upgraded before the custom acoustic model can be upgraded. Omit the parameter if the custom acoustic model was not trained with a custom language model.

For more information, see Upgrading custom models.

Supported but not yet documented. See the upgradeAcousticModel method in https://github.com/watson-developer-cloud/node-sdk.

Not yet supported.


POST /v1/acoustic_customizations/{customization_id}/upgrade_model

Request

Parameter Description
customization_id path string The GUID of the custom acoustic model that is to be upgraded. You must make the request with service credentials created for the instance of the service that owns the custom model.
custom_language_model_id query string If the custom acoustic model was trained with a custom language model, the GUID of that custom language model. The custom language model must be upgraded before the custom acoustic model can be upgraded.

Example request


curl -X POST -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/acoustic_customizations/{customization_id}/upgrade_model"

Response

An empty response body: {}.

Response codes

Status Description
200 OK Upgrade of the custom acoustic model has started successfully.
400 Bad Request A parameter is null or invalid, or the specified custom model cannot be upgraded:
  • Malformed GUID: '{customization_id}'
  • Custom model is up-to-date
  • No input data available to upgrade the model
  • Cannot upgrade failed custom model
  • The passed language custom model needs to be upgraded in order to upgrade the acoustic custom model.
  • Base model name mismatch detected. Please make sure that the base model name of the language custom model matches the base model name of the acoustic custom model.
  • Invalid model type for customization_id '{customization_id}'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization '{customization_id}' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{}

Delete a custom acoustic model

Deletes an existing custom acoustic model. The custom model cannot be deleted if another request, such as adding an audio resource to the model, is currently being processed. You must use credentials for the instance of the service that owns a model to delete it.

Supported but not yet documented. See the deleteAcousticModel method in https://github.com/watson-developer-cloud/node-sdk.

Not yet supported.


DELETE /v1/acoustic_customizations/{customization_id}

Request

Parameter Description
customization_id path string The GUID of the custom acoustic model that is to be deleted. You must make the request with service credentials created for the instance of the service that owns the custom model.

Example request


curl -X DELETE -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/acoustic_customizations/{customization_id}"

Response

An empty response body: {}.

Response codes

Status Description
200 OK The custom acoustic model was successfully deleted.
400 Bad Request The specified customization ID is invalid:
  • Malformed GUID: '{customization_id}'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials, including the case where the custom model does not exist:
  • Invalid customization_id '{customization_id}' for user
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization '{customization_id}' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{}

Custom audio resources

Add an audio resource

Adds an audio resource to a custom acoustic model. Add audio content that reflects the acoustic characteristics of the audio that you plan to transcribe. You must use credentials for the instance of the service that owns a model to add an audio resource to it. Adding audio data does not affect the custom acoustic model until you train the model for the new data by using the Train a custom acoustic model method.

You can add individual audio files or an archive file that contains multiple audio files. Adding multiple audio files via a single archive file is significantly more efficient than adding each file individually.

  • You can add an individual audio file in any format that the service supports for speech recognition. Use the Content-Type header to specify the format of the audio file.

  • You can add an archive file (.zip or .tar.gz file) that contains audio files in any format that the service supports for speech recognition. All audio files added with the same archive file must have the same audio format. Use the Content-Type header to specify the archive type, application/zip or application/gzip. Use the Contained-Content-Type header to specify the format of the contained audio files; the default format is audio/wav.

You can use this method to add any number of audio resources to a custom model by calling the method once for each audio or archive file. But the addition of one audio resource must be fully complete before you can add another. You must add a minimum of 10 minutes and a maximum of 50 hours of audio that includes speech, not just silence, to a custom acoustic model before you can train it. No audio resource, audio- or archive-type, can be larger than 100 MB.

The method is asynchronous. It can take several seconds to complete depending on the duration of the audio and, in the case of an archive file, the total number of audio files being processed. The service returns a 201 response code if the audio is valid. It then asynchronously analyzes the contents of the audio file or files and automatically extracts information about the audio such as its length, sampling rate, and encoding. You cannot submit requests to add additional audio resources to a custom acoustic model, or to train the model, until the service's analysis of all audio files for the current request completes.

To determine the status of the service's analysis of the audio, use the List an audio resource method to poll the status of the audio. The method accepts the GUID of the custom model and the name of the audio resource, and it returns the status of the resource. Use a loop to check the status of the audio every few seconds until it becomes ok.

Note: The sampling rate of an audio file must match the sampling rate of the base model for the custom model: for broadband models, at least 16 kHz; for narrowband models, at least 8 kHz. If the sampling rate of the audio is higher than the minimum required rate, the service down-samples the audio to the appropriate rate. If the sampling rate of the audio is lower than the minimum required rate, the service labels the audio file as invalid.

Supported but not yet documented. See the addAudio method in https://github.com/watson-developer-cloud/node-sdk.

Not yet supported.


POST /v1/acoustic_customizations/{customization_id}/audio/{audio_name}

Request

Parameter Description
customization_id path string The GUID of the custom acoustic model to which an audio resource is to be added. You must make the request with service credentials created for the instance of the service that owns the custom model.
audio_name path string The name of the audio resource that is to be added to the custom acoustic model. The name cannot contain spaces. Use a localized name that matches the language of the custom model.
Content-Type header string The audio format (MIME type) of the audio resource that is to be added to the custom acoustic model.

For an audio-type resource, one of the audio formats supported by the service for speech recognition:
  • audio/basic
  • audio/flac
  • audio/l16
  • audio/mp3
  • audio/mpeg
  • audio/mulaw
  • audio/ogg
  • audio/ogg;codecs=opus
  • audio/ogg;codecs=vorbis
  • audio/wav
  • audio/webm
  • audio/webm;codecs=opus
  • audio/webm;codecs=vorbis
The header supports required and optional rate, channels, and endianness specifications for applicable formats. For more information about the supported audio formats, see Audio formats.

For an archive-type resource, the media type of the archive file:
  • application/zip for a .zip file
  • application/gzip for a .tar.gz file
All audio files contained in the archive must have the same audio format.
Contained-Content-Type header string For an archive-type resource that contains audio files whose format is not audio/wav, specifies the format of the audio files. The header accepts all of the audio formats supported for use with speech recognition and with the Content-Type header, including the rate, channels, and endianness parameters that are used with some formats. For a complete list of supported audio formats, see Audio formats.
allow_overwrite query boolean Indicates whether the specified audio resource is to overwrite an existing resource with the same name. If a resource with the same name already exists, the request fails unless allow_overwrite is set to true; by default, the parameter is false. The parameter has no effect if a resource with the same name does not already exist.
audio_resource body stream The audio resource that is to be added to the custom acoustic model, an individual audio file or an archive file.

Example request


curl -X POST -u "{username}":"{password}"
--header "Content-Type: audio/wav"
--data-binary @audio1.wav
"https://stream.watsonplatform.net/speech-to-text/api/v1/acoustic_customizations/{customization_id}/audio/audio1"

Response

An empty response body: {}.

Response codes

Status Description
201 Created Addition of the audio resource was successfully started. The service is analyzing the data.
400 Bad Request A required parameter is null or invalid, the specified customization ID or audio resource is invalid, or the specified audio resource already exists. Specific failure messages include:
  • Malformed GUID: '{customization_id}'
  • Audio file not specified or empty
  • Invalid audio format detected
  • Audio '{name}' already exists - change its name, remove existing file before adding new one, or overwrite existing file by setting 'allow_overwrite' flag to 'true'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization '{customization_id}' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request. You can also receive status code 500 Forwarding Error if the service is currently busy handling a previous request for the custom model.

Example response


{}

List audio resources

Lists information about all audio resources from a custom acoustic model. The information includes the name of the resource and information about its audio data, such as its duration. It also includes the status of the audio resource, which is important for checking the service's analysis of the resource in response to a request to add it to the custom acoustic model. You must use credentials for the instance of the service that owns a model to list its audio resources.

Supported but not yet documented. See the listAudio method in https://github.com/watson-developer-cloud/node-sdk.

Not yet supported.


GET /v1/acoustic_customizations/{customization_id}/audio

Request

Parameter Description
customization_id path string The GUID of the custom acoustic model for which audio resources are to be listed. You must make the request with service credentials created for the instance of the service that owns the custom model.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/acoustic_customizations/{customization_id}/audio"

Response

AudioResources
Name Description
total_minutes_of_audio double The total minutes of accumulated audio summed over all of the valid audio resources for the custom acoustic model. You can use this value to determine whether the custom model has too little or too much audio to begin training.
audio object[ ] An array of AudioResource objects that provides information about the audio resources of the custom acoustic model. The array is empty if the custom model has no audio resources.
AudioResource
Name Description
duration double The total seconds of audio in the audio resource.
name string The name of the audio resource.
details object An AudioDetails object that provides detailed information about the audio resource. The object is empty until the service finishes processing the audio.
status string The status of the audio resource:
  • ok indicates that the service has successfully analyzed the audio data. The data can be used to train the custom model.
  • being_processed indicates that the service is still analyzing the audio data. The service cannot accept requests to add new audio resources or to train the custom model until its analysis is complete.
  • invalid indicates that the audio data is not valid for training the custom model (possibly because it has the wrong format or sampling rate, or because it is corrupted). For an archive file, the entire archive is invalid if any of its audio files are invalid.
AudioDetails
Name Description
type string The type of the audio resource:
  • audio for an individual audio file
  • archive for an archive (.zip or .tar.gz) file that contains audio files
codec string For an audio-type resource, the codec in which the audio is encoded. Omitted for an archive-type resource.
frequency integer For an audio-type resource, the sampling rate of the audio in Hertz (samples per second). Omitted for an archive-type resource.
compression string For an archive-type resource, the format of the compressed archive:
  • zip for a .zip file
  • gzip for a .tar.gz file
Omitted for an audio-type resource.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The specified customization ID is invalid:
  • Malformed GUID: '{customization_id}'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{
  "total_minutes_of_audio": 5.9185787598292032,
  "audio": [
    {
      "duration": 3.8428750038146973,
      "name": "audio1",
      "details": {
        "codec": "pcm_s16le",
        "type": "audio",
        "frequency": 16000
      }
      "status": "ok"
    },
    {
      "duration": 351.2718505859375,
      "name": "audio2",
      "details": {
        "type": "archive",
        "compression": "zip"
      },
      "status": "ok"
    },
    {
      "duration": 0,
      "name": "audio3",
      "details": {},
      "status": "being_processed"
    }
  ]
}

List an audio resource

Lists information about an audio resource from a custom acoustic model. The method returns an AudioListing object whose fields depend on the type of audio resource you specify with the method's audio_name parameter:

  • For an audio-type resource, the object's fields match those of an AudioResource object: duration, name, details, and status.

  • For an archive-type resource, the object includes a container field whose fields match those of an AudioResource object. It also includes an audio field, which contains an array of AudioResource objects that provides information about the audio files that are contained in the archive.

The information includes the status of the specified audio resource, which is important for checking the service's analysis of the resource in response to a request to add it to the custom model. You must use credentials for the instance of the service that owns a model to list its audio resources.

Supported but not yet documented. See the getAudio method in https://github.com/watson-developer-cloud/node-sdk.

Not yet supported.


GET /v1/acoustic_customizations/{customization_id}/audio/{audio_name}

Request

Parameter Description
customization_id path string The GUID of the custom acoustic model for which an audio resource is to be listed. You must make the request with service credentials created for the instance of the service that owns the custom model.
audio_name path string The name of the audio resource about which information is to be listed.

Example request


curl -X GET -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/acoustic_customizations/{customization_id}/audio/audio2"

Response

AudioListing
Name Description
duration double For an audio-type resource, the total seconds of audio in the resource. Omitted for an archive-type resource.
name string For an audio-type resource, the name of the resource. Omitted for an archive-type resource.
details object For an audio-type resource, an AudioDetails object that provides detailed information about the resource. The object is empty until the service finishes processing the audio. Omitted for an archive-type resource.
status string For an audio-type resource, the status of the resource:
  • ok indicates that the service has successfully analyzed the audio data. The data can be used to train the custom model.
  • being_processed indicates that the service is still analyzing the audio data. The service cannot accept requests to add new audio resources or to train the custom model until its analysis is complete.
  • invalid indicates that the audio data is not valid for training the custom model (possibly because it has the wrong format or sampling rate, or because it is corrupted).
Omitted for an archive-type resource.
container object For an archive-type resource, an object of type AudioResource that provides information about the resource. Omitted for an audio-type resource.
audio object[ ] For an archive-type resource, an array of AudioResource objects that provides information about the audio-type resources that are contained in the resource. Omitted for an audio-type resource.

Response codes

Status Description
200 OK The request succeeded.
400 Bad Request The specified customization ID or audio resource name is invalid, including the case where the audio resource does not exist for the custom model. Specific failure messages include:
  • Malformed GUID: '{customization_id}'
  • Invalid value for audio name '{audio_name}'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{
  "container": {
    "duration": 351.2718505859375,
    "name": "audio2",
    "details": {
      "type": "archive",
      "compression": "zip"
    },
    "status": "ok"
  },
  "audio": [
    {
      "duration": 11.760937690734863,
      "name": "arl001.wav",
      "details": {
        "codec": "pcm_s16le",
        "type": "audio",
        "frequency": 16000
      },
      "status": "ok"
    },
    {
      "duration": 2.9024999141693115,
      "name": "arl002.wav",
      "details": {
        "codec": "pcm_s16le",
        "type": "audio",
        "frequency": 16000
      },
      "status": "ok"
    },
    . . .
  ]
}

Delete an audio resource

Deletes an existing audio resource from a custom acoustic model. Deleting an archive-type audio resource removes the entire archive of files; the current interface does not allow deletion of individual files from an archive resource. Removing an audio resource does not affect the custom model until you train the model on its updated data by using the Train a custom acoustic model method. You must use credentials for the instance of the service that owns a model to delete its audio resources.

Supported but not yet documented. See the deleteAudio method in https://github.com/watson-developer-cloud/node-sdk.

Not yet supported.


DELETE /v1/acoustic_customizations/{customization_id}/audio/{audio_name}

Request

Parameter Description
customization_id path string The GUID of the custom acoustic model from which an audio resource is to be deleted. You must make the request with service credentials created for the instance of the service that owns the custom model.
audio_name path string The name of the audio resource that is to be deleted from the custom acoustic model.

Example request


curl -X DELETE -u "{username}":"{password}"
"https://stream.watsonplatform.net/speech-to-text/api/v1/acoustic_customizations/{customization_id}/audio/audio1"

Response

An empty response body: {}.

Response codes

Status Description
200 OK The audio resource was successfully deleted from the custom acoustic model.
400 Bad Request The specified customization ID or audio resource name is invalid, including the case where the audio resource does not exist for the custom model. Specific failure messages include:
  • Malformed GUID: '{customization_id}'
  • Invalid value for audio name '{audio_name}'
401 Unauthorized The specified service credentials are invalid or the specified customization ID is invalid for the requesting service credentials:
  • Invalid customization_id '{customization_id}' for user
405 Method Not Allowed No audio resource name was specified with the request.
409 Conflict The service is currently busy handling a previous request for the custom model:
  • Customization '{customization_id}' is currently locked to process your last request.
500 Internal Server Error An internal error prevented the service from satisfying the request.

Example response


{}

User data

Delete labeled data

Deletes all data that is associated with a specified customer ID. The method deletes all data for the customer ID, regardless of the method by which the information was added. The method has no effect if no data is associated with the customer ID. You must issue the request with credentials for the same instance of the service that was used to associate the customer ID with the data.

You associate a customer ID with data by passing the X-Watson-Metadata header with a request that passes the data. For more information about customer IDs and about using this method, see Information security.

Supported but not yet documented.


DELETE /v1/user_data

Request

Parameter Description
customer_id query string The customer ID for which all data is to be deleted.

Example request


curl -X DELETE -u {username]:{password}
"https://stream.watsonplatform.net/speech-to-text/api/v1/user_data?customer_id={customer_id}"

Response

No response body.

Response codes

Status Description
200 OK The deletion request was successfully submitted.
400 Bad Request The request did not pass a customer ID:
  • No customer ID found in the request