Customizing Watson Speech Library for Embed
For a basic usage of both services this example will use Text to Speech to synthesize an audio file and then pass that through to Speech to Text to recognize the utterances. Both have examples of customization.
For details on STT customization features and support for next generation models, view the STT API docs. Note that acoustic model customization is not supported.
For details on TTS customization features and support, view the TTS API docs. Note that custom pronunciation is supported but not voice transformation.
Dependencies
-
S3 Compatible Storage
An S3 compatible storage service must exist that supports HMAC (access key and secret key) credentials. Watson Speech requires one bucket that it can read and write objects to. The bucket will be populated with stock models and additional training data (to faciliatate customization for some of the the recent STT models) at install time. This additional data demands higher storage resoures and longer loading times. The bucket will also store customization artifacts, including custom training data and trained models.
-
PostgreSQL Database
A PostgreSQL database is required to manage metadata related to customization.
-
Kubernetes Cluster
The Speech services are assumed to be running in a Kubernetes cluster. The commands below take advantage of the
kubectl proxycommand to route traffic to the services installed in the cluster. -
Installs of Watson Text to Speech and Watson Speech to Text Libraries for Embed
Installing the Speech Embed services with customization requires setting a number of configurations. To make the installation easier, there are Helm charts provided on GitHub at IBM/ibm-watson-embed-charts. For details on how to install see the STT Run with Helm page and the TTS Run with Helm page.
Customization Example
-
Start a local proxy server to route requests to the services installed in the cluster:
kubectl proxy -
Create a new Text to Speech customization
You create a customization for a specific language, not for a specific voice. A customization can be used with any voice for its specified language. Omit the
languageparameter to use the the default language,en-US.Note that a header must be passed for customization requests. The header key is
X-Watson-UserInfoand the required value isbluemix-instance-id=$UUIDwhere$UUIDis formatted as a string likexxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.To facilitate copy-pasting the commands below, export the following variables:
export NAMESPACE=<your-namespace> export INSTALL_NAME=<your-install-name-used-with-helm-chart> export INSTANCE_ID="00000000-0000-0000-0000-000000000000"Note that due to the character limit for Kubernetes services and if you used a
nameOverride, the URL below may need to be changed after theINSTALL_NAME.curl -X POST "http://localhost:8001/api/v1/namespaces/${NAMESPACE}/services/https:${INSTALL_NAME}-ibm-watson-tts-embed-customization:https/proxy/text-to-speech/api/v1/customizations" \ --header "x-watson-userinfo: bluemix-instance-id=${INSTANCE_ID}" \ --header "Content-Type: application/json" \ --data '{"name":"MyCustomModel", "language":"en-US", "description": "First example custom language model with acronym translations"}' {"customization_id": "0fbee6df-7b4a-40b9-a6bc-9ccdcd42fb42"}Extract the customization id, for example:
export CUSTOMIZATION_ID="0fbee6df-7b4a-40b9-a6bc-9ccdcd42fb42" -
View the list of customizations
curl "http://localhost:8001/api/v1/namespaces/${NAMESPACE}/services/https:${INSTALL_NAME}-ibm-watson-tts-embed-customization:https/proxy/text-to-speech/api/v1/customizations" \ --header "x-watson-userinfo: bluemix-instance-id=${INSTANCE_ID}" -
Update your model with custom word-translation pairs
curl -X POST "http://localhost:8001/api/v1/namespaces/${NAMESPACE}/services/https:${INSTALL_NAME}-ibm-watson-tts-embed-customization:https/proxy/text-to-speech/api/v1/customizations/${CUSTOMIZATION_ID}/words" \ --header "x-watson-userinfo: bluemix-instance-id=${INSTANCE_ID}" \ --header "Content-Type: application/json" \ --data '{"words": [ {"word": "NCAA", "translation": "N C double A"}, {"word": "iPhone", "translation": "I phone"}, {"word": "BTW", "translation": "By the way"}, {"word": "NYSE", "translation": "New York Stock Exchange"}, {"word": "TTS", "translation": "Text to Speech"} ]}' {} # an empty JSON document indicates successView the customization model. You should see the list of word-translation pairs.
curl "http://localhost:8001/api/v1/namespaces/${NAMESPACE}/services/https:${INSTALL_NAME}-ibm-watson-tts-embed-customization:https/proxy/speech-to-text/api/v1/customizations/${CUSTOMIZATION_ID}" \ --header "x-watson-userinfo: bluemix-instance-id=${INSTANCE_ID}" -
Use the updated model in a
/synthesizecallcurl "http://localhost:8001/api/v1/namespaces/${NAMESPACE}/services/https:${INSTALL_NAME}-ibm-watson-tts-embed-runtime:https/proxy/text-to-speech/api/v1/synthesize?customization_id=$CUSTOMIZATION_ID" \ --header "x-watson-userinfo: bluemix-instance-id=${INSTANCE_ID}" \ --header "Content-Type: application/json" \ --data '{"text":"This is a simple test of the IBM TTS product. My favorite team reached the NCAA tournament’s final four. I’m thinking of getting a new iPhone next week. BTW. Companies listed in NYSE are showing mixed results."}' \ --header "Accept: audio/wav" \ --output tts-result.wavYou can play the output an a Mac:
afplay tts-result.wavTo see any errors that appear when creating the synthesized audio, remove the output flag.
-
Send the synthesized audio through Speech-to-Text
curl "http://localhost:8001/api/v1/namespaces/${NAMESPACE}/services/https:${INSTALL_NAME}-ibm-watson-stt-embed-runtime:https/proxy/speech-to-text/api/v1/recognize" \ --header "Content-Type: audio/wav" \ --data-binary @tts-result.wavNotice that the transcription does not understand the acronyms "i b m" and "nc double a" and it doesn't format "iphone" as iPhone.
-
Create a Speech to Text custom model to recognize these acronyms
curl -X POST "http://localhost:8001/api/v1/namespaces/${NAMESPACE}/services/https:${INSTALL_NAME}-ibm-watson-stt-embed-customization:https/proxy/speech-to-text/api/v1/customizations" \ --header "x-watson-userinfo: bluemix-instance-id=${INSTANCE_ID}" \ --header "Content-Type: application/json" \ --data '{ "name":"MyCustomModel", "description": "First example language model with custom words", "base_model_name": "en-US_Multimedia" }' {"customization_id": "5859a77f-3329-4cdf-948c-28279cd8530b"} $ export STT_CUSTOMIZATION_ID="5859a77f-3329-4cdf-948c-28279cd8530b" -
View the list of customizations
curl "http://localhost:8001/api/v1/namespaces/${NAMESPACE}/services/https:${INSTALL_NAME}-ibm-watson-stt-embed-customization:https/proxy/speech-to-text/api/v1/customizations" \ --header "x-watson-userinfo: bluemix-instance-id=${INSTANCE_ID}"Notice that the model you just created has status
Pending. This means the model is created but waiting for training data or analyzing. -
Update the model with custom word-sound pairs
curl "http://localhost:8001/api/v1/namespaces/${NAMESPACE}/services/https:${INSTALL_NAME}-ibm-watson-stt-embed-customization:https/proxy/speech-to-text/api/v1/customizations/${STT_CUSTOMIZATION_ID}/words" \ -X POST \ --header "x-watson-userinfo: bluemix-instance-id=${INSTANCE_ID}" \ --header "Content-Type: application/json" \ --data '{"words": [ {"word": "IBM", "sounds_like": ["I B M"]}, {"word": "NCAA", "sounds_like": ["N C double A", "NC double A"]}, {"word": "iPhone", "sounds_like": ["i phone", "iphone"]}, {"word": "BTW", "sounds_like": ["by the way"]}, {"word": "NYSE", "sounds_like": ["New York Stock Exchange"]}, {"word": "TTS", "sounds_like": ["Text to Speech"]} ]}'View the customization model:
curl "http://localhost:8001/api/v1/namespaces/${NAMESPACE}/services/https:${INSTALL_NAME}-ibm-watson-stt-embed-customization:https/proxy/speech-to-text/api/v1/customizations/${STT_CUSTOMIZATION_ID}" \ --header "x-watson-userinfo: bluemix-instance-id=${INSTANCE_ID}"Notice that the status of the model is
Ready. This means that the model has data and needs training to beAvailable. -
Train the model
curl -X POST "http://localhost:8001/api/v1/namespaces/${NAMESPACE}/services/https:${INSTALL_NAME}-ibm-watson-stt-embed-customization:https/proxy/speech-to-text/api/v1/customizations/${STT_CUSTOMIZATION_ID}/train" \ --header "x-watson-userinfo: bluemix-instance-id=${INSTANCE_ID}" -
Use the model in a
/recognizecallcurl "http://localhost:8001/api/v1/namespaces/${NAMESPACE}/services/https:${INSTALL_NAME}-ibm-watson-stt-embed-runtime:https/proxy/speech-to-text/api/v1/recognize?customization_id=${STT_CUSTOMIZATION_ID}" \ --header "x-watson-userinfo: bluemix-instance-id=${INSTANCE_ID}" \ --header "Content-Type: audio/wav" \ --data-binary @tts-result.wav