Advanced configuration for the Watson Speech services
Custom resource topics
The following information can help you understand and make the best use of the properties of the Watson Speech services custom resource:
- Usage notes provides important information about installing and modifying your Watson Speech services installation and configuration.
- Editing the custom resource provides brief instructions for editing the custom resource.
- Full custom resource shows an example of a full Watson Speech services custom resource with all possible properties and default values.
- License, version, storage class, and scaling properties describes the properties that you use to accept the product license and to specify the product version, storage classes, and scaling for your installation.
- Datastores properties describes the properties that you use to configure the Multicloud Object Gateway and PostgreSQL datastores for your installation.
- Speech microservices, models, and voices properties describes the properties that you use to tailor the installation and configuration of the Speech services microservices, models, and voices to meet your application needs.
- User data and node affinity properties describes the properties that you use to disable the storage and logging of user data and to specify node affinity.
- Configure autoscaling (HPA) for gateway resource describes the gateway size and autoscaling configuration that you can add to enable autoscaling for Watson Gateway, which is one of the dependencies of Watson Speech services.
- Configure Speech transcript enrichment provides information about enabling the enrichment features to improve the readability and usability of raw Automatic Speech Recognition (ASR) transcripts.
- Configure bucketSuffix when namespace name exceeds maximum length provides information about how to use bucketSuffix when namespace exceeds the maximum length.
Usage notes
Keep in mind the following usage notes when installing the Watson Speech services:
Speech to Text models (
sttModels) and Text to Speech voices (ttsVoices) are installed only if their corresponding runtimes (sttRuntimeandttsRuntime) are installed.You cannot uninstall individual service microservices once they are installed. Changing their values in the custom resource from
truetofalsehas no effect. To remove any of the following microservices, you must uninstall the Watson Speech services in their entirety and reinstall only the microservices that you need: Speech to Text runtime (sttRuntime), Speech to Text asynchronous HTTP (sttAsync), Speech to Text customization (sttCustomization), Text to Speech runtime (ttsRuntime), and Text to Speech customization (ttsCustomization).You can change any of the other values later by editing the custom resource that is created during installation. For more information, see Editing the custom resource.
After installing the Watson Speech services, change aspects of the installation and configuration only if needed. The default values are sufficient for most users.
Editing the custom resource
You can edit the custom resource to modify many aspects of your Watson Speech services installation and configuration. To modify the custom resource, enter the following command:
oc edit watsonspeech ${CUSTOM_RESOURCE_SPEECH}
The command opens the most recent custom resource in your editor. Modify the custom resource for the operation you are performing, then save the custom resource and exit your editor.
The Watson Speech operator picks up the changes to the custom resource on its next reconciliation loop, which is a periodic process that it runs to ensure that your installation reflects the latest custom resource. It might take up to 20 minutes for the operator to pick up the latest changes. It then takes more time for your installation to be updated with the changes.
Full custom resource
The complete custom resource for the Speech services includes all of the properties that you can include with the Speech custom resource. This full version of the custom resource installs both runtimes for Watson Speech services. It shows the default values for all properties.
apiVersion: speech.watson.ibm.com/v1
kind: WatsonSpeech
metadata:
name: speech-cr # The recommended name of the custom resource
namespace: ${PROJECT_CPD_INST_OPERANDS} # The project (namespace) name where you plan to install the Speech services
spec:
license:
accept: true
version: 5.4.0 # Omit this property to always install the latest version
##################
# Storage classes
##################
blockStorageClass: "portworx-db-gp3-sc" # The block storage class, for example, "portworx-db-gp3-sc"
fileStorageClass: "portworx-shared-gp3" # The file storage class, for example, "portworx-shared-gp3"
########################
# Configuration scaling
########################
scaleConfig:
stt:
size: xsmall # Size of Speech to Text configuration: xsmall, small, medium, large, or custom
tts:
size: xsmall # Size of Text to Speech configuration: xsmall, small, medium, large, or custom
###############
# Request CPUs
###############
sttAMPatcher:
resources:
requestsCPU: 1
################################
# Speech services microservices
################################
tags:
sttRuntime: true # Enables the Speech to Text runtime microservice
sttAsync: false # Enables the Speech to Text asynchronous HTTP microservice
sttCustomization: false # Enables the Speech to Text customization microservice
ttsRuntime: true # Enables the Text to Speech runtime microservice
ttsCustomization: false # Enables the Text to Speech customization microservice
#############
# Datastores
#############
global:
datastores:
##################################
# Multicloud Gateway object store
##################################
s3:
# Secrets
authSecretName: "noobaa-account-watson-speech"
###########################
# The PostgreSQL datastore
###########################
postgressql:
# Sizing configuration
replicas: 3 # Number of replica nodes for PostgreSQL.
databaseMemoryLimit: 5Gi # Maximum memory that PostgreSQL can use.
databaseMemoryRequest: 1Gi # Default memory requested by PostgreSQL.
databaseCPULimit: 1000m # Maximum CPU that PostgreSQL can use.
databaseCPU: 500m # Default CPU requested by PostgreSQL.
databaseStorageRequest: 5Gi # Maximum size of a PostgreSQL database storage request.
# Storage configuration
blockStorageClass: "{{ blockStorageClass }}" # Storage class that is used by PostgreSQL.
# Secrets
createSecret: true # True: Speech operator generates a secret.
# False: User provides existing secret via authSecretName property.
authSecretName: "<speech-cr>-postgres-auth-secret" # Name of PostgreSQL secrets object.
#########################
certificate:
duration: 26280h
renewBefore: 2160h
#########################
# Speech to Text previous-generation models
############################################
defaultSTTModel: en-US_BroadbandModel # Default model for speech recognition
sttModels:
enUsBroadbandModel: # US English (en-US) Broadband model
enabled: true
enUsNarrowbandModel: # US English (en-US) Narrowband model
enabled: true
enUsShortFormNarrowbandModel: # US English (en-US) Short-Form Narrowband model
enabled: true
arMsBroadbandModel: # Modern Standard Arabic (ar-MS) Broadband model
enabled: false
deDeBroadbandModel: # German (de-DE) Broadband model
enabled: false
deDeNarrowbandModel: # German (de-DE) Narrowband model
enabled: false
enAuBroadbandModel: # Australian English (en-AU) Broadband model
enabled: false
enAuNarrowbandModel: # Australian English (en-AU) Narrowband model
enabled: false
enGbBroadbandModel: # UK English (en-GB) Broadband model
enabled: false
enGbNarrowbandModel: # UK English (en-GB) Narrowband model
enabled: false
esEsBroadbandModel: # Castilian Spanish (es-ES, es-AR, es-CL, es-CO, es-MX, and es-PE) Broadband models
enabled: false
esEsNarrowbandModel: # Castilian Spanish (es-ES, es-AR, es-CL, es-CO, es-MX, and es-PE) Narrowband models
enabled: false
frCaBroadbandModel: # Canadian French (fr-CA) Broadband model
enabled: false
frCaNarrowbandModel: # Canadian French (fr-CA) Narrowband model
enabled: false
frFrBroadbandModel: # French (fr-FR) Broadband model
enabled: false
frFrNarrowbandModel: # French (fr-FR) Narrowband model
enabled: false
itItBroadbandModel: # Italian (it-IT) Broadband model
enabled: false
itItNarrowbandModel: # Italian (it-IT) Narrowband model
enabled: false
jaJpBroadbandModel: # Japanese (ja-JP) Broadband model
enabled: false
jaJpNarrowbandModel: # Japanese (ja-JP) Narrowband model
enabled: false
koKrBroadbandModel: # Korean (ko-KR) Broadband model
enabled: false
koKrNarrowbandModel: # Korean (ko-KR) Narrowband model
enabled: false
nlNlBroadbandModel: # Dutch (nl-NL) Broadband model
enabled: false
nlNlNarrowbandModel: # Dutch (nl-NL) Narrowband model
enabled: false
ptBrBroadbandModel: # Brazilian Portuguese (pt-BR) Broadband model
enabled: false
ptBrNarrowbandModel: # Brazilian Portuguese (pt-BR) Narrowband model
enabled: false
zhCnBroadbandModel: # Mandarin Chinese (zh-CN) Broadband model
enabled: false
zhCnNarrowbandModel: # Mandarin Chinese (zh-CN) Narrowband model
enabled: false
########################################
# Speech to Text next-generation models
########################################
enUsMultimedia: # US English (en-US) Multimedia model
enabled: true
enUsTelephony: # US English (en-US) Telephony model
enabled: true
arMsTelephony: # Modern Standard Arabic (ar-MS) Telephony model
enabled: false
csCZTelephony: # Czech (cs-CZ) Telephony model
enabled: false
deDeMultimedia: # German (de-DE) Multimedia model
enabled: false
deDeTelephony: # German (de-DE) Telephony model
enabled: false
enAuMultimedia: # Australian English (en-AU) Multimedia model
enabled: false
enAuTelephony: # Australian English (en-AU) Telephony model
enabled: false
enGbMultimedia: # UK English (en-GB) Multimedia model
enabled: false
enGbTelephony: # UK English (en-GB) Telephony model
enabled: false
enInTelephony: # Indian English (en-IN) Telephony model
enabled: false
enWwMedicalTelephony: # English (all supported dialects) Medical Telephony model
enabled: false
esEsMultimedia: # Castilian Spanish (es-ES) Multimedia model
enabled: false
esEsTelephony: # Castilian Spanish (es-ES) Telephony model
enabled: false
esLaTelephony: # Latin American Spanish (es-LA) Telephony model
enabled: false
frCaMultimedia: # Canadian French (fr-CA) Multimedia model
enabled: false
frCaTelephony: # Canadian French (fr-CA) Telephony model
enabled: false
frFrMultimedia: # French (fr-FR) Multimedia model
enabled: false
frFrTelephony: # French (fr-FR) Telephony model
enabled: false
hiInTelephony: # Indian Hindi (hi-IN) Telephony model
enabled: false
itItMultimedia: # Italian (it-IT) Multimedia model
enabled: false
itItTelephony: # Italian (it-IT) Telephony model
enabled: false
jaJpMultimedia: # Japanese (ja-JP) Multimedia model
enabled: false
jaJpTelephony: # Japanese (ja-JP) Telephony model
enabled: false
koKrMultimedia: # Korean (ko-KR) Multimedia model
enabled: false
koKrTelephony: # Korean (ko-KR) Telephony model
enabled: false
nlBeTelephony: # Belgian Dutch (nl-BE) Telephony model
enabled: false
nlNlMultimedia: # Netherlands Dutch (nl-NL) Multimedia model
enabled: false
nlNlTelephony: # Netherlands Dutch (nl-NL) Telephony model
enabled: false
ptBrMultimedia: # Brazilian Portuguese (pt-BR) Multimedia model
enabled: false
ptBrTelephony: # Brazilian Portuguese (pt-BR) Telephony model
enabled: false
svSeTelephony: # Swedish (sv-SE) Telephony model
enabled: false
zhCnTelephony: # Mandarin Chinese (zh-CN) Telephony model
enabled: false
########################################
# Speech to Text large speech models
########################################
deDe: # German (de-DE) model
enabled: false
enUs: # US English (en-US) model
enabled: false
enGb: # UK English (en-GB) model
enabled: false
enAu: # Australian English (en-AU) model
enabled: false
enIn: # Indian English (en-IN) model
enabled: false
frFr: # French (fr-FR) model
enabled: false
frCa: # Canadian French (fr-CA) model
enabled: false
jaJp: # Japanese (ja-JP) model
enabled: false
ptPt: # Portugal Portuguese (pt-PT) model
enabled: false
ptBr: # Brazilian Portuguese (pt-BR) model
enabled: false
esEs: # Castilian Spanish (es-ES) model
enabled: false
esAr: # Argentinian Spanish (es-AR) model
enabled: false
esCl: # Chilean Spanish (es-CL) model
enabled: false
esCo: # Colombian Spanish (es-CO) model
enabled: false
esMx: # Mexican Spanish (es-MX) model
enabled: false
esPe: # Peruvian Spanish (es-PE) model
enabled: false
########################################
# Text to Speech enhanced neural voices
########################################
defaultTTSVoice: en-US_MichaelV3Voice # Default voice for speech synthesis
ttsVoices:
enUSAllisonV3Voice: # US English (en-US) Allison enhanced neural voice
enabled: true
enUSLisaV3Voice: # US English (en-US) Lisa enhanced neural voice
enabled: true
enUSMichaelV3Voice: # US English (en-US) Michael enhanced neural voice
enabled: true
enUSEmilyV3Voice: # US English (en-US) Emily enhanced neural voice
enabled: false
enUSHenryV3Voice: # US English (en-US) Henry enhanced neural voice
enabled: false
enUSKevinV3Voice: # US English (en-US) Kevin enhanced neural voice
enabled: false
enUSOliviaV3Voice: # US English (en-US) Olivia enhanced neural voice
enabled: false
deDEBirgitV3Voice: # German (de-DE) Birgit enhanced neural voice
enabled: false
deDEDieterV3Voice: # German (de-DE) Dieter enhanced neural voice
enabled: false
deDEErikaV3Voice: # German (de-DE) Erika enhanced neural voice
enabled: false
enGBCharlotteV3Voice: # UK English (en-GB) Charlotte enhanced neural voice
enabled: false
enGBJamesV3Voice: # UK English (en-GB) James enhanced neural voice
enabled: false
enGBKateV3Voice: # UK English (en-GB) Kate enhanced neural voice
enabled: false
esESEnriqueV3Voice: # Castilian Spanish (es-ES) Enrique enhanced neural voice
enabled: false
esESLauraV3Voice: # Castilian Spanish (es-ES) Laura enhanced neural voice
enabled: false
esLASofiaV3Voice: # Latin American Spanish (es-LA) Sofia enhanced neural voice
enabled: false
esUSSofiaV3Voice: # North American Spanish (es-US) Sofia enhanced neural voice
enabled: false
frCALouiseV3Voice: # French Canadian (fr-CA) Louise enhanced neural voice
enabled: false
frFRNicolasV3Voice: # French (fr-FR) Nicolas enhanced neural voice
enabled: false
frFRReneeV3Voice: # French (fr-FR) Renee enhanced neural voice
enabled: false
itITFrancescaV3Voice: # Italian (it-IT) Francesca enhanced neural voice
enabled: false
jaJPEmiV3Voice: # Japanese (ja-JP) Emi enhanced neural voice
enabled: false
koKRJinV3Voice: # Korean (ko-KR) Jin enhanced neural voice
enabled: false
nlNLMerelV3Voice: # Netherlands Dutch (nl-NL) Merel enhanced neural voice
enabled: false
ptBRIsabelaV3Voice: # Brazilian Portuguese (pt-BR) Isabela enhanced neural voice
enabled: false
##########################################
# Text to Speech expressive neural voices
##########################################
enAUHeidiExpressive: # Australian English (en-AU) Heidi expressive neural voice
enabled: false
enAUJackExpressive: # Australian English (en-AU) Jack expressive neural voice
enabled: false
enUSAllisonExpressive: # US English (en-US) Allison expressive neural voice
enabled: false
enUSEmmaExpressive: # US English (en-US) Emma expressive neural voice
enabled: false
enUSLisaExpressive: # US English (en-US) Lisa expressive neural voice
enabled: false
enUSMichaelExpressive: # US English (en-US) Michael expressive neural voice
enabled: false
enGBGeorgeExpressive: # UK English (en-GB) George expressive neural voice
enabled: false
ptBRLucasExpressive: # Brazilian Portuguese (pt-BR) Lucas expressive neural voice
enabled: false
##########################################
# Text to Speech natural voices
##########################################
enAUHeidiNatural: # Australian English (en-AU) Heidi natural voice
enabled: false
enAUJackNatural: # Australian English (en-AU) Jack natural voice
enabled: false
enCAHannahNatural: # Canadian English (en-CA) Hannah natuarl voice
enabled: false
enGBChloeNatural: # UK English (en-GB) Chloe natural voice
enabled: false
enGBGeorgeNatural: # UK English (en-GB) George natural voice
enabled: false
enUSEllieNatural: # US English (en-US) Ellie natural voice
enabled: false
enUSEmmaNatural: # US English (en-US) Emma natural voice
enabled: false
enUSEthanNatural: # US English (en-US) Ethan natural voice
enabled: false
enUSJacksonNatural: # US English (en-US) Jackson natural voice
enabled: false
enUSVictoriaNatural: # US English (en-US) Victoria natural voice
enabled: false
esLAAlejandroNatural:# Latin American Spanish (es-LA) Alejandro natural voice
enabled: false
esLADanielaNatural: # Latin American Spanish (es-LA) Daniela natural voice
enabled: false
ptBRLucasNatural: # Brazilian Portuguese (pt-BR) Lucas natural voice
enabled: false
ptBRCamilaNatural: # Brazilian Portuguese (pt-BR) Camila natural voice
enabled: false
#####################
# Backup and restore
#####################
backupRestore:
components:
backup:
name: "backup"
backupVolume: "{{ releaseName }}-aux-s3-backup-pvc"
volumeSize: 10Gi
###################################
# Storage and logging of user data
###################################
sttRuntime:
skipAudioAndResultLogging: "true" # If true, disables storage and logging of user data
sttAMPatcher:
skipAudioAndResultLogging: "true" # If true, disables storage and logging of user data
ttsRuntime:
skipAudioAndResultLogging: "true" # If true, disables storage and logging of user data
################
# Node affinity
################
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: beta.kubernetes.io/arch
operator: In
values:
- amd64
License, version, storage class, and scaling properties
The Speech custom resource provides properties that you use to accept the product license and to specify the product version, storage class, and scaling for your installation:
- Acknowledging license acceptance
- Specifying a version
- Specifying storage classes
- Scaling your installation
- Specifying request CPUs
- Acknowledging license acceptance
-
To install the Speech services, you must read, understand, and accept the license terms. You are strongly encouraged to read the full terms of the license agreement. (The link requires you to leave the IBM web site.)
You must set the license
acceptproperty totrue. The installation fails if you set the property tofalse.license: accept: true
- Specifying a version
-
The
versionproperty specifies the version of the Speech services that the operator is to install. The property's value is provided by IBM. Future releases of the Speech services will increment the version to greater values.version: 5.4.0 # Omit this property to always install the latest versionThe
versionproperty disconnects the Speech operator from the operator service. The Speech operator can handle multiple versions of the Speech services. You can continue to run older versions of the Speech services on newer versions of the operator. You can then upgrade to later versions of the Speech services at your discretion.Alternatively, you can omit the
versionproperty entirely from the custom resource. Doing so allows you to manage your Speech services (operand) automatically. If you omit theversionproperty, the Watson Speech operator always installs the latest available version of the Speech services. It also automatically upgrades your installation to the latest version of the Speech services as new versions become available.
- Specifying storage classes
-
The
blockStorageClassandfileStorageClassproperties specify the persistent storage the Speech services are to use. For example, the following example uses Portworx by specifying the storage classes"portworx-db-gp3-sc"andportworx-shared-gp3:################## # Storage classes ################## blockStorageClass: "portworx-db-gp3-sc" # The block storage class, for example, "portworx-db-gp3-sc" fileStorageClass: "portworx-shared-gp3" # The file storage class, for example, "portworx-shared-gp3"For more information about the available block and file storage classes for the persistent storage for Watson Speech services, see:
- Storage requirements in Information you need to complete this task
- Configuring persistent storage for Cloud Pak for Data
- Scaling your configuration
-
The
scaleConfigproperty specifies the size of the installation for the Speech services. Possible sizes includexsmall,small,medium,large, andcustom. By default, the services use the followingxsmallvalues:######################## # Configuration scaling ######################## scaleConfig: stt: size: xsmall # Size of Speech to Text configuration: xsmall, small, medium, large, or custom tts: size: xsmall # Size of Text to Speech configuration: xsmall, small, medium, large, or customThe values for the Speech to Text and Text to Speech services are independent. You can specify different values for the
sizeproperties. For example, the following values increase the size of the Speech to Text configuration but leave the Text to Speech configuration unchanged:######################## # Configuration scaling ######################## scaleConfig: stt: size: medium tts: size: xsmallYou can start by using the default values and scale your configuration as your usage grows. During installation, these properties are specified by the
watson_speech_stt_scale_configandwatson_speech_tts_scale_configoptions.- For more information about scaling your configuration, see Scaling up your Watson Speech services installation.
- Specifying request CPUs
-
The
sttAMPatchermicroservice manages acoustic model customization for the Speech to Text service. It is automatically installed with thesttCustomizationmicroservice.The AM Patcher uses a dedicated number of CPUs to handle requests. The
sttAMPatcher.resources.requestsCPUproperty specifies the number of CPUs that are dedicated to handling acoustic model training requests by the AM Patcher microservice.############### # Request CPUs ############### sttAMPatcher: resources: requestsCPU: 1If you experience training failures for custom acoustic models in the form of the following messages, increase the value of the
requestsCPUproperty from1to5:Unresponsive backend detected. Please try later.Allocating more resources prevents this error and enables custom acoustic models to be trained as expected. Increasing the value of the property increases the size of the deployment. For more information, see also Training of custom acoustic models is failing.
Datastores properties
The Speech custom resource provides properties that you can use to configure the Multicloud Object Gateway and PostgreSQL datastores for your installation:
The individual datastores are installed only with the following microservices:
The Multicloud Object Gateway must be installed before all Watson Speech services installs. The Speech runtime microservices depend on it, as does
sttCustomization. For installation instructions, see Installing Multicloud Object Gateway for IBM Cloud Pak for DataThe PostgreSQL datastore is installed only if at least one of the following microservices is enabled:
sttAsync,sttCustomization, orttsCustomization. If the dependent microservices are disabled at a later date, PostgreSQL remains installed but is not used.Note: Prior to version 4.6.0, PostgreSQL was always installed with the Speech services. If you are an existing user who enabled only the runtime microservices, PostgreSQL remains installed but is not used. In this case, PostgreSQL also remains installed across upgrades.
- Configuring the Multicloud Object Gateway
-
The Multicloud Object Gateway is an object storage solution that contains the following stateful data for the Speech services:
- For Speech to Text and Text to Speech, models and voices that are installed with the services.
- For Speech to Text, transcription results while a speech recognition job is in progress.
- For Speech to Text, binary patches for trained models, grammars and corpora for custom language models, and audio files for custom acoustic models.
You can use the following properties to configure Multicloud Object Gateway for your installation. The definition shows the default configuration values for the datastore's properties. These properties are indented beneath
global.datastores.Note that the secret
authSecretNameis created in the Creating secrets for services that use Multicloud Object Gateway section.################################## # Multicloud Gateway object store ################################## s3: # Secrets authSecretName: "noobaa-account-watson-speech"
- Configuring the PostgreSQL datastore
-
PostgreSQL is an open-source relational database that contains the stateful data for the following Speech microservices:
- For Speech to Text, custom language models and custom acoustic models.
- For Speech to Text, all asynchronous HTTP jobs for the past week. Entries older than one week are automatically purged.
- For Text to Speech, custom models and speaker models.
You can use the following properties to configure the PostgreSQL datastore for your installation. The definition shows the default configuration values for the datastore's properties. These properties are indented beneath
global.datastores.Note that the value
{{ blockStorageClass }}is replaced by the value specified for themetadata.blockStorageClassproperty. In the full custom resource shown earlier, the value isportworx-db-gp3-sc.########################### # The PostgreSQL datastore ########################### postgressql: # Sizing configuration replicas: 3 # Number of replica nodes for PostgreSQL. databaseMemoryLimit: 5Gi # Maximum memory that PostgreSQL can use. databaseMemoryRequest: 1Gi # Default memory requested by PostgreSQL. databaseCPULimit: 1000m # Maximum CPU that PostgreSQL can use. databaseCPU: 500m # Default CPU requested by PostgreSQL. databaseStorageRequest: 5Gi # Maximum size of a PostgreSQL database storage request. enablePodMonitor: false # Enables the PostgreSQL operator to monitor the PostgreSQL pods. # Storage configuration blockStorageClass: "{{ blockStorageClass }}" # Storage class that is used by PostgreSQL. # Secrets authSecretName: "<speech-cr>-postgres-auth-secret" # Name of PostgreSQL secrets object.- If you plan to use a user-provided secret for the PostgreSQL datastore, see Creating a secrets object for your PostgreSQL datastore and Updating secrets objects for your datastores after you install the Speech services.
- For more information about scaling up the number of replicas for the datastore, see Scaling up the PostgreSQL datastore.
- For more information about monitoring the PostgreSQL datastore, see Monitoring the PostgreSQL datastore for Watson Speech services.
Speech microservices, models, and voices properties
The Speech custom resource provides a rich set of properties that you can use to tailor the installation and configuration of the Speech service to meet your application needs. These properties specify the Speech services microservices, models, and voices that are to be installed.
- Installing Speech services microservices
- Installing Speech to Text models
- Installing Text to Speech voices
- Installing Speech services microservices
-
You use these five properties to specify the Speech microservices to install. The properties specify the functionality that is available for the Speech services. By default, the runtimes are enabled, but you can enable or disable them separately. The following example installs the Speech to Text and Text to Speech runtimes:
################################ # Speech services microservices ################################ tags: sttRuntime: true # Enables the Speech to Text runtime microservice sttAsync: false # Enables the Speech to Text asynchronous HTTP microservice sttCustomization: false # Enables the Speech to Text customization microservice ttsRuntime: true # Enables the Text to Speech runtime microservice ttsCustomization: false # Enables the Text to Speech customization microserviceThe properties provide the following service capabilities:
sttRuntime-
Speech to Text runtime, the base microservice for speech recognition. This value enables the
/v1/recognizeinterfaces (synchronous HTTP and WebSocket). Enabling either of the other Speech to Text microservices automatically enables the Speech to Text runtime. Speech to Text models (stt-models) are installed only if the runtime is installed. During installation, this property is specified by thewatson_speech_enable_stt_runtimeoption.For more information, see The synchronous HTTP interface and The WebSocket interface.
sttAsync-
Speech to Text asynchronous HTTP. This value enables the
/v1/recognitionsinterface. During installation, this property is specified by thewatson_speech_enable_stt_asyncoption.For more information, see The asynchronous HTTP interface.
sttCustomization-
Speech to Text customization. This value enables the
/v1/customizationsand/v1/acoustic_customizationsinterfaces for language model and acoustic model customization. (ThesttAMPatchermicroservice, the backend microservice for acoustic model customization, is automatically installed with thesttCustomizationmicroservice.) During installation, this property is specified by thewatson_speech_enable_stt_customizationoption.For more information, see Understanding customization.
ttsRuntime-
Text to Speech runtime, the base microservice for speech synthesis. This value enables the
/v1/synthesizeinterfaces (HTTP and WebSocket). Enabling the Text to Speech customization microservice automatically enables the Text to Speech runtime. Text to Speech voices (tts-voices) are installed only if the runtime is installed. During installation, this property is specified by thewatson_speech_enable_tts_runtimeoption.For more information, see The HTTP interface and The WebSocket interface.
ttsCustomization-
Text to Speech customization. This value enables the
/v1/customizationsinterface for customization. During installation, this property is specified by thewatson_speech_enable_tts_customizationoption.For more information, see Understanding customization.
You can combine installation of the different microservices to install the following functionality:
- To install Speech to Text only, set
ttsRuntimeandttsCustomizationtofalse. - To install Text to Speech only, set
sttRuntime,sttAsync, andsttCustomizationtofalse. - To install both Speech to Text and Text to Speech without enabling customization, set
sttCustomizationandttsCustomizationtofalse.
- Installing Speech to Text models
-
You can use the following properties to specify the models to install. Installing all models substantially increases the memory requirements. You are therefore strongly encouraged to install only those models that you intend to use.
When choosing between previous- and next-generation models, prefer next-generation models where possible. Next-generation models offer improved speech recognition over previous-generation models. They also require fewer resources (CPU and memory) than previous-generation models. By default, the dynamic resource calculation feature automatically computes the exact amount of memory that is required for the selected models.
The
defaultSTTModelproperty sets the default model for speech recognition. If you install and use models in languages other than US English, set the default to the model you expect to use most often.Speech to Text models are installed only if the
sttRuntimemicroservice is installed. To install a model, set itsenabledproperty totruein the custom resource. Set the property tofalseto indicate that the model is not to be installed. The following properties show the default values. By default, only the US English previous- and next-generation models are installed. These properties are indented beneathglobal. All available models are indented beneathsttModels. During installation, Speech to Text models are specified by thewatson_speech_modelsoption.- For more information about all available previous-generation models, see Previous-generation languages and models.
- For more information about all available next-generation models, see Next-generation languages and models.
- You can change which models are installed at any time. For more information, see Updating models and voices for your Watson Speech services.
Important: Effective 31 July 2023, all previous-generation models will be removed from the service and the documentation. Most previous-generation models were deprecated on 15 March 2022. For more information, see the 22 February 2023 (version 4.6.3) service update in the Release notes for Speech to Text for IBM Cloud Pak for Data.############################################ # Speech to Text previous-generation models ############################################ defaultSTTModel: en-US_BroadbandModel # Default model for speech recognition sttModels: enUsBroadbandModel: # US English (en-US) Broadband model enabled: true enUsNarrowbandModel: # US English (en-US) Narrowband model enabled: true enUsShortFormNarrowbandModel: # US English (en-US) Short-Form Narrowband model enabled: true arMsBroadbandModel: # Modern Standard Arabic (ar-MS) Broadband model enabled: false deDeBroadbandModel: # German (de-DE) Broadband model enabled: false deDeNarrowbandModel: # German (de-DE) Narrowband model enabled: false enAuBroadbandModel: # Australian English (en-AU) Broadband model enabled: false enAuNarrowbandModel: # Australian English (en-AU) Narrowband model enabled: false enGbBroadbandModel: # UK English (en-GB) Broadband model enabled: false enGbNarrowbandModel: # UK English (en-GB) Narrowband model enabled: false esEsBroadbandModel: # Castilian Spanish (es-ES, es-AR, es-CL, es-CO, es-MX, and es-PE) Broadband models enabled: false esEsNarrowbandModel: # Castilian Spanish (es-ES, es-AR, es-CL, es-CO, es-MX, and es-PE) Narrowband models enabled: false frCaBroadbandModel: # Canadian French (fr-CA) Broadband model enabled: false frCaNarrowbandModel: # Canadian French (fr-CA) Narrowband model enabled: false frFrBroadbandModel: # French (fr-FR) Broadband model enabled: false frFrNarrowbandModel: # French (fr-FR) Narrowband model enabled: false itItBroadbandModel: # Italian (it-IT) Broadband model enabled: false itItNarrowbandModel: # Italian (it-IT) Narrowband model enabled: false jaJpBroadbandModel: # Japanese (ja-JP) Broadband model enabled: false jaJpNarrowbandModel: # Japanese (ja-JP) Narrowband model enabled: false koKrBroadbandModel: # Korean (ko-KR) Broadband model enabled: false koKrNarrowbandModel: # Korean (ko-KR) Narrowband model enabled: false nlNlBroadbandModel: # Dutch (nl-NL) Broadband model enabled: false nlNlNarrowbandModel: # Dutch (nl-NL) Narrowband model enabled: false ptBrBroadbandModel: # Brazilian Portuguese (pt-BR) Broadband model enabled: false ptBrNarrowbandModel: # Brazilian Portuguese (pt-BR) Narrowband model enabled: false zhCnBroadbandModel: # Mandarin Chinese (zh-CN) Broadband model enabled: false zhCnNarrowbandModel: # Mandarin Chinese (zh-CN) Narrowband model enabled: false ######################################## # Speech to Text next-generation models ######################################## enUsMultimedia: # US English (en-US) Multimedia model enabled: true enUsTelephony: # US English (en-US) Telephony model enabled: true arMsTelephony: # Modern Standard Arabic (ar-MS) Telephony model enabled: false csCZTelephony: # Czech (cs-CZ) Telephony model enabled: false deDeMultimedia: # German (de-DE) Multimedia model enabled: false deDeTelephony: # German (de-DE) Telephony model enabled: false enAuMultimedia: # Australian English (en-AU) Multimedia model enabled: false enAuTelephony: # Australian English (en-AU) Telephony model enabled: false enGbMultimedia: # UK English (en-GB) Multimedia model enabled: false enGbTelephony: # UK English (en-GB) Telephony model enabled: false enInTelephony: # Indian English (en-IN) Telephony model enabled: false enWwMedicalTelephony: # English (all supported dialects) Medical Telephony model enabled: false esEsMultimedia: # Castilian Spanish (es-ES) Multimedia model enabled: false esEsTelephony: # Castilian Spanish (es-ES) Telephony model enabled: false esLaTelephony: # Latin American Spanish (es-LA) Telephony model enabled: false frCaMultimedia: # Canadian French (fr-CA) Multimedia model enabled: false frCaTelephony: # Canadian French (fr-CA) Telephony model enabled: false frFrMultimedia: # French (fr-FR) Multimedia model enabled: false frFrTelephony: # French (fr-FR) Telephony model enabled: false hiInTelephony: # Indian Hindi (hi-IN) Telephony model enabled: false itItMultimedia: # Italian (it-IT) Multimedia model enabled: false itItTelephony: # Italian (it-IT) Telephony model enabled: false jaJpMultimedia: # Japanese (ja-JP) Multimedia model enabled: false jaJpTelephony: # Japanese (ja-JP) Telephony model enabled: false koKrMultimedia: # Korean (ko-KR) Multimedia model enabled: false koKrTelephony: # Korean (ko-KR) Telephony model enabled: false nlBeTelephony: # Belgian Dutch (nl-BE) Telephony model enabled: false nlNlMultimedia: # Netherlands Dutch (nl-NL) Multimedia model enabled: false nlNlTelephony: # Netherlands Dutch (nl-ML) Telephony model enabled: false ptBrMultimedia: # Brazilian Portuguese (pt-BR) Multimedia model enabled: false ptBrTelephony: # Brazilian Portuguese (pt-BR) Telephony model enabled: false svSeTelephony: # Swedish (sv-SE) Telephony model enabled: false zhCnTelephony: # Mandarin Chinese (zh-CN) Telephony model enabled: false #################################### # Speech to Text large speech models #################################### deDe: # German (de-DE) model enabled: false enUs: # US English (en-US) model enabled: false enGb: # UK English (en-GB) model enabled: false enAu: # Australian English (en-AU) model enabled: false enIn: # Indian English (en-IN) model enabled: false frFr: # French (fr-FR) model enabled: false frCa: # Canadian French (fr-CA) model enabled: false jaJp: # Japanese (ja-JP) model enabled: false ptPt: # Portugal Portuguese (pt-PT) model enabled: false ptBr: # Brazilian Portuguese (pt-BR) model enabled: false esEs: # Castilian Spanish (es-ES) model enabled: false esAr: # Argentinian Spanish (es-AR) model enabled: false esCl: # Chilean Spanish (es-CL) model enabled: false esCo: # Colombian Spanish (es-CO) model enabled: false esMx: # Mexican Spanish (es-MX) model enabled: false esPe: # Peruvian Spanish (es-PE) model enabled: false
- Installing Text to Speech voices
-
You can use the following properties to specify the voices to install. All voices are enhanced neural voices. You must indicate the individual voices that you want to install. To install a voice, set its
enabledproperty totrue. Set the property tofalseto indicate that the voice is not to be installed. The properties in the following example enable installation of the default voices. By default, only a subset of the US English voices are installed.By default, the dynamic resource calculation feature automatically computes the exact amount of memory that is required for the voices that you select to install. Installing more voices increases the memory requirements for the service. You are therefore encouraged to install only those voices that you intend to use.
The
defaultTTSVoiceproperty sets the default voice for speech synthesis. If you install and use voices in languages other than US English, set the default to the voice you expect to use most often.Text to Speech voices are installed only if the
ttsRuntimemicroservice is installed. These properties are all indented beneathglobal. All available voices are indented beneathttsVoices. During installation, Text to Speech models are specified by thewatson_speech_voicesoption.- For more information about all available voices, see Using languages and voices.
- You can change which models are installed at any time. For more information, see Updating models and voices for your Watson Speech services.
######################################## # Text to Speech enhanced neural voices ######################################## defaultTTSVoice: en-US_MichaelV3Voice # Default voice for speech synthesis ttsVoices: enUSAllisonV3Voice: # US English (en-US) Allison enhanced neural voice enabled: true enUSLisaV3Voice: # US English (en-US) Lisa enhanced neural voice enabled: true enUSMichaelV3Voice: # US English (en-US) Michael enhanced neural voice enabled: true enUSEmilyV3Voice: # US English (en-US) Emily enhanced neural voice enabled: false enUSHenryV3Voice: # US English (en-US) Henry enhanced neural voice enabled: false enUSKevinV3Voice: # US English (en-US) Kevin enhanced neural voice enabled: false enUSOliviaV3Voice: # US English (en-US) Olivia enhanced neural voice enabled: false deDEBirgitV3Voice: # German (de-DE) Birgit enhanced neural voice enabled: false deDEDieterV3Voice: # German (de-DE) Dieter enhanced neural voice enabled: false deDEErikaV3Voice: # German (de-DE) Erika enhanced neural voice enabled: false enGBCharlotteV3Voice: # UK English (en-GB) Charlotte enhanced neural voice enabled: false enGBJamesV3Voice: # UK English (en-GB) James enhanced neural voice enabled: false enGBKateV3Voice: # UK English (en-GB) Kate enhanced neural voice enabled: false esESEnriqueV3Voice: # Castilian Spanish (es-ES) Enrique enhanced neural voice enabled: false esESLauraV3Voice: # Castilian Spanish (es-ES) Laura enhanced neural voice enabled: false esLASofiaV3Voice: # Latin American Spanish (es-LA) Sofia enhanced neural voice enabled: false esUSSofiaV3Voice: # North American Spanish (es-US) Sofia enhanced neural voice enabled: false frCALouiseV3Voice: # French Canadian (fr-CA) Louise enhanced neural voice enabled: false frFRNicolasV3Voice: # French (fr-FR) Nicolas enhanced neural voice enabled: false frFRReneeV3Voice: # French (fr-FR) Renee enhanced neural voice enabled: false itITFrancescaV3Voice: # Italian (it-IT) Francesca enhanced neural voice enabled: false jaJPEmiV3Voice: # Japanese (ja-JP) Emi enhanced neural voice enabled: false koKRJinV3Voice: # Korean (ko-KR) Jin enhanced neural voice enabled: false nlNLMerelV3Voice: # Netherlands Dutch (nl-NL) Merel enhanced neural voice enabled: false ptBRIsabelaV3Voice: # Brazilian Portuguese (pt-BR) Isabela enhanced neural voice enabled: false ########################################## # Text to Speech expressive neural voices ########################################## enAUHeidiExpressive: # Australian English (en-AU) Heidi expressive neural voice enabled: false enAUJackExpressive: # Australian English (en-AU) Jack expressive neural voice enabled: false enUSAllisonExpressive: # US English (en-US) Allison expressive neural voice enabled: false enUSEmmaExpressive: # US English (en-US) Emma expressive neural voice enabled: false enUSLisaExpressive: # US English (en-US) Lisa expressive neural voice enabled: false enUSMichaelExpressive: # US English (en-US) Michael expressive neural voice enabled: false enGBGeorgeExpressive: # UK English (en-GB) George expressive neural voice enabled: false ptBRLucasExpressive: # Brazilian Portuguese (pt-BR) Lucas expressive neural voice enabled: false ########################################## # Text to Speech natural voices ########################################## enAUHeidiNatural: # Australian English (en-AU) Heidi natural voice enabled: false enAUJackNatural: # Australian English (en-AU) Jack natural voice enabled: false enCAHannahNatural: # Canadian English (en-CA) Hannah natuarl voice enabled: false enGBChloeNatural: # UK English (en-GB) Chloe natural voice enabled: false enGBGeorgeNatural: # UK English (en-GB) George natural voice enabled: false enUSEllieNatural: # US English (en-US) Ellie natural voice enabled: false enUSEmmaNatural: # US English (en-US) Emma natural voice enabled: false enUSEthanNatural: # US English (en-US) Ethan natural voice enabled: false enUSJacksonNatural: # US English (en-US) Jackson natural voice enabled: false enUSVictoriaNatural: # US English (en-US) Victoria natural voice enabled: false esLAAlejandroNatural:# Latin American Spanish (es-LA) Alejandro natural voice enabled: false esLADanielaNatural: # Latin American Spanish (es-LA) Daniela natural voice enabled: false ptBRLucasNatural: # Brazilian Portuguese (pt-BR) Lucas natural voice enabled: false ptBRCamilaNatural: # Brazilian Portuguese (pt-BR) Camila natural voice enabled: false
User data and node affinity properties
The Speech custom resource includes properties that you can use to disable the storage and logging of user data and to specify node affinity:- Disabling the storage and logging of user data
-
By default, the Speech to Text runtime, Text to Speech runtime, and Speech to Text customization AM patcher temporarily store payload data in the running container. The data includes audio files, recognition hypotheses, and annotations that represent user data. The default values for the
skipAudioAndResultLoggingproperties specify the following values, which allow the storage and logging of user data:################################### # Storage and logging of user data ################################### sttRuntime: skipAudioAndResultLogging: "false" # If true, disables storage and logging of user data sttAMPatcher: skipAudioAndResultLogging: "false" # If true, disables storage and logging of user data ttsRuntime: skipAudioAndResultLogging: "false" # If true, disables storage and logging of user dataYou can disable the storage and logging of user data by setting the properties to
true. Setting these properties totruealso removes sensitive information from container logs and might significantly reduce the size of the logs.################################### # Storage and logging of user data ################################### sttRuntime: skipAudioAndResultLogging: "true" # If true, disables storage and logging of user data sttAMPatcher: skipAudioAndResultLogging: "true" # If true, disables storage and logging of user data ttsRuntime: skipAudioAndResultLogging: "true" # If true, disables storage and logging of user data
- Specifying node affinity
-
In Kubernetes, node affinity defines constraints for pods so that they land on the nodes that satisfy the affinity constraints. Affinity and anti-affinity settings greatly expand the types of constraints that you can express for your pods. For more information about node affinity, see Affinity and anti-affinity in the Kubernetes documentation.
By default, the Speech services use the following node affinity specifications. The default affinity allows any pod to run on any amd64 node.
################ # Node affinity ################ affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: beta.kubernetes.io/arch operator: In values: - amd64You can update the values for the
affinityproperty if you want the Speech services deployment pods to land on specific nodes. For example, the following specification replaces the default values to enable affinity for the designated nodes:################ # Node affinity ################ affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/e2e-az-name operator: In values: - e2e-az1 - e2e-az2
Configure autoscaling (HPA) for gateway resource
Make changes to the Watson Speech custom resource (CR) to enable autoscaling and optionally modify the number of maximum replicas, target CPU utilization percentage. You can edit the Speech CR by using the following command:
watsonspeech speech-cr -n ${PROJECT_CPD_INST_OPERANDS}
The following example is a sample CR:
apiVersion: speech.watson.ibm.com/v1
kind: WatsonSpeech
metadata:
name: speech-cr
spec:
# Required fields section
scaleConfig:
stt:
size: large
tts:
size: large
license:
accept: true
tags:
sttAsync: false
sttCustomization: true
ttsCustomization: true
sttRuntime: true
ttsRuntime: true
blockStorageClass: ocs-storagecluster-ceph-rbd
fileStorageClass: ocs-storagecluster-cephfs
global:
datastores:
gateway:
stt:
size: "large"
autoscaling:
enabled: true
maxReplicas: 3
cpuUtilization: 75
tts:
size: "large"
autoscaling:
enabled: true
maxReplicas: 3
cpuUtilization: 75
The following fields can be added:
- Gateway size and autoscaling configuration
.spec.global.datastores.gateway.<speech-service>.size: large- Size of the Watson Gateway service (large= 3 replicas,medium= 2 replicas). This configuration also determines the minimum number of replicas of the Watson Gateway service when autoscaling is enabled..spec.global.datastores.gateway.<speech-service>.autoscaling.enabled: true- Enables autoscaling for the Watson Gateway service..spec.global.datastores.gateway.<speech-service>.autoscaling.maxReplicas: 3- Maximum number of replicas for the Watson Gateway service. The default value is 3..spec.global.datastores.gateway.<speech-service>.autoscaling.cpuUtilization: 3- Target CPU utilization percentage of the Watson Gateway service. It is the threshold that determines when to scale your pods up or down based on their CPU usage. The default value is 75%.
Where <speech-service> is stt or tts,
depending on whether you are scaling up Speech to Text or Text to Speech.
Configure Speech transcript enrichment
After you install Watson Speech services, add the following spec to the Watson Speech custom resource (CR) to enable Speech transcript enrichment. This post-processing service adds punctuation and applies intelligent capitalization to enhance the structure and clarity of spoken content.
spec:
sttRuntime:
enrichments:
enabled: true
When enrichment is enabled, the Speech Operator creates the Inference foundation models (watsonx_ai_ifm) custom resource and deploys the
mistral-small-3-1-24b-instruct-2503 model. Wait until the model finishes deploying
and becomes available before you use the enrichment feature. This process might take approximately
30 minutes.
After installation, if you enable the enrichment feature in an air-gapped environment, you must also mirror the specified mistral model. For more information, see Models for Watson Speech to Text.
For more information on the prerequisite software that must be installed, see Installing prerequisite software.
Configure bucketSuffix when namespace name exceeds maximum length
The Amazon S3 bucket name limit of 63
characters affects how Watson Speech services base model buckets are generated. By default, the base
model bucket follows the pattern speech-service-base-models-<bucketSuffix>,
where bucketSuffix is set to ibm-<release-name>-<namespace>.
Given this structure, the namespace value can be no longer than 21 characters to remain within the
S3 naming constraint. If you use a namespace that exceeds this limit, you must override the default
bucketSuffix in the Watson Speech services custom resource (CR) to ensure the
generated bucket name stays within the required length. If you do not override the default
bucketSuffix, the model and voice upload jobs fail and return an error.
spec:
global:
datastores:
s3:
bucketSuffix: "ibm-{{ releaseName }}-{{ speechNamespace }}"