Advanced configuration for the Watson Speech services
Custom resource topics
The following information can help you understand and make the best use of the properties of the Watson Speech services custom resource:
- Usage notes provides important information about installing and modifying your Watson Speech services installation and configuration.
- Editing the custom resource provides brief instructions for editing the custom resource.
- Full custom resource shows an example of a full Watson Speech services custom resource with all possible properties and default values.
- License, version, storage class, and scaling properties describes the properties that you use to accept the product license and to specify the product version, storage classes, and scaling for your installation.
- Datastores properties describes the properties that you use to configure the Multicloud Object Gateway, PostgreSQL, and RabbitMQ datastores for your installation.
- Speech microservices, models, and voices properties describes the properties that you use to tailor the installation and configuration of the Speech services microservices, models, and voices to meet your application needs.
- User data and node affinity properties describes the properties that you use to disable the storage and logging of user data and to specify node affinity.
For sample custom resources that install both Watson Speech services, only Watson Speech to Text, or only Watson Text to Speech, see Sample Watson Speech services custom resources.
Usage notes
Keep in mind the following usage notes when installing the Watson Speech services:
Speech to Text models (
sttModels
) and Text to Speech voices (ttsVoices
) are installed only if their corresponding runtimes (sttRuntime
andttsRuntime
) are installed.You cannot uninstall individual service microservices once they are installed. Changing their values in the custom resource from
true
tofalse
has no effect. To remove any of the following microservices, you must uninstall the Watson Speech services in their entirety and reinstall only the microservices that you need: Speech to Text runtime (sttRuntime
), Speech to Text asynchronous HTTP (sttAsync
), Speech to Text customization (sttCustomization
), Text to Speech runtime (ttsRuntime
), and Text to Speech customization (ttsCustomization
).You can change any of the other values later by editing the custom resource that is created during installation. For more information, see Editing the custom resource.
After installing the Watson Speech services, change aspects of the installation and configuration only if needed. The default values are sufficient for most users.
Editing the custom resource
You can edit the custom resource to modify many aspects of your Watson Speech services installation and configuration. To modify the custom resource, enter the following command:
oc edit watsonspeech ${CUSTOM_RESOURCE_SPEECH}
The command opens the most recent custom resource in your editor. Modify the custom resource for the operation you are performing, then save the custom resource and exit your editor.
The Watson Speech operator picks up the changes to the custom resource on its next reconciliation loop, which is a periodic process that it runs to ensure that your installation reflects the latest custom resource. It might take up to 20 minutes for the operator to pick up the latest changes. It then takes more time for your installation to be updated with the changes.
Full custom resource
The complete custom resource for the Speech services includes all of the properties that you can include with the Speech custom resource. This full version of the custom resource installs both runtimes for Watson Speech services. It shows the default values for all properties.
apiVersion: speech.watson.ibm.com/v1
kind: WatsonSpeech
metadata:
name: speech-cr # The recommended name of the custom resource
namespace: ${PROJECT_CPD_INST_OPERANDS} # The project (namespace) name where you plan to install the Speech services
spec:
license:
accept: true
version: 4.7.0 # Omit this property to always install the latest version
##################
# Storage classes
##################
blockStorageClass: "portworx-db-gp3-sc" # The block storage class, for example, "portworx-db-gp3-sc"
fileStorageClass: "portworx-shared-gp3" # The file storage class, for example, "portworx-shared-gp3"
########################
# Configuration scaling
########################
scaleConfig:
stt:
size: xsmall # Size of Speech to Text configuration: xsmall, small, medium, large, or custom
tts:
size: xsmall # Size of Text to Speech configuration: xsmall, small, medium, large, or custom
###############
# Request CPUs
###############
sttAMPatcher:
resources:
requestsCPU: 1
################################
# Speech services microservices
################################
tags:
sttRuntime: true # Enables the Speech to Text runtime microservice
sttAsync: false # Enables the Speech to Text asynchronous HTTP microservice
sttCustomization: false # Enables the Speech to Text customization microservice
ttsRuntime: true # Enables the Text to Speech runtime microservice
ttsCustomization: false # Enables the Text to Speech customization microservice
#############
# Datastores
#############
global:
datastores:
##################################
# Multicloud Gateway object store
##################################
s3:
# Secrets
authSecretName: "noobaa-account-watson-speech"
###########################
# The PostgreSQL datastore
###########################
postgressql:
# Sizing configuration
replicas: 3 # Number of replica nodes for PostgreSQL.
databaseMemoryLimit: 5Gi # Maximum memory that PostgreSQL can use.
databaseMemoryRequest: 1Gi # Default memory requested by PostgreSQL.
databaseCPULimit: 1000m # Maximum CPU that PostgreSQL can use.
databaseCPU: 500m # Default CPU requested by PostgreSQL.
databaseStorageRequest: 5Gi # Maximum size of a PostgreSQL database storage request.
# Storage configuration
blockStorageClass: "{{ blockStorageClass }}" # Storage class that is used by PostgreSQL.
# Secrets
createSecret: true # True: Speech operator generates a secret.
# False: User provides existing secret via authSecretName property.
authSecretName: "<speech-cr>-postgres-auth-secret" # Name of PostgreSQL secrets object.
#########################
# The RabbitMQ datastore
#########################
rabbitMQ:
# Sizing configuration
replicas: 3 # Number of replica pods for RabbitMQ.
cpuRequest: 200m # Default CPU requested by RabbitMQ.
cpuLimit: 200m # Maximum CPU that RabbitMQ can use.
memoryRequest: 256Mi # Default memory requested by RabbitMQ.
memoryLimit: 256Mi # Maximum memory that RabbitMQ can use.
# Storage configuration
blockStorageClass: "{{ blockStorageClass }}" # Storage class that is used by RabbitMQ.
pvSize: 5Gi # Size of the persistent volume for RabbitMQ.
# Secrets
authSecretName: "<speech-cr>-ibm-rabbitmq-auth-secret" # Name of RabbitMQ secrets object.
############################################
# Speech to Text previous-generation models
############################################
defaultSTTModel: en-US_BroadbandModel # Default model for speech recognition
sttModels:
enUsBroadbandModel: # US English (en-US) Broadband model
enabled: true
enUsNarrowbandModel: # US English (en-US) Narrowband model
enabled: true
enUsShortFormNarrowbandModel: # US English (en-US) Short-Form Narrowband model
enabled: true
arMsBroadbandModel: # Modern Standard Arabic (ar-MS) Broadband model
enabled: false
deDeBroadbandModel: # German (de-DE) Broadband model
enabled: false
deDeNarrowbandModel: # German (de-DE) Narrowband model
enabled: false
enAuBroadbandModel: # Australian English (en-AU) Broadband model
enabled: false
enAuNarrowbandModel: # Australian English (en-AU) Narrowband model
enabled: false
enGbBroadbandModel: # UK English (en-GB) Broadband model
enabled: false
enGbNarrowbandModel: # UK English (en-GB) Narrowband model
enabled: false
esEsBroadbandModel: # Castilian Spanish (es-ES, es-AR, es-CL, es-CO, es-MX, and es-PE) Broadband models
enabled: false
esEsNarrowbandModel: # Castilian Spanish (es-ES, es-AR, es-CL, es-CO, es-MX, and es-PE) Narrowband models
enabled: false
frCaBroadbandModel: # Canadian French (fr-CA) Broadband model
enabled: false
frCaNarrowbandModel: # Canadian French (fr-CA) Narrowband model
enabled: false
frFrBroadbandModel: # French (fr-FR) Broadband model
enabled: false
frFrNarrowbandModel: # French (fr-FR) Narrowband model
enabled: false
itItBroadbandModel: # Italian (it-IT) Broadband model
enabled: false
itItNarrowbandModel: # Italian (it-IT) Narrowband model
enabled: false
jaJpBroadbandModel: # Japanese (ja-JP) Broadband model
enabled: false
jaJpNarrowbandModel: # Japanese (ja-JP) Narrowband model
enabled: false
koKrBroadbandModel: # Korean (ko-KR) Broadband model
enabled: false
koKrNarrowbandModel: # Korean (ko-KR) Narrowband model
enabled: false
nlNlBroadbandModel: # Dutch (nl-NL) Broadband model
enabled: false
nlNlNarrowbandModel: # Dutch (nl-NL) Narrowband model
enabled: false
ptBrBroadbandModel: # Brazilian Portuguese (pt-BR) Broadband model
enabled: false
ptBrNarrowbandModel: # Brazilian Portuguese (pt-BR) Narrowband model
enabled: false
zhCnBroadbandModel: # Mandarin Chinese (zh-CN) Broadband model
enabled: false
zhCnNarrowbandModel: # Mandarin Chinese (zh-CN) Narrowband model
enabled: false
########################################
# Speech to Text next-generation models
########################################
enUsMultimedia: # US English (en-US) Multimedia model
enabled: true
enUsTelephony: # US English (en-US) Telephony model
enabled: true
arMsTelephony: # Modern Standard Arabic (ar-MS) Telephony model
enabled: false
csCZTelephony: # Czech (cs-CZ) Telephony model
enabled: false
deDeMultimedia: # German (de-DE) Multimedia model
enabled: false
deDeTelephony: # German (de-DE) Telephony model
enabled: false
enAuMultimedia: # Australian English (en-AU) Multimedia model
enabled: false
enAuTelephony: # Australian English (en-AU) Telephony model
enabled: false
enGbMultimedia: # UK English (en-GB) Multimedia model
enabled: false
enGbTelephony: # UK English (en-GB) Telephony model
enabled: false
enInTelephony: # Indian English (en-IN) Telephony model
enabled: false
enWwMedicalTelephony: # English (all supported dialects) Medical Telephony model
enabled: false
esEsMultimedia: # Castilian Spanish (es-ES) Multimedia model
enabled: false
esEsTelephony: # Castilian Spanish (es-ES) Telephony model
enabled: false
esLaTelephony: # Latin American Spanish (es-LA) Telephony model
enabled: false
frCaMultimedia: # Canadian French (fr-CA) Multimedia model
enabled: false
frCaTelephony: # Canadian French (fr-CA) Telephony model
enabled: false
frFrMultimedia: # French (fr-FR) Multimedia model
enabled: false
frFrTelephony: # French (fr-FR) Telephony model
enabled: false
hiInTelephony: # Indian Hindi (hi-IN) Telephony model
enabled: false
itItMultimedia: # Italian (it-IT) Multimedia model
enabled: false
itItTelephony: # Italian (it-IT) Telephony model
enabled: false
jaJpMultimedia: # Japanese (ja-JP) Multimedia model
enabled: false
jaJpTelephony: # Japanese (ja-JP) Telephony model
enabled: false
koKrMultimedia: # Korean (ko-KR) Multimedia model
enabled: false
koKrTelephony: # Korean (ko-KR) Telephony model
enabled: false
nlBeTelephony: # Belgian Dutch (nl-BE) Telephony model
enabled: false
nlNlMultimedia: # Netherlands Dutch (nl-NL) Multimedia model
enabled: false
nlNlTelephony: # Netherlands Dutch (nl-NL) Telephony model
enabled: false
ptBrMultimedia: # Brazilian Portuguese (pt-BR) Multimedia model
enabled: false
ptBrTelephony: # Brazilian Portuguese (pt-BR) Telephony model
enabled: false
svSeTelephony: # Swedish (sv-SE) Telephony model
enabled: false
zhCnTelephony: # Mandarin Chinese (zh-CN) Telephony model
enabled: false
########################################
# Text to Speech enhanced neural voices
########################################
defaultTTSVoice: en-US_MichaelV3Voice # Default voice for speech synthesis
ttsVoices:
enUSAllisonV3Voice: # US English (en-US) Allison enhanced neural voice
enabled: true
enUSLisaV3Voice: # US English (en-US) Lisa enhanced neural voice
enabled: true
enUSMichaelV3Voice: # US English (en-US) Michael enhanced neural voice
enabled: true
enUSEmilyV3Voice: # US English (en-US) Emily enhanced neural voice
enabled: false
enUSHenryV3Voice: # US English (en-US) Henry enhanced neural voice
enabled: false
enUSKevinV3Voice: # US English (en-US) Kevin enhanced neural voice
enabled: false
enUSOliviaV3Voice: # US English (en-US) Olivia enhanced neural voice
enabled: false
deDEBirgitV3Voice: # German (de-DE) Birgit enhanced neural voice
enabled: false
deDEDieterV3Voice: # German (de-DE) Dieter enhanced neural voice
enabled: false
deDEErikaV3Voice: # German (de-DE) Erika enhanced neural voice
enabled: false
enGBCharlotteV3Voice: # UK English (en-GB) Charlotte enhanced neural voice
enabled: false
enGBJamesV3Voice: # UK English (en-GB) James enhanced neural voice
enabled: false
enGBKateV3Voice: # UK English (en-GB) Kate enhanced neural voice
enabled: false
esESEnriqueV3Voice: # Castilian Spanish (es-ES) Enrique enhanced neural voice
enabled: false
esESLauraV3Voice: # Castilian Spanish (es-ES) Laura enhanced neural voice
enabled: false
esLASofiaV3Voice: # Latin American Spanish (es-LA) Sofia enhanced neural voice
enabled: false
esUSSofiaV3Voice: # North American Spanish (es-US) Sofia enhanced neural voice
enabled: false
frCALouiseV3Voice: # French Canadian (fr-CA) Louise enhanced neural voice
enabled: false
frFRNicolasV3Voice: # French (fr-FR) Nicolas enhanced neural voice
enabled: false
frFRReneeV3Voice: # French (fr-FR) Renee enhanced neural voice
enabled: false
itITFrancescaV3Voice: # Italian (it-IT) Francesca enhanced neural voice
enabled: false
jaJPEmiV3Voice: # Japanese (ja-JP) Emi enhanced neural voice
enabled: false
koKRJinV3Voice: # Korean (ko-KR) Jin enhanced neural voice
enabled: false
nlNLMerelV3Voice: # Netherlands Dutch (nl-NL) Merel enhanced neural voice
enabled: false
ptBRIsabelaV3Voice: # Brazilian Portuguese (pt-BR) Isabela enhanced neural voice
enabled: false
##########################################
# Text to Speech expressive neural voices
##########################################
enAUHeidiExpressive: # Australian English (en-AU) Heidi expressive neural voice
enabled: false
enAUJackExpressive: # Australian English (en-AU) Jack expressive neural voice
enabled: false
enUSAllisonExpressive: # US English (en-US) Allison expressive neural voice
enabled: false
enUSEmmaExpressive: # US English (en-US) Emma expressive neural voice
enabled: false
enUSLisaExpressive: # US English (en-US) Lisa expressive neural voice
enabled: false
enUSMichaelExpressive: # US English (en-US) Michael expressive neural voice
enabled: false
#####################
# Backup and restore
#####################
backupRestore:
components:
backup:
name: "backup"
backupVolume: "{{ releaseName }}-aux-s3-backup-pvc"
volumeSize: 10Gi
###################################
# Storage and logging of user data
###################################
sttRuntime:
skipAudioAndResultLogging: "true" # If true, disables storage and logging of user data
sttAMPatcher:
skipAudioAndResultLogging: "true" # If true, disables storage and logging of user data
ttsRuntime:
skipAudioAndResultLogging: "true" # If true, disables storage and logging of user data
################
# Node affinity
################
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: beta.kubernetes.io/arch
operator: In
values:
- amd64
License, version, storage class, and scaling properties
The Speech custom resource provides properties that you use to accept the product license and to specify the product version, storage class, and scaling for your installation:
- Acknowledging license acceptance
- Specifying a version
- Specifying storage classes
- Scaling your installation
- Specifying request CPUs
Acknowledging license acceptance
To install the Speech services, you must read, understand, and accept the license terms. You are strongly encouraged to read the full terms of the license agreement. (The link requires you to leave the IBM web site.)
You must set the license
accept
property to true
. The installation fails if you set the
property to false
.
license:
accept: true
Specifying a version
The version
property specifies the version of the Speech services that the
operator is to install. The property's value is provided by IBM. Future releases of the Speech
services will increment the version to greater values.
version: 4.7.0 # Omit this property to always install the latest version
The version
property disconnects the Speech operator from the operator service.
The Speech operator can handle multiple versions of the Speech services. You can continue to run
older versions of the Speech services on newer versions of the operator. You can then upgrade to
later versions of the Speech services at your discretion.
Alternatively, you can omit the version
property entirely from the custom
resource. Doing so allows you to manage your Speech services (operand) automatically. If you omit
the version
property, the Watson Speech operator always installs the latest
available version of the Speech services. It also automatically upgrades your installation to
the latest version of the Speech services as new versions become available.
Specifying storage classes
The blockStorageClass
and fileStorageClass
properties specify
the persistent storage the Speech services are to use. For example, the following example uses
Portworx by specifying the storage classes
"portworx-db-gp3-sc"
and portworx-shared-gp3
:
##################
# Storage classes
##################
blockStorageClass: "portworx-db-gp3-sc" # The block storage class, for example, "portworx-db-gp3-sc"
fileStorageClass: "portworx-shared-gp3" # The file storage class, for example, "portworx-shared-gp3"
For more information about the available block and file storage classes for the persistent storage for Watson Speech services, see:
- Storage requirements in Information you need to complete this task
- Setting up persistent storage
Scaling your configuration
The scaleConfig
property specifies the size of the installation for the Speech services.
Possible sizes include xsmall
, small
, medium
, large
,
and custom
. By default, the services use the following xsmall
values:
########################
# Configuration scaling
########################
scaleConfig:
stt:
size: xsmall # Size of Speech to Text configuration: xsmall, small, medium, large, or custom
tts:
size: xsmall # Size of Text to Speech configuration: xsmall, small, medium, large, or custom
The values for the Speech to Text and Text to Speech services are
independent. You can specify different values for the size
properties. For example,
the following values increase the size of the Speech to Text configuration but leave the Text to
Speech configuration unchanged:
########################
# Configuration scaling
########################
scaleConfig:
stt:
size: medium
tts:
size: xsmall
You can start by using the default values and scale your configuration as your usage grows. During
installation, these properties are specified by the watson_speech_stt_scale_config
and
watson_speech_tts_scale_config
options.
- For more information about scaling your configuration, see Scaling up your Watson Speech services installation.
Specifying request CPUs
The sttAMPatcher
microservice manages acoustic model customization for the Speech
to Text service. It is automatically installed with the sttCustomization
microservice.
The AM Patcher uses a dedicated number of CPUs to handle requests. The
sttAMPatcher.resources.requestsCPU
property specifies the number of CPUs that
are dedicated to handling acoustic model training requests by the AM Patcher microservice.
###############
# Request CPUs
###############
sttAMPatcher:
resources:
requestsCPU: 1
If you experience training failures for custom acoustic models in the form of the following
messages, increase the value of the requestsCPU
property from 1
to 5
:
Unresponsive backend detected. Please try later.
Allocating more resources prevents this error and enables custom acoustic models to be trained as expected. Increasing the value of the property increases the size of the deployment. For more information, see also Training of custom acoustic models is failing.
Datastores properties
The Speech custom resource provides properties that you can use to configure the Multicloud Object Gateway, PostgreSQL, and RabbitMQ datastores for your installation:
- Configuring Multicloud Object Gateway
- Configuring the PostgreSQL datastore
- Configuring the RabbitMQ datastore
The individual datastores are installed only with the following microservices:
The Multicloud Object Gateway must be installed before all Watson Speech services installs. The Speech runtime microservices depend on it, as does
sttCustomization
. For installation instructions, see Installing Multicloud Object Gateway for IBM Cloud Pak for DataThe PostgreSQL datastore is installed only if at least one of the following microservices is enabled:
sttAsync
,sttCustomization
, orttsCustomization
. If the dependent microservices are disabled at a later date, PostgreSQL remains installed but is not used.Note: Prior to version 4.6.0, PostgreSQL was always installed with the Speech services. If you are an existing user who enabled only the runtime microservices, PostgreSQL remains installed but is not used. In this case, PostgreSQL also remains installed across upgrades.The RabbitMQ datastore is installed only if the
sttAsync
microservice is enabled.
Configuring the Multicloud Object Gateway
The Multicloud Object Gateway is an object storage solution that contains the following stateful data for the Speech services:
- For Speech to Text and Text to Speech, models and voices that are installed with the services.
- For Speech to Text, transcription results while a speech recognition job is in progress.
- For Speech to Text, binary patches for trained models, grammars and corpora for custom language models, and audio files for custom acoustic models.
You can use the following properties to configure Multicloud Object Gateway for your installation. The definition
shows the default configuration values for the datastore's properties. These properties are indented
beneath global.datastores
.
Note that the secret authSecretName
is created in the Creating secrets for
services that use Multicloud Object Gateway section.
##################################
# Multicloud Gateway object store
##################################
s3:
# Secrets
authSecretName: "noobaa-account-watson-speech"
Configuring the PostgreSQL datastore
PostgreSQL is an open-source relational database that contains the stateful data for the following Speech microservices:
- For Speech to Text, custom language models and custom acoustic models.
- For Speech to Text, all asynchronous HTTP jobs for the past week. Entries older than one week are automatically purged.
- For Text to Speech, custom models and speaker models.
You can use the following properties to configure the PostgreSQL datastore for your installation.
The definition shows the default configuration values for the datastore's properties. These
properties are indented beneath global.datastores
.
Note that the value {{ blockStorageClass }}
is replaced by the value specified
for the metadata.blockStorageClass
property. In the full custom resource shown earlier,
the value is portworx-db-gp3-sc
.
###########################
# The PostgreSQL datastore
###########################
postgressql:
# Sizing configuration
replicas: 3 # Number of replica nodes for PostgreSQL.
databaseMemoryLimit: 5Gi # Maximum memory that PostgreSQL can use.
databaseMemoryRequest: 1Gi # Default memory requested by PostgreSQL.
databaseCPULimit: 1000m # Maximum CPU that PostgreSQL can use.
databaseCPU: 500m # Default CPU requested by PostgreSQL.
databaseStorageRequest: 5Gi # Maximum size of a PostgreSQL database storage request.
enablePodMonitor: false # Enables the PostgreSQL operator to monitor the PostgreSQL pods.
# Storage configuration
blockStorageClass: "{{ blockStorageClass }}" # Storage class that is used by PostgreSQL.
# Secrets
authSecretName: "<speech-cr>-postgres-auth-secret" # Name of PostgreSQL secrets object.
- If you plan to use a user-provided secret for the PostgreSQL datastore, see Creating a secrets object for your PostgreSQL datastore and Updating secrets objects for your datastores after you install the Speech services.
- For more information about scaling up the number of replicas for the datastore, see Scaling up the PostgreSQL datastore.
- For more information about monitoring the PostgreSQL datastore, see Monitoring the PostgreSQL datastore for Watson Speech services.
Configuring the RabbitMQ datastore
The RabbitMQ datastore handles non-persistent message queuing for the Speech to Text asynchronous
HTTP microservice (sttAsync
).
You can use the following properties to configure the RabbitMQ datastore for your installation. The
definition shows the default configuration values for the datastore's properties. These properties are
indented beneath global.datastores
.
Note that the value {{ blockStorageClass }}
is replaced by the value specified for the
metadata.blockStorageClass
property. In the full custom resource shown earlier, the value
is portworx-db-gp3-sc
.
#########################
# The RabbitMQ datastore
#########################
rabbitMQ:
# Sizing configuration
replicas: 3 # Number of replica pods for RabbitMQ.
cpuRequest: 200m # Default CPU requested by RabbitMQ.
cpuLimit: 200m # Maximum CPU that RabbitMQ can use.
memoryRequest: 256Mi # Default memory requested by RabbitMQ.
memoryLimit: 256Mi # Maximum memory that RabbitMQ can use.
# Storage configuration
blockStorageClass: "{{ blockStorageClass }}" # Storage class that is used by RabbitMQ.
pvSize: 5Gi # Size of the persistent volume for RabbitMQ.
# Secrets
authSecretName: "<speech-cr>-ibm-rabbitmq-auth-secret" # Name of RabbitMQ secrets object.
- If you plan to use a user-provided secret for the RabbitMQ datastore, see Creating a secrets object for your RabbitMQ datastore and Updating secrets objects for your datastores after you install the Speech services.
- For more information about using these properties to scale up the number of replicas for the datastore, see Scaling up the RabbitMQ datastore.
Speech microservices, models, and voices properties
The Speech custom resource provides a rich set of properties that you can use to tailor the installation and configuration of the Speech service to meet your application needs. These properties specify the Speech services microservices, models, and voices that are to be installed.
- Installing Speech services microservices
- Installing Speech to Text models
- Installing Text to Speech voices
Installing Speech services microservices
You use these five properties to specify the Speech microservices to install. The properties specify the functionality that is available for the Speech services. By default, the runtimes are enabled, but you can enable or disable them separately. The following example installs the Speech to Text and Text to Speech runtimes:
################################
# Speech services microservices
################################
tags:
sttRuntime: true # Enables the Speech to Text runtime microservice
sttAsync: false # Enables the Speech to Text asynchronous HTTP microservice
sttCustomization: false # Enables the Speech to Text customization microservice
ttsRuntime: true # Enables the Text to Speech runtime microservice
ttsCustomization: false # Enables the Text to Speech customization microservice
The properties provide the following service capabilities:
sttRuntime
Speech to Text runtime, the base microservice for speech recognition. This value enables the
/v1/recognize
interfaces (synchronous HTTP and WebSocket). Enabling either of the other Speech to Text microservices automatically enables the Speech to Text runtime. Speech to Text models (stt-models
) are installed only if the runtime is installed. During installation, this property is specified by thewatson_speech_enable_stt_runtime
option.For more information, see The synchronous HTTP interface and The WebSocket interface.
sttAsync
Speech to Text asynchronous HTTP. This value enables the
/v1/recognitions
interface. During installation, this property is specified by thewatson_speech_enable_stt_async
option.For more information, see The asynchronous HTTP interface.
sttCustomization
Speech to Text customization. This value enables the
/v1/customizations
and/v1/acoustic_customizations
interfaces for language model and acoustic model customization. (ThesttAMPatcher
microservice, the backend microservice for acoustic model customization, is automatically installed with thesttCustomization
microservice.) During installation, this property is specified by thewatson_speech_enable_stt_customization
option.For more information, see Understanding customization.
ttsRuntime
Text to Speech runtime, the base microservice for speech synthesis. This value enables the
/v1/synthesize
interfaces (HTTP and WebSocket). Enabling the Text to Speech customization microservice automatically enables the Text to Speech runtime. Text to Speech voices (tts-voices
) are installed only if the runtime is installed. During installation, this property is specified by thewatson_speech_enable_tts_runtime
option.For more information, see The HTTP interface and The WebSocket interface.
ttsCustomization
Text to Speech customization. This value enables the
/v1/customizations
interface for customization. During installation, this property is specified by thewatson_speech_enable_tts_customization
option.For more information, see Understanding customization.
You can combine installation of the different microservices to install the following functionality:
- To install Speech to Text only, set
ttsRuntime
andttsCustomization
tofalse
. - To install Text to Speech only, set
sttRuntime
,sttAsync
, andsttCustomization
tofalse
. - To install both Speech to Text and Text to Speech without enabling customization, set
sttCustomization
andttsCustomization
tofalse
.
Installing Speech to Text models
You can use the following properties to specify the models to install. Installing all models substantially increases the memory requirements. You are therefore strongly encouraged to install only those models that you intend to use.
When choosing between previous- and next-generation models, prefer next-generation models where possible. Next-generation models offer improved speech recognition over previous-generation models. They also require fewer resources (CPU and memory) than previous-generation models. By default, the dynamic resource calculation feature automatically computes the exact amount of memory that is required for the selected models.
The defaultSTTModel
property sets the default model for speech recognition. If
you install and use models in languages other than US English, set the default to the model you expect
to use most often.
Speech to Text models are installed only if the sttRuntime
microservice is installed.
To install a model, set its enabled
property to true
in the custom
resource. Set the property to false
to indicate that the model is not to be installed.
The following properties show the default values. By default, only the US English previous- and
next-generation models are installed. These properties are indented beneath global
.
All available models are indented beneath sttModels
. During installation, Speech to
Text models are specified by the watson_speech_models
option.
- For more information about all available previous-generation models, see Previous-generation languages and models.
- For more information about all available next-generation models, see Next-generation languages and models.
- You can change which models are installed at any time. For more information, see Updating models and voices for your Watson Speech services.
############################################
# Speech to Text previous-generation models
############################################
defaultSTTModel: en-US_BroadbandModel # Default model for speech recognition
sttModels:
enUsBroadbandModel: # US English (en-US) Broadband model
enabled: true
enUsNarrowbandModel: # US English (en-US) Narrowband model
enabled: true
enUsShortFormNarrowbandModel: # US English (en-US) Short-Form Narrowband model
enabled: true
arMsBroadbandModel: # Modern Standard Arabic (ar-MS) Broadband model
enabled: false
deDeBroadbandModel: # German (de-DE) Broadband model
enabled: false
deDeNarrowbandModel: # German (de-DE) Narrowband model
enabled: false
enAuBroadbandModel: # Australian English (en-AU) Broadband model
enabled: false
enAuNarrowbandModel: # Australian English (en-AU) Narrowband model
enabled: false
enGbBroadbandModel: # UK English (en-GB) Broadband model
enabled: false
enGbNarrowbandModel: # UK English (en-GB) Narrowband model
enabled: false
esEsBroadbandModel: # Castilian Spanish (es-ES, es-AR, es-CL, es-CO, es-MX, and es-PE) Broadband models
enabled: false
esEsNarrowbandModel: # Castilian Spanish (es-ES, es-AR, es-CL, es-CO, es-MX, and es-PE) Narrowband models
enabled: false
frCaBroadbandModel: # Canadian French (fr-CA) Broadband model
enabled: false
frCaNarrowbandModel: # Canadian French (fr-CA) Narrowband model
enabled: false
frFrBroadbandModel: # French (fr-FR) Broadband model
enabled: false
frFrNarrowbandModel: # French (fr-FR) Narrowband model
enabled: false
itItBroadbandModel: # Italian (it-IT) Broadband model
enabled: false
itItNarrowbandModel: # Italian (it-IT) Narrowband model
enabled: false
jaJpBroadbandModel: # Japanese (ja-JP) Broadband model
enabled: false
jaJpNarrowbandModel: # Japanese (ja-JP) Narrowband model
enabled: false
koKrBroadbandModel: # Korean (ko-KR) Broadband model
enabled: false
koKrNarrowbandModel: # Korean (ko-KR) Narrowband model
enabled: false
nlNlBroadbandModel: # Dutch (nl-NL) Broadband model
enabled: false
nlNlNarrowbandModel: # Dutch (nl-NL) Narrowband model
enabled: false
ptBrBroadbandModel: # Brazilian Portuguese (pt-BR) Broadband model
enabled: false
ptBrNarrowbandModel: # Brazilian Portuguese (pt-BR) Narrowband model
enabled: false
zhCnBroadbandModel: # Mandarin Chinese (zh-CN) Broadband model
enabled: false
zhCnNarrowbandModel: # Mandarin Chinese (zh-CN) Narrowband model
enabled: false
########################################
# Speech to Text next-generation models
########################################
enUsMultimedia: # US English (en-US) Multimedia model
enabled: true
enUsTelephony: # US English (en-US) Telephony model
enabled: true
arMsTelephony: # Modern Standard Arabic (ar-MS) Telephony model
enabled: false
csCZTelephony: # Czech (cs-CZ) Telephony model
enabled: false
deDeMultimedia: # German (de-DE) Multimedia model
enabled: false
deDeTelephony: # German (de-DE) Telephony model
enabled: false
enAuMultimedia: # Australian English (en-AU) Multimedia model
enabled: false
enAuTelephony: # Australian English (en-AU) Telephony model
enabled: false
enGbMultimedia: # UK English (en-GB) Multimedia model
enabled: false
enGbTelephony: # UK English (en-GB) Telephony model
enabled: false
enInTelephony: # Indian English (en-IN) Telephony model
enabled: false
enWwMedicalTelephony: # English (all supported dialects) Medical Telephony model
enabled: false
esEsMultimedia: # Castilian Spanish (es-ES) Multimedia model
enabled: false
esEsTelephony: # Castilian Spanish (es-ES) Telephony model
enabled: false
esLaTelephony: # Latin American Spanish (es-LA) Telephony model
enabled: false
frCaMultimedia: # Canadian French (fr-CA) Multimedia model
enabled: false
frCaTelephony: # Canadian French (fr-CA) Telephony model
enabled: false
frFrMultimedia: # French (fr-FR) Multimedia model
enabled: false
frFrTelephony: # French (fr-FR) Telephony model
enabled: false
hiInTelephony: # Indian Hindi (hi-IN) Telephony model
enabled: false
itItMultimedia: # Italian (it-IT) Multimedia model
enabled: false
itItTelephony: # Italian (it-IT) Telephony model
enabled: false
jaJpMultimedia: # Japanese (ja-JP) Multimedia model
enabled: false
jaJpTelephony: # Japanese (ja-JP) Telephony model
enabled: false
koKrMultimedia: # Korean (ko-KR) Multimedia model
enabled: false
koKrTelephony: # Korean (ko-KR) Telephony model
enabled: false
nlBeTelephony: # Belgian Dutch (nl-BE) Telephony model
enabled: false
nlNlMultimedia: # Netherlands Dutch (nl-NL) Multimedia model
enabled: false
nlNlTelephony: # Netherlands Dutch (nl-ML) Telephony model
enabled: false
ptBrMultimedia: # Brazilian Portuguese (pt-BR) Multimedia model
enabled: false
ptBrTelephony: # Brazilian Portuguese (pt-BR) Telephony model
enabled: false
svSeTelephony: # Swedish (sv-SE) Telephony model
enabled: false
zhCnTelephony: # Mandarin Chinese (zh-CN) Telephony model
enabled: false
Installing Text to Speech voices
You can use the following properties to specify the voices to install. All voices are enhanced
neural voices. You must indicate the individual voices that you want to install. To install a voice,
set its enabled
property to true
. Set the property to false
to indicate that the voice is not to be installed. The properties in the following example enable
installation of the default voices. By default, only a subset of the US English voices are installed.
By default, the dynamic resource calculation feature automatically computes the exact amount of memory that is required for the voices that you select to install. Installing more voices increases the memory requirements for the service. You are therefore encouraged to install only those voices that you intend to use.
The defaultTTSVoice
property sets the default voice for speech synthesis. If
you install and use voices in languages other than US English, set the default to the voice you
expect to use most often.
Text to Speech voices are installed only if the ttsRuntime
microservice is
installed. These properties are all indented beneath global
. All available voices
are indented beneath ttsVoices
. During installation, Text to Speech models are
specified by the watson_speech_voices
option.
- For more information about all available voices, see Using languages and voices.
- You can change which models are installed at any time. For more information, see Updating models and voices for your Watson Speech services.
########################################
# Text to Speech enhanced neural voices
########################################
defaultTTSVoice: en-US_MichaelV3Voice # Default voice for speech synthesis
ttsVoices:
enUSAllisonV3Voice: # US English (en-US) Allison enhanced neural voice
enabled: true
enUSLisaV3Voice: # US English (en-US) Lisa enhanced neural voice
enabled: true
enUSMichaelV3Voice: # US English (en-US) Michael enhanced neural voice
enabled: true
enUSEmilyV3Voice: # US English (en-US) Emily enhanced neural voice
enabled: false
enUSHenryV3Voice: # US English (en-US) Henry enhanced neural voice
enabled: false
enUSKevinV3Voice: # US English (en-US) Kevin enhanced neural voice
enabled: false
enUSOliviaV3Voice: # US English (en-US) Olivia enhanced neural voice
enabled: false
deDEBirgitV3Voice: # German (de-DE) Birgit enhanced neural voice
enabled: false
deDEDieterV3Voice: # German (de-DE) Dieter enhanced neural voice
enabled: false
deDEErikaV3Voice: # German (de-DE) Erika enhanced neural voice
enabled: false
enGBCharlotteV3Voice: # UK English (en-GB) Charlotte enhanced neural voice
enabled: false
enGBJamesV3Voice: # UK English (en-GB) James enhanced neural voice
enabled: false
enGBKateV3Voice: # UK English (en-GB) Kate enhanced neural voice
enabled: false
esESEnriqueV3Voice: # Castilian Spanish (es-ES) Enrique enhanced neural voice
enabled: false
esESLauraV3Voice: # Castilian Spanish (es-ES) Laura enhanced neural voice
enabled: false
esLASofiaV3Voice: # Latin American Spanish (es-LA) Sofia enhanced neural voice
enabled: false
esUSSofiaV3Voice: # North American Spanish (es-US) Sofia enhanced neural voice
enabled: false
frCALouiseV3Voice: # French Canadian (fr-CA) Louise enhanced neural voice
enabled: false
frFRNicolasV3Voice: # French (fr-FR) Nicolas enhanced neural voice
enabled: false
frFRReneeV3Voice: # French (fr-FR) Renee enhanced neural voice
enabled: false
itITFrancescaV3Voice: # Italian (it-IT) Francesca enhanced neural voice
enabled: false
jaJPEmiV3Voice: # Japanese (ja-JP) Emi enhanced neural voice
enabled: false
koKRJinV3Voice: # Korean (ko-KR) Jin enhanced neural voice
enabled: false
nlNLMerelV3Voice: # Netherlands Dutch (nl-NL) Merel enhanced neural voice
enabled: false
ptBRIsabelaV3Voice: # Brazilian Portuguese (pt-BR) Isabela enhanced neural voice
enabled: false
##########################################
# Text to Speech expressive neural voices
##########################################
enAUHeidiExpressive: # Australian English (en-AU) Heidi expressive neural voice
enabled: false
enAUJackExpressive: # Australian English (en-AU) Jack expressive neural voice
enabled: false
enUSAllisonExpressive: # US English (en-US) Allison expressive neural voice
enabled: false
enUSEmmaExpressive: # US English (en-US) Emma expressive neural voice
enabled: false
enUSLisaExpressive: # US English (en-US) Lisa expressive neural voice
enabled: false
enUSMichaelExpressive: # US English (en-US) Michael expressive neural voice
enabled: false
User data and node affinity properties
The Speech custom resource includes properties that you can use to disable the storage and logging of user data and to specify node affinity:Disabling the storage and logging of user data
By default, the Speech to Text runtime, Text to Speech runtime, and Speech to Text customization
AM patcher temporarily store payload data in the running container. The data includes audio files,
recognition hypotheses, and annotations that represent user data. The default values for the
skipAudioAndResultLogging
properties specify the following values, which allow
the storage and logging of user data:
###################################
# Storage and logging of user data
###################################
sttRuntime:
skipAudioAndResultLogging: "false" # If true, disables storage and logging of user data
sttAMPatcher:
skipAudioAndResultLogging: "false" # If true, disables storage and logging of user data
ttsRuntime:
skipAudioAndResultLogging: "false" # If true, disables storage and logging of user data
You can disable the storage and logging of user data by setting the properties to true
.
Setting these properties to true
also removes sensitive information from container
logs and might significantly reduce the size of the logs.
###################################
# Storage and logging of user data
###################################
sttRuntime:
skipAudioAndResultLogging: "true" # If true, disables storage and logging of user data
sttAMPatcher:
skipAudioAndResultLogging: "true" # If true, disables storage and logging of user data
ttsRuntime:
skipAudioAndResultLogging: "true" # If true, disables storage and logging of user data
Specifying node affinity
In Kubernetes, node affinity defines constraints for pods so that they land on the nodes that satisfy the affinity constraints. Affinity and anti-affinity settings greatly expand the types of constraints that you can express for your pods. For more information about node affinity, see Affinity and anti-affinity in the Kubernetes documentation.
By default, the Speech services use the following node affinity specifications. The default affinity allows any pod to run on any amd64 node.
################
# Node affinity
################
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: beta.kubernetes.io/arch
operator: In
values:
- amd64
You can update the values for the affinity
property if you want the Speech
services deployment pods to land on specific nodes. For example, the following specification
replaces the default values to enable affinity for the designated nodes:
################
# Node affinity
################
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/e2e-az-name
operator: In
values:
- e2e-az1
- e2e-az2