Advanced configuration for the Watson Speech services

A custom resource enables you to specify how Watson Speech services (Watson™ Speech to Text and Watson Text to Speech) are installed in your environment. After you install Watson Speech services, you can customize the installation by modifying the properties of the custom resource.

Custom resource topics

The following information can help you understand and make the best use of the properties of the Watson Speech services custom resource:

For sample custom resources that install both Watson Speech services, only Watson Speech to Text, or only Watson Text to Speech, see Sample Watson Speech services custom resources.

Usage notes

Keep in mind the following usage notes when installing the Watson Speech services:

  • Speech to Text models (sttModels) and Text to Speech voices (ttsVoices) are installed only if their corresponding runtimes (sttRuntime and ttsRuntime) are installed.

  • You cannot uninstall individual service microservices once they are installed. Changing their values in the custom resource from true to false has no effect. To remove any of the following microservices, you must uninstall the Watson Speech services in their entirety and reinstall only the microservices that you need: Speech to Text runtime (sttRuntime), Speech to Text asynchronous HTTP (sttAsync), Speech to Text customization (sttCustomization), Text to Speech runtime (ttsRuntime), and Text to Speech customization (ttsCustomization).

  • You can change any of the other values later by editing the custom resource that is created during installation. For more information, see Editing the custom resource.

  • After installing the Watson Speech services, change aspects of the installation and configuration only if needed. The default values are sufficient for most users.

Editing the custom resource

You can edit the custom resource to modify many aspects of your Watson Speech services installation and configuration. To modify the custom resource, enter the following command:

oc edit watsonspeech ${CUSTOM_RESOURCE_SPEECH}

The command opens the most recent custom resource in your editor. Modify the custom resource for the operation you are performing, then save the custom resource and exit your editor.

The Watson Speech operator picks up the changes to the custom resource on its next reconciliation loop, which is a periodic process that it runs to ensure that your installation reflects the latest custom resource. It might take up to 20 minutes for the operator to pick up the latest changes. It then takes more time for your installation to be updated with the changes.

Note: Comments are not preserved in a custom resource. They are shown in examples for added clarity.

Full custom resource

The complete custom resource for the Speech services includes all of the properties that you can include with the Speech custom resource. This full version of the custom resource installs both runtimes for Watson Speech services. It shows the default values for all properties.

apiVersion: speech.watson.ibm.com/v1
kind: WatsonSpeech
metadata:
  name: speech-cr                     # The recommended name of the custom resource
  namespace: ${PROJECT_CPD_INST_OPERANDS}  # The project (namespace) name where you plan to install the Speech services
spec:
  license:
    accept: true
  version: 4.7.0                      # Omit this property to always install the latest version
##################
# Storage classes  
##################
  blockStorageClass: "portworx-db-gp3-sc"   # The block storage class, for example, "portworx-db-gp3-sc"
  fileStorageClass:  "portworx-shared-gp3"  # The file storage class, for example, "portworx-shared-gp3"
########################
# Configuration scaling
########################
  scaleConfig:  
    stt:  
      size: xsmall  # Size of Speech to Text configuration: xsmall, small, medium, large, or custom
    tts:  
      size: xsmall  # Size of Text to Speech configuration: xsmall, small, medium, large, or custom
###############
# Request CPUs
###############
  sttAMPatcher:
    resources:
      requestsCPU: 1
################################
# Speech services microservices
################################
  tags:
    sttRuntime: true         # Enables the Speech to Text runtime microservice
    sttAsync: false          # Enables the Speech to Text asynchronous HTTP microservice
    sttCustomization: false  # Enables the Speech to Text customization microservice
    ttsRuntime: true         # Enables the Text to Speech runtime microservice
    ttsCustomization: false  # Enables the Text to Speech customization microservice
#############
# Datastores
#############
  global:
    datastores:
##################################
# Multicloud Gateway object store
##################################
      s3:
        # Secrets
        authSecretName: "noobaa-account-watson-speech"
###########################
# The PostgreSQL datastore
###########################
      postgressql:  
        # Sizing configuration  
        replicas: 3                                 # Number of replica nodes for PostgreSQL.
        databaseMemoryLimit: 5Gi                    # Maximum memory that PostgreSQL can use.
        databaseMemoryRequest: 1Gi                  # Default memory requested by PostgreSQL.
        databaseCPULimit: 1000m                     # Maximum CPU that PostgreSQL can use.
        databaseCPU: 500m                           # Default CPU requested by PostgreSQL.
        databaseStorageRequest: 5Gi                 # Maximum size of a PostgreSQL database storage request.
        # Storage configuration  
        blockStorageClass: "{{ blockStorageClass }}"  # Storage class that is used by PostgreSQL.
        # Secrets  
        createSecret: true  # True: Speech operator generates a secret.
                            # False: User provides existing secret via authSecretName property.
        authSecretName: "<speech-cr>-postgres-auth-secret"  # Name of PostgreSQL secrets object.
#########################
# The RabbitMQ datastore
#########################
      rabbitMQ:
        # Sizing configuration
        replicas: 3                             # Number of replica pods for RabbitMQ.
        cpuRequest: 200m                        # Default CPU requested by RabbitMQ.
        cpuLimit: 200m                          # Maximum CPU that RabbitMQ can use.
        memoryRequest: 256Mi                    # Default memory requested by RabbitMQ.
        memoryLimit: 256Mi                      # Maximum memory that RabbitMQ can use.
        # Storage configuration
        blockStorageClass: "{{ blockStorageClass }}"  # Storage class that is used by RabbitMQ.
        pvSize: 5Gi                             # Size of the persistent volume for RabbitMQ.
        # Secrets  
        authSecretName: "<speech-cr>-ibm-rabbitmq-auth-secret"  # Name of RabbitMQ secrets object.
############################################
# Speech to Text previous-generation models
############################################
    defaultSTTModel: en-US_BroadbandModel # Default model for speech recognition
    sttModels:
      enUsBroadbandModel:            # US English (en-US) Broadband model
        enabled: true
      enUsNarrowbandModel:           # US English (en-US) Narrowband model
        enabled: true
      enUsShortFormNarrowbandModel:  # US English (en-US) Short-Form Narrowband model
        enabled: true
      arMsBroadbandModel:            # Modern Standard Arabic (ar-MS) Broadband model
        enabled: false
      deDeBroadbandModel:            # German (de-DE) Broadband model 
        enabled: false
      deDeNarrowbandModel:           # German (de-DE) Narrowband model 
        enabled: false
      enAuBroadbandModel:            # Australian English (en-AU) Broadband model
        enabled: false
      enAuNarrowbandModel:           # Australian English (en-AU) Narrowband model 
        enabled: false
      enGbBroadbandModel:            # UK English (en-GB) Broadband model
        enabled: false
      enGbNarrowbandModel:           # UK English (en-GB) Narrowband model
        enabled: false
      esEsBroadbandModel:            # Castilian Spanish (es-ES, es-AR, es-CL, es-CO, es-MX, and es-PE) Broadband models 
        enabled: false
      esEsNarrowbandModel:           # Castilian Spanish (es-ES, es-AR, es-CL, es-CO, es-MX, and es-PE) Narrowband models 
        enabled: false	
      frCaBroadbandModel:            # Canadian French (fr-CA) Broadband model 
        enabled: false	
      frCaNarrowbandModel:           # Canadian French (fr-CA) Narrowband model 
        enabled: false
      frFrBroadbandModel:            # French (fr-FR) Broadband model 
        enabled: false
      frFrNarrowbandModel:           # French (fr-FR) Narrowband model 
        enabled: false
      itItBroadbandModel:            # Italian (it-IT) Broadband model 
        enabled: false	
      itItNarrowbandModel:           # Italian (it-IT) Narrowband model 
        enabled: false	
      jaJpBroadbandModel:            # Japanese (ja-JP) Broadband model 
        enabled: false	
      jaJpNarrowbandModel:           # Japanese (ja-JP) Narrowband model 
        enabled: false	
      koKrBroadbandModel:            # Korean (ko-KR) Broadband model
        enabled: false	
      koKrNarrowbandModel:           # Korean (ko-KR) Narrowband model
        enabled: false
      nlNlBroadbandModel:            # Dutch (nl-NL) Broadband model
        enabled: false
      nlNlNarrowbandModel:           # Dutch (nl-NL) Narrowband model 
        enabled: false	
      ptBrBroadbandModel:            # Brazilian Portuguese (pt-BR) Broadband model 
        enabled: false	
      ptBrNarrowbandModel:           # Brazilian Portuguese (pt-BR) Narrowband model 
        enabled: false	
      zhCnBroadbandModel:            # Mandarin Chinese (zh-CN) Broadband model 
        enabled: false	
      zhCnNarrowbandModel:           # Mandarin Chinese (zh-CN) Narrowband model 
        enabled: false
########################################
# Speech to Text next-generation models
########################################
      enUsMultimedia:        # US English (en-US) Multimedia model 
        enabled: true
      enUsTelephony:         # US English (en-US) Telephony model 
        enabled: true
      arMsTelephony:         # Modern Standard Arabic (ar-MS) Telephony model 
        enabled: false
      csCZTelephony:         # Czech (cs-CZ) Telephony model
        enabled: false
      deDeMultimedia:        # German (de-DE) Multimedia model 
        enabled: false
      deDeTelephony:         # German (de-DE) Telephony model 
        enabled: false
      enAuMultimedia:        # Australian English (en-AU) Multimedia model 
        enabled: false     
      enAuTelephony:         # Australian English (en-AU) Telephony model 
        enabled: false
      enGbMultimedia:        # UK English (en-GB) Multimedia model 
        enabled: false
      enGbTelephony:         # UK English (en-GB) Telephony model 
        enabled: false
      enInTelephony:         # Indian English (en-IN) Telephony model
        enabled: false
      enWwMedicalTelephony:  # English (all supported dialects) Medical Telephony model
        enabled: false
      esEsMultimedia:        # Castilian Spanish (es-ES) Multimedia model
        enabled: false
      esEsTelephony:         # Castilian Spanish (es-ES) Telephony model
        enabled: false
      esLaTelephony:         # Latin American Spanish (es-LA) Telephony model
        enabled: false
      frCaMultimedia:        # Canadian French (fr-CA) Multimedia model
        enabled: false
      frCaTelephony:         # Canadian French (fr-CA) Telephony model 
        enabled: false
      frFrMultimedia:        # French (fr-FR) Multimedia model
        enabled: false
      frFrTelephony:         # French (fr-FR) Telephony model
        enabled: false
      hiInTelephony:         # Indian Hindi (hi-IN) Telephony model
        enabled: false
      itItMultimedia:        # Italian (it-IT) Multimedia model 
        enabled: false
      itItTelephony:         # Italian (it-IT) Telephony model 
        enabled: false
      jaJpMultimedia:        # Japanese (ja-JP) Multimedia model
        enabled: false
      jaJpTelephony:         # Japanese (ja-JP) Telephony model
        enabled: false
      koKrMultimedia:        # Korean (ko-KR) Multimedia model
        enabled: false
      koKrTelephony:         # Korean (ko-KR) Telephony model
        enabled: false
      nlBeTelephony:         # Belgian Dutch (nl-BE) Telephony model
        enabled: false
      nlNlMultimedia:        # Netherlands Dutch (nl-NL) Multimedia model
        enabled: false
      nlNlTelephony:         # Netherlands Dutch (nl-NL) Telephony model
        enabled: false
      ptBrMultimedia:        # Brazilian Portuguese (pt-BR) Multimedia model 
        enabled: false
      ptBrTelephony:         # Brazilian Portuguese (pt-BR) Telephony model 
        enabled: false
      svSeTelephony:         # Swedish (sv-SE) Telephony model 
        enabled: false
      zhCnTelephony:         # Mandarin Chinese (zh-CN) Telephony model
        enabled: false
########################################
# Text to Speech enhanced neural voices
########################################
    defaultTTSVoice: en-US_MichaelV3Voice # Default voice for speech synthesis
    ttsVoices:
      enUSAllisonV3Voice:    # US English (en-US) Allison enhanced neural voice 
        enabled: true	
      enUSLisaV3Voice:       # US English (en-US) Lisa enhanced neural voice 
        enabled: true	
      enUSMichaelV3Voice:    # US English (en-US) Michael enhanced neural voice 
        enabled: true	
      enUSEmilyV3Voice:      # US English (en-US) Emily enhanced neural voice 
        enabled: false	
      enUSHenryV3Voice:      # US English (en-US) Henry enhanced neural voice 
        enabled: false	
      enUSKevinV3Voice:      # US English (en-US) Kevin enhanced neural voice 
        enabled: false	
      enUSOliviaV3Voice:     # US English (en-US) Olivia enhanced neural voice 
        enabled: false	
      deDEBirgitV3Voice:     # German (de-DE) Birgit enhanced neural voice 
        enabled: false	
      deDEDieterV3Voice:     # German (de-DE) Dieter enhanced neural voice 
        enabled: false	
      deDEErikaV3Voice:      # German (de-DE) Erika enhanced neural voice 
        enabled: false	
      enGBCharlotteV3Voice:  # UK English (en-GB) Charlotte enhanced neural voice 
        enabled: false	
      enGBJamesV3Voice:      # UK English (en-GB) James enhanced neural voice 
        enabled: false	
      enGBKateV3Voice:       # UK English (en-GB) Kate enhanced neural voice 
        enabled: false
      esESEnriqueV3Voice:    # Castilian Spanish (es-ES) Enrique enhanced neural voice 
        enabled: false	
      esESLauraV3Voice:      # Castilian Spanish (es-ES) Laura enhanced neural voice 
        enabled: false	
      esLASofiaV3Voice:      # Latin American Spanish (es-LA) Sofia enhanced neural voice 
        enabled: false	
      esUSSofiaV3Voice:      # North American Spanish (es-US) Sofia enhanced neural voice 
        enabled: false	
      frCALouiseV3Voice:     # French Canadian (fr-CA) Louise enhanced neural voice 
        enabled: false
      frFRNicolasV3Voice:    # French (fr-FR) Nicolas enhanced neural voice 
        enabled: false	
      frFRReneeV3Voice:      # French (fr-FR) Renee enhanced neural voice 
        enabled: false	
      itITFrancescaV3Voice:  # Italian (it-IT) Francesca enhanced neural voice 
        enabled: false	
      jaJPEmiV3Voice:        # Japanese (ja-JP) Emi enhanced neural voice
        enabled: false
      koKRJinV3Voice:        # Korean (ko-KR) Jin enhanced neural voice
        enabled: false
      nlNLMerelV3Voice:      # Netherlands Dutch (nl-NL) Merel enhanced neural voice
        enabled: false
      ptBRIsabelaV3Voice:    # Brazilian Portuguese (pt-BR) Isabela enhanced neural voice
        enabled: false
##########################################
# Text to Speech expressive neural voices
##########################################
      enAUHeidiExpressive:   # Australian English (en-AU) Heidi expressive neural voice
        enabled: false
      enAUJackExpressive:    # Australian English (en-AU) Jack expressive neural voice
        enabled: false
      enUSAllisonExpressive: # US English (en-US) Allison expressive neural voice
        enabled: false
      enUSEmmaExpressive:    # US English (en-US) Emma expressive neural voice
        enabled: false
      enUSLisaExpressive:    # US English (en-US) Lisa expressive neural voice
        enabled: false
      enUSMichaelExpressive: # US English (en-US) Michael expressive neural voice
        enabled: false
#####################
# Backup and restore
#####################
  backupRestore:
    components:
      backup:
        name: "backup"
        backupVolume: "{{ releaseName }}-aux-s3-backup-pvc"
        volumeSize: 10Gi
###################################
# Storage and logging of user data
###################################
  sttRuntime:  
    skipAudioAndResultLogging: "true"  # If true, disables storage and logging of user data
  sttAMPatcher:  
    skipAudioAndResultLogging: "true"  # If true, disables storage and logging of user data
  ttsRuntime:  
    skipAudioAndResultLogging: "true"  # If true, disables storage and logging of user data
################
# Node affinity
################
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: beta.kubernetes.io/arch  
            operator: In  
            values:  
            - amd64

License, version, storage class, and scaling properties

The Speech custom resource provides properties that you use to accept the product license and to specify the product version, storage class, and scaling for your installation:

Acknowledging license acceptance

To install the Speech services, you must read, understand, and accept the license terms. You are strongly encouraged to read the full terms of the license agreement. (The link requires you to leave the IBM web site.)

You must set the license accept property to true. The installation fails if you set the property to false.

  license:
    accept: true

Specifying a version

The version property specifies the version of the Speech services that the operator is to install. The property's value is provided by IBM. Future releases of the Speech services will increment the version to greater values.

  version: 4.7.0  # Omit this property to always install the latest version

The version property disconnects the Speech operator from the operator service. The Speech operator can handle multiple versions of the Speech services. You can continue to run older versions of the Speech services on newer versions of the operator. You can then upgrade to later versions of the Speech services at your discretion.

Alternatively, you can omit the version property entirely from the custom resource. Doing so allows you to manage your Speech services (operand) automatically. If you omit the version property, the Watson Speech operator always installs the latest available version of the Speech services. It also automatically upgrades your installation to the latest version of the Speech services as new versions become available.

Specifying storage classes

The blockStorageClass and fileStorageClass properties specify the persistent storage the Speech services are to use. For example, the following example uses Portworx by specifying the storage classes "portworx-db-gp3-sc" and portworx-shared-gp3:

##################
# Storage classes
##################
  blockStorageClass: "portworx-db-gp3-sc"   # The block storage class, for example, "portworx-db-gp3-sc"
  fileStorageClass:  "portworx-shared-gp3"  # The file storage class, for example, "portworx-shared-gp3"

For more information about the available block and file storage classes for the persistent storage for Watson Speech services, see:

Scaling your configuration

The scaleConfig property specifies the size of the installation for the Speech services. Possible sizes include xsmall, small, medium, large, and custom. By default, the services use the following xsmall values:

########################
# Configuration scaling
########################
  scaleConfig:
    stt:
      size: xsmall  # Size of Speech to Text configuration: xsmall, small, medium, large, or custom
    tts:
      size: xsmall  # Size of Text to Speech configuration: xsmall, small, medium, large, or custom

The values for the Speech to Text and Text to Speech services are independent. You can specify different values for the size properties. For example, the following values increase the size of the Speech to Text configuration but leave the Text to Speech configuration unchanged:

########################
# Configuration scaling
########################
  scaleConfig:
    stt:
      size: medium
    tts:  
      size: xsmall

You can start by using the default values and scale your configuration as your usage grows. During installation, these properties are specified by the watson_speech_stt_scale_config and watson_speech_tts_scale_config options.

Specifying request CPUs

The sttAMPatcher microservice manages acoustic model customization for the Speech to Text service. It is automatically installed with the sttCustomization microservice.

The AM Patcher uses a dedicated number of CPUs to handle requests. The sttAMPatcher.resources.requestsCPU property specifies the number of CPUs that are dedicated to handling acoustic model training requests by the AM Patcher microservice.

###############
# Request CPUs
###############
  sttAMPatcher:
    resources:
      requestsCPU: 1

If you experience training failures for custom acoustic models in the form of the following messages, increase the value of the requestsCPU property from 1 to 5:

Unresponsive backend detected. Please try later.

Allocating more resources prevents this error and enables custom acoustic models to be trained as expected. Increasing the value of the property increases the size of the deployment. For more information, see also Training of custom acoustic models is failing.

Datastores properties

The Speech custom resource provides properties that you can use to configure the Multicloud Object Gateway, PostgreSQL, and RabbitMQ datastores for your installation:

The individual datastores are installed only with the following microservices:

  • The Multicloud Object Gateway must be installed before all Watson Speech services installs. The Speech runtime microservices depend on it, as does sttCustomization. For installation instructions, see Installing Multicloud Object Gateway for IBM Cloud Pak for Data

  • The PostgreSQL datastore is installed only if at least one of the following microservices is enabled: sttAsync, sttCustomization, or ttsCustomization. If the dependent microservices are disabled at a later date, PostgreSQL remains installed but is not used.

    Note: Prior to version 4.6.0, PostgreSQL was always installed with the Speech services. If you are an existing user who enabled only the runtime microservices, PostgreSQL remains installed but is not used. In this case, PostgreSQL also remains installed across upgrades.
  • The RabbitMQ datastore is installed only if the sttAsync microservice is enabled.

Configuring the Multicloud Object Gateway

The Multicloud Object Gateway is an object storage solution that contains the following stateful data for the Speech services:

  • For Speech to Text and Text to Speech, models and voices that are installed with the services.
  • For Speech to Text, transcription results while a speech recognition job is in progress.
  • For Speech to Text, binary patches for trained models, grammars and corpora for custom language models, and audio files for custom acoustic models.

You can use the following properties to configure Multicloud Object Gateway for your installation. The definition shows the default configuration values for the datastore's properties. These properties are indented beneath global.datastores.

Note that the secret authSecretName is created in the Creating secrets for services that use Multicloud Object Gateway section.

##################################
# Multicloud Gateway object store
##################################
      s3:
        # Secrets
        authSecretName: "noobaa-account-watson-speech"

Configuring the PostgreSQL datastore

PostgreSQL is an open-source relational database that contains the stateful data for the following Speech microservices:

  • For Speech to Text, custom language models and custom acoustic models.
  • For Speech to Text, all asynchronous HTTP jobs for the past week. Entries older than one week are automatically purged.
  • For Text to Speech, custom models and speaker models.

You can use the following properties to configure the PostgreSQL datastore for your installation. The definition shows the default configuration values for the datastore's properties. These properties are indented beneath global.datastores.

Note that the value {{ blockStorageClass }} is replaced by the value specified for the metadata.blockStorageClass property. In the full custom resource shown earlier, the value is portworx-db-gp3-sc.

###########################
# The PostgreSQL datastore
###########################
      postgressql:  
        # Sizing configuration  
        replicas: 3                                 # Number of replica nodes for PostgreSQL.
        databaseMemoryLimit: 5Gi                    # Maximum memory that PostgreSQL can use.
        databaseMemoryRequest: 1Gi                  # Default memory requested by PostgreSQL.
        databaseCPULimit: 1000m                     # Maximum CPU that PostgreSQL can use.
        databaseCPU: 500m                           # Default CPU requested by PostgreSQL.
        databaseStorageRequest: 5Gi                 # Maximum size of a PostgreSQL database storage request.
        enablePodMonitor: false                     # Enables the PostgreSQL operator to monitor the PostgreSQL pods.
        # Storage configuration  
        blockStorageClass: "{{ blockStorageClass }}"  # Storage class that is used by PostgreSQL.
        # Secrets  
        authSecretName: "<speech-cr>-postgres-auth-secret"  # Name of PostgreSQL secrets object.

Configuring the RabbitMQ datastore

The RabbitMQ datastore handles non-persistent message queuing for the Speech to Text asynchronous HTTP microservice (sttAsync).

You can use the following properties to configure the RabbitMQ datastore for your installation. The definition shows the default configuration values for the datastore's properties. These properties are indented beneath global.datastores.

Note that the value {{ blockStorageClass }} is replaced by the value specified for the metadata.blockStorageClass property. In the full custom resource shown earlier, the value is portworx-db-gp3-sc.

#########################
# The RabbitMQ datastore
#########################
      rabbitMQ:
        # Sizing configuration
        replicas: 3                             # Number of replica pods for RabbitMQ.
        cpuRequest: 200m                        # Default CPU requested by RabbitMQ.
        cpuLimit: 200m                          # Maximum CPU that RabbitMQ can use.
        memoryRequest: 256Mi                    # Default memory requested by RabbitMQ.
        memoryLimit: 256Mi                      # Maximum memory that RabbitMQ can use.
        # Storage configuration
        blockStorageClass: "{{ blockStorageClass }}"  # Storage class that is used by RabbitMQ.
        pvSize: 5Gi                             # Size of the persistent volume for RabbitMQ.
        # Secrets  
        authSecretName: "<speech-cr>-ibm-rabbitmq-auth-secret"  # Name of RabbitMQ secrets object.

Speech microservices, models, and voices properties

The Speech custom resource provides a rich set of properties that you can use to tailor the installation and configuration of the Speech service to meet your application needs. These properties specify the Speech services microservices, models, and voices that are to be installed.

Installing Speech services microservices

You use these five properties to specify the Speech microservices to install. The properties specify the functionality that is available for the Speech services. By default, the runtimes are enabled, but you can enable or disable them separately. The following example installs the Speech to Text and Text to Speech runtimes:

################################
# Speech services microservices
################################
  tags:
    sttRuntime: true         # Enables the Speech to Text runtime microservice
    sttAsync: false          # Enables the Speech to Text asynchronous HTTP microservice
    sttCustomization: false  # Enables the Speech to Text customization microservice
    ttsRuntime: true         # Enables the Text to Speech runtime microservice
    ttsCustomization: false  # Enables the Text to Speech customization microservice

The properties provide the following service capabilities:

sttRuntime

Speech to Text runtime, the base microservice for speech recognition. This value enables the /v1/recognize interfaces (synchronous HTTP and WebSocket). Enabling either of the other Speech to Text microservices automatically enables the Speech to Text runtime. Speech to Text models (stt-models) are installed only if the runtime is installed. During installation, this property is specified by the watson_speech_enable_stt_runtime option.

For more information, see The synchronous HTTP interface and The WebSocket interface.

sttAsync

Speech to Text asynchronous HTTP. This value enables the /v1/recognitions interface. During installation, this property is specified by the watson_speech_enable_stt_async option.

For more information, see The asynchronous HTTP interface.

sttCustomization

Speech to Text customization. This value enables the /v1/customizations and /v1/acoustic_customizations interfaces for language model and acoustic model customization. (The sttAMPatcher microservice, the backend microservice for acoustic model customization, is automatically installed with the sttCustomization microservice.) During installation, this property is specified by the watson_speech_enable_stt_customization option.

For more information, see Understanding customization.

ttsRuntime

Text to Speech runtime, the base microservice for speech synthesis. This value enables the /v1/synthesize interfaces (HTTP and WebSocket). Enabling the Text to Speech customization microservice automatically enables the Text to Speech runtime. Text to Speech voices (tts-voices) are installed only if the runtime is installed. During installation, this property is specified by the watson_speech_enable_tts_runtime option.

For more information, see The HTTP interface and The WebSocket interface.

ttsCustomization

Text to Speech customization. This value enables the /v1/customizations interface for customization. During installation, this property is specified by the watson_speech_enable_tts_customization option.

For more information, see Understanding customization.

You can combine installation of the different microservices to install the following functionality:

  • To install Speech to Text only, set ttsRuntime and ttsCustomization to false.
  • To install Text to Speech only, set sttRuntime, sttAsync, and sttCustomization to false.
  • To install both Speech to Text and Text to Speech without enabling customization, set sttCustomization and ttsCustomization to false.

Installing Speech to Text models

You can use the following properties to specify the models to install. Installing all models substantially increases the memory requirements. You are therefore strongly encouraged to install only those models that you intend to use.

When choosing between previous- and next-generation models, prefer next-generation models where possible. Next-generation models offer improved speech recognition over previous-generation models. They also require fewer resources (CPU and memory) than previous-generation models. By default, the dynamic resource calculation feature automatically computes the exact amount of memory that is required for the selected models.

The defaultSTTModel property sets the default model for speech recognition. If you install and use models in languages other than US English, set the default to the model you expect to use most often.

Speech to Text models are installed only if the sttRuntime microservice is installed. To install a model, set its enabled property to true in the custom resource. Set the property to false to indicate that the model is not to be installed. The following properties show the default values. By default, only the US English previous- and next-generation models are installed. These properties are indented beneath global. All available models are indented beneath sttModels. During installation, Speech to Text models are specified by the watson_speech_models option.

Important: Effective 31 July 2023, all previous-generation models will be removed from the service and the documentation. Most previous-generation models were deprecated on 15 March 2022. For more information, see the 22 February 2023 (version 4.6.3) service update in the Release notes for Speech to Text for IBM Cloud Pak® for Data.
############################################
# Speech to Text previous-generation models
############################################
    defaultSTTModel: en-US_BroadbandModel # Default model for speech recognition
    sttModels:
      enUsBroadbandModel:            # US English (en-US) Broadband model
        enabled: true
      enUsNarrowbandModel:           # US English (en-US) Narrowband model
        enabled: true
      enUsShortFormNarrowbandModel:  # US English (en-US) Short-Form Narrowband model
        enabled: true
      arMsBroadbandModel:            # Modern Standard Arabic (ar-MS) Broadband model
        enabled: false
      deDeBroadbandModel:            # German (de-DE) Broadband model 
        enabled: false	
      deDeNarrowbandModel:           # German (de-DE) Narrowband model 
        enabled: false	
      enAuBroadbandModel:            # Australian English (en-AU) Broadband model
        enabled: false
      enAuNarrowbandModel:           # Australian English (en-AU) Narrowband model 
        enabled: false
      enGbBroadbandModel:            # UK English (en-GB) Broadband model
        enabled: false
      enGbNarrowbandModel:           # UK English (en-GB) Narrowband model
        enabled: false
      esEsBroadbandModel:            # Castilian Spanish (es-ES, es-AR, es-CL, es-CO, es-MX, and es-PE) Broadband models 
        enabled: false
      esEsNarrowbandModel:           # Castilian Spanish (es-ES, es-AR, es-CL, es-CO, es-MX, and es-PE) Narrowband models 
        enabled: false	
      frCaBroadbandModel:            # Canadian French (fr-CA) Broadband model 
        enabled: false	
      frCaNarrowbandModel:           # Canadian French (fr-CA) Narrowband model 
        enabled: false 
      frFrBroadbandModel:            # French (fr-FR) Broadband model 
        enabled: false
      frFrNarrowbandModel:           # French (fr-FR) Narrowband model 
        enabled: false
      itItBroadbandModel:            # Italian (it-IT) Broadband model 
        enabled: false	
      itItNarrowbandModel:           # Italian (it-IT) Narrowband model 
        enabled: false	
      jaJpBroadbandModel:            # Japanese (ja-JP) Broadband model 
        enabled: false	
      jaJpNarrowbandModel:           # Japanese (ja-JP) Narrowband model 
        enabled: false	
      koKrBroadbandModel:            # Korean (ko-KR) Broadband model 
        enabled: false	
      koKrNarrowbandModel:           # Korean (ko-KR) Narrowband model 
        enabled: false
      nlNlBroadbandModel:            # Dutch (nl-NL) Broadband model 
        enabled: false
      nlNlNarrowbandModel:           # Dutch (nl-NL) Narrowband model 
        enabled: false	
      ptBrBroadbandModel:            # Brazilian Portuguese (pt-BR) Broadband model 
        enabled: false	
      ptBrNarrowbandModel:           # Brazilian Portuguese (pt-BR) Narrowband model 
        enabled: false	
      zhCnBroadbandModel:            # Mandarin Chinese (zh-CN) Broadband model 
        enabled: false	
      zhCnNarrowbandModel:           # Mandarin Chinese (zh-CN) Narrowband model 
        enabled: false
########################################
# Speech to Text next-generation models
########################################
      enUsMultimedia:        # US English (en-US) Multimedia model 
        enabled: true
      enUsTelephony:         # US English (en-US) Telephony model 
        enabled: true
      arMsTelephony:         # Modern Standard Arabic (ar-MS) Telephony model 
        enabled: false
      csCZTelephony:         # Czech (cs-CZ) Telephony model
        enabled: false
      deDeMultimedia:        # German (de-DE) Multimedia model 
        enabled: false
      deDeTelephony:         # German (de-DE) Telephony model 
        enabled: false
      enAuMultimedia:        # Australian English (en-AU) Multimedia model 
        enabled: false     
      enAuTelephony:         # Australian English (en-AU) Telephony model 
        enabled: false
      enGbMultimedia:        # UK English (en-GB) Multimedia model 
        enabled: false
      enGbTelephony:         # UK English (en-GB) Telephony model 
        enabled: false
      enInTelephony:         # Indian English (en-IN) Telephony model
        enabled: false
      enWwMedicalTelephony:  # English (all supported dialects) Medical Telephony model
        enabled: false
      esEsMultimedia:        # Castilian Spanish (es-ES) Multimedia model
        enabled: false
      esEsTelephony:         # Castilian Spanish (es-ES) Telephony model
        enabled: false
      esLaTelephony:         # Latin American Spanish (es-LA) Telephony model
        enabled: false
      frCaMultimedia:        # Canadian French (fr-CA) Multimedia model
        enabled: false
      frCaTelephony:         # Canadian French (fr-CA) Telephony model 
        enabled: false
      frFrMultimedia:        # French (fr-FR) Multimedia model
        enabled: false
      frFrTelephony:         # French (fr-FR) Telephony model
        enabled: false
      hiInTelephony:         # Indian Hindi (hi-IN) Telephony model
        enabled: false
      itItMultimedia:        # Italian (it-IT) Multimedia model 
        enabled: false
      itItTelephony:         # Italian (it-IT) Telephony model 
        enabled: false
      jaJpMultimedia:        # Japanese (ja-JP) Multimedia model
        enabled: false
      jaJpTelephony:         # Japanese (ja-JP) Telephony model
        enabled: false
      koKrMultimedia:        # Korean (ko-KR) Multimedia model
        enabled: false
      koKrTelephony:         # Korean (ko-KR) Telephony model
        enabled: false
      nlBeTelephony:         # Belgian Dutch (nl-BE) Telephony model
        enabled: false
      nlNlMultimedia:        # Netherlands Dutch (nl-NL) Multimedia model
        enabled: false
      nlNlTelephony:         # Netherlands Dutch (nl-ML) Telephony model
        enabled: false
      ptBrMultimedia:        # Brazilian Portuguese (pt-BR) Multimedia model 
        enabled: false
      ptBrTelephony:         # Brazilian Portuguese (pt-BR) Telephony model 
        enabled: false
      svSeTelephony:         # Swedish (sv-SE) Telephony model 
        enabled: false
      zhCnTelephony:         # Mandarin Chinese (zh-CN) Telephony model
        enabled: false

Installing Text to Speech voices

You can use the following properties to specify the voices to install. All voices are enhanced neural voices. You must indicate the individual voices that you want to install. To install a voice, set its enabled property to true. Set the property to false to indicate that the voice is not to be installed. The properties in the following example enable installation of the default voices. By default, only a subset of the US English voices are installed.

By default, the dynamic resource calculation feature automatically computes the exact amount of memory that is required for the voices that you select to install. Installing more voices increases the memory requirements for the service. You are therefore encouraged to install only those voices that you intend to use.

The defaultTTSVoice property sets the default voice for speech synthesis. If you install and use voices in languages other than US English, set the default to the voice you expect to use most often.

Text to Speech voices are installed only if the ttsRuntime microservice is installed. These properties are all indented beneath global. All available voices are indented beneath ttsVoices. During installation, Text to Speech models are specified by the watson_speech_voices option.

########################################
# Text to Speech enhanced neural voices
########################################
    defaultTTSVoice: en-US_MichaelV3Voice # Default voice for speech synthesis
    ttsVoices:
      enUSAllisonV3Voice:    # US English (en-US) Allison enhanced neural voice 
        enabled: true	
      enUSLisaV3Voice:       # US English (en-US) Lisa enhanced neural voice 
        enabled: true	
      enUSMichaelV3Voice:    # US English (en-US) Michael enhanced neural voice 
        enabled: true	
      enUSEmilyV3Voice:      # US English (en-US) Emily enhanced neural voice 
        enabled: false	
      enUSHenryV3Voice:      # US English (en-US) Henry enhanced neural voice 
        enabled: false	
      enUSKevinV3Voice:      # US English (en-US) Kevin enhanced neural voice 
        enabled: false	
      enUSOliviaV3Voice:     # US English (en-US) Olivia enhanced neural voice 
        enabled: false	
      deDEBirgitV3Voice:     # German (de-DE) Birgit enhanced neural voice 
        enabled: false
      deDEDieterV3Voice:     # German (de-DE) Dieter enhanced neural voice 
        enabled: false	
      deDEErikaV3Voice:      # German (de-DE) Erika enhanced neural voice 
        enabled: false	
      enGBCharlotteV3Voice:  # UK English (en-GB) Charlotte enhanced neural voice 
        enabled: false	
      enGBJamesV3Voice:      # UK English (en-GB) James enhanced neural voice 
        enabled: false	
      enGBKateV3Voice:       # UK English (en-GB) Kate enhanced neural voice 
        enabled: false
      esESEnriqueV3Voice:    # Castilian Spanish (es-ES) Enrique enhanced neural voice
        enabled: false	
      esESLauraV3Voice:      # Castilian Spanish (es-ES) Laura enhanced neural voice
        enabled: false	
      esLASofiaV3Voice:      # Latin American Spanish (es-LA) Sofia enhanced neural voice
        enabled: false
      esUSSofiaV3Voice:      # North American Spanish (es-US) Sofia enhanced neural voice
        enabled: false	
      frCALouiseV3Voice:     # French Canadian (fr-CA) Louise enhanced neural voice 
        enabled: false
      frFRNicolasV3Voice:    # French (fr-FR) Nicolas enhanced neural voice 
        enabled: false	
      frFRReneeV3Voice:      # French (fr-FR) Renee enhanced neural voice 
        enabled: false	
      itITFrancescaV3Voice:  # Italian (it-IT) Francesca enhanced neural voice 
        enabled: false	
      jaJPEmiV3Voice:        # Japanese (ja-JP) Emi enhanced neural voice
        enabled: false
      koKRJinV3Voice:        # Korean (ko-KR) Jin enhanced neural voice
        enabled: false
      nlNLMerelV3Voice:      # Netherlands Dutch (nl-NL) Merel enhanced neural voice
        enabled: false
      ptBRIsabelaV3Voice:    # Brazilian Portuguese (pt-BR) Isabela enhanced neural voice
        enabled: false
##########################################
# Text to Speech expressive neural voices
##########################################
      enAUHeidiExpressive:   # Australian English (en-AU) Heidi expressive neural voice
        enabled: false
      enAUJackExpressive:    # Australian English (en-AU) Jack expressive neural voice
        enabled: false
      enUSAllisonExpressive: # US English (en-US) Allison expressive neural voice
        enabled: false
      enUSEmmaExpressive:    # US English (en-US) Emma expressive neural voice
        enabled: false
      enUSLisaExpressive:    # US English (en-US) Lisa expressive neural voice
        enabled: false
      enUSMichaelExpressive: # US English (en-US) Michael expressive neural voice
        enabled: false

User data and node affinity properties

The Speech custom resource includes properties that you can use to disable the storage and logging of user data and to specify node affinity:

Disabling the storage and logging of user data

By default, the Speech to Text runtime, Text to Speech runtime, and Speech to Text customization AM patcher temporarily store payload data in the running container. The data includes audio files, recognition hypotheses, and annotations that represent user data. The default values for the skipAudioAndResultLogging properties specify the following values, which allow the storage and logging of user data:

###################################
# Storage and logging of user data
###################################
  sttRuntime:  
    skipAudioAndResultLogging: "false"  # If true, disables storage and logging of user data
  sttAMPatcher:  
    skipAudioAndResultLogging: "false"  # If true, disables storage and logging of user data
  ttsRuntime:  
    skipAudioAndResultLogging: "false"  # If true, disables storage and logging of user data

You can disable the storage and logging of user data by setting the properties to true. Setting these properties to true also removes sensitive information from container logs and might significantly reduce the size of the logs.

###################################
# Storage and logging of user data
###################################
  sttRuntime:  
    skipAudioAndResultLogging: "true"  # If true, disables storage and logging of user data
  sttAMPatcher:  
    skipAudioAndResultLogging: "true"  # If true, disables storage and logging of user data
  ttsRuntime:  
    skipAudioAndResultLogging: "true"  # If true, disables storage and logging of user data

Specifying node affinity

In Kubernetes, node affinity defines constraints for pods so that they land on the nodes that satisfy the affinity constraints. Affinity and anti-affinity settings greatly expand the types of constraints that you can express for your pods. For more information about node affinity, see Affinity and anti-affinity in the Kubernetes documentation.

By default, the Speech services use the following node affinity specifications. The default affinity allows any pod to run on any amd64 node.

################
# Node affinity
################
  affinity:
    nodeAffinity:  
      requiredDuringSchedulingIgnoredDuringExecution:  
        nodeSelectorTerms:  
        - matchExpressions:  
          - key: beta.kubernetes.io/arch  
            operator: In  
            values:  
            - amd64

You can update the values for the affinity property if you want the Speech services deployment pods to land on specific nodes. For example, the following specification replaces the default values to enable affinity for the designated nodes:

################
# Node affinity
################
  affinity:
    nodeAffinity: 
      requiredDuringSchedulingIgnoredDuringExecution: 
        nodeSelectorTerms: 
        - matchExpressions: 
          - key: kubernetes.io/e2e-az-name 
            operator: In 
            values: 
            - e2e-az1 
            - e2e-az2