Scaling up your Watson Speech services installation

You can scale up your installation by changing the size of each of the Speech services in the custom resource. You can use the same procedures to scale down an installation.

Permissions you need for these tasks:: Be an administrator of the Red Hat® OpenShift® project to scale up your installation.

Before you begin

Keep the following in mind when scaling up your installation:

All deployments and Kubernetes resources are managed by the Speech operator.
Do not directly edit any Kubernetes resources (for example, deployments) except for the WatsonSpeech instance.
Make all changes to your deployment only by editing the Speech custom resource.
Your configuration is physically limited by the amount of hardware resources available in your Kubernetes cluster and namespace.

Installation scaling topics

Use the following information to scale up the size of your installation. The steps involve modifying the custom resource.

Sizing your Speech services installation
- Sizing for Speech to Text
- Sizing for Text to Speech
Sizing configuration
Sizing for acoustic model training
Custom sizing

Updating the t-shirt size for your installation as described in has no effect on the number of replicas that are used by your data stores. For maximum flexibility, scaling up the Speech services is separate from increasing the number of replicas for your data stores. For more information about increasing the number of replicas for your data stores, see Scaling up your datastores.

Sizing your Speech services installation

The Speech services support four standard installation sizes, which you configure at the time of installation by using the custom resource. The Speech services support t-shirt sizing as well as custom sizing. Both Speech services use this sizing, though the size of the two services can be different. You can also use the custom resource to update or reconfigure the sizes of the existing services.

Speech sizing depends on the number concurrent requests that an instance must process. The higher the number of concurrent requests, the higher the resource requirements.

You can create any number of instances of either the Speech to Text or the Text to Speech service. But adding new instances does not scale the resources available for the instances. You need to size your installation to support the sum of the number of concurrent connections you expect to handle across all of your service instances.

Note: Incoming requests are randomly distributed to runtime pods. If a pod does not have enough resources available to service the request, a 503 (Service Unavailable) error is returned to the client application. To achieve the concurrency advertised by the tables below, you must implement client-side retry logic so that requests can land on a runtime pod with sufficient resources to process them. Requests returned with 502 or 503 error codes should be retried. Setting the max retry count to 10 works well for most production applications.

Sizing for Speech to Text

The following table describes the t-shirt sizing for the Speech to Text service. The first column lists the available sizes. For each size, the remaining columns list the number of runtime pods, the maximum number of concurrent sessions depending on the type of Speech to Text models that you install, and the number of CPUs allocated per pod. The final column provides additional information about the sizes. For more information about using previous- or next-generation models, see Installing Speech to Text models.

Size	Number of runtime pods	Maximum number of concurrent sessions with previous-generation models	Maximum number of concurrent sessions with next-generation models	Number of CPUs allocated per pod	Notes
xsmall	1	6	20	4	The default configuration, which is meant for development purposes.
Minimum reserved CPU (`small_mincpureq`)	2	12	40	4	High availability is enabled. Every microservice has at least 2 instances running.
small	2	12	40	4	High availability is enabled. Every microservice has at least 2 instances running.
medium	2	26	80	8	High availability is enabled. Every microservice has at least 2 instances running.
large	4	52	160	8	High availability is enabled. Every microservice has at least 2 instances running.
custom	2+	12+	40+	4 or 8 depending on configuration	By default, the custom is a small size that has high availability that is enabled. The minimum value for the maximum number of sessions is 12, but the actual value depends on configuration parameters. The IBM Sales team has a sizing calculator that you can use to arrive at an accurate value for your installation. Note: If you install a mix of previous- and next-generation models, the limits for previous-generation models apply.

Sizing for Text to Speech

The following table describes the t-shirt sizing for the Text to Speech service. The first column lists the available sizes. For each size, the remaining columns list the number of runtime pods and the maximum number of concurrent sessions. For more information about the available voices, see Installing Text to Speech voices.

Size	Number of runtime pods	Maximum number of concurrent sessions
xsmall	1	6
Minimum reserved CPU (`small_mincpureq`)	2	12
small	2	12
medium	4	26
large	8	52
custom	2+	12+

Sizing configuration

When you install the Speech services, you choose one of the t-shirt size configurations. The resources required for the installation, in terms of CPUs and memory, depend on the configuration you select.

The development configuration (the xsmall size) is the default configuration. This configuration has a minimal footprint and is meant for development purposes and as a proof of concept. It can handle several concurrent recognition sessions only, and it is not highly available because some of the core microservices have no redundancy (they are single-replica).
The production configuration (any of the other sizes) is a highly available solution that is intended to run production workloads. This configuration, which requires a minimum of three worker nodes, provides a highly available solution.

The scaleConfig property of the custom resource provides the following size specifications by default:

  scaleConfig:
    stt:
      size: xsmall
    tts:
      size: xsmall

You can start with a configuration of xsmall or small. If that proves inadequate as your usage grows, you can update the custom resource with a larger size that is more appropriate.

To scale up your installation, you edit your custom resource. You can modify your custom resource in either of the following ways:

By using the procedure described in Editing the custom resource.
By entering the following oc patch command:
```
oc patch watsonspeech ${CUSTOM_RESOURCE_SPEECH} -n ${PROJECT_CPD_INST_OPERANDS} \
--type=merge --patch='{"spec": {"scaleConfig": {"<speech-service>": {"size": "<size>"} } } }'
```
Where

${PROJECT_CPD_INST_OPERANDS}

The name of the namespace (or project) where you installed the Speech services.

<speech-service>

stt or tts, depending on whether you are scaling up Speech to Text or Text to Speech.

<size>

The size specification that you want to use, for instance, small, medium, or one of the other supported sizes.

For example, to scale up your Speech to Text installation from xsmall to small, specify the following command:
```
oc patch watsonspeech ${CUSTOM_RESOURCE_SPEECH} -n ${PROJECT_CPD_INST_OPERANDS} \
--type=merge --patch='{"spec": {"scaleConfig": {"stt": {"size": "small"} } } }'
```

Sizing for acoustic model training

The acoustic model training (AM-patcher) microservice now provides support for different sizes: small, medium, and large. These sizes are not automatic and must be specified by the customer in the Custom Resource (CR) based on the audio file size that is used for acoustic model training. By default, the AM-patcher microservice remains scaled down when no training is in progress, optimizing cluster resource usage and allocation, and automatically scales up when a training request is received. The default size of the AM-patcher microservice is set to small, but you can change the scaling by using the Custom Resource (CR) file.

Use the following command to scale up the AM-patcher microservice:
```
oc patch watsonspeech speech-cr -n <cpd-instance> --type=merge --patch='{"spec": {"sttAMPatcher": {"keepAlive": true } } }'
```
spec.sttAMPatcher.keepAlive: By default, the parameter is set to false, that is, the AM-patcher pod is scaled down when no training is in progress. When you set to true, the AM-patcher pod remains scaled up even when no training is in progress.
Use the following command to size the AM-patcher pod:
```
oc patch watsonspeech speech-cr -n <cpd-instance> --type=merge --patch='{"spec": {"sttAMPatcher": {"size": "<size>" } } }'
```
spec.sttAMPatcher.size: By default, the size is set to small. You can choose between small, medium, or large sizes. These sizes define the threads, CPU, memory, and ephemeral storage of the chuck container.

spec.scaleConfig.stt.size: The number of replicas remains defined by the spec.scaleConfig.stt.size parameter.

Note: You can change the sizes without scaling up the AM-patcher microservice. When a training request is received, the AM-patcher automatically scales up.

Table 1. Sizes and resource allocation
AM Patcher size	Thread count	CPU	Memory	Ephemeral Storage (approx)
Small	1	~1	~14 GB	~20 GB
Medium	2	~2	~17 GB	~30 GB
Large	4	~4	~20 GB	~50 GB

Use the following table to decide the size to use based on the acoustic audio training file:

Table 2. Recommended sizes
Training Audio Length	AM Patcher size
< 10 hrs	Small
10 - 20 hrs	Medium
20 - 50 hrs	Large

Custom sizing

If the fixed t-shirt sizing is not appropriate or does not provide adequate capacity (for example, you need something bigger than the large configuration), you can use custom sizing. The custom size must be at least as large as the small size, with a value of no less than 12 for the maximum number of sessions.

To scale up your installation to a custom sizing, follow the procedure that is described in Editing the custom resource.

Speech to Text custom sizing

Custom sizing for Speech to Text provides two more configuration fields:

maxConcurrentSessions: The maximum number of concurrent sessions the installation supports.
maxMinutesOfAudioPerDay: The maximum minutes of audio the installation can be transcribe in a 24-hour period.

Internally, the configuration converts maxMinutesOfAudioPerDay into a maxConcurrentSessions value. Therefore, it is more accurate to assess how many concurrent sessions your Speech to Text installation needs to support. You can specify one or both values. If you specify both values, the configuration converts maxMinutesOfAudioPerDay into some sessions and uses the maximum of that computed value and maxConcurrentSessions for the actual sizing.

The following example shows the properties from a custom resource that specifies a custom sizing and maxConcurrentSessions:

  scaleConfig:
    stt:
      size: custom
      maxConcurrentSessions: 100

Text to Speech custom sizing

Custom sizing for Text to Speech provides two more configuration fields:

maxConcurrentSessions: The maximum number of concurrent sessions the installation supports.
maxCharactersPerDay: The maximum number of characters the installation converts from text to speech in a 24-hour period.

Internally, the configuration converts maxCharactersPerDay into a maxConcurrentSessions value. Therefore, it is more accurate to assess how many concurrent sessions your Text to Speech installation needs to support. You can specify one or both values. If you specify both values, the configuration converts maxCharactersPerDay into some sessions and uses the maximum of that computed value and maxConcurrentSessions for the actual sizing.

The following example shows the properties from a custom resource that specifies a custom sizing and maxConcurrentSessions:

  scaleConfig:
    stt:
      size: custom
      maxConcurrentSessions: 65