Scaling up your Watson Speech services installation

You can scale up your installation by changing the size of each of the Speech services in the custom resource. You can use the same procedures to scale down an installation.

Permissions you need for these tasks:: You must be an administrator of the Red Hat® OpenShift® project to scale up your installation.

Before you begin

Keep the following in mind when scaling up your installation:

All deployments and Kubernetes resources are managed by the Speech operator.
Do not directly edit any Kubernetes resources (for example, deployments) except for the WatsonSpeech instance.
Make all changes to your deployments only by editing the Speech custom resource.
Your configuration is physically limited by the amount of hardware resources available in your Kubernetes cluster and namespace.

Installation scaling topics

Use the following information to scale up the size of your installation. The steps involve modifying the custom resource.

Sizing your Speech services installation
- Sizing for Speech to Text
- Sizing for Text to Speech
Sizing configuration
Custom sizing

Updating the t-shirt size for your installation as described in has no effect on the number of replicas that are used by your datastores. For maximum flexibility, scaling up the Speech services is completely separate from increasing the number of replicas for your datastores. For more information about increasing the number of replicas for your datastores, see Scaling up your datastores.

Sizing your Speech services installation

The Speech services support four standard installation sizes, which you configure at the time of installation by using the custom resource. The Speech services support t-shirt sizing as well as custom sizing. Both Speech services use this sizing, though the size of the two services can be different. You can also use the custom resource to update or reconfigure the sizes of the existing services.

Speech sizing depends on the number concurrent requests an instance has to process. The higher the number of concurrent requests, the higher the resource requirements.

You can create any number of instances of either the Speech to Text or the Text to Speech service. But adding new instances does not scale the resources available for the instances. You need to size your installation to support the sum of the number of concurrent connections you expect to handle across all of your service instances.

Sizing for Speech to Text

The following table describes the t-shirt sizing for the Speech to Text service. The first column lists the available sizes. For each size, the remaining columns list the number of runtime pods, the maximum number of concurrent sessions depending on the type of Speech to Text models that you install, and the number of CPUs allocated per pod. The final column provides additional information about the sizes. For more information about using previous- or next-generation models, see Installing Speech to Text models.

Size	Number of runtime pods	Maximum number of concurrent sessions with previous-generation models	Maximum number of concurrent sessions with next-generation models	Number of CPUs allocated per pod	Notes
xsmall	1	6	20	4	The default configuration, which is meant for development purposes.
small	2	12	40	4	High availability enabled. Every microservice has at least 2 instances running.
medium	2	26	80	8	High availability enabled. Every microservice has at least 2 instances running.
large	4	52	160	8	High availability enabled. Every microservice has at least 2 instances running.
custom	2+	12+	40+	4 or 8 depending on configuration	By default, custom is a small size that has high availability enabled. The minimum value for the maximum number of sessions is 12, but the actual value depends on configuration parameters. The IBM Sales team has a sizing calculator that you can use to arrive at an accurate value for your installation. Note: If you install a mix of previous- and next-generation models, the limits for previous-generation models apply.

Sizing for Text to Speech

The following table describes the t-shirt sizing for the Text to Speech service. The first column lists the available sizes. For each size, the remaining columns list the number of runtime pods and the maximum number of concurrent sessions. For more information about the available voices, see Installing Text to Speech voices.

Size	Number of runtime pods	Maximum number of concurrent sessions
xsmall	1	6
small	2	12
medium	4	26
large	8	52
custom	2+	12+

Sizing configuration

When you install the Speech services, you choose one of the t-shirt size configurations. The resources required for the installation, in terms of CPUs and memory, depend on the configuration you select.

The development configuration (the xsmall size) is the default configuration. This configuration has a minimal footprint and is meant for development purposes and as a proof of concept. It can handle several concurrent recognition sessions only, and it is not highly available because some of the core microservices have no redundancy (they are single-replica).
The production configuration (any of the other sizes) is a highly available solution that is intended to run production workloads. This configuration, which requires a minimum of three worker nodes, provides a highly available solution.

The scaleConfig property of the custom resource provides the following size specifications by default:

  scaleConfig:
    stt:
      size: xsmall
    tts:
      size: xsmall

You can start with a configuration of xsmall or small. If that proves inadequate as your usage grows, you can update the custom resource with a larger size that is more appropriate.

To scale up your installation, you edit your custom resource. You can modify your custom resource in either of the following ways:

By using the procedure described in Editing the custom resource.
By entering the following oc patch command:
```
oc patch watsonspeech ${CUSTOM_RESOURCE_SPEECH} -n ${PROJECT_CPD_INSTANCE} \
--type=merge --patch='{"spec": {"scaleConfig": {"<speech-service>": {"size": "<size>"} } } }'
```
where

${PROJECT_CPD_INSTANCE}

The name of the namespace (or project) where you installed the Speech services.

<speech-service>

stt or tts, depending on whether you are scaling up Speech to Text or Text to Speech.

<size>

The size specification you want to use, for instance, small, medium, or one of the other supported sizes.

For example, to scale up your Speech to Text installation from xsmall to small, specify the following command:
```
oc patch watsonspeech ${CUSTOM_RESOURCE_SPEECH} -n ${PROJECT_CPD_INSTANCE} \
--type=merge --patch='{"spec": {"scaleConfig": {"stt": {"size": "small"} } } }'
```

Custom sizing

If the fixed t-shirt sizing is not appropriate or does not provide adequate capacity (for example, you need something bigger than the large configuration), you can use custom sizing. The custom size must be at least as large as the small size, with a value of no less than 12 for the maximum number of sessions.

To scale up your installation to a custom sizing, follow the procedure described in Editing the custom resource.

Speech to Text custom sizing

Custom sizing for Speech to Text provides two additional configuration fields:

maxConcurrentSessions: The maximum number of concurrent sessions the installation supports.
maxMinutesOfAudioPerDay: The maximum minutes of audio the installation can transcribe in a 24-hour period.

Internally, the configuration converts maxMinutesOfAudioPerDay into a maxConcurrentSessions value. Therefore, it is more accurate to assess how many concurrent sessions your Speech to Text installation needs to support. You can specify one or both values. If you specify both values, the configuration converts maxMinutesOfAudioPerDay into a number of sessions and uses the maximum of that computed value and maxConcurrentSessions for the actual sizing.

The following example shows the properties from a custom resource that specifies a custom sizing and maxConcurrentSessions:

  scaleConfig:
    stt:
      size: custom
      maxConcurrentSessions: 100

Text to Speech custom sizing

Custom sizing for Text to Speech provides two additional configuration fields:

maxConcurrentSessions: The maximum number of concurrent sessions the installation supports.
maxCharactersPerDay: The maximum number of characters the installation converts from text to speech in a 24-hour period.

Internally, the configuration converts maxCharactersPerDay into a maxConcurrentSessions value. Therefore, it is more accurate to assess how many concurrent sessions your Text to Speech installation needs to support. You can specify one or both values. If you specify both values, the configuration converts maxCharactersPerDay into a number of sessions and uses the maximum of that computed value and maxConcurrentSessions for the actual sizing.

The following example shows the properties from a custom resource that specifies a custom sizing and maxConcurrentSessions:

  scaleConfig:
    stt:
      size: custom
      maxConcurrentSessions: 65