Scaling up your Watson Speech services installation
You can scale up your installation by changing the size of each of the Speech services in the custom resource. You can use the same procedures to scale down an installation.
- Permissions you need for these tasks:
- You must be an administrator of the Red Hat® OpenShift® project to scale up your installation.
Before you begin
Keep the following in mind when scaling up your installation:
- All deployments and Kubernetes resources are managed by the Speech operator.
- Do not directly edit any Kubernetes resources (for example, deployments) except for the
WatsonSpeech
instance. - Make all changes to your deployments only by editing the Speech custom resource.
- Your configuration is physically limited by the amount of hardware resources available in your Kubernetes cluster and namespace.
Installation scaling topics
Use the following information to scale up the size of your installation. The steps involve modifying the custom resource.
Updating the t-shirt size for your installation as described in has no effect on the number of replicas that are used by your datastores. For maximum flexibility, scaling up the Speech services is completely separate from increasing the number of replicas for your datastores. For more information about increasing the number of replicas for your datastores, see Scaling up your datastores.
Sizing your Speech services installation
The Speech services support four standard installation sizes, which you configure at the time of installation by using the custom resource. The Speech services support t-shirt sizing as well as custom sizing. Both Speech services use this sizing, though the size of the two services can be different. You can also use the custom resource to update or reconfigure the sizes of the existing services.
Speech sizing depends on the number concurrent requests an instance has to process. The higher the number of concurrent requests, the higher the resource requirements.
You can create any number of instances of either the Speech to Text or the Text to Speech service. But adding new instances does not scale the resources available for the instances. You need to size your installation to support the sum of the number of concurrent connections you expect to handle across all of your service instances.
Sizing for Speech to Text
The following table describes the t-shirt sizing for the Speech to Text service. The first column lists the available sizes. For each size, the remaining columns list the number of runtime pods, the maximum number of concurrent sessions depending on the type of Speech to Text models that you install, and the number of CPUs allocated per pod. The final column provides additional information about the sizes. For more information about using previous- or next-generation models, see Installing Speech to Text models.
Size | Number of runtime pods | Maximum number of concurrent sessions with previous-generation models | Maximum number of concurrent sessions with next-generation models | Number of CPUs allocated per pod | Notes |
---|---|---|---|---|---|
xsmall | 1 | 6 | 20 | 4 | The default configuration, which is meant for development purposes. |
small | 2 | 12 | 40 | 4 | High availability enabled. Every microservice has at least 2 instances running. |
medium | 2 | 26 | 80 | 8 | High availability enabled. Every microservice has at least 2 instances running. |
large | 4 | 52 | 160 | 8 | High availability enabled. Every microservice has at least 2 instances running. |
custom | 2+ | 12+ | 40+ | 4 or 8 depending on configuration |
By default, custom is a small size that has high availability enabled. The minimum value for the maximum number of sessions is 12, but the actual value depends on configuration parameters. The IBM Sales team has a sizing calculator that you can use to arrive at an accurate value for your installation. Note: If you install a mix of previous- and next-generation models, the limits for previous-generation models apply. |
Sizing for Text to Speech
The following table describes the t-shirt sizing for the Text to Speech service. The first column lists the available sizes. For each size, the remaining columns list the number of runtime pods and the maximum number of concurrent sessions. For more information about the available voices, see Installing Text to Speech voices.
Size | Number of runtime pods | Maximum number of concurrent sessions |
---|---|---|
xsmall | 1 | 6 |
small | 2 | 12 |
medium | 4 | 26 |
large | 8 | 52 |
custom | 2+ | 12+ |
Sizing configuration
When you install the Speech services, you choose one of the t-shirt size configurations. The resources required for the installation, in terms of CPUs and memory, depend on the configuration you select.
- The development configuration (the
xsmall
size) is the default configuration. This configuration has a minimal footprint and is meant for development purposes and as a proof of concept. It can handle several concurrent recognition sessions only, and it is not highly available because some of the core microservices have no redundancy (they are single-replica). - The production configuration (any of the other sizes) is a highly available solution that is intended to run production workloads. This configuration, which requires a minimum of three worker nodes, provides a highly available solution.
The scaleConfig
property of the custom resource provides the following size
specifications by default:
scaleConfig:
stt:
size: xsmall
tts:
size: xsmall
You can start with a configuration of xsmall
or small
. If that
proves inadequate as your usage grows, you can update the custom resource with a larger size that is
more appropriate.
To scale up your installation, you edit your custom resource. You can modify your custom resource in either of the following ways:
By using the procedure described in Editing the custom resource.
By entering the following
oc patch
command:oc patch watsonspeech ${CUSTOM_RESOURCE_SPEECH} -n ${PROJECT_CPD_INSTANCE} \ --type=merge --patch='{"spec": {"scaleConfig": {"<speech-service>": {"size": "<size>"} } } }'
where
${PROJECT_CPD_INSTANCE}
- The name of the namespace (or project) where you installed the Speech services.
- <speech-service>
stt
ortts
, depending on whether you are scaling up Speech to Text or Text to Speech.- <size>
- The size specification you want to use, for instance,
small
,medium
, or one of the other supported sizes.
For example, to scale up your Speech to Text installation from
xsmall
tosmall
, specify the following command:oc patch watsonspeech ${CUSTOM_RESOURCE_SPEECH} -n ${PROJECT_CPD_INSTANCE} \ --type=merge --patch='{"spec": {"scaleConfig": {"stt": {"size": "small"} } } }'
Custom sizing
If the fixed t-shirt sizing is not appropriate or does not provide adequate capacity (for
example, you need something bigger than the large
configuration), you can use
custom sizing. The custom size must be at least as large as the small
size, with a
value of no less than 12 for the maximum number of sessions.
To scale up your installation to a custom sizing, follow the procedure described in Editing the custom resource.
Speech to Text custom sizing
Custom sizing for Speech to Text provides two additional configuration fields:
maxConcurrentSessions
- The maximum number of concurrent sessions the installation supports.
maxMinutesOfAudioPerDay
- The maximum minutes of audio the installation can transcribe in a 24-hour period.
Internally, the configuration converts maxMinutesOfAudioPerDay
into a
maxConcurrentSessions
value. Therefore, it is more accurate to assess how many
concurrent sessions your Speech to Text installation needs to support. You can specify one or both
values. If you specify both values, the configuration converts
maxMinutesOfAudioPerDay
into a number of sessions and uses the maximum of that
computed value and maxConcurrentSessions
for the actual sizing.
The following example shows the properties from a custom resource that specifies a
custom
sizing and maxConcurrentSessions
:
scaleConfig:
stt:
size: custom
maxConcurrentSessions: 100
Text to Speech custom sizing
Custom sizing for Text to Speech provides two additional configuration fields:
maxConcurrentSessions
- The maximum number of concurrent sessions the installation supports.
maxCharactersPerDay
- The maximum number of characters the installation converts from text to speech in a 24-hour period.
Internally, the configuration converts maxCharactersPerDay
into a
maxConcurrentSessions
value. Therefore, it is more accurate to assess how many
concurrent sessions your Text to Speech installation needs to support. You can specify one or both
values. If you specify both values, the configuration converts maxCharactersPerDay
into a number of sessions and uses the maximum of that computed value and
maxConcurrentSessions
for the actual sizing.
The following example shows the properties from a custom resource that specifies a
custom
sizing and maxConcurrentSessions
:
scaleConfig:
stt:
size: custom
maxConcurrentSessions: 65