Scaling up your Watson Speech services installation
You can scale up your installation by changing the size of each of the Speech services in the custom resource. You can use the same procedures to scale down an installation.
- Permissions you need for these tasks:
- Be an administrator of the Red Hat® OpenShift® project to scale up your installation.
Before you begin
Keep the following in mind when scaling up your installation:
- All deployments and Kubernetes resources are managed by the Speech operator.
- Do not directly edit any Kubernetes resources (for example, deployments) except for the
WatsonSpeechinstance. - Make all changes to your deployment only by editing the Speech custom resource.
- Your configuration is physically limited by the amount of hardware resources available in your Kubernetes cluster and namespace.
Installation scaling topics
Use the following information to scale up the size of your installation. The steps involve modifying the custom resource.
- Sizing your Speech services installation
- Sizing configuration
- Sizing for acoustic model training
- Custom sizing
Updating the t-shirt size for your installation as described in has no effect on the number of replicas that are used by your data stores. For maximum flexibility, scaling up the Speech services is separate from increasing the number of replicas for your data stores. For more information about increasing the number of replicas for your data stores, see Scaling up your datastores.
Sizing your Speech services installation
The Speech services support four standard installation sizes, which you configure at the time of installation by using the custom resource. The Speech services support t-shirt sizing as well as custom sizing. Both Speech services use this sizing, though the size of the two services can be different. You can also use the custom resource to update or reconfigure the sizes of the existing services.
Speech sizing depends on the number concurrent requests that an instance must process. The higher the number of concurrent requests, the higher the resource requirements.
You can create any number of instances of either the Speech to Text or the Text to Speech service. But adding new instances does not scale the resources available for the instances. You need to size your installation to support the sum of the number of concurrent connections you expect to handle across all of your service instances.
503 (Service Unavailable) error is
returned to the client application. To achieve the concurrency advertised by the tables below, you
must implement client-side retry logic so that requests can land on a runtime pod with sufficient
resources to process them. Requests returned with 502 or 503 error codes should be retried. Setting
the max retry count to 10 works well for most production applications. Sizing for Speech to Text
The following table describes the t-shirt sizing for the Speech to Text service. The first column lists the available sizes. For each size, the remaining columns list the number of runtime pods, the maximum number of concurrent sessions depending on the type of Speech to Text models that you install, and the number of CPUs allocated per pod. The final column provides additional information about the sizes. For more information about using previous- or next-generation models, see Installing Speech to Text models.
| Size | Number of runtime pods | Maximum number of concurrent sessions with previous-generation models | Maximum number of concurrent sessions with next-generation models | Number of CPUs allocated per pod | Notes |
|---|---|---|---|---|---|
| xsmall | 1 | 6 | 20 | 4 | The default configuration, which is meant for development purposes. |
Minimum reserved CPU (small_mincpureq) |
2 | 12 | 40 | 4 | High availability is enabled. Every microservice has at least 2 instances running. |
| small | 2 | 12 | 40 | 4 | High availability is enabled. Every microservice has at least 2 instances running. |
| medium | 2 | 26 | 80 | 8 | High availability is enabled. Every microservice has at least 2 instances running. |
| large | 4 | 52 | 160 | 8 | High availability is enabled. Every microservice has at least 2 instances running. |
| custom | 2+ | 12+ | 40+ | 4 or 8 depending on configuration |
By default, the custom is a small size that has high availability that is enabled. The minimum value for the maximum number of sessions is 12, but the actual value depends on configuration parameters. The IBM Sales team has a sizing calculator that you can use to arrive at an accurate value for your installation. Note: If you install a mix of previous- and next-generation models, the limits for previous-generation models apply. |
Sizing for Text to Speech
The following table describes the t-shirt sizing for the Text to Speech service. The first column lists the available sizes. For each size, the remaining columns list the number of runtime pods and the maximum number of concurrent sessions. For more information about the available voices, see Installing Text to Speech voices.
| Size | Number of runtime pods | Maximum number of concurrent sessions |
|---|---|---|
| xsmall | 1 | 6 |
Minimum reserved CPU (small_mincpureq) |
2 | 12 |
| small | 2 | 12 |
| medium | 4 | 26 |
| large | 8 | 52 |
| custom | 2+ | 12+ |
Sizing configuration
When you install the Speech services, you choose one of the t-shirt size configurations. The resources required for the installation, in terms of CPUs and memory, depend on the configuration you select.
- The development configuration (the
xsmallsize) is the default configuration. This configuration has a minimal footprint and is meant for development purposes and as a proof of concept. It can handle several concurrent recognition sessions only, and it is not highly available because some of the core microservices have no redundancy (they are single-replica). - The production configuration (any of the other sizes) is a highly available solution that is intended to run production workloads. This configuration, which requires a minimum of three worker nodes, provides a highly available solution.
The scaleConfig property of the custom resource provides the following size
specifications by default:
scaleConfig:
stt:
size: xsmall
tts:
size: xsmall
You can start with a configuration of xsmall or small. If that
proves inadequate as your usage grows, you can update the custom resource with a larger size that is
more appropriate.
To scale up your installation, you edit your custom resource. You can modify your custom resource in either of the following ways:
By using the procedure described in Editing the custom resource.
By entering the following
oc patchcommand:oc patch watsonspeech ${CUSTOM_RESOURCE_SPEECH} -n ${PROJECT_CPD_INST_OPERANDS} \ --type=merge --patch='{"spec": {"scaleConfig": {"<speech-service>": {"size": "<size>"} } } }'Where
${PROJECT_CPD_INST_OPERANDS}- The name of the namespace (or project) where you installed the Speech services.
- <speech-service>
sttortts, depending on whether you are scaling up Speech to Text or Text to Speech.- <size>
- The size specification that you want to use, for instance,
small,medium, or one of the other supported sizes.
For example, to scale up your Speech to Text installation from
xsmalltosmall, specify the following command:oc patch watsonspeech ${CUSTOM_RESOURCE_SPEECH} -n ${PROJECT_CPD_INST_OPERANDS} \ --type=merge --patch='{"spec": {"scaleConfig": {"stt": {"size": "small"} } } }'
Sizing for acoustic model training
The acoustic model training (AM-patcher) microservice now provides support for different sizes: small, medium, and large. These sizes are not automatic and must be specified by the customer in the Custom Resource (CR) based on the audio file size that is used for acoustic model training. By default, the AM-patcher microservice remains scaled down when no training is in progress, optimizing cluster resource usage and allocation, and automatically scales up when a training request is received. The default size of the AM-patcher microservice is set to small, but you can change the scaling by using the Custom Resource (CR) file.
- Use the following command to scale up the AM-patcher microservice:
oc patch watsonspeech speech-cr -n <cpd-instance> --type=merge --patch='{"spec": {"sttAMPatcher": {"keepAlive": true } } }'spec.sttAMPatcher.keepAlive: By default, the parameter is set to false, that is, the AM-patcher pod is scaled down when no training is in progress. When you set to true, the AM-patcher pod remains scaled up even when no training is in progress.
- Use the following command to size the AM-patcher pod:
oc patch watsonspeech speech-cr -n <cpd-instance> --type=merge --patch='{"spec": {"sttAMPatcher": {"size": "<size>" } } }'spec.sttAMPatcher.size: By default, the size is set to small. You can choose between small, medium, or large sizes. These sizes define the threads, CPU, memory, and ephemeral storage of the chuck container.
spec.scaleConfig.stt.size: The number of replicas remains defined by the spec.scaleConfig.stt.size parameter.
| AM Patcher size | Thread count | CPU | Memory | Ephemeral Storage (approx) |
|---|---|---|---|---|
| Small | 1 | ~1 | ~14 GB | ~20 GB |
| Medium | 2 | ~2 | ~17 GB | ~30 GB |
| Large | 4 | ~4 | ~20 GB | ~50 GB |
Use the following table to decide the size to use based on the acoustic audio training file:
| Training Audio Length | AM Patcher size |
|---|---|
| < 10 hrs | Small |
| 10 - 20 hrs | Medium |
| 20 - 50 hrs | Large |
Custom sizing
If the fixed t-shirt sizing is not appropriate or does not provide adequate capacity (for
example, you need something bigger than the large configuration), you can use
custom sizing. The custom size must be at least as large as the small size, with a
value of no less than 12 for the maximum number of sessions.
To scale up your installation to a custom sizing, follow the procedure that is described in Editing the custom resource.
Speech to Text custom sizing
Custom sizing for Speech to Text provides two more configuration fields:
maxConcurrentSessions- The maximum number of concurrent sessions the installation supports.
maxMinutesOfAudioPerDay- The maximum minutes of audio the installation can be transcribe in a 24-hour period.
Internally, the configuration converts maxMinutesOfAudioPerDay into a
maxConcurrentSessions value. Therefore, it is more accurate to assess how many
concurrent sessions your Speech to Text installation needs to support. You can specify one or both
values. If you specify both values, the configuration converts
maxMinutesOfAudioPerDay into some sessions and uses the maximum of that computed
value and maxConcurrentSessions for the actual sizing.
The following example shows the properties from a custom resource that specifies a
custom sizing and maxConcurrentSessions:
scaleConfig:
stt:
size: custom
maxConcurrentSessions: 100
Text to Speech custom sizing
Custom sizing for Text to Speech provides two more configuration fields:
maxConcurrentSessions- The maximum number of concurrent sessions the installation supports.
maxCharactersPerDay- The maximum number of characters the installation converts from text to speech in a 24-hour period.
Internally, the configuration converts maxCharactersPerDay into a
maxConcurrentSessions value. Therefore, it is more accurate to assess how many
concurrent sessions your Text to Speech installation needs to support. You can specify one or both
values. If you specify both values, the configuration converts maxCharactersPerDay
into some sessions and uses the maximum of that computed value and
maxConcurrentSessions for the actual sizing.
The following example shows the properties from a custom resource that specifies a
custom sizing and maxConcurrentSessions:
scaleConfig:
stt:
size: custom
maxConcurrentSessions: 65