Installing Watson Speech to Text

Before you can use Watson™ Speech to Text or Watson Text to Speech, you must prepare the cluster and then install the services on the cluster. The two services use the same installation steps and can be installed separately or together. If you install both services, they share datastores for a more efficient utilization of resources and simplified support.

Before you begin

Important: Watson Speech to Text version 1.2.x on IBM Cloud Pak for Data version 3.5 is out of service as of 1 May 2022. Watson Speech to Text version 1.2.x is no longer supported and is no longer available for installation. For more information, see Release notes for Speech to Text for IBM Cloud Pak for Data.

This procedure installs a single instance of both the Watson Speech to Text and Watson Text to Speechservices.

Required role: To complete this task, you must be an administrator of the project (namespace) where you are deploying the service.

You must meet the system requirements for the cluster. For more information, see System requirements.

The Watson Speech services also have their own requirements, which systems that host a deployment must meet:

IBM Watson Speech Services for IBM® Cloud Pak for Data can run on the x86-64 architecture only.
CPUs must support the Advanced Vector Extensions (AVX) 2 instruction set. For a list of CPUs that include this support, see the AVX Wikipedia page.

There are two typical installation configurations:

The development configuration, which is the configuration that is used in the default installation, has a minimal footprint and is meant for development purposes and as a proof of concept. It can only handle several concurrent recognition sessions, and it is not highly available because some of the core component have no redundancy (single replica).
The production configuration is a highly available solution that is intended to run production workloads. This configuration can be achieved by scaling up the development configuration after installation.

The following resources are required in addition to the minimum platform requirements.

Required resources	For a development configuration	For a production configuration
Minimum worker nodes	3	3
Minimum CPUs	Speech to Text: 11; Text to Speech: 5	Speech to Text: 19; Text to Speech: 10
Minimum memory	Speech to Text: 60 GB; Text to Speech: 20 GB	Speech to Text: 90 GB; Text to Speech: 40 GB
Minimum disk space per node	500 GB	500 GB

Note: Minimum memory requirements depend on the Watson Speech to Text models and Watson Text to Speech voices that are installed.

Procedure

Ensure that you have proper permissions on the cluster and that you have already installed IBM Cloud Pak for Data.
Complete the following tasks to prepare your cluster and finish the installation process.