Run with Kubernetes
Running a simple deployment on Kubernetes offers an easy way to deploy a horizontally-scalable, Speech-to-Text service alongside your existing Kubernetes workloads. The runtime and models are configured in a single Kubernetes Deployment. Through
the use of initContainers
, pretrained models are copied into an emptyDir
volume and loaded by the runtime. Additional training data is pre-loaded, too, to facilitate the model customization training.
Deploying a static set of models to Kubernetes
-
Access the container images from your cluster:
To allow your Kubernetes cluster to access the container images, use the methods from the Kubernetes documentation to store your credentials as a Kubernetes Secret. For example, use the following command to create a Secret named
ibm-entitlement-key
.kubectl create secret docker-registry ibm-entitlement-key --docker-server=cp.icr.io --docker-username=<your-name> --docker-password=<your-password> --docker-email=<your-email>
where:
your-name
is your IBM Entitled Registry usernameyour-password
is your IBM Entitled Registry passwordyour-email
is your IBM Entitled Registry email address
-
Deploy in Kubernetes:
To run the service in a Kubernetes cluster, ensure that you have the Kubernetes CLI (kubectl) installed on your local machine, and that you have logged into the cluster.
The YAML below includes Kubernetes manifests for a Service that routes to pods created by a Deployment. With the configurations below, two models are deployed:
- en-US_Multimedia (
watson-stt-en-us-multimedia
) - en-US_Telephony (
watson-stt-en-us-telephony
)
Note that those models (as well as
de-DE_Telephony
,fr-FR_Telephony
,de-DE_Multimedia
,fr-FR_Multimedia
) are enhanced for improved recognition accuracy when customized, so they demand increased storage resources for pre-loaded training datasets. Copy the YAML below into a file calledwatson-stt-manifests.yaml
--- apiVersion: apps/v1 kind: Deployment metadata: name: ibm-watson-stt-embed labels: app.kubernetes.io/name: "ibm-watson-stt-embed" app.kubernetes.io/component: "runtime" app.kubernetes.io/instance: "example" spec: selector: matchLabels: app.kubernetes.io/name: "ibm-watson-stt-embed" app.kubernetes.io/component: "runtime" app.kubernetes.io/instance: "example" progressDeadlineSeconds: 1800 template: metadata: labels: app.kubernetes.io/name: "ibm-watson-stt-embed" app.kubernetes.io/component: "runtime" app.kubernetes.io/instance: "example" spec: imagePullSecrets: - name: ibm-entitlement-key initContainers: - name: catalog image: cp.icr.io/cp/ai/watson-stt-generic-models:1.11.0 # use args to not override license entrypoint args: - cp - catalog.json - /opt/ibm/chuck.x86_64/var/catalog.json env: - name: ACCEPT_LICENSE value: "true" resources: limits: cpu: 1 ephemeral-storage: 1Gi memory: 1Gi requests: cpu: 100m ephemeral-storage: 1Gi memory: 256Mi volumeMounts: - name: chuck-var mountPath: /opt/ibm/chuck.x86_64/var - name: watson-stt-en-us-multimedia image: cp.icr.io/cp/ai/watson-stt-en-us-multimedia:1.11.0 args: - sh - -c - cp -r model/* /models/ env: - name: ACCEPT_LICENSE value: "true" resources: limits: cpu: 1 ephemeral-storage: 1Gi memory: 1Gi requests: cpu: 100m ephemeral-storage: 1Gi memory: 256Mi volumeMounts: - name: models mountPath: /models - name: watson-stt-en-us-telephony image: cp.icr.io/cp/ai/watson-stt-en-us-telephony:1.11.0 args: - sh - -c - cp -r model/* /models/ env: - name: ACCEPT_LICENSE value: "true" resources: limits: cpu: 1 ephemeral-storage: 1Gi memory: 1Gi requests: cpu: 100m ephemeral-storage: 1Gi memory: 256Mi volumeMounts: - name: models mountPath: /models - name: prepare-models image: cp.icr.io/cp/ai/watson-stt-runtime:1.11.0 args: - prepare_models.sh env: - name: ACCEPT_LICENSE value: "true" - name: CATALOG_PATH value: "var/catalog.json" # MODELS is a comma separated list of Model IDs - name: MODELS value: "en-US_Multimedia,en-US_Telephony" - name: DEFAULT_MODEL value: "en-US_Multimedia" resources: limits: cpu: 4 ephemeral-storage: 50Gi memory: 50Gi requests: cpu: 1 ephemeral-storage: 5Gi memory: 5Gi volumeMounts: - name: chuck-var mountPath: /opt/ibm/chuck.x86_64/var - name: chuck-logs mountPath: /opt/ibm/chuck.x86_64/logs - name: tmp mountPath: /tmp - name: models mountPath: /models containers: - name: runtime image: cp.icr.io/cp/ai/watson-stt-runtime:1.11.0 resources: limits: cpu: 4 ephemeral-storage: 10Gi memory: 10Gi requests: cpu: 1 ephemeral-storage: 5Gi memory: 5Gi env: - name: ACCEPT_LICENSE value: "true" - name: CATALOG_PATH value: "var/catalog.json" # MODELS is a comma separated list of Model IDs - name: MODELS value: "en-US_Multimedia,en-US_Telephony" - name: DEFAULT_MODEL value: "en-US_Multimedia" - name: RESOURCES_CPU valueFrom: resourceFieldRef: containerName: runtime resource: requests.cpu - name: RESOURCES_MEMORY valueFrom: resourceFieldRef: containerName: runtime resource: requests.memory ports: - containerPort: 1080 startupProbe: tcpSocket: port: 1080 failureThreshold: 30 periodSeconds: 10 livenessProbe: tcpSocket: port: 1080 periodSeconds: 10 readinessProbe: httpGet: path: /v1/miniHealthCheck port: 1080 periodSeconds: 10 volumeMounts: - name: chuck-var mountPath: /opt/ibm/chuck.x86_64/var - name: chuck-logs mountPath: /opt/ibm/chuck.x86_64/logs - name: tmp mountPath: /tmp lifecycle: preStop: exec: command: - /bin/sleep - "15" volumes: - name: chuck-var emptyDir: {} - name: chuck-cache emptyDir: {} - name: chuck-logs emptyDir: {} - name: models emptyDir: {} - name: tmp emptyDir: {} affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "kubernetes.io/arch" operator: In values: - "amd64" --- apiVersion: v1 kind: Service metadata: name: ibm-watson-stt-embed spec: type: ClusterIP selector: app.kubernetes.io/name: "ibm-watson-stt-embed" app.kubernetes.io/component: "runtime" app.kubernetes.io/instance: "example" ports: - name: runtime protocol: TCP port: 1080 targetPort: 1080
- en-US_Multimedia (
-
Run on Kubernetes
Assuming the YAML above is copied into a file called
watson-stt-manifests.yaml
, create the service with:kubectl apply -f watson-stt-manifests.yaml
Watch the pods progress to the running state. It may take a few minutes to pull the container images and for the pods to become ready.
kubectl get pods --watch
-
Use the service
The runtime container and service listen on port 1080. Set up a port-forward to the service with:
kubectl port-forward svc/ibm-watson-stt-embed 1080
In another terminal, download an example audio file:
curl -sLo example.flac https://github.com/watson-developer-cloud/doc-tutorial-downloads/raw/master/speech-to-text/0001.flac
Send it through the
/recognize
endpoint:curl "http://localhost:1080/speech-to-text/api/v1/recognize" \ --header "Content-Type: audio/flac" \ --data-binary @example.flac