Run with Kubernetes

Running a simple deployment on Kubernetes offers an easy way to deploy a horizontally-scalable, Speech-to-Text service alongside your existing Kubernetes workloads. The runtime and models are configured in a single Kubernetes Deployment. Through the use of initContainers, pretrained models are copied into an emptyDir volume and loaded by the runtime. Additional training data is pre-loaded, too, to facilitate the model customization training.

Deploying a static set of models to Kubernetes

Access the container images from your cluster:

To allow your Kubernetes cluster to access the container images, use the methods from the Kubernetes documentation to store your credentials as a Kubernetes Secret. For example, use the following command to create a Secret named ibm-entitlement-key.
```
kubectl create secret docker-registry ibm-entitlement-key --docker-server=cp.icr.io --docker-username=<your-name> --docker-password=<your-password> --docker-email=<your-email>
```
where:
- your-name is your IBM Entitled Registry username
- your-password is your IBM Entitled Registry password
- your-email is your IBM Entitled Registry email address

Deploy in Kubernetes:

To run the service in a Kubernetes cluster, ensure that you have the Kubernetes CLI (kubectl) installed on your local machine, and that you have logged into the cluster.

The YAML below includes Kubernetes manifests for a Service that routes to pods created by a Deployment. With the configurations below, two models are deployed:

en-US_Multimedia (watson-stt-en-us-multimedia)
en-US_Telephony (watson-stt-en-us-telephony)

Note that those models (as well as de-DE_Telephony, fr-FR_Telephony, de-DE_Multimedia, fr-FR_Multimedia) are enhanced for improved recognition accuracy when customized, so they demand increased storage resources for pre-loaded training datasets. Copy the YAML below into a file called watson-stt-manifests.yaml

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ibm-watson-stt-embed
  labels:
    app.kubernetes.io/name: "ibm-watson-stt-embed"
    app.kubernetes.io/component: "runtime"
    app.kubernetes.io/instance: "example"
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: "ibm-watson-stt-embed"
      app.kubernetes.io/component: "runtime"
      app.kubernetes.io/instance: "example"
  progressDeadlineSeconds: 1800
  template:
    metadata:
      labels:
        app.kubernetes.io/name: "ibm-watson-stt-embed"
        app.kubernetes.io/component: "runtime"
        app.kubernetes.io/instance: "example"
    spec:
      imagePullSecrets:
      - name: ibm-entitlement-key
      initContainers:
      - name: catalog
        image: cp.icr.io/cp/ai/watson-stt-generic-models:1.11.0
        # use args to not override license entrypoint
        args:
        - cp
        - catalog.json
        - /opt/ibm/chuck.x86_64/var/catalog.json
        env:
        - name: ACCEPT_LICENSE
          value: "true"
        resources:
          limits:
            cpu: 1
            ephemeral-storage: 1Gi
            memory: 1Gi
          requests:
            cpu: 100m
            ephemeral-storage: 1Gi
            memory: 256Mi
        volumeMounts:
        - name: chuck-var
          mountPath: /opt/ibm/chuck.x86_64/var

      - name: watson-stt-en-us-multimedia
        image: cp.icr.io/cp/ai/watson-stt-en-us-multimedia:1.11.0
        args:
        - sh
        - -c
        - cp -r model/* /models/
        env:
        - name: ACCEPT_LICENSE
          value: "true"
        resources:
          limits:
            cpu: 1
            ephemeral-storage: 1Gi
            memory: 1Gi
          requests:
            cpu: 100m
            ephemeral-storage: 1Gi
            memory: 256Mi
        volumeMounts:
        - name: models
          mountPath: /models

      - name: watson-stt-en-us-telephony
        image: cp.icr.io/cp/ai/watson-stt-en-us-telephony:1.11.0
        args:
        - sh
        - -c
        - cp -r model/* /models/
        env:
        - name: ACCEPT_LICENSE
          value: "true"
        resources:
          limits:
            cpu: 1
            ephemeral-storage: 1Gi
            memory: 1Gi
          requests:
            cpu: 100m
            ephemeral-storage: 1Gi
            memory: 256Mi
        volumeMounts:
        - name: models
          mountPath: /models

      - name: prepare-models
        image: cp.icr.io/cp/ai/watson-stt-runtime:1.11.0
        args:
        - prepare_models.sh
        env:
        - name: ACCEPT_LICENSE
          value: "true"
        - name: CATALOG_PATH
          value: "var/catalog.json"
        # MODELS is a comma separated list of Model IDs
        - name: MODELS
          value: "en-US_Multimedia,en-US_Telephony"
        - name: DEFAULT_MODEL
          value: "en-US_Multimedia"              
        resources:
          limits:
            cpu: 4
            ephemeral-storage: 50Gi
            memory: 50Gi
          requests:
            cpu: 1
            ephemeral-storage: 5Gi
            memory: 5Gi
        volumeMounts:
        - name: chuck-var
          mountPath: /opt/ibm/chuck.x86_64/var
        - name: chuck-logs
          mountPath: /opt/ibm/chuck.x86_64/logs
        - name: tmp
          mountPath: /tmp
        - name: models
          mountPath: /models             
      containers:
      - name: runtime
        image: cp.icr.io/cp/ai/watson-stt-runtime:1.11.0
        resources:
          limits:
            cpu: 4
            ephemeral-storage: 10Gi
            memory: 10Gi
          requests:
            cpu: 1
            ephemeral-storage: 5Gi
            memory: 5Gi
        env:
        - name: ACCEPT_LICENSE
          value: "true"
        - name: CATALOG_PATH
          value: "var/catalog.json"
        # MODELS is a comma separated list of Model IDs
        - name: MODELS
          value: "en-US_Multimedia,en-US_Telephony"
        - name: DEFAULT_MODEL
          value: "en-US_Multimedia"
        - name: RESOURCES_CPU
          valueFrom:
            resourceFieldRef:
              containerName: runtime
              resource: requests.cpu
        - name: RESOURCES_MEMORY
          valueFrom:
            resourceFieldRef:
              containerName: runtime
              resource: requests.memory
        ports:
        - containerPort: 1080
        startupProbe:
          tcpSocket:
            port: 1080
          failureThreshold: 30
          periodSeconds: 10
        livenessProbe:
          tcpSocket:
            port: 1080
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /v1/miniHealthCheck
            port: 1080
          periodSeconds: 10
        volumeMounts:
        - name: chuck-var
          mountPath: /opt/ibm/chuck.x86_64/var
        - name: chuck-logs
          mountPath: /opt/ibm/chuck.x86_64/logs
        - name: tmp
          mountPath: /tmp
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sleep
              - "15"

      volumes:
        - name: chuck-var
          emptyDir: {}
        - name: chuck-cache
          emptyDir: {}
        - name: chuck-logs
          emptyDir: {}
        - name: models
          emptyDir: {}
        - name: tmp
          emptyDir: {}

      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: "kubernetes.io/arch"
                operator: In
                values:
                  - "amd64"

---
apiVersion: v1
kind: Service
metadata:
  name: ibm-watson-stt-embed
spec:
  type: ClusterIP
  selector:
    app.kubernetes.io/name: "ibm-watson-stt-embed"
    app.kubernetes.io/component: "runtime"
    app.kubernetes.io/instance: "example"
  ports:
    - name: runtime
      protocol: TCP
      port: 1080
      targetPort: 1080

Run on Kubernetes

Assuming the YAML above is copied into a file called watson-stt-manifests.yaml, create the service with:
```
kubectl apply -f watson-stt-manifests.yaml
```
Watch the pods progress to the running state. It may take a few minutes to pull the container images and for the pods to become ready.
```
kubectl get pods --watch
```

Use the service

The runtime container and service listen on port 1080. Set up a port-forward to the service with:

kubectl port-forward svc/ibm-watson-stt-embed 1080

In another terminal, download an example audio file:

curl -sLo example.flac https://github.com/watson-developer-cloud/doc-tutorial-downloads/raw/master/speech-to-text/0001.flac

Send it through the /recognize endpoint:

curl "http://localhost:1080/speech-to-text/api/v1/recognize" \
  --header "Content-Type: audio/flac" \
  --data-binary @example.flac