Run with Kubernetes

Running a simple deployment on Kubernetes offers an easy way to deploy a horizontally-scalable, Speech-to-Text service alongside your existing Kubernetes workloads. The runtime and models are configured in a single Kubernetes Deployment. Through the use of initContainers, pretrained models are copied into an emptyDir volume and loaded by the runtime. Additional training data is pre-loaded, too, to facilitate the model customization training.

Deploying a static set of models to Kubernetes

  1. Access the container images from your cluster:

    To allow your Kubernetes cluster to access the container images, use the methods from the Kubernetes documentation to store your credentials as a Kubernetes Secret. For example, use the following command to create a Secret named ibm-entitlement-key.

    kubectl create secret docker-registry ibm-entitlement-key --docker-server=cp.icr.io --docker-username=<your-name> --docker-password=<your-password> --docker-email=<your-email>
    

    where:

    • your-name is your IBM Entitled Registry username
    • your-password is your IBM Entitled Registry password
    • your-email is your IBM Entitled Registry email address
  2. Deploy in Kubernetes:

    To run the service in a Kubernetes cluster, ensure that you have the Kubernetes CLI (kubectl) installed on your local machine, and that you have logged into the cluster.

    The YAML below includes Kubernetes manifests for a Service that routes to pods created by a Deployment. With the configurations below, two models are deployed:

    • en-US_Multimedia (watson-stt-en-us-multimedia)
    • en-US_Telephony (watson-stt-en-us-telephony)

    Note that those models (as well as de-DE_Telephony, fr-FR_Telephony, de-DE_Multimedia, fr-FR_Multimedia) are enhanced for improved recognition accuracy when customized, so they demand increased storage resources for pre-loaded training datasets. Copy the YAML below into a file called watson-stt-manifests.yaml

    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: ibm-watson-stt-embed
      labels:
        app.kubernetes.io/name: "ibm-watson-stt-embed"
        app.kubernetes.io/component: "runtime"
        app.kubernetes.io/instance: "example"
    spec:
      selector:
        matchLabels:
          app.kubernetes.io/name: "ibm-watson-stt-embed"
          app.kubernetes.io/component: "runtime"
          app.kubernetes.io/instance: "example"
      progressDeadlineSeconds: 1800
      template:
        metadata:
          labels:
            app.kubernetes.io/name: "ibm-watson-stt-embed"
            app.kubernetes.io/component: "runtime"
            app.kubernetes.io/instance: "example"
        spec:
          imagePullSecrets:
          - name: ibm-entitlement-key
          initContainers:
          - name: catalog
            image: cp.icr.io/cp/ai/watson-stt-generic-models:1.11.0
            # use args to not override license entrypoint
            args:
            - cp
            - catalog.json
            - /opt/ibm/chuck.x86_64/var/catalog.json
            env:
            - name: ACCEPT_LICENSE
              value: "true"
            resources:
              limits:
                cpu: 1
                ephemeral-storage: 1Gi
                memory: 1Gi
              requests:
                cpu: 100m
                ephemeral-storage: 1Gi
                memory: 256Mi
            volumeMounts:
            - name: chuck-var
              mountPath: /opt/ibm/chuck.x86_64/var
    
          - name: watson-stt-en-us-multimedia
            image: cp.icr.io/cp/ai/watson-stt-en-us-multimedia:1.11.0
            args:
            - sh
            - -c
            - cp -r model/* /models/
            env:
            - name: ACCEPT_LICENSE
              value: "true"
            resources:
              limits:
                cpu: 1
                ephemeral-storage: 1Gi
                memory: 1Gi
              requests:
                cpu: 100m
                ephemeral-storage: 1Gi
                memory: 256Mi
            volumeMounts:
            - name: models
              mountPath: /models
    
          - name: watson-stt-en-us-telephony
            image: cp.icr.io/cp/ai/watson-stt-en-us-telephony:1.11.0
            args:
            - sh
            - -c
            - cp -r model/* /models/
            env:
            - name: ACCEPT_LICENSE
              value: "true"
            resources:
              limits:
                cpu: 1
                ephemeral-storage: 1Gi
                memory: 1Gi
              requests:
                cpu: 100m
                ephemeral-storage: 1Gi
                memory: 256Mi
            volumeMounts:
            - name: models
              mountPath: /models
    
          - name: prepare-models
            image: cp.icr.io/cp/ai/watson-stt-runtime:1.11.0
            args:
            - prepare_models.sh
            env:
            - name: ACCEPT_LICENSE
              value: "true"
            - name: CATALOG_PATH
              value: "var/catalog.json"
            # MODELS is a comma separated list of Model IDs
            - name: MODELS
              value: "en-US_Multimedia,en-US_Telephony"
            - name: DEFAULT_MODEL
              value: "en-US_Multimedia"              
            resources:
              limits:
                cpu: 4
                ephemeral-storage: 50Gi
                memory: 50Gi
              requests:
                cpu: 1
                ephemeral-storage: 5Gi
                memory: 5Gi
            volumeMounts:
            - name: chuck-var
              mountPath: /opt/ibm/chuck.x86_64/var
            - name: chuck-logs
              mountPath: /opt/ibm/chuck.x86_64/logs
            - name: tmp
              mountPath: /tmp
            - name: models
              mountPath: /models             
          containers:
          - name: runtime
            image: cp.icr.io/cp/ai/watson-stt-runtime:1.11.0
            resources:
              limits:
                cpu: 4
                ephemeral-storage: 10Gi
                memory: 10Gi
              requests:
                cpu: 1
                ephemeral-storage: 5Gi
                memory: 5Gi
            env:
            - name: ACCEPT_LICENSE
              value: "true"
            - name: CATALOG_PATH
              value: "var/catalog.json"
            # MODELS is a comma separated list of Model IDs
            - name: MODELS
              value: "en-US_Multimedia,en-US_Telephony"
            - name: DEFAULT_MODEL
              value: "en-US_Multimedia"
            - name: RESOURCES_CPU
              valueFrom:
                resourceFieldRef:
                  containerName: runtime
                  resource: requests.cpu
            - name: RESOURCES_MEMORY
              valueFrom:
                resourceFieldRef:
                  containerName: runtime
                  resource: requests.memory
            ports:
            - containerPort: 1080
            startupProbe:
              tcpSocket:
                port: 1080
              failureThreshold: 30
              periodSeconds: 10
            livenessProbe:
              tcpSocket:
                port: 1080
              periodSeconds: 10
            readinessProbe:
              httpGet:
                path: /v1/miniHealthCheck
                port: 1080
              periodSeconds: 10
            volumeMounts:
            - name: chuck-var
              mountPath: /opt/ibm/chuck.x86_64/var
            - name: chuck-logs
              mountPath: /opt/ibm/chuck.x86_64/logs
            - name: tmp
              mountPath: /tmp
            lifecycle:
              preStop:
                exec:
                  command:
                  - /bin/sleep
                  - "15"
    
          volumes:
            - name: chuck-var
              emptyDir: {}
            - name: chuck-cache
              emptyDir: {}
            - name: chuck-logs
              emptyDir: {}
            - name: models
              emptyDir: {}
            - name: tmp
              emptyDir: {}
    
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: "kubernetes.io/arch"
                    operator: In
                    values:
                      - "amd64"
    
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: ibm-watson-stt-embed
    spec:
      type: ClusterIP
      selector:
        app.kubernetes.io/name: "ibm-watson-stt-embed"
        app.kubernetes.io/component: "runtime"
        app.kubernetes.io/instance: "example"
      ports:
        - name: runtime
          protocol: TCP
          port: 1080
          targetPort: 1080
    
  3. Run on Kubernetes

    Assuming the YAML above is copied into a file called watson-stt-manifests.yaml, create the service with:

    kubectl apply -f watson-stt-manifests.yaml
    

    Watch the pods progress to the running state. It may take a few minutes to pull the container images and for the pods to become ready.

    kubectl get pods --watch
    
  4. Use the service

    The runtime container and service listen on port 1080. Set up a port-forward to the service with:

    kubectl port-forward svc/ibm-watson-stt-embed 1080
    

    In another terminal, download an example audio file:

    curl -sLo example.flac https://github.com/watson-developer-cloud/doc-tutorial-downloads/raw/master/speech-to-text/0001.flac
    

    Send it through the /recognize endpoint:

    curl "http://localhost:1080/speech-to-text/api/v1/recognize" \
      --header "Content-Type: audio/flac" \
      --data-binary @example.flac