Run with Kubernetes

Running a simple deployment on Kubernetes offers an easy way to deploy a horizontally-scalable, static set of NLP models alongside your existing kubernetes workloads. The entire definition for the Watson NLP Runtime and a set of models to serve fits inside a single kubernetes deployment resource, and no other external dependencies are required. Through the use of initContainers, pretrained model images will extract model content into a volume shared with the runtime container without requiring external storage.

simple kubernetes deployment

Deploying a static set of models to Kubernetes

  1. Access the container images from your cluster:

    To allow your Kubernetes cluster to access the container images, use the methods from the Kubernetes documentation to store your credentials as a Kubernetes Secret. For example, use the following command to create a Secret named ibm-entitlement-key.

    kubectl create secret docker-registry ibm-entitlement-key --docker-server=cp.icr.io --docker-username=<your-name> --docker-password=<your-password> --docker-email=<your-email>
    

    where:

    • your-registry-server is cp.icr.io
    • your-name is your IBM Entitled Registry username
    • your-password is your IBM Entitled Registry password
    • your-email is your IBM Entitled Registry email address
  2. Deploy in Kubernetes:

    To run the service in a Kubernetes cluster, ensure that you have the Kubernetes CLI (kubectl) installed on your local machine, and that you have logged into the cluster. Further, ensure that the Docker image you created above is in a container registry that is accessible from your Kubernetes cluster.

    Below is an example of a YAML file to use to deploy on your cluster:

    apiVersion: apps/v1 
    kind: Deployment 
    metadata: 
      name: watson-nlp-container 
    spec: 
      selector: 
        matchLabels: 
          app: watson-nlp-container 
      replicas: 1 
      template: 
        metadata: 
          labels: 
            app: watson-nlp-container 
        spec: 
          initContainers:
          - name: english-syntax-model
            image: cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_en_stock:1.4.1
            volumeMounts:
            - name: model-directory
              mountPath: "/app/models"
            env:
            - name: ACCEPT_LICENSE
              value: 'true'
          - name: english-tone-model
            image: cp.icr.io/cp/ai/watson-nlp_classification_ensemble-workflow_lang_en_tone-stock:1.4.1
            volumeMounts:
            - name: model-directory
              mountPath: "/app/models"
            env:
            - name: ACCEPT_LICENSE
              value: 'true'
          containers: 
          - name: watson-nlp-container 
            image: cp.icr.io/cp/ai/watson-nlp-runtime:1.1.36
            env:
            - name: ACCEPT_LICENSE
              value: "true"
            - name: LOCAL_MODELS_DIR
              value: "/app/models"
            resources: 
              requests: 
                memory: "4Gi" 
                cpu: "1000m" 
              limits: 
                memory: "8Gi" 
                cpu: "2000m"
            ports: 
            - containerPort: 8085 
            - containerPort: 8080 
            volumeMounts:
            - name: model-directory
              mountPath: "/app/models"
          imagePullSecrets:
          - name: ibm-entitlement-key
          volumes:
          - name: model-directory
            emptyDir: {}
    --- 
    apiVersion: v1 
    kind: Service 
    metadata: 
      name: watson-nlp-container 
    spec: 
      type: ClusterIP 
      selector: 
        app: watson-nlp-container 
      ports: 
      - port: 8085 
        protocol: TCP 
        targetPort: 8085
      - port: 8080 
        protocol: TCP 
        targetPort: 8080 
    
  3. Run on Kubernetes

    Run the below commands

    kubectl apply -f Runtime/deployment/deployment.yaml 
    

    Check that the pod and service are running.

    kubectl get pods
    
    kubectl get svc
    

Validating the runtime server

  1. Examine the boot log.

    • Look for a Loading model . . . message for each of the models you wish to serve with the Runtime server, as in the following example:
    [STARTING RUNTIME]
    .
    .
    .
    {"channel": "MODEL-LOADER", "exception": null, "level": "info", "log_code": "<COM89711114I>", "message": "Loading model 'syntax_izumo_lang_en_stock'", "num_indent": 0, "thread_id": 140580835800896, "timestamp": "2023-04-28T15:38:45.047317"}
    .
    .
    .
    
    • Look for a Caikit Runtime is serving on port: . . . message at the end of the log file, as in the following example:
    .
    .
    .
    {"channel": "COMMON-SERVR", "exception": null, "level": "info", "log_code": "<COM10001001I>", "message": "Caikit Runtime is serving on port: 8085 with thread pool size: 5", "num_indent": 0, "thread_id": 140580835800896, "timestamp": "2023-04-28T15:38:51.622875"}
    [STARTING GATEWAY]
    2023/04/28 15:38:51 Running with INSECURE credentials
    2023/04/28 15:38:51 Serving proxy calls INSECURE
    
  2. Make a request to the running container:

    curl -s \
      "http://localhost:8080/v1/watson.runtime.nlp.v1/NlpService/SyntaxPredict" \
      -H "accept: application/json" \
      -H "content-type: application/json" \
      -H "grpc-metadata-mm-model-id: syntax_izumo_lang_en_stock" \
      -d '{ "raw_document": { "text": "This is a test sentence" }, "parsers": ["token"] }'
    

    The response is:

    {"text":"This is a test sentence",
      "producerId":{"name":"Izumo Text Processing","version":"0.0.1"},
        "tokens":[
            {"span":{"begin":0,"end":4,"text":"This"},"lemma":"this","partOfSpeech":"POS_PRON","dependency":{"relation":"DEP_NSUBJ","identifier":1,"head":2},"features":[]},
            {"span":{"begin":5,"end":7,"text":"is"},"lemma":"be","partOfSpeech":"POS_AUX","dependency":{"relation":"DEP_COP","identifier":3,"head":2},"features":[]},
            {"span":{"begin":8,"end":9,"text":"a"},"lemma":"a","partOfSpeech":"POS_DET","dependency":{"relation":"DEP_DET","identifier":4,"head":2},"features":[]},
            {"span":{"begin":10,"end":14,"text":"test"},"lemma":"test","partOfSpeech":"POS_NOUN","dependency":{"relation":"DEP_COMPOUND","identifier":5,"head":2},"features":[]},
            {"span":{"begin":15,"end":23,"text":"sentence"},"lemma":"sentence","partOfSpeech":"POS_NOUN","dependency":{"relation":"DEP_ROOT","identifier":2,"head":0},"features":[]}],
        "sentences":[{"span":{"begin":0,"end":23,"text":"This is a test sentence"}}],
        "paragraphs":[{"span":{"begin":0,"end":23,"text":"This is a test sentence"}}]}
    

To see a tutorial that takes you through the steps to build a standalone container image to serve Watson NLP models and run it on a Kubernetes or OpenShift cluster, check out Serve Models on Kubernetes or OpenShift using Standalone Containers on GitHub.

Once you have your runtime server working, see Accessing client libraries and tools to continue.