Run with Kubernetes

Running a simple deployment on Kubernetes offers an easy way to deploy a horizontally-scalable, static set of NLP models alongside your existing kubernetes workloads. The entire definition for the Watson NLP Runtime and a set of models to serve fits inside a single kubernetes deployment resource, and no other external dependencies are required. Through the use of initContainers, pretrained model images will extract model content into a volume shared with the runtime container without requiring external storage.

simple kubernetes deployment

Deploying a static set of models to Kubernetes

Access the container images from your cluster:

To allow your Kubernetes cluster to access the container images, use the methods from the Kubernetes documentation to store your credentials as a Kubernetes Secret. For example, use the following command to create a Secret named ibm-entitlement-key.
```
kubectl create secret docker-registry ibm-entitlement-key --docker-server=cp.icr.io --docker-username=<your-name> --docker-password=<your-password> --docker-email=<your-email>
```
where:
- your-registry-server is cp.icr.io
- your-name is your IBM Entitled Registry username
- your-password is your IBM Entitled Registry password
- your-email is your IBM Entitled Registry email address

Deploy in Kubernetes:

To run the service in a Kubernetes cluster, ensure that you have the Kubernetes CLI (kubectl) installed on your local machine, and that you have logged into the cluster. Further, ensure that the Docker image you created above is in a container registry that is accessible from your Kubernetes cluster.

Below is an example of a YAML file to use to deploy on your cluster:

apiVersion: apps/v1 
kind: Deployment 
metadata: 
  name: watson-nlp-container 
spec: 
  selector: 
    matchLabels: 
      app: watson-nlp-container 
  replicas: 1 
  template: 
    metadata: 
      labels: 
        app: watson-nlp-container 
    spec: 
      initContainers:
      - name: english-syntax-model
        image: cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_en_stock:1.4.1
        volumeMounts:
        - name: model-directory
          mountPath: "/app/models"
        env:
        - name: ACCEPT_LICENSE
          value: 'true'
      - name: english-tone-model
        image: cp.icr.io/cp/ai/watson-nlp_classification_ensemble-workflow_lang_en_tone-stock:1.4.1
        volumeMounts:
        - name: model-directory
          mountPath: "/app/models"
        env:
        - name: ACCEPT_LICENSE
          value: 'true'
      containers: 
      - name: watson-nlp-container 
        image: cp.icr.io/cp/ai/watson-nlp-runtime:1.1.36
        env:
        - name: ACCEPT_LICENSE
          value: "true"
        - name: LOCAL_MODELS_DIR
          value: "/app/models"
        resources: 
          requests: 
            memory: "4Gi" 
            cpu: "1000m" 
          limits: 
            memory: "8Gi" 
            cpu: "2000m"
        ports: 
        - containerPort: 8085 
        - containerPort: 8080 
        volumeMounts:
        - name: model-directory
          mountPath: "/app/models"
      imagePullSecrets:
      - name: ibm-entitlement-key
      volumes:
      - name: model-directory
        emptyDir: {}
--- 
apiVersion: v1 
kind: Service 
metadata: 
  name: watson-nlp-container 
spec: 
  type: ClusterIP 
  selector: 
    app: watson-nlp-container 
  ports: 
  - port: 8085 
    protocol: TCP 
    targetPort: 8085
  - port: 8080 
    protocol: TCP 
    targetPort: 8080

Run on Kubernetes

Run the below commands

kubectl apply -f Runtime/deployment/deployment.yaml

Check that the pod and service are running.

kubectl get pods

kubectl get svc

Validating the runtime server

Examine the boot log.

Look for a Loading model . . . message for each of the models you wish to serve with the Runtime server, as in the following example:

[STARTING RUNTIME]
.
.
.
{"channel": "MODEL-LOADER", "exception": null, "level": "info", "log_code": "<COM89711114I>", "message": "Loading model 'syntax_izumo_lang_en_stock'", "num_indent": 0, "thread_id": 140580835800896, "timestamp": "2023-04-28T15:38:45.047317"}
.
.
.

Look for a Caikit Runtime is serving on port: . . . message at the end of the log file, as in the following example:

.
.
.
{"channel": "COMMON-SERVR", "exception": null, "level": "info", "log_code": "<COM10001001I>", "message": "Caikit Runtime is serving on port: 8085 with thread pool size: 5", "num_indent": 0, "thread_id": 140580835800896, "timestamp": "2023-04-28T15:38:51.622875"}
[STARTING GATEWAY]
2023/04/28 15:38:51 Running with INSECURE credentials
2023/04/28 15:38:51 Serving proxy calls INSECURE

Make a request to the running container:

curl -s \
  "http://localhost:8080/v1/watson.runtime.nlp.v1/NlpService/SyntaxPredict" \
  -H "accept: application/json" \
  -H "content-type: application/json" \
  -H "grpc-metadata-mm-model-id: syntax_izumo_lang_en_stock" \
  -d '{ "raw_document": { "text": "This is a test sentence" }, "parsers": ["token"] }'

The response is:

{"text":"This is a test sentence",
  "producerId":{"name":"Izumo Text Processing","version":"0.0.1"},
    "tokens":[
        {"span":{"begin":0,"end":4,"text":"This"},"lemma":"this","partOfSpeech":"POS_PRON","dependency":{"relation":"DEP_NSUBJ","identifier":1,"head":2},"features":[]},
        {"span":{"begin":5,"end":7,"text":"is"},"lemma":"be","partOfSpeech":"POS_AUX","dependency":{"relation":"DEP_COP","identifier":3,"head":2},"features":[]},
        {"span":{"begin":8,"end":9,"text":"a"},"lemma":"a","partOfSpeech":"POS_DET","dependency":{"relation":"DEP_DET","identifier":4,"head":2},"features":[]},
        {"span":{"begin":10,"end":14,"text":"test"},"lemma":"test","partOfSpeech":"POS_NOUN","dependency":{"relation":"DEP_COMPOUND","identifier":5,"head":2},"features":[]},
        {"span":{"begin":15,"end":23,"text":"sentence"},"lemma":"sentence","partOfSpeech":"POS_NOUN","dependency":{"relation":"DEP_ROOT","identifier":2,"head":0},"features":[]}],
    "sentences":[{"span":{"begin":0,"end":23,"text":"This is a test sentence"}}],
    "paragraphs":[{"span":{"begin":0,"end":23,"text":"This is a test sentence"}}]}

To see a tutorial that takes you through the steps to build a standalone container image to serve Watson NLP models and run it on a Kubernetes or OpenShift cluster, check out Serve Models on Kubernetes or OpenShift using Standalone Containers on GitHub.

Once you have your runtime server working, see Accessing client libraries and tools to continue.