Run with Kubernetes
Running a simple deployment on Kubernetes offers an easy way to deploy a horizontally-scalable, static set of NLP models alongside your existing kubernetes workloads. The entire definition for the Watson NLP Runtime and a set of models to serve fits inside a single kubernetes deployment resource, and no other external dependencies are required. Through the use of initContainers, pretrained model images will extract model content into a volume shared with the runtime container without requiring external storage.

Deploying a static set of models to Kubernetes
-
Access the container images from your cluster:
To allow your Kubernetes cluster to access the container images, use the methods from the Kubernetes documentation to store your credentials as a Kubernetes Secret. For example, use the following command to create a Secret named
ibm-entitlement-key.kubectl create secret docker-registry ibm-entitlement-key --docker-server=cp.icr.io --docker-username=<your-name> --docker-password=<your-password> --docker-email=<your-email>where:
your-registry-serveriscp.icr.ioyour-nameis your IBM Entitled Registry usernameyour-passwordis your IBM Entitled Registry passwordyour-emailis your IBM Entitled Registry email address
-
Deploy in Kubernetes:
To run the service in a Kubernetes cluster, ensure that you have the Kubernetes CLI (kubectl) installed on your local machine, and that you have logged into the cluster. Further, ensure that the Docker image you created above is in a container registry that is accessible from your Kubernetes cluster.
Below is an example of a YAML file to use to deploy on your cluster:
apiVersion: apps/v1 kind: Deployment metadata: name: watson-nlp-container spec: selector: matchLabels: app: watson-nlp-container replicas: 1 template: metadata: labels: app: watson-nlp-container spec: initContainers: - name: english-syntax-model image: cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_en_stock:1.4.1 volumeMounts: - name: model-directory mountPath: "/app/models" env: - name: ACCEPT_LICENSE value: 'true' - name: english-tone-model image: cp.icr.io/cp/ai/watson-nlp_classification_ensemble-workflow_lang_en_tone-stock:1.4.1 volumeMounts: - name: model-directory mountPath: "/app/models" env: - name: ACCEPT_LICENSE value: 'true' containers: - name: watson-nlp-container image: cp.icr.io/cp/ai/watson-nlp-runtime:1.1.36 env: - name: ACCEPT_LICENSE value: "true" - name: LOCAL_MODELS_DIR value: "/app/models" resources: requests: memory: "4Gi" cpu: "1000m" limits: memory: "8Gi" cpu: "2000m" ports: - containerPort: 8085 - containerPort: 8080 volumeMounts: - name: model-directory mountPath: "/app/models" imagePullSecrets: - name: ibm-entitlement-key volumes: - name: model-directory emptyDir: {} --- apiVersion: v1 kind: Service metadata: name: watson-nlp-container spec: type: ClusterIP selector: app: watson-nlp-container ports: - port: 8085 protocol: TCP targetPort: 8085 - port: 8080 protocol: TCP targetPort: 8080 -
Run on Kubernetes
Run the below commands
kubectl apply -f Runtime/deployment/deployment.yamlCheck that the pod and service are running.
kubectl get podskubectl get svc
Validating the runtime server
-
Examine the boot log.
- Look for a
Loading model . . .message for each of the models you wish to serve with the Runtime server, as in the following example:
[STARTING RUNTIME] . . . {"channel": "MODEL-LOADER", "exception": null, "level": "info", "log_code": "<COM89711114I>", "message": "Loading model 'syntax_izumo_lang_en_stock'", "num_indent": 0, "thread_id": 140580835800896, "timestamp": "2023-04-28T15:38:45.047317"} . . .- Look for a
Caikit Runtime is serving on port: . . .message at the end of the log file, as in the following example:
. . . {"channel": "COMMON-SERVR", "exception": null, "level": "info", "log_code": "<COM10001001I>", "message": "Caikit Runtime is serving on port: 8085 with thread pool size: 5", "num_indent": 0, "thread_id": 140580835800896, "timestamp": "2023-04-28T15:38:51.622875"} [STARTING GATEWAY] 2023/04/28 15:38:51 Running with INSECURE credentials 2023/04/28 15:38:51 Serving proxy calls INSECURE - Look for a
-
Make a request to the running container:
curl -s \ "http://localhost:8080/v1/watson.runtime.nlp.v1/NlpService/SyntaxPredict" \ -H "accept: application/json" \ -H "content-type: application/json" \ -H "grpc-metadata-mm-model-id: syntax_izumo_lang_en_stock" \ -d '{ "raw_document": { "text": "This is a test sentence" }, "parsers": ["token"] }'The response is:
{"text":"This is a test sentence", "producerId":{"name":"Izumo Text Processing","version":"0.0.1"}, "tokens":[ {"span":{"begin":0,"end":4,"text":"This"},"lemma":"this","partOfSpeech":"POS_PRON","dependency":{"relation":"DEP_NSUBJ","identifier":1,"head":2},"features":[]}, {"span":{"begin":5,"end":7,"text":"is"},"lemma":"be","partOfSpeech":"POS_AUX","dependency":{"relation":"DEP_COP","identifier":3,"head":2},"features":[]}, {"span":{"begin":8,"end":9,"text":"a"},"lemma":"a","partOfSpeech":"POS_DET","dependency":{"relation":"DEP_DET","identifier":4,"head":2},"features":[]}, {"span":{"begin":10,"end":14,"text":"test"},"lemma":"test","partOfSpeech":"POS_NOUN","dependency":{"relation":"DEP_COMPOUND","identifier":5,"head":2},"features":[]}, {"span":{"begin":15,"end":23,"text":"sentence"},"lemma":"sentence","partOfSpeech":"POS_NOUN","dependency":{"relation":"DEP_ROOT","identifier":2,"head":0},"features":[]}], "sentences":[{"span":{"begin":0,"end":23,"text":"This is a test sentence"}}], "paragraphs":[{"span":{"begin":0,"end":23,"text":"This is a test sentence"}}]}
To see a tutorial that takes you through the steps to build a standalone container image to serve Watson NLP models and run it on a Kubernetes or OpenShift cluster, check out Serve Models on Kubernetes or OpenShift using Standalone Containers on GitHub.
Once you have your runtime server working, see Accessing client libraries and tools to continue.