Run with Kubernetes and Knative Serving

This topic walks you through the steps to serve pretrained Watson NLP models using Knative Serving in a RedHat OpenShift cluster.

Create a Knative Service to run the Watson NLP Runtime. Pods of this Knative Service specify Watson NLP pretrained model images as init containers. These init containers run to completion before the main application starts in the pod. They will provision models to the emptyDir volume of the pod. When the Watson NLP Runtime container starts, it loads the models and begins serving them.

Using this approach allows for models to be kept in separate container images from the runtime container image. To change the set of served models you need only update the Knative Service Manifest.

Prerequisites

Install Docker Desktop
Ensure that you have access to an OpenShift Container Platform account with cluster administrator access. Follow the instructions below to install Knative Serving in your own cluster
- Install the OpenShift Serverless Operator
- Install Knative Serving
Install the Red Hat OpenShift CLI (oc) and log in to the OpenShift cluster
Create a Docker registry secret in the Kubernetes project that grants access to the Watson NLP Runtime and pretrained models

Step 1. Configure Knative

Configure Knative to enable init containers and empty directories.

Save the config-features configuration map in your current directory.

oc get configmap/config-features -n knative-serving -o yaml > config-feature.yaml

Modify the configuration with your favorite editor by adding the following lines in the data section; do not modify any other section or content.

apiVersion: v1
data:
  kubernetes.podspec-init-containers: enabled
  kubernetes.podspec-volumes-emptydir: enabled

Now, apply the configuration.

oc apply -f config-feature.yaml

Step 2. Deploy the model service

Create a Knative service to run the Watson NLP Runtime. When a Service is created, Knative does the following:

Creates a new immutable revision for this version of the application.
Creates a Route, Ingress, Service, and Load Balancer for your application.
Automatically scales replicas based on request load, including scaling to zero active replicas.

To create the Knative service, run the following example command:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: watson-nlp-kn
spec:
  template:
    metadata:
      annotations:
        queue.sidecar.serving.knative.dev/resourcePercentage: "10"
    spec:
      initContainers:
      - name: ensemble-workflow-lang-en-tone-stock
        image: cp.icr.io/cp/ai/watson-nlp_classification_ensemble-workflow_lang_en_tone-stock:1.4.1
        volumeMounts:
        - name: model-directory
          mountPath: "/app/models"
        env:
        - name: ACCEPT_LICENSE
          value: 'true'
        resources:
          requests:
            memory: "100Mi"
            cpu: "100m"
          limits:
            memory: "200Mi"
            cpu: "200m"
      containers:
      - name: watson-nlp-runtime
        image: cp.icr.io/cp/ai/watson-nlp-runtime:1.1.36
        env:
        - name: ACCEPT_LICENSE
          value: 'true'
        - name: LOCAL_MODELS_DIR
          value: "/app/models"
        - name: LOG_LEVEL
          value: debug
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "4Gi"
            cpu: "2"
        ports:
        - containerPort: 8080
        volumeMounts:
        - name: model-directory
          mountPath: "/app/models"
      imagePullSecrets:
      - name: watson-nlp
      volumes:
      - name: model-directory
        emptyDir: {}

Verify that the service has been created:

oc get configuration

You should see output similar to the following:

NAME     LATESTCREATED    LATESTREADY   READY   REASON
watson-nlp-kn   watson-nlp-kn-00001   watson-nlp-kn-00001   True

To check the revisions of this service:

oc get revisions

Set the URL for the service in an environment variable.

export SERVICE_URL=$(oc get ksvc watson-nlp-kn  -o jsonpath="{.status.url}")

Step 3. Test Knative autoscaling

With the parameters used when creating the service, Knative will autoscale pods based on requests including scaling to zero when there are no requests.

Run the following command to list the pods in your OpenShift Project:

oc get pods

Pods belonging to the Knative service should have the prefix watson-nlp-kn. Initially, there should be none; if you do see any, then wait for a minute or two and they will be automatically terminated.

Run the following command to trigger the Knative service to start up pods:

curl ${SERVICE_URL}

Use ctrl-c to break out of the command.

You can watch the pods being created in response to the request, and then later being terminated, using the following command:

oc get pods -w

The output will be similar to the following:

NAME                READY   STATUS     RESTARTS   AGE
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   0/2     Init:0/1   0          15s
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   0/2     PodInitializing   0          75s
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   1/2     Running           0          76s
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   2/2     Running           0          2m
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   2/2     Terminating       0          3m
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   1/2     Terminating       0          3m20s
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   1/2     Terminating       0          3m30s
watson-nlp-kn-00001-deployment-6f8b5d7494-cdvqb   0/2     Terminating       0          3m32s

Use ctrl-c to break out of the command.

Step 4. Test the service

Make an inference request on the model using the REST interface. Exceute the following command.

curl -X POST "${SERVICE_URL}/v1/watson.runtime.nlp.v1/NlpService/ClassificationPredict" -H "accept: application/json" -H "grpc-metadata-mm-model-id: classification_ensemble-workflow_lang_en_tone-stock" -H "content-type: application/json" -d "{ \"rawDocument\": { \"text\": \"Watson nlp is awesome! works in knative\" }}" | jq

You will see output similar to the following.

{
  "classes": [
    {
      "className": "satisfied",
      "confidence": 0.6308287
    },
    {
      "className": "excited",
      "confidence": 0.5176963
    },
    {
      "className": "polite",
      "confidence": 0.3245624
    },
    {
      "className": "sympathetic",
      "confidence": 0.1331128
    },
    {
      "className": "sad",
      "confidence": 0.023583649
    },
    {
      "className": "frustrated",
      "confidence": 0.0158445
    },
    {
      "className": "impolite",
      "confidence": 0.0021891927
    }
  ],
  "producerId": {
    "name": "Voting based Ensemble",
    "version": "0.0.1"
  }
}

Other Resources

To see a tutorial that takes you through the steps to deploy a Watson NLP model to the Knative Serving sandbox environment on IBM Technology Zone (TechZone), check out Watson NLP - Serve Models with Kubernetes or OpenShift.