Creating custom models

Certain algorithms in Watson Natural Language Processing Library for Embed can be trained with your own data, to create a custom model.

Training models with Watson Studio

For the current release of Watson Natural Language Processing Library for Embed, you can work with Python notebooks in Watson Studio to train Watson NLP models with your own data. Training is currently supported for:

Text classification

Classifying text with a custom classification model

IMPORTANT: The Classifying text with a custom classification model documentation includes instructions on training classification blocks as well as classification workflows. The difference is that blocks require a syntax analysis result as input, while workflows take raw text. Because the runtime API only supports raw text input, the runtime container only supports inference for workflows.

To get around this limitation, as an example in the case of a CNN block model, you can piece together a workflow using a syntax model and a CNN block model. The workflow can then run in the Watson Natural Language Processing Library for Embed runtime:
```
import watson_nlp
from watson_nlp.workflows.classification.base_classifier import GloveCNN

my_cnn_block = watson_nlp.load("/path/to/your/cnn/block")
syntax_model = watson_nlp.load('syntax_izumo_en_stock')

my_cnn_workflow = GloveCNN(syntax_model=syntax_model, cnn_model=my_cnn_model)
my_cnn_workflow.save("/some/output/path")
```

Entity detection

Sentiment extraction

IMPORTANT: Once the steps in the above links have been completed, you must also use Watson Studio to load the Dictionary and/or Regular Expression models using the RBRGeneric model. This allows Dictionary and/or Regular Expression models to use the Watson Natural Language Processing for Embed Rules API.

Complete these steps in Watson Studio:

# Create the feature extractor from a set of regular expressions
language = 'en'
custom_regex_block = watson_nlp.resources.feature_extractor.RBR.train(module_path=module_folder, language, regexes=regexes)

# Save the feature extractor
feature_extractor_model_path = '/some/path'
custom_regex_block.save(feature_extractor_model_path)

# The AQL model was saved in a file "executor.zip" in the provided path, in this case "/some/path/executor.zip"
model_path = os.path.join(feature_extractor_model_path, 'executor.zip')

# Load the feature extractor underlying AQL model as a RBRGeneric block
block_model = watson_nlp.blocks.rules.RBRGeneric(watson_nlp.toolkit.rule_utils.RBRExecutor.load(model_path), language)

# Save the block model. This model can now be loaded in Watson NLP for Embed Rules API.
block_model.save('/path/to/watson/nlp/for/embed/rule/model')

AQL rules

AQL (Annotation Query Language) is the language for expressing NLP rules in Watson Natural Language Processing.

Watson Natural Language Processing Library for Embed supports the AQL rules defined in the open source Elyra-based NLP visual editor. This can be downloaded from GitHub at https://github.com/CODAIT/nlp-editor.

IMPORTANT: As with the Feature Extractors above, you need to export the AQL model and load it as an RBRGeneric model type in Watson Studio. This allows the AQL model to use the Watson Natural Language Processing for Embed Rules API.

You can load the model in the standard format with the help of the toolkit class RBRExecutor then pass it to RBRGeneric. Note that since the AQL model in standard format does not have a config.yml file, you cannot load it using RBRGeneric directly, and you must prepare it first, as follows:

model = watson_nlp.blocks.rules.RBRGeneric(watson_nlp.toolkit.rule_utils.RBRExecutor.load(model_path), language)
response = model.run('This is a sample text')

Note that RBRGeneric block requires the language to be fixed when the model is loaded. This is the language that will be used for all invocations of the run() method on that model. In contrast, the RBRExecutor toolkit is more flexible: it allows an instance of the model to be executed on texts with different languages.

Finally, once the model has been loaded with the help of the RBRExecutor, you can save it and load it back up directly using RBRGeneric.

model = watson_nlp.blocks.rules.RBRGeneric(watson_nlp.toolkit.rule_utils.RBRExecutor.load(model_path), language)

# Path to save the model
model_path = '/some/path'

# Save the model, in a format compatible with RBRGeneric
model.save(model_path)

# Load the model directly as RBRGeneric
model = watson_nlp.load(model_path)

# Run the model
response = model.run('This is a sample text')

Deploying custom models

Once you have trained a custom model using Watson Studio, you can then package the model into a container image using the Watson model builder tool. Once the custom image is packaged, you can push it to a container registry and then serve it, like the other models.

Install the model builder tool

This Python tool takes one or more models and packages them as container images
```
pip install watson-embed-model-packager
```
Build the model container image

Run the following command to perform a setup operation for the packaging tool. The first command performs a setup step, and the second performs the build.
```
cd
python3 -m watson_embed_model_packager setup \
    --library-version watson_nlp:3.6.0 \
    --local-model-dir /models \
    --output-csv model-manifest.csv

python3 -m watson_embed_model_packager build --config model-manifest.csv
```
This will create a new Docker image for your model. You can verify that this was created with the following command:
```
docker images
```

Push the model image to the container registry

In order to use the model images on, for example, an OpenShift cluster, you need to push those images to a container registry.

Run the following to log into the registry, tag and push images.

echo $(oc [CLUSTER_NAME] -t) | docker login $REGISTRY -u $(oc CLUSTER_NAME) --password-stdin --tls-verify=false

cd /models
for m in $(ls models)
do 
  docker tag watson-nlp_${m}:latest ${REGISTRY}/${PROJECT}/watson-nlp_${m}:latest
  docker push ${REGISTRY}/${PROJECT}/watson-nlp_${m}:latest --tls-verify=false
done

You can view the images with the following command:

oc get is

Update manifest

Apply a Kubernetes manifest to deploy Watson NLP with your custom model.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: watson-nlp-runtime
spec:
  selector:
    matchLabels:
      app: watson-nlp-runtime
  replicas: 1
  strategy: 
    type: Recreate
  template:
    metadata:
      labels:
        app: watson-nlp-runtime
    spec:
      initContainers:
      - name: my-newly-trained-model
        image: image-registry.openshift-image-registry.svc:5000/$PROJECT/watson-nlp_model_[MY_CUSTOM_MODEL]
        volumeMounts:
        - name: model-directory
          mountPath: \"/app/models\"
        env:
        - name: ACCEPT_LICENSE
          value: 'true'
        resources:
          requests:
            memory: \"100Mi\"
            cpu: \"100m\"
          limits:
            memory: \"200Mi\"
            cpu: \"200m\"
      containers:
      - name: watson-nlp-runtime
        image: cp.icr.io/cp/ai/watson-nlp-runtime:1.1.36
        env:
        - name: ACCEPT_LICENSE
          value: 'true'
        - name: LOCAL_MODELS_DIR
          value: "/app/models"
        - name: LOG_LEVEL
          value: debug
        resources:
          requests:
            memory: "4Gi"
            cpu: "2"
            ephemeral-storage: "2Gi"
          limits:
            memory: "6Gi"
            cpu: "3"
            ephemeral-storage: "2Gi"
        ports:
        - containerPort: 8080
        - containerPort: 8085
        livenessProbe:
          httpGet:
            path: /swagger/
            port: 8080
          initialDelaySeconds: 150
        volumeMounts:
        - name: model-directory
          mountPath: "/app/models"
      volumes:
      - name: model-directory
        emptyDir:
          sizeLimit: 2Gi
---
apiVersion: v1
kind: Service
metadata:
  name: watson-nlp-runtime-service
spec:
  type: ClusterIP
  selector:
    app: watson-nlp-runtime
  ports:
  - port: 8080
    name: http-rest-svc
    protocol: TCP
    targetPort: 8080
  - port: 8085
    name: http-grpc-svc
    protocol: TCP
    targetPort: 8085