Classification

At a glance

The classification task encapsulates algorithms for document classification: classifying the input text into one or more of a pre-determined set of labels.

The task offers implementations of strong classification algorithms from three different families: classic ML, deep-learning and transformers. It supports multi-label and multi-class problems and its special cases: single-label, multi-class tasks and, respectively, binary classification tasks.

Class definitions
`watson_nlp.blocks.classification.bert.BERT`
`watson_nlp.blocks.classification.transformer.Transformer`
`watson_nlp.workflows.classification.ensemble.Ensemble`

BERT is a transformer-based architecture, built for multi-class and multi-label text classification on short texts. Utilizes Multilingual BERT pretrained models.

Transformer is a transformer-based architecture, built for multi-class and multi-label text classification on short texts. Utilizes BERT and RoBERTa pretrained models.

Ensemble is a weighted ensemble of SVM and CNN algorithms; it computes the weighted mean of a set of classification predictions using confidence scores. SVM is a support vector machine classifier, which may be trained using any type of input embedding / vectorization task's predictions as feature vectors, e.g., USE embeddings or TF-IDF vectorizers. SVM supports multi-class and multi-label text classification and produces confidence scores via Platt Scaling. CNN is a simple convolutional network architecture, built for multi-class and multi-label text classification on short texts. CNN utilizes GloVe embeddings.

Pretrained models

Model names are listed below. For language support, see Supported languages.

Model ID	Container Image
ensemble-workflow
classification_ensemble-workflow_lang_en_tone-stock	cp.icr.io/cp/ai/watson-nlp_classification_ensemble-workflow_lang_en_tone-stock:1.4.1
classification_ensemble-workflow_lang_fr_tone-stock	cp.icr.io/cp/ai/watson-nlp_classification_ensemble-workflow_lang_fr_tone-stock:1.4.1
transformer
classification_transformer_lang_multilingual_slate.153m.distilled.tone	cp.icr.io/cp/ai/watson-nlp_classification_transformer_lang_multilingual_slate.153m.distilled.tone:1.4.1
classification_transformer_lang_multilingual_slate.153m.distilled.tone-cpu	cp.icr.io/cp/ai/watson-nlp_classification_transformer_lang_multilingual_slate.153m.distilled.tone-cpu:1.4.1

The models have been tested on data from news reports and general web pages.

Running models

The Classification model request accepts the following fields:

Field	Type	Required Optional Repeated	Description
`raw_document`	`watson_core_data_model.nlp.RawDocument`	required	The input document on which to perform Classification predictions

Example requests

REST API

curl -s \
  "http://localhost:8080/v1/watson.runtime.nlp.v1/NlpService/ClassificationPredict" \
  -H "accept: application/json" \
  -H "content-type: application/json" \
  -H "Grpc-Metadata-mm-model-id: classification_ensemble-workflow_lang_en_tone-stock" \
  -d '{ "raw_document": { "text": "I hate school. School is bad." } }'

Response

{"classes":[
  {"className":"frustrated", "confidence":0.74309075},
  {"className":"sad", "confidence":0.20021306},
  {"className":"impolite", "confidence":0.07343281},
  {"className":"excited", "confidence":0.029446114},
  {"className":"sympathetic", "confidence":0.02796789},
  {"className":"polite", "confidence":0.016257437},
  {"className":"satisfied", "confidence":0.01131451}],
 "producerId":{
  "name":"Voting based Ensemble",
  "version":"0.0.1"
  }
}

Python

import grpc

from watson_nlp_runtime_client import (
    common_service_pb2,
    common_service_pb2_grpc,
    syntax_types_pb2,
)

channel = grpc.insecure_channel("localhost:8085")

stub = common_service_pb2_grpc.NlpServiceStub(channel)

request = common_service_pb2.ClassificationRequest(
    raw_document=syntax_types_pb2.RawDocument(text="I hate school. School is bad."),
)

  response = stub.ClassificationPredict(
    request, metadata=[("mm-model-id", "classification_ensemble-workflow_lang_en_tone-stock")]
)

print(response)

Response

classes {
  class_name: "frustrated"
  confidence: 0.743090749
}
classes {
  class_name: "sad"
  confidence: 0.20021306
}
classes {
  class_name: "impolite"
  confidence: 0.0734328106
}
classes {
  class_name: "excited"
  confidence: 0.0294461139
}
classes {
  class_name: "sympathetic"
  confidence: 0.0279678907
}
classes {
  class_name: "polite"
  confidence: 0.0162574369
}
classes {
  class_name: "satisfied"
  confidence: 0.0113145104
}
producer_id {
  name: "Voting based Ensemble"
  version: "0.0.1"
}