Categories

At a glance

The categories task assigns to an input document individual nodes within a hierarchical taxonomy. For example, in the text:

IBM announces new advances in quantum computing.

examples of categories extracted are technology and computing/hardware/computer and technology and computing/operating systems, nodes respectively at level 3 and level 2 in a hierarchical taxonomy.

This model differs from the classification model in that training starts from a set of seed phrases associated with each node in the taxonomy, and does not require labeled documents.

Class definition
watson_nlp.workflows.categories.esa_hierarchical.ESAHierarchical

For language support, see Supported languages.

The motivation for the implementation is based on Hierarchical Dataless Classification by Song & Roth, AAAI 2014, see algorithm details for implementation details.

Pretrained models

Model names are listed below.

Model ID Container Image
categories_esa-workflow_lang_en_stock cp.icr.io/cp/ai/watson-nlp_categories_esa-workflow_lang_en_stock:1.4.1

The models have been tested on data from news reports and general web pages.

For details of the Categories type system, see Understanding model type systems.

Running models

The Categories model request accepts the following fields:

Field Type Required
Optional
Repeated
Description
raw_document watson_core_data_model.nlp.RawDocument required The input document on which to perform Categories predictions
explanation bool optional Boolean flag indicating whether or not explanations should be computed and returned
limit int32 optional The maximum number of predicted categories. If not specified then the limit on the number of predicted categories defaults to 3

Example requests

REST API

curl -s \
  "http://localhost:8080/v1/watson.runtime.nlp.v1/NlpService/CategoriesPredict" \
  -H "accept: application/json" \
  -H "content-type: application/json" \
  -H "Grpc-Metadata-mm-model-id: categories_esa-workflow_lang_en_stock" \
  -d '{ "raw_document": { "text": "A solicitor from Loughborough is delighted when she gets the chance to take part in the final of Bake Off. However, her chances are scuppered when she finds out her arch rival is also going to compete. Unexpectedly, the solicitor is bitten by a zombie and therefore is disqualified from competing." }, "explanation": true, "limit": 2 }'

Response

{"categories":[
  {"labels":[
    "food & drink",
    "desserts and baking"
    ],
    "score":0.74634,
    "explanation":[{"text":"bake"}
    ]
   },
  {"labels":[
    "sports",
    "field hockey"
    ],
    "score":0.599151, "explanation":[{"text":"chances"}
    ]
   }
  ],
  "producerId":{
    "name":"Categories ESA Workflow",
    "version":"1.0.0"
    }
   }

Python

import grpc

  from watson_nlp_runtime_client import (
      common_service_pb2,
      common_service_pb2_grpc,
      syntax_types_pb2,
  )

  channel = grpc.insecure_channel("localhost:8085")

  stub = common_service_pb2_grpc.NlpServiceStub(channel)

  request = common_service_pb2.CategoriesRequest(
      raw_document=syntax_types_pb2.RawDocument(text="A solicitor from Loughborough is delighted when she gets the chance to take part in the final of Bake Off. However, her chances are scuppered when she finds out her arch rival is also going to compete. Unexpectedly, the solicitor is bitten by a zombie and therefore is disqualified from competing."),
      explanation=True,
      limit=2
  )

    response = stub.CategoriesPredict(
      request, metadata=[("mm-model-id", "categories_esa-workflow_lang_en_stock")]
  )

  print(response)

Response

categories {
  labels: "food & drink"
  labels: "desserts and baking"
  score: 0.74634
  explanation {
    text: "bake"
  }
}
categories {
  labels: "sports"
  labels: "field hockey"
  score: 0.599151
  explanation {
    text: "chances"
  }
}
producer_id {
  name: "Categories ESA Workflow"
  version: "1.0.0"
}

When to use Categorization instead of Classification?

Categorization models work well in use cases where documents can be mapped to taxonomy nodes based on general knowledge topics discussed in the document, where these general topics have a good representation in Wikipedia. Categorization is not expected to work well in use cases that are very domain specific (with little or no representation in Wikipedia) or that require interpretation beyond general topics. For example, tasks such as sentiment or emotion classification are not a good match for Categorization models.