Detecting entities with a custom transformer model

If you don't have a fixed set of terms or you cannot express entities that you like to detect as regular expressions, you can build a custom transformer model. The model is based on the pretrained Slate IBM Foundation model.

When you use the pretrained model, you can build multi-lingual models. You don't have to have separate models for each language.

You need sufficient training data to achieve high quality (2000 – 5000 per entity type). If you have GPUs available, use them for training.

Note:

Training transformer models is CPU and memory intensive. The predefined environments are not large enough to complete the training. Create a custom notebook environment with a larger amount of CPU and memory, and use that to run your notebook. If you have GPUs available, it's highly recommended to use them. See Creating your own environment template.

Input data format

The training data is represented as an array with multiple JSON objects. Each JSON object represents one training instance, and must have a text and a mentions field. The text field represents the training sentence text, and mentions is an array of JSON objects with the text, type, and location of each mention:

[
  {
    "text": str,
    "mentions": [{
      "location": {
        "begin": int,
        "end": int
      },
      "text": str,
      "type": str
    },...]
  },...
]

Example:

[
    {
        "id": 38863234,
        "text": "I'm moving to Colorado in a couple months.",
        "mentions": [{
            "text": "Colorado",
            "type": "Location",
            "location": {
                "begin": 14,
                 "end": 22
                 }
            },
            {
                "text": "couple months",
                "type": "Duration",
                "location": {
                    "begin": 28,
                     "end": 41
                     }
            }]
    }
]

Training your model

The transformer algorithm is using the pretrained Slate model. The pretrained Slate model is only available in Runtime 23.1.

To get the options available for configuring Transformer training, enter:

help(watson_nlp.workflows.entity_mentions.transformer.Transformer.train)

Sample code

import watson_nlp
from watson_nlp.toolkit.entity_mentions_utils.train_util import prepare_stream_of_train_records_from_JSON_collection

# load the syntax models for all languages to be supported
syntax_model = watson_nlp.load('syntax_izumo_en_stock')
syntax_models = [syntax_model]

# load the pretrained Slate model
pretrained_model_resource = watson_nlp.load('pretrained-model_slate.153m.distilled_many_transformer_multilingual_uncased')

# prepare the train and dev data
# entity_train_data is a directory with one or more json files in the input format specified above
train_data_stream = prepare_stream_of_train_records_from_JSON_collection('entity_train_data')
dev_data_stream = prepare_stream_of_train_records_from_JSON_collection('entity_train_data')

# train a transformer workflow model
trained_workflow = watson_nlp.workflows.entity_mentions.transformer.Transformer.train(
    train_data_stream=train_data_stream,
    dev_data_stream=dev_data_stream,
    syntax_models=syntax_models,
    template_resource=pretrained_model_resource,
    num_train_epochs=3,
)

Applying the model on new data

Apply the trained transformer workflow model on new data by using the run() method, as you would use on any of the existing pre-trained blocks.

Code sample

trained_workflow.run('Bruce is at Times Square')

Storing and loading the model

The custom transformer model can be stored as any other model as described in Saving and loading custom models, using ibm_watson_studio_lib.

To load the custom transformer model, extra steps are required:

  1. Ensure that you have an access token on the Access control page on the Manage tab of your project. Only project admins can create access tokens. The access token can have Viewer or Editor access permissions. Only editors can inject the token into a notebook.

  2. Add the project token to the notebook by clicking More > Insert project token from the notebook action bar and then run the cell.

    By running the inserted hidden code cell, a wslib object is created that you can use for functions in the ibm-watson-studio-lib library. For information on the available ibm-watson-studio-lib functions, see Using ibm-watson-studio-lib for Python.

  3. Download and extract the model to your local runtime environment:

    import zipfile
    model_zip = 'trained_workflow_file'
    model_folder = 'trained_workflow_folder'
    wslib.download_file('trained_workflow', file_name=model_zip)
    
    with zipfile.ZipFile(model_zip, 'r') as zip_ref:
      zip_ref.extractall(model_folder)
    
  4. Load the model from the extracted folder:

    trained_workflow = watson_nlp.load(model_folder)
    

Parent topic: Creating your own models