Extracting targets sentiment with a custom transformer model

You can train your own models for targets sentiment extraction based on the Slate IBM Foundation model. This pretrained model can be find-tuned for your use case by training it on your specific input data.

Note: Training transformer models is CPU and memory intensive. Depending on the size of your training data, the environment might not be large enough to complete the training. If you run into issues with the notebook kernel during training, create a custom notebook environment with a larger amount of CPU and memory, and use that to run your notebook. Use a GPU-based environment for training and also inference time, if it is available to you. See Creating your own environment template.

Input data format for training

You must provide a training and development data set to the training function. The development data is usually around 10% of the training data. Each training or development sample is represented as a JSON object. It must have a text and a target_mentions field. The text represents the training example text, and the target_mentions field is an array, which contains an entry for each target mention with its text, location, and sentiment.

Consider using Watson Knowledge Studio to enable your domain subject matter experts to easily annotate text and create training data.

The following is an example of an array with sample training data:

[
  {
    "text": "Those waiters stare at you your entire meal, just waiting for you to put your fork down and they snatch the plate away in a second.",
    "target_mentions": [
      {
        "text": "waiters",
        "location": {
          "begin": 6,
          "end": 13
        },
        "sentiment": "negative"
      }
    ]
  }
]

The training and development data sets are created as data streams from arrays of JSON objects. To create the data streams, you may use the utility method read_json_to_stream. It requires the syntax analysis model for the language of your input data.

Sample code:

import watson_nlp
from watson_nlp.toolkit.targeted_sentiment.training_data_reader import read_json_to_stream

training_data_file = 'train_data.json'
dev_data_file = 'dev_data.json'

# Load the syntax analysis model for the language of your input data
syntax_model = watson_nlp.load('syntax_izumo_en_stock')

# Prepare train and dev data streams
train_stream = read_json_to_stream(json_path=training_data_file, syntax_model=syntax_model)
dev_stream = read_json_to_stream(json_path=dev_data_file, syntax_model=syntax_model)

Loading the pretrained model resources

The pretrained Slate IBM Foundation model needs to be loaded before passing it to the training algorithm.

For a list of available Slate models, see this table:

List of available Slate models and their descriptions
Model Description
pretrained-model_slate.153m.distilled_many_transformer_multilingual_uncased Generic, multi-purpose model
pretrained-model_slate.125m.finance_many_transformer_en_cased Model pretrained on finance content
pretrained-model_slate.110m.cybersecurity_many_transformer_en_uncased Model pretrained on cybersecurity content
pretrained-model_slate.125m.biomedical_many_transformer_en_cased Model pretrained on biomedical content

To load the model:

# Load the pretrained Slate IBM Foundation model
pretrained_model_resource = watson_nlp.load('<pretrained Slate model>')

Training the model

For all options that are available for configuring sentiment transformer training, enter:

help(watson_nlp.blocks.targeted_sentiment.SequenceTransformerTSA.train)

The train method will create a new targets sentiment block model.

The following is a sample call that uses the input data and pretrained model from the previous section (Training the model):

# Train the model
custom_tsa_model = watson_nlp.blocks.targeted_sentiment.SequenceTransformerTSA.train(
    train_stream,
    dev_stream,
    pretrained_model_resource,
    num_train_epochs=5
)

Applying the model on new data

After you train the model on a data set, apply the model on new data by using the run() method, as you would use on any of the existing pre-trained blocks. Because the created custom model is a block model, you need to run syntax analysis on the input text and pass the results to the run() methods.

Sample code:

input_text = 'new input text'

# Run syntax analysis first
syntax_model = watson_nlp.load('syntax_izumo_en_stock')
syntax_analysis = syntax_model.run(input_text, parsers=('token',))

# Apply the new model on top of the syntax predictions
tsa_predictions = custom_tsa_model.run(syntax_analysis)

Storing and loading the model

The custom targets sentiment model can be stored as any other model as described in Saving and loading custom models, using ibm_watson_studio_lib.

To load the custom targets sentiment model, additional steps are required:

  1. Ensure that you have an access token on the Access control page on the Manage tab of your project. Only project admins can create access tokens. The access token can have Viewer or Editor access permissions. Only editors can inject the token into a notebook.

  2. Add the project token to the notebook by clicking More > Insert project token from the notebook action bar. Then run the cell.

    By running the inserted hidden code cell, a wslib object is created that you can use for functions in the ibm-watson-studio-lib library. For information on the available ibm-watson-studio-lib functions, see Using ibm-watson-studio-lib for Python.

  3. Download and extract the model to your local runtime environment:

    import zipfile
    model_zip = 'custom_TSA_model_file'
    model_folder = 'custom_TSA'
    wslib.download_file('custom_TSA_model', file_name=model_zip)
    
    with zipfile.ZipFile(model_zip, 'r') as zip_ref:
      zip_ref.extractall(model_folder)
    
  4. Load the model from the extracted folder:

    custom_TSA_model = watson_nlp.load(model_folder)
    

Parent topic: Creating your own models