Creating your own models
Certain algorithms in Watson Natural Language Processing can be trained with your own data, for example you can create custom models based on your own data for entity extraction and to classify data.
Starting with Runtime 23.1 you can use the new built-in transformer-based IBM foundation model called Slate to create your own models. The Slate model has been trained on a very large data set that was preprocessed to filter hate, bias, and profanity.
To create your own classification or entity extraction model, you can fine-tune the Slate model on your own data. To train the model in reasonable time, it's recommended to use GPU-based environments.
- Detecting entities with a custom dictionary
- Detecting entities with regular expressions
- Detecting entities with a custom transformer model
- Classifying text with a custom classification model
- Extracting sentiment with a custom transformer model
- Extracting targets sentiment with a custom transformer model
Language support for custom models
You can create custom models and use the following pretrained dictionary and classification models for the shown languages. For a list of the language codes and the corresponding languages, see Language codes.
Custom model | Supported language codes |
---|---|
Dictionary models | af, ar, bs, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw (all languages supported in the Syntax part of speech tagging) |
Regexes | af, ar, bs, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw (all languages supported in the Syntax part of speech tagging) |
SVM classification with TFIDF | af, ar, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw |
SVM classification with USE | ar, de, en, es, fr, it, ja, ko, nl, pl, pt, ru, tr, zh_cn, zh_tw |
CNN classification with GloVe | ar, de, en, es, fr, it, ja, ko, nl, pt, zh_cn |
BERT Multilingual classification | af, ar, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw |
Transformer model | af, ar, bs, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw |
Stopword lists | ar, de, en, es, fr, it, ja, ko |
Saving and loading custom models
If you want to use your custom model in another notebook, save it as a Data Asset to your project. This way, you can export the model as part of a project export.
Use the ibm-watson-studio-lib
library to save and load custom models. For details on the available ibm-watson-studio-lib
functions, see ibm-watson-studio-lib
for Python.
To save a custom model in your notebook as a data asset to export and use in another project:
-
Run the
train()
method to create a custom dictionary, regular expression, or classification model and assign this custom model to a variable. For example:custom_block = CNN.train(train_stream, embedding_model.embedding, verbose=2)
-
If you want to save a custom dictionary or regular expression model, convert it to a RBRGeneric block. Converting a custom dictionary or regular expression model to a RBRGeneric block is useful if you want to load and execute the model using the API for Watson Natural Language Processing for Embed. To date, Watson Natural Language Processing for Embed supports running dictionary and regular expression models only as RBRGeneric blocks. To convert a model to a RBRGeneric block, run the following commands:
# Create the custom regular expression model custom_regex_block = watson_nlp.resources.feature_extractor.RBR.train(module_folder, language='en', regexes=regexes) # Save the model to the local file system custom_regex_model_path = 'some/path' custom_regex_block.save(custom_regex_model_path) # The model was saved in a file "executor.zip" in the provided path, in this case "some/path/executor.zip" model_path = os.path.join(custom_regex_model_path, 'executor.zip') # Re-load the model as a RBRGeneric block custom_block = watson_nlp.blocks.rules.RBRGeneric(watson_nlp.toolkit.rule_utils.RBRExecutor.load(model_path), language='en')
-
Save the model as a Data Asset to your project using
ibm-watson-studio-lib
:from ibm_watson_studio_lib import access_project_or_space wslib = access_project_or_space() wslib.save_data('<model name>', data=custom_model.as_bytes(), overwrite=True)
When saving transformer models, you have the option to save the model in CPU format. If you plan to use the model only in CPU environments, using this format will make your custom model run more efficiently. To do that, set the CPU format option as follows:
wslib.save_data('<model name>', data=custom_model.as_bytes(cpu_format=True), overwrite=True)
To load a custom model to a notebook that was imported from another project:
- Load the model using
ibm-watson-studio-lib
andwatson-nlp
:custom_model = watson_nlp.load(wslib.load_data('<model name>'))
Parent topic: Watson Natural Language Processing library