Important:

Important: IBM Cloud Pak® for Data Version 4.8 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.
Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.8 reaches end of support. For more information, see Upgrading from IBM Cloud Pak for Data Version 4.8 to IBM Software Hub Version 5.1.

Syntax analysis

The Watson Natural Language Processing Syntax block encapsulates syntax analysis functionality.

Block names

  • syntax_izumo_<language>_stock
  • syntax_izumo_<language>_stock-dp (Runtime 23.1 only)

Supported languages

The Syntax analysis block is available for the following languages. For a list of the language codes and the corresponding language, see Language codes.

Language codes to use for model syntax_izumo_<language>_stock: af, ar, bs, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw

Language codes to use for model syntax_izumo_<language>_stock-dp: af, ar, bs, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh

List of the supported languages for each syntax task
Task Supported language codes
Tokenization af, ar, bs, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw, zh
Part-of-speech tagging af, ar, bs, ca, cs, da, de, nl, nn, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw, zh
Lemmatization af, ar, bs, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw, zh
Sentence detection af, ar, bs, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw, zh
Paragraph detection af, ar, bs, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw, zh
Dependency parsing af, ar, bs, cs, da, de, en, es, fi, fr, hi, hr, it, ja, nb, nl, nn, pt, ro, ru, sk, sr, sv

Capabilities

Use this block to perform tasks like sentence detection, tokenization, part-of-speech tagging, lemmatization and dependency parsing in different languages. For most tasks, you will likely only need sentence detection, tokenization, and part-of-speech tagging. For these use cases use the syntax_model_xx_stock model. If you want to run dependency parsing in Runtime 23.1, use the syntax_model_xx_stock-dp model.

In Runtime 22.2, dependency parsing is included in the syntax_model_xx_stock model.

Note: Starting with the 4.8.4 release of Watson Studio, Runtime 22.2 is deprecated and will be removed in a future release. We recommend that you switch to Runtime 23.1.

The analysis for Part-of-speech (POS) tagging and dependencies follows the Universal Parts of Speech tagset (Universal POS tags) and the Universal Dependencies v2 tagset (Universal Dependency Relations).

The following table shows you the capabilities of each task based on the same example and the outcome to the parse.

Capabilities of each syntax task based on an example
Capabilities Examples Parser attributes
Tokenization "I don't like Mondays" --> "I" , "do", "n't", "like", "Mondays" token
Part-Of_Speech detection "I don't like Mondays" --> "I"\POS_PRON, "do"\POS_AUX, "n't"\POS_PART, "like"\POS_VERB, "Mondays"\POS_PROPN part_of_speech
Lemmatization "I don't like Mondays" --> "I", "do", "not", "like", "Monday" lemma
Dependency parsing "I don't like Mondays" --> "I"-SUBJECT->"like"<-OBJECT-"Mondays" dependency
Sentence detection "I don't like Mondays" --> returns this sentence sentence
Paragraph detection (Currently paragraph detection is still experimental and returns similar results to sentence detection.) "I don't like Mondays" --> returns this sentence as being a paragraph sentence

Dependencies on other blocks

None

Code sample

import watson_nlp

# Load Syntax for English
syntax_model = watson_nlp.load('syntax_izumo_en_stock')

# Detect tokens, lemma and part-of-speech
text = 'I don\'t like Mondays'
syntax_prediction = syntax_model.run(text, parsers=('token', 'lemma', 'part_of_speech'))

# Print the syntax result
print(syntax_prediction)

Output of the code sample:

{
  "text": "I don't like Mondays",
  "producer_id": {
    "name": "Izumo Text Processing",
    "version": "0.0.1"
  },
  "tokens": [
    {
      "span": {
        "begin": 0,
        "end": 1,
        "text": "I"
      },
      "lemma": "I",
      "part_of_speech": "POS_PRON"
    },
    {
      "span": {
        "begin": 2,
        "end": 4,
        "text": "do"
      },
      "lemma": "do",
      "part_of_speech": "POS_AUX"
    },
    {
      "span": {
        "begin": 4,
        "end": 7,
        "text": "n't"
      },
      "lemma": "not",
      "part_of_speech": "POS_PART"
    },
    {
      "span": {
        "begin": 8,
        "end": 12,
        "text": "like"
      },
      "lemma": "like",
      "part_of_speech": "POS_VERB"
    },
    {
      "span": {
        "begin": 13,
        "end": 20,
        "text": "Mondays"
      },
      "lemma": "Monday",
      "part_of_speech": "POS_PROPN"
    }
  ],
  "sentences": [
    {
      "span": {
        "begin": 0,
        "end": 20,
        "text": "I don't like Mondays"
      }
    }
  ],
  "paragraphs": [
    {
      "span": {
        "begin": 0,
        "end": 20,
        "text": "I don't like Mondays"
      }
    }
  ]
}

Parent topic: Watson Natural Language Processing task catalog