Syntax
At a glance
The Syntax
model performs fundamental NLP tasks on the input text:
- Sentence detection
- Tokenization:
can't
->ca
+n't
- Part-of-Speech tagging:
I thought
->I/PRON
,thought/VERB
- Lemmatization:
I thought
->I/I
,thought/think
- Dependency parsing:
I
->nsubj
->thought
->root
Class definition |
---|
watson_nlp.blocks.syntax.izumo.IzumoTextProcessing |
For language support, see Supported languages.
- Izumo: Provides high accuracy and throughput at moderate computational cost. The model is built using curated human knowledge (dictionaries, complementary rules) with machine learning algorithms (Logistic Regression, Conditional Random Fields). The implementation has been tested over many years in IBM products.
Pretrained models
Model names are listed below.
Model ID | Container Image |
---|---|
syntax_izumo_lang_af_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_af_stock:1.4.1 |
syntax_izumo_lang_ar_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_ar_stock:1.4.1 |
syntax_izumo_lang_bs_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_bs_stock:1.4.1 |
syntax_izumo_lang_ca_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_ca_stock:1.4.1 |
syntax_izumo_lang_cs_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_cs_stock:1.4.1 |
syntax_izumo_lang_da_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_da_stock:1.4.1 |
syntax_izumo_lang_de_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_de_stock:1.4.1 |
syntax_izumo_lang_el_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_el_stock:1.4.1 |
syntax_izumo_lang_en_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_en_stock:1.4.1 |
syntax_izumo_lang_es_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_es_stock:1.4.1 |
syntax_izumo_lang_fi_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_fi_stock:1.4.1 |
syntax_izumo_lang_fr_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_fr_stock:1.4.1 |
syntax_izumo_lang_he_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_he_stock:1.4.1 |
syntax_izumo_lang_hi_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_hi_stock:1.4.1 |
syntax_izumo_lang_hr_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_hr_stock:1.4.1 |
syntax_izumo_lang_it_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_it_stock:1.4.1 |
syntax_izumo_lang_ja_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_ja_stock:1.4.1 |
syntax_izumo_lang_ko_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_ko_stock:1.4.1 |
syntax_izumo_lang_nb_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_nb_stock:1.4.1 |
syntax_izumo_lang_nl_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_nl_stock:1.4.1 |
syntax_izumo_lang_nn_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_nn_stock:1.4.1 |
syntax_izumo_lang_pl_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_pl_stock:1.4.1 |
syntax_izumo_lang_pt_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_pt_stock:1.4.1 |
syntax_izumo_lang_ro_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_ro_stock:1.4.1 |
syntax_izumo_lang_ru_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_ru_stock:1.4.1 |
syntax_izumo_lang_sk_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_sk_stock:1.4.1 |
syntax_izumo_lang_sr_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_sr_stock:1.4.1 |
syntax_izumo_lang_sv_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_sv_stock:1.4.1 |
syntax_izumo_lang_tr_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_tr_stock:1.4.1 |
syntax_izumo_lang_zh-cn_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_zh-cn_stock:1.4.1 |
syntax_izumo_lang_zh-tw_stock | cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_zh-tw_stock:1.4.1 |
The Syntax models use pre-defined algorithms and models suitable for each language. The output schema for part of speech and dependency parsing follows the Universal Part-of-Speech v2 and Universal Dependency Relations v2 standard for all languages. This means that the Syntax models output the same set of parts of speech tags and dependency relations for all supported languages. For details about the Universal Part-of-Speech and Universal Dependency v2 standards, see Grammatical properties.
Both Izumo models and transformer-based syntax models have been trained with Universal Dependency corpora with commercial license, as well as additional data generated via a novel silver data generation process invented by IBM Research Yorktown and Tokyo labs. Models are continuously improved with feedback from users.
For Izumo models, the specific algorithms used in training the Syntax models for each language have been chosen to provide a good trade-off between accuracy and runtime performance. The algorithms differ across different language groups: as a general rule, simpler and fast algorithms are used for simpler languages such as English (Group A languages), and increasingly more sophisticated (and slower) algorithms are used for complex languages such as Arabic (Group B languages), and Chinese, Japanese and Korean (Group C languages).
The transformer-based syntax is purely based on machine learning approaches, thus the algorithm is common for all supported languages. Currently, only the English stock model is avaiable.
Running models
The Syntax model request accepts the following fields:
Field | Type | Required Optional Repeated |
Description |
---|---|---|---|
raw_document |
watson_core_data_model.nlp.RawDocument |
required | The input document on which to perform Syntax predictions |
parsers |
str |
repeated | List containing any of the following strings: token , sentence , lemma , part_of_speech , dependency |
and returns the following responses:
Token
Groups a sequence of characters into a useful semantic unit for processing.
Sentence
Identifies sentence(s) within a text.
Lemma
Returns the base, or root, form of a word.
Part of Speech
Returns a part-of-speech code:
Name | Number | Description |
---|---|---|
POS_UNSET | 0 | Default value when no POS tagging performed |
POS_ADJ | 1 | adjective |
POS_ADP | 2 | adposition |
POS_ADV | 3 | adverb |
POS_AUX | 4 | auxiliary |
POS_CCONJ | 5 | coordinating conjunction |
POS_DET | 6 | determiner |
POS_INTJ | 7 | interjection |
POS_NOUN | 8 | noun |
POS_NUM | 9 | numeral |
POS_PART | 10 | particle |
POS_PRON | 11 | pronoun |
POS_PROPN | 12 | proper noun |
POS_PUNCT | 13 | punctuation |
POS_SCONJ | 14 | subordinating conjunction |
POS_SYM | 15 | symbol |
POS_VERB | 16 | verb |
POS_X | 17 | other |
Dependency
Returns a dependency relation code:
Name | Number | Description |
---|---|---|
DEP_OTHER | 0 | other |
DEP_ACL | 1 | clausal modifier of noun (adjectival clause) |
DEP_ACL_RELCL | 38 | relative clause modifier |
DEP_ADVCL | 2 | adverbial clause modifier |
DEP_ADVMOD | 3 | adverbial modifier |
DEP_ADVMOD_EMPH | 39 | emphasizing word, intensifier |
DEP_ADVMOD_LMOD | 40 | locative adverbial modifier |
DEP_AMOD | 4 | adjectival modifier |
DEP_APPOS | 5 | appositional modifier |
DEP_AUX | 6 | auxiliary |
DEP_AUX_PASS | 41 | passive auxiliary |
DEP_CASE | 7 | case marking |
DEP_CC | 8 | coordinating conjunction |
DEP_CC PRECONJ | 4 | preconjunct |
DEP_CCOMP | 9 | clausal complement |
DEP_CLF | 10 | classifier |
DEP_COMPOUND | 11 | compound |
DEP_COMPOUND_LVC | 44 | light verb construction |
DEP_COMPOUND_PRT | 45 | phrasal verb particle |
DEP_COMPOUND_REDUP | 46 | reduplicated compounds |
DEP_COMPOUND_SVC | 47 | serial verb compounds |
DEP_CONJ | 12 | conjunct |
DEP_COP | 13 | copula |
DEP_CSUBJ | 14 | clausal subject |
DEP_CSUBJ_PASS | 43 | clausal passive subject |
DEP_DEP | 15 | unspecified dependency |
DEP_DET | 16 | determiner |
DEP_DET_NUMGOV | 48 | pronominal quantifier governing the case of the noun |
DEP_DET_NUMNOD | 49 | pronominal quantifier agreeing in case with the noun |
DEP_DET_POSS | 50 | possessive determiner |
DEP_DISCOURSE | 17 | discourse element |
DEP_DISLOCATED | 18 | dislocated elements |
DEP_EXPL | 19 | expletive |
DEP_EXPL_IMPERS | 51 | impersonal expletive |
DEP_EXPL_PASS | 52 | reflexive pronoun used in reflexive passive |
DEP_EXPL_PV | 53 | reflexive clitic with an inherently reflexive verb |
DEP_FIXED | 20 | fixed multiword expression |
DEP_FLAT | 21 | flat multiword expression |
DEP_FLAT_FOREIGN | 54 | foreign words |
DEP_FLAT_NAME | 55 | names |
DEP_GOESWITH | 22 | goes with |
DEP_IOBJ | 23 | indirect object |
DEP_LIST | 24 | list |
DEP_MARK | 25 | |
DEP_NMOD | 26 | nominal modifier |
DEP_NMOD_POSS | 56 | possessive nominal modifier |
DEP_NMOD_TMOD | 57 | temporal modifier |
DEP_NSUBJ | 27 | nominal subject |
DEP_NSUBJ_PASS | 58 | passive nominal subject |
DEP_NUMMOD | 2 | numeric modifier |
DEP_NUMMOD_GOV | 59 | numeric modifier governing the case of the noun |
DEP_OBJ | 29 | object |
DEP_OBL | 30 | oblique nominal |
DEP_OBL_AGENT | 60 | agent modifier |
DEP_OBL_ARG | 61 | oblique argument |
DEP_OBL_LMOD | 62 | locative modifier |
DEP_OBL_TMOD | 63 | temporal modifier |
DEP_ORPHAN | 31 | orphan |
DEP_PARATAXIS | 32 | parataxis |
DEP_PUNCT | 33 | punctuation |
DEP_REPARANDUM | 34 | overridden disfluency |
DEP_ROOT | 35 | root |
DEP_VOCATIVE | 36 | vocative |
DEP_XCOMP | 37 | open clausal complements |
Example requests
REST API
curl -s \
"http://localhost:8080/v1/watson.runtime.nlp.v1/NlpService/SyntaxPredict" \
-H "accept: application/json" \
-H "content-type: application/json" \
-H "Grpc-Metadata-mm-model-id: syntax_izumo_lang_en_stock" \
-d '{ "raw_document": { "text": "This is a test sentence." }, "parsers": ["token","sentence","lemma","part_of_speech","dependency"] }'
Response
{"text":"This is a test sentence.", "producerId":{"name":"Izumo Text Processing", "version":"0.0.1"},
"tokens":[
{"span":{"begin":0, "end":4, "text":"This"}, "lemma":"this", "partOfSpeech":"POS_PRON", "dependency":{"relation":"DEP_NSUBJ", "identifier":1, "head":2}, "features":[]},
{"span":{"begin":5, "end":7, "text":"is"}, "lemma":"be", "partOfSpeech":"POS_AUX", "dependency":{"relation":"DEP_COP", "identifier":3, "head":2}, "features":[]},
{"span":{"begin":8, "end":9, "text":"a"}, "lemma":"a", "partOfSpeech":"POS_DET", "dependency":{"relation":"DEP_DET", "identifier":4, "head":2}, "features":[]},
{"span":{"begin":10, "end":14, "text":"test"}, "lemma":"test", "partOfSpeech":"POS_NOUN", "dependency":{"relation":"DEP_COMPOUND", "identifier":5, "head":2}, "features":[]},
{"span":{"begin":15, "end":23, "text":"sentence"}, "lemma":"sentence", "partOfSpeech":"POS_NOUN", "dependency":{"relation":"DEP_ROOT", "identifier":2, "head":0}, "features":[]},
{"span":{"begin":23, "end":24, "text":"."}, "lemma":"", "partOfSpeech":"POS_PUNCT", "dependency":{"relation":"DEP_PUNCT", "identifier":6, "head":2}, "features":[]}
],
"sentences":[
{"span":{"begin":0, "end":24, "text":"This is a test sentence."}}
],
"paragraphs":[
{"span":{"begin":0, "end":24, "text":"This is a test sentence."}}
]
}
Python
import grpc
from watson_nlp_runtime_client import common_service_pb2, common_service_pb2_grpc
client = common_service_pb2_grpc.NlpServiceStub(grpc.insecure_channel("localhost:8085"))
response = client.SyntaxPredict(
common_service_pb2.SyntaxRequest(
raw_document={"text": "This is a test sentence."},
parsers=('token', 'sentence', 'lemma', 'part_of_speech', 'dependency')
),
metadata=[("mm-model-id", "syntax_izumo_lang_en_stock")],
)
print(response)
Response
text: "This is a test sentence."
producer_id {
name: "Izumo Text Processing"
version: "0.0.1"
}
tokens {
span {
end: 4
text: "This"
}
lemma: "this"
part_of_speech: POS_PRON
dependency {
relation: DEP_NSUBJ
identifier: 1
head: 2
}
}
tokens {
span {
begin: 5
end: 7
text: "is"
}
lemma: "be"
part_of_speech: POS_AUX
dependency {
relation: DEP_COP
identifier: 3
head: 2
}
}
tokens {
span {
begin: 8
end: 9
text: "a"
}
lemma: "a"
part_of_speech: POS_DET
dependency {
relation: DEP_DET
identifier: 4
head: 2
}
}
tokens {
span {
begin: 10
end: 14
text: "test"
}
lemma: "test"
part_of_speech: POS_NOUN
dependency {
relation: DEP_COMPOUND
identifier: 5
head: 2
}
}
tokens {
span {
begin: 15
end: 23
text: "sentence"
}
lemma: "sentence"
part_of_speech: POS_NOUN
dependency {
relation: DEP_ROOT
identifier: 2
}
}
tokens {
span {
begin: 23
end: 24
text: "."
}
part_of_speech: POS_PUNCT
dependency {
relation: DEP_PUNCT
identifier: 6
head: 2
}
}
sentences {
span {
end: 24
text: "This is a test sentence."
}
}
paragraphs {
span {
end: 24
text: "This is a test sentence."
}
}