Syntax

At a glance

The Syntax model performs fundamental NLP tasks on the input text:

Sentence detection
Tokenization: can't -> ca + n't
Part-of-Speech tagging: I thought -> I/PRON, thought/VERB
Lemmatization: I thought -> I/I, thought/think
Dependency parsing: I -> nsubj -> thought -> root

Class definition
`watson_nlp.blocks.syntax.izumo.IzumoTextProcessing`

For language support, see Supported languages.

Izumo: Provides high accuracy and throughput at moderate computational cost. The model is built using curated human knowledge (dictionaries, complementary rules) with machine learning algorithms (Logistic Regression, Conditional Random Fields). The implementation has been tested over many years in IBM products.

Pretrained models

Model names are listed below.

Model ID	Container Image
syntax_izumo_lang_af_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_af_stock:1.4.1
syntax_izumo_lang_ar_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_ar_stock:1.4.1
syntax_izumo_lang_bs_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_bs_stock:1.4.1
syntax_izumo_lang_ca_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_ca_stock:1.4.1
syntax_izumo_lang_cs_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_cs_stock:1.4.1
syntax_izumo_lang_da_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_da_stock:1.4.1
syntax_izumo_lang_de_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_de_stock:1.4.1
syntax_izumo_lang_el_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_el_stock:1.4.1
syntax_izumo_lang_en_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_en_stock:1.4.1
syntax_izumo_lang_es_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_es_stock:1.4.1
syntax_izumo_lang_fi_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_fi_stock:1.4.1
syntax_izumo_lang_fr_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_fr_stock:1.4.1
syntax_izumo_lang_he_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_he_stock:1.4.1
syntax_izumo_lang_hi_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_hi_stock:1.4.1
syntax_izumo_lang_hr_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_hr_stock:1.4.1
syntax_izumo_lang_it_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_it_stock:1.4.1
syntax_izumo_lang_ja_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_ja_stock:1.4.1
syntax_izumo_lang_ko_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_ko_stock:1.4.1
syntax_izumo_lang_nb_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_nb_stock:1.4.1
syntax_izumo_lang_nl_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_nl_stock:1.4.1
syntax_izumo_lang_nn_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_nn_stock:1.4.1
syntax_izumo_lang_pl_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_pl_stock:1.4.1
syntax_izumo_lang_pt_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_pt_stock:1.4.1
syntax_izumo_lang_ro_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_ro_stock:1.4.1
syntax_izumo_lang_ru_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_ru_stock:1.4.1
syntax_izumo_lang_sk_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_sk_stock:1.4.1
syntax_izumo_lang_sr_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_sr_stock:1.4.1
syntax_izumo_lang_sv_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_sv_stock:1.4.1
syntax_izumo_lang_tr_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_tr_stock:1.4.1
syntax_izumo_lang_zh-cn_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_zh-cn_stock:1.4.1
syntax_izumo_lang_zh-tw_stock	cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_zh-tw_stock:1.4.1

The Syntax models use pre-defined algorithms and models suitable for each language. The output schema for part of speech and dependency parsing follows the Universal Part-of-Speech v2 and Universal Dependency Relations v2 standard for all languages. This means that the Syntax models output the same set of parts of speech tags and dependency relations for all supported languages. For details about the Universal Part-of-Speech and Universal Dependency v2 standards, see Grammatical properties.

Both Izumo models and transformer-based syntax models have been trained with Universal Dependency corpora with commercial license, as well as additional data generated via a novel silver data generation process invented by IBM Research Yorktown and Tokyo labs. Models are continuously improved with feedback from users.

For Izumo models, the specific algorithms used in training the Syntax models for each language have been chosen to provide a good trade-off between accuracy and runtime performance. The algorithms differ across different language groups: as a general rule, simpler and fast algorithms are used for simpler languages such as English (Group A languages), and increasingly more sophisticated (and slower) algorithms are used for complex languages such as Arabic (Group B languages), and Chinese, Japanese and Korean (Group C languages).

The transformer-based syntax is purely based on machine learning approaches, thus the algorithm is common for all supported languages. Currently, only the English stock model is avaiable.

Running models

The Syntax model request accepts the following fields:

Field	Type	Required Optional Repeated	Description
`raw_document`	`watson_core_data_model.nlp.RawDocument`	required	The input document on which to perform Syntax predictions
`parsers`	`str`	repeated	List containing any of the following strings: `token`, `sentence`, `lemma`, `part_of_speech`, `dependency`

and returns the following responses:

Token

Groups a sequence of characters into a useful semantic unit for processing.

Sentence

Identifies sentence(s) within a text.

Lemma

Returns the base, or root, form of a word.

Part of Speech

Returns a part-of-speech code:

Name	Number	Description
POS_UNSET	0	Default value when no POS tagging performed
POS_ADJ	1	adjective
POS_ADP	2	adposition
POS_ADV	3	adverb
POS_AUX	4	auxiliary
POS_CCONJ	5	coordinating conjunction
POS_DET	6	determiner
POS_INTJ	7	interjection
POS_NOUN	8	noun
POS_NUM	9	numeral
POS_PART	10	particle
POS_PRON	11	pronoun
POS_PROPN	12	proper noun
POS_PUNCT	13	punctuation
POS_SCONJ	14	subordinating conjunction
POS_SYM	15	symbol
POS_VERB	16	verb
POS_X	17	other

Dependency

Returns a dependency relation code:

Name	Number	Description
DEP_OTHER	0	other
DEP_ACL	1	clausal modifier of noun (adjectival clause)
DEP_ACL_RELCL	38	relative clause modifier
DEP_ADVCL	2	adverbial clause modifier
DEP_ADVMOD	3	adverbial modifier
DEP_ADVMOD_EMPH	39	emphasizing word, intensifier
DEP_ADVMOD_LMOD	40	locative adverbial modifier
DEP_AMOD	4	adjectival modifier
DEP_APPOS	5	appositional modifier
DEP_AUX	6	auxiliary
DEP_AUX_PASS	41	passive auxiliary
DEP_CASE	7	case marking
DEP_CC	8	coordinating conjunction
DEP_CC PRECONJ	4	preconjunct
DEP_CCOMP	9	clausal complement
DEP_CLF	10	classifier
DEP_COMPOUND	11	compound
DEP_COMPOUND_LVC	44	light verb construction
DEP_COMPOUND_PRT	45	phrasal verb particle
DEP_COMPOUND_REDUP	46	reduplicated compounds
DEP_COMPOUND_SVC	47	serial verb compounds
DEP_CONJ	12	conjunct
DEP_COP	13	copula
DEP_CSUBJ	14	clausal subject
DEP_CSUBJ_PASS	43	clausal passive subject
DEP_DEP	15	unspecified dependency
DEP_DET	16	determiner
DEP_DET_NUMGOV	48	pronominal quantifier governing the case of the noun
DEP_DET_NUMNOD	49	pronominal quantifier agreeing in case with the noun
DEP_DET_POSS	50	possessive determiner
DEP_DISCOURSE	17	discourse element
DEP_DISLOCATED	18	dislocated elements
DEP_EXPL	19	expletive
DEP_EXPL_IMPERS	51	impersonal expletive
DEP_EXPL_PASS	52	reflexive pronoun used in reflexive passive
DEP_EXPL_PV	53	reflexive clitic with an inherently reflexive verb
DEP_FIXED	20	fixed multiword expression
DEP_FLAT	21	flat multiword expression
DEP_FLAT_FOREIGN	54	foreign words
DEP_FLAT_NAME	55	names
DEP_GOESWITH	22	goes with
DEP_IOBJ	23	indirect object
DEP_LIST	24	list
DEP_MARK	25
DEP_NMOD	26	nominal modifier
DEP_NMOD_POSS	56	possessive nominal modifier
DEP_NMOD_TMOD	57	temporal modifier
DEP_NSUBJ	27	nominal subject
DEP_NSUBJ_PASS	58	passive nominal subject
DEP_NUMMOD	2	numeric modifier
DEP_NUMMOD_GOV	59	numeric modifier governing the case of the noun
DEP_OBJ	29	object
DEP_OBL	30	oblique nominal
DEP_OBL_AGENT	60	agent modifier
DEP_OBL_ARG	61	oblique argument
DEP_OBL_LMOD	62	locative modifier
DEP_OBL_TMOD	63	temporal modifier
DEP_ORPHAN	31	orphan
DEP_PARATAXIS	32	parataxis
DEP_PUNCT	33	punctuation
DEP_REPARANDUM	34	overridden disfluency
DEP_ROOT	35	root
DEP_VOCATIVE	36	vocative
DEP_XCOMP	37	open clausal complements

Example requests

REST API

curl -s \
  "http://localhost:8080/v1/watson.runtime.nlp.v1/NlpService/SyntaxPredict" \
  -H "accept: application/json" \
  -H "content-type: application/json" \
  -H "Grpc-Metadata-mm-model-id: syntax_izumo_lang_en_stock" \
  -d '{ "raw_document": { "text": "This is a test sentence." }, "parsers": ["token","sentence","lemma","part_of_speech","dependency"] }'

Response

{"text":"This is a test sentence.", "producerId":{"name":"Izumo Text Processing", "version":"0.0.1"},
 "tokens":[
  {"span":{"begin":0, "end":4, "text":"This"}, "lemma":"this", "partOfSpeech":"POS_PRON", "dependency":{"relation":"DEP_NSUBJ", "identifier":1, "head":2}, "features":[]},
  {"span":{"begin":5, "end":7, "text":"is"}, "lemma":"be", "partOfSpeech":"POS_AUX", "dependency":{"relation":"DEP_COP", "identifier":3, "head":2}, "features":[]},
  {"span":{"begin":8, "end":9, "text":"a"}, "lemma":"a", "partOfSpeech":"POS_DET", "dependency":{"relation":"DEP_DET", "identifier":4, "head":2}, "features":[]},
  {"span":{"begin":10, "end":14, "text":"test"}, "lemma":"test", "partOfSpeech":"POS_NOUN", "dependency":{"relation":"DEP_COMPOUND", "identifier":5, "head":2}, "features":[]},
  {"span":{"begin":15, "end":23, "text":"sentence"}, "lemma":"sentence", "partOfSpeech":"POS_NOUN", "dependency":{"relation":"DEP_ROOT", "identifier":2, "head":0}, "features":[]},
  {"span":{"begin":23, "end":24, "text":"."}, "lemma":"", "partOfSpeech":"POS_PUNCT", "dependency":{"relation":"DEP_PUNCT", "identifier":6, "head":2}, "features":[]}
  ],
 "sentences":[
  {"span":{"begin":0, "end":24, "text":"This is a test sentence."}}
  ],
 "paragraphs":[
  {"span":{"begin":0, "end":24, "text":"This is a test sentence."}}
  ]
}

Python

import grpc

from watson_nlp_runtime_client import common_service_pb2, common_service_pb2_grpc

client = common_service_pb2_grpc.NlpServiceStub(grpc.insecure_channel("localhost:8085"))

response = client.SyntaxPredict(
  common_service_pb2.SyntaxRequest(
    raw_document={"text": "This is a test sentence."},
    parsers=('token', 'sentence', 'lemma', 'part_of_speech', 'dependency')
  ), 
  metadata=[("mm-model-id", "syntax_izumo_lang_en_stock")],
)

print(response)

Response

text: "This is a test sentence."
producer_id {
  name: "Izumo Text Processing"
  version: "0.0.1"
}
tokens {
  span {
    end: 4
    text: "This"
  }
  lemma: "this"
  part_of_speech: POS_PRON
  dependency {
    relation: DEP_NSUBJ
    identifier: 1
    head: 2
  }
}
tokens {
  span {
    begin: 5
    end: 7
    text: "is"
  }
  lemma: "be"
  part_of_speech: POS_AUX
  dependency {
    relation: DEP_COP
    identifier: 3
    head: 2
  }
}
tokens {
  span {
    begin: 8
    end: 9
    text: "a"
  }
  lemma: "a"
  part_of_speech: POS_DET
  dependency {
    relation: DEP_DET
    identifier: 4
    head: 2
  }
}
tokens {
  span {
    begin: 10
    end: 14
    text: "test"
  }
  lemma: "test"
  part_of_speech: POS_NOUN
  dependency {
    relation: DEP_COMPOUND
    identifier: 5
    head: 2
  }
}
tokens {
  span {
    begin: 15
    end: 23
    text: "sentence"
  }
  lemma: "sentence"
  part_of_speech: POS_NOUN
  dependency {
    relation: DEP_ROOT
    identifier: 2
  }
}
tokens {
  span {
    begin: 23
    end: 24
    text: "."
  }
  part_of_speech: POS_PUNCT
  dependency {
    relation: DEP_PUNCT
    identifier: 6
    head: 2
  }
}
sentences {
  span {
    end: 24
    text: "This is a test sentence."
  }
}
paragraphs {
  span {
    end: 24
    text: "This is a test sentence."
  }
}