I'm trying to use the TextExtract operator from the com.ibm.streams.text toolkit, and in my (uncompiledModule) .aql file I am doing an "extract part_of_speech". When I attempt to run this in Streams Studio, I get the following error:
RuntimeException: The Standard tokenizer does not support part of speech tagging. Use the Multilingual tokenizer and part of speech tagger, or another compatible tokenizer that supports part of speech tagging instead.
This seems to be because the default tokenizer being used by Streams is the 'Standard' tokenizer. I've done some google searches, and when you are doing this in the context of pure java it looks like you could make a call to "SystemT.setTokenizerConfig(TokenizerConfig)" to set the tokenizer. It also seems like (from the google searches) the default tokenizer in BigInsights Eclipse usage is the Multilingual tokenizer.
Is there a way to change the tokenizer being used in Streams / Streams Studio to use the Multilingual tokenizer (or at least a tokenizer that can handle part_of_speech rules)?
I can't find much information in the InformationCenter on this, aside from things like the following: "Part of speech extraction works only when Text Analytics is using the Multilingual tokenizer. If the system uses the Standard tokenizer, a part_of_speech extraction generates an error." So... I understand what the problem is, but I can't (yet) see what the solution is.
Any help you can provide would be most appreciated, and thanks in advance