--learnPhrases Parameters

Following are the --learnPhrases parameters:

  • --corpus: Path to a plaintext file containing the corpus to train on.
  • --maxPhrases: Maximum number of phrases to extract. (default: 100)
  • --delimiter: Delimiter for between words in a phrase when writing phrases during output. (default: [single space])
  • --loadModel: Path to a language model previously trained and saved using --persistModel to be used to re-output, optionally also training that model further if the --corpus option is passed as well.
  • --minPhraseFrequency: Minimum number of times a phrase should occur in the given corpus in order to be considered for final output. (default: 2)
  • --minPhraseWordFrequency: Minimum number of times each word in a phrase should occur in the given corpus in order to be considered for final output. (default: 5)
  • --minScore: Minimum score a phrase should have been assigned by the scoring algorithm in order to be output. (default: 100)
  • --pear: Path to a PEAR file to use to tokenize the corpus. The location on the machine that Ontolection Trainer is running on where PEAR files can be found. If Ontolection Trainer is running on the same system as the Watson™ Explorer Engine instance it is connected to, then the PEAR files can be found in the data/pears directory within the Watson Explorer Engine install directory.
  • --persistModel: Path to save the trained language model to.
  • --phraseLength: Maximum length of extracted phrases. (default: 3)