Removing Bad Clusters

Next, look for more words that belong on the stoplist by testing the clustering on different sets of documents and examining the labels that surface. On corporate intranets, for example, issue general queries like the name of the company, its products, or your own name. Identify cluster labels that tend to repeat or describe their clusters poorly, and discard them as follows:

  • If the label is a single word, tell Watson™ Explorer Engine to treat it as a clustering stopword.
  • If the label is two or three words long, and the all words seem individually uninformative (each word would be a poor cluster label by itself), treat each of them as a clustering stopword.

