Netezza Divisive Clustering Nugget - Settings Tab

On the Settings tab, you can set options for scoring the model.

Include input fields. If selected, this option passes all the original input fields downstream, appending the extra modeling field or fields to each row of data. If you clear this check box, only the Record ID field and the extra modeling fields are passed on, and so the stream runs more quickly.

Distance measure. The method to be used for measuring the distance between data points; greater distances indicate greater dissimilarities. The options are:

  • Euclidean. (default) The distance between two points is computed by joining them with a straight line.
  • Manhattan. The distance between two points is calculated as the sum of the absolute differences between their co-ordinates.
  • Canberra. Similar to Manhattan distance, but more sensitive to data points closer to the origin.
  • Maximum. The distance between two points is calculated as the greatest of their differences along any coordinate dimension.

Applied hierarchy level. The level of hierarchy that should be applied to the data.