Using a Distribution Node

Distribution nodes are used to show the distribution of symbolic values in a dataset. They are frequently used before manipulation nodes to explore the data and correct any imbalances. For example, if instances of respondents without children occur much more frequently than other types of respondents, you might want to reduce these instances so that a more useful rule can be generated in later data mining operations. A Distribution node will help you to examine and make decisions about such imbalances.

The Distribution node is unusual in that it produces both a graph and a table to analyze your data.

Figure 1. Distribution graph showing the number of people with or without children who responded to a marketing campaign
Distribution graph showing the number of people with or without children who responded to a marketing campaign
Figure 2. Distribution table showing the proportion of people with or without children who responded to a marketing campaign
Distribution table showing the proportion of people with or without children who responded to a marketing campaign

Once you have created a distribution table and graph and examined the results, you can use options from the menus to group values, copy values, and generate a number of nodes for data preparation. In addition, you can copy or export the graph and table information for use in other applications, such as MS Word or MS PowerPoint. See the topic Printing, saving, copying, and exporting graphs for more information.

To Select and Copy Values from a Distribution Table

  1. Click and hold the mouse button while dragging it over the rows to select a set of values. You can also use the Edit menu to Select All values.
  2. From the Edit menu, choose Copy Table or Copy Table (inc. field names).
  3. Paste to the clipboard or into the desired application.

    Note: The bars do not get copied directly. Instead, the table values are copied. This means that overlaid values will not be displayed in the copied table.

To Group Values from a Distribution Table

  1. Select values for grouping using the Ctrl+click method.
  2. From the Edit menu, choose Group.

Note: When you group and ungroup values, the graph on the Graph tab is automatically redrawn to show the changes.

You can also:

  • Ungroup values by selecting the group name in the distribution list and choosing Ungroup from the Edit menu.
  • Edit groups by selecting the group name in the distribution list and choosing Edit group from the Edit menu. This opens a dialog box where values can be shifted to and from the group.

Generate Menu Options

You can use options on the Generate menu to select a subset of data, derive a flag field, regroup values, reclassify values, or balance the data from either a graph or table. These operations generate a data preparation node and place it on the stream canvas. To use the generated node, connect it to an existing stream. See the topic Generating Nodes from Graphs for more information.