Binning (Grouping) Data Values

Instead of displaying all data values individually, you can bin them. Binning involves grouping individual data values into one instance of a graphic element. A bin may be a point that indicates the number of cases in the bin. Or it may be a histogram bar, whose height indicates the number of cases in the bin.

Use binning when you have such a large number of individual graphic elements in the chart that you cannot distinguish them. To bin, the Chart Editor:

  • Divides the data area into a grid of bins of the specified size. For points/markers, you can also specify the shape of the bin.
  • Counts the cases within each bin.
  • Assigns a color or size corresponding to the count.
  • Draws the graphic element on the bin. For points/markers, you can specify the position of the marker relative to each bin (midpoint versus centroid).

The option to bin markers/points is always available. Binning other elements is typically available only when the graph is a type of histogram or the data are unaggregated.

How to Bin Any Element

  1. From the menus choose:

    Options > Bin Element

  2. If necessary, use the Binning tab to refine the binning.

Without a selection, all available axes are binned. You can use the Binning tab to change the number of binned axes. You can also select a particular axis or element before binning to bin only the selected axis or element.

How to Remove Binning

  1. From the menus choose:

Options > Un-Bin Element

When you remove binning for a histogram, the result may not be useful. You will end up with one bar for each unique value on the x axis. Because the values do not often overlap, each bar will show a count of 1.

Using the Binning Tab with Marker/Point Elements

Count indicator. Choose color or size to indicate how the data markers show the number of data points in each bin. If you use size, circular markers work best. If you use color, be aware that the markers' border color is used when you first bin the data points. Because the marker border is often black, use the Marker tab to change the color. Do not use very dark or very light colors for the markers. The variation in intensity of these colors does not allow you to distinguish the bin sizes.

Position of markers. Choose where, in the bin, the symbol is displayed. Midpoint is the graphical middle of the bin and makes it less likely that the symbols will overlap. Centroid positions the symbol at the centroid location of the points it represents. The coordinates of the centroid are the weighted means for each axis. A missing value in any one of the variables excludes the case from the calculation. Changing the scale does not affect the calculation of the centroid.

Bin layout. Choose how the bins are defined. Grid divides the data frame into rectangles to create each bin. Hex divides the data frame into hexagons to create each bin. The data points falling into each shape are binned together.

Bin size. Click Automatic to let the Chart Editor pick the number of bins based on the underlying data. Click Custom to enter a specific number of bins for each dimension.

Using the Binning Tab with Other Elements

Axis selection. Select an option for choosing the axes on which binning occurs.

For each axis, you can set the following options.

Automatic. Let the Chart Editor pick bin sizes and widths based on the underlying data.

Custom. Choose your own values for the bin sizes and widths. You can either specify the number of bins or the width of each bin. The width also affects the number of bins. For example, if the axis range is 0–100 and you specify the width as 5, there will be 20 bins. The greater the number of bins, the more detailed the histogram is. However, more bins may make the histogram too irregular to see the shape of the distribution. Note: If the variable being binned is a date, the units used for the width are days. Therefore, specifying a width of 30 indicates 30 days.

Custom value for anchor. Specify the starting value of a bin. By default, the bin will include the lowest data value. The anchor is set so that bin boundaries are at good values. If you specify a value less than the lowest data value, this is the starting value of the first bin. For example, you might want the values of 0–5 to be included in the first bin even though the lowest value in the dataset is 6. In this case, you might set the Custom value for anchor to 0. Note: If the variable being binned is a date, the starting value must be a date literal in the date format specified for the variable on the Variable View tab in the Data Editor (for example, 01/01/2001).