Binning Variables
The Visual Binning main dialog box provides the following information for the scanned variables:
- Scanned Variable List
- Displays the variables you selected in the initial dialog box. You can sort the list by measurement level (scale or ordinal) or by variable label or name by clicking on the column headings.
- Cases Scanned
- Indicates the number of cases scanned. All scanned cases without user-missing or system-missing values for the selected variable are used to generate the distribution of values used in calculations in Visual Binning, including the histogram displayed in the main dialog box and cutpoints based on percentiles or standard deviation units.
- Missing Values
- Indicates the number of scanned cases with user-missing or system-missing values. Missing values are not included in any of the binned categories. See the topic User-Missing Values in Visual Binning for more information.
- Current Variable
- The name and variable label (if any) for the currently selected variable that will be used as the basis for the new, binned variable.
- Binned Variable
- Name and optional variable label for the new, binned variable.
- Name
- You must enter a name for the new variable. Variable names must be unique and must follow variable naming rules. See the topic Variable names for more information.
- Label
- You can enter a descriptive variable label up to 255 characters long. The default variable label is the variable label (if any) or variable name of the source variable with (Binned) appended to the end of the label.
- Minimum and Maximum
- Minimum and maximum values for the currently selected variable, based on the scanned cases and not including values defined as user-missing.
- Non-missing Values
- The histogram displays the distribution of non-missing values for the currently selected variable, based on the scanned cases.
- After you define bins for the new variable, vertical lines on the histogram are displayed to indicate the cutpoints that define bins.
- You can click and drag the cutpoint lines to different locations on the histogram, changing the bin ranges.
- You can remove bins by dragging cutpoint lines off the histogram.
Note: The histogram (displaying nonmissing values), the minimum, and the
maximum are based on the scanned values. If you do not include all cases in the scan, the true
distribution may not be accurately reflected, particularly if the data file has been sorted by the
selected variable. If you scan zero cases, no information about the distribution of values is
available.
- Grid
- Displays the values that define the upper endpoints of each bin and optional
value labels for each bin.
- Value
- The values that define the upper endpoints of each bin. You can enter values or use Make Cutpoints to automatically create bins based on selected criteria. By default, a cutpoint with a value of HIGH is automatically included. This bin will contain any nonmissing values above the other cutpoints. The bin defined by the lowest cutpoint will include all nonmissing values lower than or equal to that value (or simply lower than that value, depending on how you define upper endpoints).
- Label
- Optional, descriptive labels for the values of the new, binned variable. Since the values of the new variable will simply be sequential integers from 1 to n, labels that describe what the values represent can be very useful. You can enter labels or use Make Labels to automatically create value labels.
Deleting a bin from the grid
- Right-click either the Value or Label cell for the bin.
- From the pop-up menu, select Delete Row.
Note: If you delete the HIGH bin, any cases with values higher than the
last specified cutpoint value will be assigned the system-missing value for the new variable.
Deleting all labels or all defined bins
- Right-click anywhere in the grid.
- From the pop-up menu select either Delete All Labels or Delete All Cutpoints.
- Upper Endpoints
- Controls treatment of upper endpoint values entered in the Value
column of the grid.
- Included (<=)
- Cases with the value specified in the Value cell are included in the binned category. For example, if you specify values of 25, 50, and 75, cases with a value of exactly 25 will go in the first bin, since this will include all cases with values less than or equal to 25.
- Excluded (<)
- Cases with the value specified in the Value cell are not included in the binned category. Instead, they are included in the next bin. For example, if you specify values of 25, 50, and 75, cases with a value of exactly 25 will go in the second bin rather than the first, since the first bin will contain only cases with values less than 25.
- Make Cutpoints
- Generates binned categories automatically for equal width intervals, intervals with the same number of cases, or intervals based on standard deviations. This is not available if you scanned zero cases.
- Make Labels
- Generates descriptive labels for the sequential integer values of the new, binned variable, based on the values in the grid and the specified treatment of upper endpoints (included or excluded).
- Reverse scale
- By default, values of the new, binned variable are ascending sequential integers from 1 to n. Reversing the scale makes the values descending sequential integers from n to 1.
- Copy Bins
- You can copy the binning specifications from another variable to the currently selected variable or from the selected variable to multiple other variables.