Tiles (Equal Count or Sum)

The tile binning method creates nominal fields that can be used to split scanned records into percentile groups (or quartiles, deciles, and so on) so that each group contains the same number of records, or the sum of the values in each group is equal. Records are ranked in ascending order based on the value of the specified bin field, so that records with the lowest values for the selected bin variable are assigned a rank of 1, the next set of records are ranked 2, and so on. The threshold values for each bin are generated automatically based on the data and tiling method used.

Tile name extension. Specify an extension used for field(s) generated using standard p-tiles. The default extension is _TILE plus N, where N is the tile number. You may also specify whether the extension is added to the start (Prefix) or end (Suffix) of the field name. For example, you could generate a new field called income_BIN4.

Custom tile extension. Specify an extension used for a custom tile range. The default is _TILEN. Note that N in this case will not be replaced by the custom number.

Available p-tiles are:

  • Quartile. Generate 4 bins, each containing 25% of the cases.
  • Quintile. Generate 5 bins, each containing 20% of the cases.
  • Decile. Generate 10 bins, each containing 10% of the cases.
  • Vingtile. Generate 20 bins, each containing 5% of the cases.
  • Percentile. Generate 100 bins, each containing 1% of the cases.
  • Custom N. Select to specify the number of bins. For example, a value of 3 would produce 3 banded categories (2 cut points), each containing 33.3% of the cases.

Note that if there are fewer discrete values in the data than the number of tiles specified, all tiles will not be used. In such cases, the new distribution is likely to reflect the original distribution of your data.

Tiling method. Specifies the method used to assign records to bins.

  • Record count. Seeks to assign an equal number of records to each bin.
  • Sum of values. Seeks to assign records to bins such that the sum of the values in each bin is equal. When targeting sales efforts, for example, this method can be used to assign prospects to decile groups based on value per record, with the highest value prospects in the top bin. For example, a pharmaceutical company might rank physicians into decile groups based on the number of prescriptions they write. While each decile would contain approximately the same number of scripts, the number of individuals contributing those scripts would not be the same, with the individuals who write the most scripts concentrated in decile 10. Note that this approach assumes that all values are greater than zero, and may yield unexpected results if this is not the case.

Ties. A tie condition results when values on either side of a cut point are identical. For example, if you are assigning deciles and more than 10% of records have the same value for the bin field, then all of them cannot fit into the same bin without forcing the threshold one way or another. Ties can be moved up to the next bin or kept in the current one but must be resolved so that all records with identical values fall into the same bin, even if this causes some bins to have more records than expected. The thresholds of subsequent bins may also be adjusted as a result, causing values to be assigned differently for the same set of numbers based on the method used to resolve ties.

  • Add to next. Select to move the tie values up to the next bin.
  • Keep in current. Keeps tie values in the current (lower) bin. This method may result in fewer total bins being created.
  • Assign randomly. Select to allocate the tie values randomly to a bin. This attempts to keep the number of records in each bin at an equal amount.

Example: Tiling by Record Count

The following table illustrates how simplified field values are ranked as quartiles when tiling by record count. Note the results vary depending on the selected ties option.

Table 1. Tiling by record count example
Values Add to Next Keep in Current
10 1 1
13 2 1
15 3 2
15 3 2
20 4 3

The number of items per bin is calculated as:

total number of value / number of tiles

In the simplified example above, the desired number of items per bin is 1.25 (5 values / 4 quartiles). The value 13 (being value number 2) straddles the 1.25 desired count threshold and is therefore treated differently depending on the selected ties option. In Add to Next mode, it is added into bin 2. In Keep in Current mode, it is left in bin 1, pushing the range of values for bin 4 outside that of existing data values. As a result, only three bins are created, and the thresholds for each bin are adjusted accordingly, as shown in the following table.

Table 2. Binning example result
Bin Lower Upper
1 >=10 <15
2 >=15 <20
3 >=20 <=20

Note: The speed of binning by tiles may benefit from enabling parallel processing.