Multidimensional Scaling Dissimilarity Measures for Binary Data
The following dissimilarity measures are available for binary data:
- Euclidean distance. Computed from a fourfold table as SQRT(b+c), where b and c represent the diagonal cells corresponding to cases present on one item but absent on the other.
- Squared Euclidean distance. Computed as the number of discordant cases. Its minimum value is 0, and it has no upper limit.
- Size difference. An index of asymmetry. It ranges from 0 to 1.
- Pattern difference. Dissimilarity measure for binary data that ranges from 0 to 1. Computed from a fourfold table as bc/(n**2), where b and c represent the diagonal cells corresponding to cases present on one item but absent on the other and n is the total number of observations.
- Variance. Computed from a fourfold table as (b+c)/4n, where b and c represent the diagonal cells corresponding to cases present on one item but absent on the other and n is the total number of observations. It ranges from 0 to 1.
- Lance and Williams. Computed from a fourfold table as (b+c)/(2a+b+c), where a represents the cell corresponding to cases present on both items, and b and c represent the diagonal cells corresponding to cases present on one item but absent on the other. This measure has a range of 0 to 1. (Also known as the Bray-Curtis nonmetric coefficient.)
You may optionally change the Present and Absent fields to specify the values that indicate that a characteristic is present or absent. The procedure will ignore all other values.