Multidimensional Scaling Dissimilarity Measures for Binary Data
The following dissimilarity measures are available for binary data:
Euclidean distance. Computed from a fourfold table as SQRT(b+c), where b and c represent the diagonal cells corresponding to cases present on one item but absent on the other.
Squared Euclidean distance. Computed as the number of discordant cases. Its minimum value is 0, and it has no upper limit.
Size difference. An index of asymmetry. It ranges from 0 to 1.
Pattern difference. Dissimilarity measure for binary data that ranges from 0 to 1. Computed from a fourfold table as bc/(n**2), where b and c represent the diagonal cells corresponding to cases present on one item but absent on the other and n is the total number of observations.
Variance. Computed from a fourfold table as (b+c)/4n, where b and c represent the diagonal cells corresponding to cases present on one item but absent on the other and n is the total number of observations. It ranges from 0 to 1.
Lance and Williams. Computed from a fourfold table as (b+c)/(2a+b+c), where a represents the cell corresponding to cases present on both items, and b and c represent the diagonal cells corresponding to cases present on one item but absent on the other. This measure has a range of 0 to 1. (Also known as the Bray-Curtis nonmetric coefficient.)
You can optionally change the Present and Absent fields to specify the values that indicate that a characteristic is present or absent. The procedure will ignore all other values.