Distance Correlation

This feature requires the Statistics Base option.

The Distance Correlations procedure computes the dependence between two random variables or multivariate data. It assesses the nonlinear relationship of the variables unlike traditional correlation methods.

Example
A retail company wants to understand whether customer demographics (age, income, education, marital status) influence their purchasing behavior. Traditional correlation methods may fail if the relationships are nonlinear or involve interactions between multiple factors.
Distance Correlation procedure can be applied in the scenario with the following benefits:
  • Captures nonlinear dependencies. Unlike Pearson correlation, distance correlation detects relationships even if they are not linear.
  • Handles multidimensional relationships. It can assess how a combination of demographic factors relates to purchasing patterns.
  • Provides comprehensive measure. A significant distance correlation suggests that customer demographics strongly influence purchasing behavior, guiding targeted marketing strategies.
Associated Statistics
  • Distance Correlation (dCor(X,Y)): Measures the dependence between customer demographics (X) and purchasing behavior (Y).
  • Permutation-Based p-value: Assesses the statistical significance of the dependency. A small p-value (For example, p<0.05) indicates a strong relationship.
  • Distance Covariance (dCov(X,Y)): Quantifies the strength of dependence between variables.
  • Distance Variance (dVar(X), dVar(Y)): Measures variability within individual variables to normalize the correlation.
Data considerations
Only continuous variables can be used for analysis. A minimum of 2 continuous variables is required to calculate distance correlation. A maximum of 30 variables are allowed.

Obtaining Distance Correlation

  1. From the menu, click: Analyze > Correlate > Distance Correlation.
  2. Select two or more numeric variables for which distance correlation is computed. You can select up to 30 continuous variables from the source dataset for analysis.
  3. Select one identifier variable. If you opt for an ID Variable, the results display distance matrix table with IDs.
  4. Click OK to run the procedure with the specified settings and generate the output.

This procedure pastes DISTANCE CORRELATION command syntax.