The selected variables exhibit diverse distributions:Figure 1. Descriptive statistics
These metrics indicate substantial variability in income and employment durations among
loan applicants, which might influence financial relationships in nonlinear ways.
Distance Correlation Analysis:
Distance correlation identifies both linear and nonlinear associations between variable pairs.
Figure 2. Distance correlation coefficients
Key findings:
Strongest Relationship: employvs.income shows the highest distance
correlation (dCor = 0.444), suggesting a moderate nonlinear relationship: more years with an
employer are likely associated with higher income.
Moderate Relationships: agevs.employ and agevs.income
(both around dCor = 0.28) reflect mild dependencies, possibly due to accumulated experience or
career progression with age.
Negligible or No Relationships: Debt-to-Income Ratio shows minimal association with any of the
other variables, with dCor values close to zero and wide confidence intervals encompassing 0.
Distance Correlation Estimates
Figure 3. Distance correlation estimates
Distance Covariance (dCov) reflects the absolute magnitude of joint variability between
variables in the distance metric space. Larger values indicate stronger joint dependence before
normalization.
Distance Correlation (dCor) is a normalized version of dCov, ranging from 0 (no
dependence) to 1 (perfect dependence). It enables comparison across variable pairs of different
scales.
High dCov and dCor
The pair employs.income not only has the highest correlation (dCor = 0.444)
but also a notable dCov (0.0026), showing substantial shared variability.
agevs.employ and agevs.income follow next, with lower dCov
but still meaningful dCor values (~0.28), revealing moderate relationships despite differing
scales.
Low dCov and dCor
Variable pairs involving debtinc have dCov values near zero, and dCor values
also close to zero, confirming that debt-to-income ratio behaves independently of the other
variables in this dataset.
While dCor shows how strongly variables are linked relative to their own scale, dCov can help
reveal if the raw joint variability is trivial even if normalized correlation appears nonzero.
In the dataset, low dCov values (For example, incomevs.debtinc: 0.0000)
reinforce the case that these variables are statistically unlinked.
Distance Variance
Figure 4. Distance variance
These values quantify the variability in
pairwise distances within each variable. Notably:
age and employment exhibit the highest internal distance
variability.
income has relatively low distance variance despite a wide numerical range,
possibly due to normalization and a skewed distribution.
Pairwise Distance Scatter Plot
The selected scatter plot (age vs debtinc ratio) provides
a visual perspective on their pairwise distances:Figure 5. Scatter
plot of pairwise distances
The plot displays the Min-Max normalized distances for Age (X-axis) and Debt-to-Income Ratio
(Y-axis).
There is no discernible pattern or clustering, which aligns with the very low distance
correlation (dCor = 0.0021).
This further confirms that Age and Debt-to-Income Ratio
are statistically independent in this dataset.