The notion of data dimensionality underpins this entire concept. Dimensionality refers to the number of attributes (or features) assigned to a single dataset. However, there’s a tradeoff at work here—the greater the amount of dimensionality, the more data storage demanded by that dataset. Furthermore, the higher the dimensionality, the more often data tends to be sparse, complicating necessary outlier analysis.
Dimensionality reduction counters that by limiting the “noise” in the data and enabling better visualization of data. A prime example of dimensionality reduction is the wavelet transform method, which assists image compression by maintaining the relative distance that exists between objects at various resolution levels.
Feature extraction is another possible transformation for data—one that changes original data into numeric features and works in conjunction with machine learning. It differs from principal component analysis (PCA), another means of reducing the dimensionality of large data sets, in which a sizable set of variables is transformed into a smaller set while retaining most of the data from the large set.