Geometric approaches


It is often the case that data is encoded as numeric vectors and hence is naturally embedded in a Euclidean space, with a dimension equal to the number of features. After the classical PCA that fits a linear (flat) subspace so that the total sum of squared distances of the data from the subspace (errors) is minimized, any distance function in this space can be used to endow it with a geometric structure, where ordinary intuition can be particularly powerful tools to reduce dimensionality. The idea can be generalized by changing the flat space to obtain a possibly nonlinear curved object (a so-called manifold) that can be fitted to the data while trying to minimize the deformations of distances as much as possible. Four major methods of this kind are reviewed, namely MDS, ISOMAP, t-SNE, and random projections.

Publication Title

Dimensionality Reduction in Data Science