Faculty Publications

An Information-theoretic approach to dimensionality reduction in data science

Sambriddhi Mainali, University of Memphis
Max Garzon, University of Memphis
Deepak Venugopal, University of Memphis
Kalidas Jana, University of MemphisFollow
Ching Chi Yang, University of MemphisFollow
Nirman Kumar, University of Memphis

Abstract

Data reduction is crucial in order to turn large datasets into information, the major purpose of data science. The classic and richer area of dimensionality reduction (DR) has traditionally been based on feature extraction by combining primary features in a linear fashion, aiming to preserve or maintain covariance/correlations between the features. Nonlinear alternatives have been developed, including information-theoretic approaches using mutual information as well and conditional entropy based on target features. Here, we further this approach to feature selection or reduction strategy based on the concept of conditional Shannon entropy of two random variables. Novel results include (a) a dimensionality reduction method based on conditional entropy between predictors themselves along two variants, disregarding the influence of the target feature; (b) an error-prevention method inspired by error-detection and correction in information theory for DR with genomic data that can be used for abiotic data as well; and (c) a comparative assessment of the performance of several machine learning models on input features selected by these methods. We assess the quality of the techniques based on their performance in solving three application problems (Malware Classification, BioTaxonomy, and Noisy Classification) of various degrees of difficulty with competitive outcomes. Some useful heuristics arise from the analysis of the results and also suggest some problems of interest for further research.

Publication Title

International Journal of Data Science and Analytics

Recommended Citation

Mainali, S., Garzon, M., Venugopal, D., Jana, K., Yang, C., & Kumar, N. (2021). An Information-theoretic approach to dimensionality reduction in data science. International Journal of Data Science and Analytics, 12 (3), 185-203. https://doi.org/10.1007/s41060-021-00272-2

Link to Full Text

COinS

Faculty Publications

An Information-theoretic approach to dimensionality reduction in data science

Abstract

Publication Title

Recommended Citation

Search

Browse

Author Corner

Libraries

Faculty Publications

An Information-theoretic approach to dimensionality reduction in data science

Authors

Abstract

Publication Title

Recommended Citation

Share

Search

Browse

Author Corner

Libraries