Improved outlier detection using sparse coding-based methods


Outlier detection is an active area of research in data mining and a large number of algorithms exist. Our goal is to come up with a guideline on how to choose the most appropriate outlier detection algorithm for a given dataset without exploiting any domain- or application-specific information. Extensive experimentations with a number of state-of-the-art algorithms on thousands of benchmark datasets revealed a clear trend. For datasets with low dimensionality and low difficulty level, traditional methods outperform sparse coding-based outlier detection (SCOD) algorithms. But the trend reverses as the dimensionality or difficulty level increases. A threshold emerges as the point of intersection of the trends for SCOD and traditional algorithms, which is 250 and 21 for dimensionality and difficulty level respectively.

Publication Title

Pattern Recognition Letters