Machine Learning and Feature Selection Based Ransomware Detection Using Hexacodes


Ransomware attacks increased within the past few years resulting huge financial losses to various businesses across the globe. To overcome the ransomware attacks, executables (or binary files) are converted back to assembly-level language or source code for further examination. In this work, we propose a novel ransomware detection method based on just hexacodes and without opcodes, which is clear departure from earlier studies. We first extracted the hexadecimal codes from the ransomware and then employed machine learning (ML) techniques and a few feature selection methods. Here, we leverage the dump and parser to decode binaries for extracting hexacodes. Apart from ransomware, files and benign executables are also used for training the classifiers. We conclude that out of the several ML techniques and the feature selection methods employed, random forest together with information gain-based feature selection obtained the highest accuracy of 88.39% in tenfold cross-validation setup. We also performed a statistical significance test to corroborate our results statistically. One significant observation is that random forest with only 30 features from information gain gave an improvement of 1% in accuracy, over the best model with all features. This architecture can be utilized as an early detection system.

Publication Title

Advances in Intelligent Systems and Computing