View Complete Reference

Fernandes, P , Ciardhuáin, SÓ and Antunes, M (2025)

Distance-based feature selection using Benford’s law for malware detection

Computers & Security 158, pp.104625.

ISSN/ISBN: Not available at this time. DOI: 10.1016/j.cose.2025.104625



Abstract: Detecting malware in computer networks and data streams from Android devices remains a critical challenge for cybersecurity researchers. While machine learning and deep learning techniques have shown promising results, these approaches often require large volumes of labelled data, offer limited interpretability, and struggle to adapt to sophisticated threats such as zero-day attacks. Moreover, their high computational requirements restrict their applicability in resource-constrained environments. This research proposes an innovative approach that advances the state of the art by offering practical solutions for dynamic and data-limited security scenarios. By integrating natural statistical laws, particularly Benford’s law, with dissimilarity functions, a lightweight, fast, and scalable model is developed that eliminates the need for extensive training and large labelled datasets while improving resilience to data imbalance and scalability for large-scale cybersecurity applications. Although Benford’s law has demonstrated potential in anomaly detection, its effectiveness is limited by the difficulty of selecting relevant features. To overcome this, the study combines Benford’s law with several distance functions, including Median Absolute Deviation, Kullback–Leibler divergence, Euclidean distance, and Pearson correlation, enabling statistically grounded feature selection. Additional metrics, such as the Kolmogorov test, Jensen–Shannon divergence, and Z statistics, were used for model validation. This approach quantifies discrepancies between expected and observed distributions, addressing classic feature selection challenges like redundancy and imbalance. Validated on both balanced and unbalanced datasets, the model achieved strong results: 88.30% accuracy and 85.08% F1-score in the balanced set, 92.75% accuracy and 95.29% F1-score in the unbalanced set. The integration of Benford’s law with distance functions significantly reduced false positives and negatives. Compared to traditional Machine Learning methods, which typically require extensive training and large datasets to achieve F1 scores between 92% and 99%, the proposed approach delivers competitive performance while enhancing computational efficiency, robustness, and interpretability. This balance makes it a practical and scalable alternative for real-time or resource-constrained cybersecurity environments.


Bibtex:
@article{, title = {Distance-based feature selection using Benford’s law for malware detection}, journal = {Computers & Security}, volume = {158}, pages = {104625}, year = {2025}, issn = {0167-4048}, doi = {10.1016/j.cose.2025.104625}, url = {https://www.sciencedirect.com/science/article/pii/S0167404825003141}, author = {Pedro Fernandes and Séamus {Ó Ciardhuáin} and Mário Antunes}, }


Reference Type: Journal Article

Subject Area(s): Computer Science, Statistics