Deteksi Malware pada File Executable Menggunakan Machine Learning Random Forest
Main Article Content
Abstract
The pervasive expansion of digital infrastructure has triggered an exponential surge in cyber threats, with malicious software (malware) posing a paramount risk to information security systems. Traditional signature-based and heuristic detection methods demonstrate severe limitations in mitigating zero-day exploits and multi-variant obfuscated malware due to their rigid dependency on existing signature repositories and susceptibility to high false-positive rates. To transcend these boundaries, this study introduces an adaptive and robust static detection framework for Portable Executable (PE) files leveraging the ensemble machine learning technique of Random Forest. Utilizing a structured dataset comprising PE files harvested from public malware repositories including Malware Bazaar alongside verified benign applications, static analysis was performed without code execution to preserve environment safety. A total of 75 distinctive structural features spanning COFF headers, section characteristics, data directories, and configuration markers were systematically extracted using the Python pefile library. The model was trained using an 80:20 data split ratio. Experimental evaluation achieved an exceptional internal generalization capability with an Out-of-Bag (OOB) score of 97.43%. Independent validation on a test suite of 332 unseen files yielded a balanced confusion matrix comprising 160 True Positives, 164 True Negatives, 5 False Positives, and 3 False Negatives, establishing a high precision, recall, and F1-score of approximately 98%. Feature importance analysis highlighted that parameters such as MajorOperatingSystemVersion, MajorSubsystemVersion, and DllCharacteristics serve as critical discriminators. Finally, the optimized predictive model was integrated into a web-accessible application architecture powered by Flask and MySQL to facilitate user-driven file uploading and real-time inference reporting, offering an scalable complementary defense layer for modern cybersecurity ecosystems
Article Details

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
[2] M. Asam, S. Hussain Khan, T. Jamal, U. Zahoora, and A. Khan, “Malware Classification Using Deep Boosted Learning.”
[3] M. Altaiy, İ. Yildiz, and B. Uçan, “MALWARE DETECTION USING DEEP LEARNING ALGORITHMS,” 2023. [Online]. Available: https://orcid.org/0000-0003-2943-3857
[4] E. S. Alomari et al., “Malware Detection Using Deep Learning and Correlation-Based Feature Selection,” Symmetry (Basel), vol. 15, no. 1, Jan. 2023, doi: 10.3390/sym15010123.
[5] M. Masum, M. Jobair Hossain Faruk, H. Shahriar, K. Qian, D. Lo, and M. Islam Adnan, “Ransomware Classification and Detection With Machine Learning Algorithms.”
[6] F. A. Rafrastara, C. Supriyanto, C. Paramita, Y. P. Astuti, and F. Ahmed, “Performance Improvement of Random Forest Algorithm for Malware Detection on Imbalanced Dataset using Random Under-Sampling Method,” vol. 8, no. 2, 2023, [Online]. Available: https://orangedatamining.com/
[7] S. Yoo, S. Kim, S. Kim, and B. B. Kang, “AI-HydRa: Advanced hybrid approach using random forest and deep learning for malware classification,” Inf Sci (N Y), vol. 546, pp. 420–435, Feb. 2021, doi:10.1016/j.ins.2020.08.082.
[8] E. Valdis Tjahjadi and B. Santoso, “Klasifikasi Malware Menggunakan Teknik Machine Learning,” Copyright @BALOK, vol. 2, no. 1, 2023, [Online]. Available: https://www.kaggle.com/datasets/amauricio/pe-files-malwares.
[9] R. B. Hadiprakoso, W. Rendra Aditya, F. N. Pramitha, P. Siber, and S. Negara, “ANALISIS STATIS DETEKSI MALWARE ANDROID MENGGUNAKAN ALGORITMA SUPERVISED MACHINE LEARNING,” 2022.
[10] Y. Wanli Sitorus, P. Sukarno, S. Mandala, F. Informatika, and U. Telkom, “Analisis Deteksi Malware Android menggunakan metode Support Vector Machine & Random Forest,” vol. 8, no. 6, p. 12500, 2021.
[11] F. A. Rafrastara, C. Supriyanto, C. Paramita, and Y. P. Astuti, “Deteksi Malware menggunakan Metode Stacking berbasis Ensemble,” vol. 8, no. 1, 2023, [Online]. Available: https://orangedatamining.com/
[12] R. Marriam, R. Mohamad, S. Hiew Moi, and H. Amnur, “A Comparative Study of Rumor Detection Domains: Machine Learning, Deep Learning, and Statistical Approaches,” 2025. doi: http://dx.doi.org/10.62527/joiv.9.6.4793.
[13] F. Abdussalam and A. Rahmatulloh, “Analisis Efektivitas Algoritma Machine Learning Dalam Deteksi Malware Android ……… ANALISIS EFEKTIVITAS ALGORITMA MACHINE LEARNING DALAM DETEKSI MALWARE ANDROID DENGAN STATISTICAL TESTS,” 2024, doi:10.35316/jimi.v9i2.124-133.
[14] M. Asam et al., “Detection of exceptional malware variants using deep boosted feature spaces and machine learning,” Applied Sciences (Switzerland), vol. 11, no. 21, Nov. 2021, doi: 10.3390/app112110464.
[15] E. Horvitz and D. Mulligan, “Data, privacy, and the greater good,” Science (1979), vol. 349, no. 6245, pp. 253–255, Jul. 2015, doi:10.1126/science.aac4520.